"If you're wondering why I'm taking a long time to respond to your patches,", began Theodore Ts'o on the linux-ext4 mailing list, in a thread that offered much insight into how and why to properly submit and test patches. "Patches that are accepted into mainline should do one and only one thing," Ted continued, "so if someone suggests that you make changes to your submitted patch, ideally what you should do is to resubmit the patch with the fixes --- and not submit a patch which is a delta to the previous one." He also noted that patch submitters often greatly outnumber maintainers dictating a higher standard of quality, "consider that for some maintainers, there may be 10 or 20 or 30 or more patch submitters in their subsystem. With that kind of submitter-to-maintainer ratio, the patch submitter simply has to do much more of the work, since otherwise the subsystem maintainer simply can't keep up."
Ted went on to acknowledge, "I happen to believe that we need to encourage newcomers to the kernel developer community, and so I spend more time mentoring people who are new to the process." He noted that his time was finite however, and that patches are accepted more quickly when they are easy to review and integrate. Regarding the filesystem for which the patches had been submitted, he added, "Ext4 is actually quite stable at this point. Very large numbers of people are using it, and most users are quite happy." For this reason, he pointed out that it is even more critical that the patches merged be of high quality. That said, he continued, "there is no such thing as code which is not buggy. For any non-trivial program, it's almost certain there are bugs. [...] Ext4 is not exempt from these fundamental laws of software engineering. 'Code is always buggy until the last user of the program dies'." He tied this back to the importance of testing patches before submitting, "keep in mind that the maxim that code which is not buggy also applies to your patches."
"The _real_ bug is clearly in the hardware design that allows you to brick those things without apparently even having a lock bit. I'm hoping Intel doesn't treat this as just a software bug. Some hw designer should be thinking hard about which orifice they put their head up in."
"As of now there are no known bugs, though I'm sure that will change as more DragonFly users start using the filesystem :-)"
In an announcement for the 2.6.25.10 stable kernel, Greg KH noted, "it contains a number of assorted bugfixes all over the tree. And once again, any users of the 2.6.25 kernel series are STRONGLY encouraged to upgrade to this release." The emphasis on the word strongly led to a lengthy discussion about how security fixes are handled in the Linux Kernel. Linus Torvalds replied, "I personally consider security bugs to be just 'normal bugs'. I don't cover them up, but I also don't have any reason what-so-ever to think it's a good idea to track them and announce them as something special." Later in the thread he went on to explain, "one reason I refuse to bother with the whole security circus is that I think it glorifies - and thus encourages - the wrong behavior. It makes 'heroes' out of security people, as if the people who don't just fix normal bugs aren't as important. In fact, all the boring normal bugs are _way_ more important, just because there's a lot more of them. I don't think some spectacular security hole should be glorified or cared about as being any more 'special' than a random spectacular crash due to bad locking."
Theodore T'so pointed out that other developers had different beliefs about disclosure than Linus and referred to mailing lists such as the private security@ list described in the SecurityBugs documentation, originally created in early 2005. He then described Linus' stance, "if Linus finds out about a security bug, he will fix it and check it into the public git repository right away. But he's very honest in telling you that is what he will do --- so you can choose whether or not to include him in any disclosures that you might choose to make." Regarding whether Full Disclosure is the best policy, Ted highlighted the fact that the debate has been going on for several decades, "it is clear that we're not going settle this debate now, and certainly not on the Linux Kernel Mailing List." Later in the discussion, Linus offered a succinct summary of his viewpoint, "my responsibility is to do a good job. And not pander to the people who want to turn security into a media circus."
"Here's a hint: next time I claim some code of yours is buggy, either just acknowledge the bug, or stay silent. You'll look smarter that way."
"In the early days, the project was conceived as a way of getting fresh blood into kernel development by giving them fairly simple but generally useful tasks and hoping they'd move more into the mainstream," began James Bottomley starting a thread titled Fixing the Kernel Janitors project. He continued, "if we wind forwards to 2008, there's considerable and rising friction being generated by janitorial patches,", references a recent thread complaining about worthless patches hitting the lkml. Later in the thread he added:
"That's why I think we have to change the process. If we keep the Janitors project, then the bar has to be raised so that it becomes more participatory and thought oriented (i.e. eliminate from the outset anyone who is not going to graduate from mechanical changes to more useful ones). That's why I think bug finding and reporting is a better thing to do. There are mechanical aspects to finding bugs but it is a useful service. Bug fixing is participatory because we usually do quite a lot of back and forth between the reporter and the fixer and at the end of the day quite a few people get curious about how the bug was triggered in the first place and actually go off and read the code."
A thread on the Linux Kernel mailing list discussed the process in place for reporting, bisecting and fixing bugs. In response to a suggestion that some of the issues could be solved by introducing new procedures, Al Viro retorted, "we've got ourselves a developing beaurocracy. As in 'more and more ways of generating activity without doing anything even remotely useful'. Complete with tendency to operate in the ways that make sense only to bureaucracy in question and an ever-growing set of bylaws..." Later in the thread, David Miller agreed and noted that ,"the resulting 'bureaucracy' or whatever you want to call it is perceived to undercut the very thing that makes the Linux kernel fun to work on. It's still largely free form, loose, and flexible. And that's a notable accomplishment considering how much things have changed. That feeling is why I got involved in the first place, and I know it's what gets other new people in and addicted too."
Andrew Morton tried to return the discussion to its original topic, "the problem we're discussing here is the apparently-large number of bugs which are in the kernel, the apparently-large number of new bugs which we're adding to the kernel, and our apparent tardiness in addressing them." Al noted that some of the problem is that git is so efficient at merging code, "the patches going in during a merge (especially for a tree that collects from secondaries) are not visible enough. And it's too late at that point, since one has to do something monumentally ugly to get Linus revert a large merge. On the scale of Great IDE Mess in 2.5..." Another suggestion was made to replace bugzilla with something better, to which Andrew replied, "swapping out bugzilla for something else wouldn't help. We'd end up with lots of people ignoring a good bug tracking system just like they were ignoring a bad one."
"The way I see it, the burden of debugging and fixing bugs is mainly on the developers of the code that breaks. You can't blame users for using the code, triggering bugs and then reporting the breakage. Users who report bugs are doing us all a great service regardless of their ability or willingness to do more work than just the initial report."
"This case is a good example to use the next time a stupid thread starts up about bug reports not being looked into. To me it seems clearly more a matter of the quality of the bug report."
"Five years ago I might have said that it's important to fix pre-existing bugs, but all the ACPI and suspend etc problems have long since convinced me that regressions are *much* more important than stuff that never worked."
"This is the listing of the open bugs that are relatively new, around 2.6.22 and up. They are vaguely classified by specific area," Natalie Protasevich said, posting a current list of bugs each linking to an appropriate bugzilla.kernel.org entry. Andrew Morton reviewed the list, noting "no response from developers" in response to many of the bugs. David Miller pointed out that in some cases this wasn't true, referring to 46 bug fixes queued in his networking tree and another 10 already pushed upstream, "when someone like me is bug fixing full time, I take massive offense to the impression you're trying to give especially when it's directed at the networking. So turn it down a notch Andrew." Andrew wasn't convinced, "first we need to work out whether we have a problem. If we do this, then we can then have a think about what to do about it. I tried to convince the 2006 KS attendees that we have a problem and I resoundingly failed. People seemed to think that we're doing OK." He continued:
"This is not a minor matter. If the kernel _is_ slowly deteriorating then this won't become readily apparent until it has been happening for a number of years. By that stage there will be so much work to do to get us back to an acceptable level that it will take a huge effort. And it will take a long time after that for the kerel to get its reputation back. So it is important that we catch deterioration *early* if it is happening."
"Exposing bugs is good for development, bad for business."
A bug report filed by Ingo Molnar regarding a procfs crash in the recently released 2.6.23-rc9 kernel was quickly tracked down by Linus Torvalds as a compiler bug. The bug was ultimately determined to be from a compiler optimization generated with an older version of GCC. Ingo was skeptical at first, "it's 4.0.2. Not the latest & greatest but I've been using it for 2 years and this would be the first time it miscompiles a 32-bit kernel out of tens of thousands of successful kernel bootups." Linus replied, "I am 100% sure. I can look at the disassembly, and point to the fact that your Oops happens on code that is simply totally bogus." He continued on to offer an interesting review of the crash, explaining line by line what should have been generated versus what actually was, causing the crash. In the end, Ingo switched to a distribution compiled GCC 4.1.2 and confirmed that the crash went away, "so you are completely right, it's a compiler bug in 4.0.2."
During the thread, Linus suggested that the optimization made by the compiler wasn't "legal", to which Alan Cox retorted, "pedant: valid. Almost all optimizations are legal, nobody has yet written laws about compilers. Sorry but I'm forever fixing misuse of the word 'illegal' in printks, docs and the like and it gets annoying after a bit." Linus playfully responded, "heh. When I'm ruler of the universe, it *will* be illegal. I'm just getting a bit ahead of myself." When asked how long until he expected to be ruler, Linus added, "I'm working on it, I'm working on it. I'm just as frustrated as you are. It turns out to be a non-trivial problem."
"We don't want to introduce pointless delays in throttle_vm_writeout() when the writeback limits are not yet exceeded, do we?" asked Fengguang Wu as the description of his patch to mm/page-writeback.c. Andrew Morton replied, "this is a pretty major bugfix, explaining, "this patch has the potential to significantly alter the dynamics of the VM behaviour under particular workloads. It might turn up other stuff..." He continued:
"I wonder why nobody noticed this happening. Either a) it turns out that kswapd is doing a good job and such callers don't do direct reclaim much or b) nobody is doing any in-depth kernel instrumentation.
"Now, how _would_ one notice this problem? We don't have very good tools, really. Booting with "profile=sleep" and looking at the profile data would be one way. Repeatedly doing sysrq-T is another. Perhaps the new lockstat-via-lockdep code would allow this to be observed in some fashion, dunno."
Michal Piotrowski sent out an updated list of known regressions in the 2.6.22-git kernel.