"This is starting to get beyond frustrating for me," complained David Miller of the latest merge window, launching what turned into a very lengthy and ongoing discussion about the Linux kernel development process. The concept of a regular "merge window" was first discussed in July of 2005 with the release of the 2.6.14-rc4 kernel, following the 2005 Developers' Summit. From 2.6.14 on, the release of each official 2.6.y kernel has been followed by a two week period during which major changes are merged into the kernel, followed by a 2.6.y-rc1 release. David complained that this particular merge window has been more painful than others, "the tree breaks every day, and it's becoming an extremely non-fun environment to work in. We need to slow down the merging, we need to review things more, we need people to test their [...] changes!"
During the lengthy discussion, Linux creator Linus Torvalds explained:
"The notion that we should even _try_ to aim to slow things down, that one I find unlikely to be true, and I don't even understand why anybody would find it a logical goal? Of course, you will have fewer new bugs if you have fewer changes. But that's not a goal, that's a tautology and totally uninteresting. A small program is likely to have fewer bugs, but that doesn't make something small 'better' than something large that does more. Similarly, a stagnant development community will introduce new bugs more seldom. But does that make a stagnant one better than a vibrant one? Hell no. So what I'm arguing against here is not that we should aim for worse quality, but I'm arguing against the false dichotomy of believing that quality is incompatible with lots of change."
Andrew Morton replied to a commit message making 4k stacks the default, saying, "this patch will cause kernels to crash." Ingo Molnar replied, "what mainline kernels crash and how will they crash? Fedora and other distros have had 4K stacks enabled for years." He added, "we've conducted tens of thousands of bootup tests with all sorts of drivers and kernel options enabled and have yet to see a single crash due to 4K stacks." During the lengthy discussion it was suggested that nfs+xfs+raid kernel configurations, and using ndiswrapper are the most common reasons for overflowing a 4K stack size.
Andi Kleen questioned the usefulness of 4k stacks, "as far as I can figure out they are not [a worthy goal]. They might have been a worthy goal on crappy 2.4 VMs, but these times are long gone." Arjan van de Ven suggested that though the 2.6 VM is much improved over the 2.4 VM, fragmentation with 8K stacks remains an unsolvable problem, "it's basic math; the Linux VM gets to deal with both short and long lasting allocations; no matter how hard you try to get some degree of fragmentation; especially due to the 15:1 acceleration you get due to the lowmem issue. And before you say 'you should use 64 bit on such machines'; I would love it if more people used 64 bit linux. Sadly the adoption rate of that is not very good still.... by far ;(" In another email, Arjan listed two advantages to 4K stacks, "1) less memory consumption in the lowmem zone (critical for enterprise use, also good for general performance), and 2) kernel stacks at 8K are one of the most prominent order-1 allocations in the kernel; again with big-memory systems the fragmentation of the lowmem zone is a problem (and the distros that ship 4K stacks went there because of customer complaints)".
"While this is probably one of the last days of the merge window, please still consider pulling the 'kgdb light' git tree," began Ingo Molnar, explaining:
"This is a slimmed-down and cleaned up version of KGDB that i've created out of the original patches that we submitted two weeks ago. I went over the kgdb patches with Thomas and we cut out everything that we did not like, and cleaned up the result. KGDB is still just as functional as it was before (i tested it on 32-bit and 64-bit x86) - and any desired extra capability or complexity should be added as a delta improvement, not in this initial merge."
Ingo noted that the previous merge request modified 41 files, while this new merge request modifies only 22 files. Among the changes, he highlighted, "removed _all_ critical path impact, even if KGDB is enabled and active; removed all the lowlevel serial drivers; added a redesigned and cleaned up version of the 'KGDB over polled consoles' approach; removed the longjump code; removed the module symbol hacks; removed the GTOD/clocksource hacks; removed the softlockup hacks; removed the toplevel Makefile changes; removed the might_sleep scheduler hack; and did lots of other cleanups and rewrites as well." Ingo summarized, "as a result, this kgdb series has _obviously_ zero impact on the kernel, because it just does not touch any dangerous codepath. From this point on KGDB can evolve in small, well-controlled baby steps, as all other kernel code as well. And the resulting kgdb is still very functional: it can still break into a kernel (via SysRq-G), can catch crashes, can single-step, etc. It's already a quite usable first step."
"We all wear the brown paper bag on occasion, and with the 'merge maelstrom' during each merge window, I'm quite frankly amazed at how _little_ stuff gets broken overall."
"I re-ran some statistics the other day on our kernel development rate, and changed my formula after Andrew accused me of severely undercounting the rate of change," noted Greg KH during a discussion about the stability of the Linux kernel while undergoing significant changes. He continued, "turns out that as of 2.6.24-rc8 for the 2.6.24 kernel release we did: lines added per day, 4945; lines removed per day, 2006; lines modified per day, 1702". Greg added:
"And note, that is real stuff, not renames or file moves at all, git handles not reporting that. That's for the 99 days that it took to do 2.6.24-rc8 (I need to re-run the scripts now that 2.6.24 is out.) It's fricken scarily amazing that things are still working at all... Just something to make you all sleep well at night :)"
Prefacing a series of 196 patches, Greg KH noted, "due to the low level nature of these patches, and because they touch so many different parts of the kernel, a number of the subsystem maintainers have asked me to get them in first to make merging other trees easier." Linus Torvalds agreed and quickly merged the patches into his tree. Greg summarized the many changes:
"They can be broken down into these major areas: Documentation updates (language translations and fixes, as well as kobject and kset documentation updates.); major kset/kobject/ktype rework and fixes; struct bus_type has been reworked to now handle the lifetime rules properly, as the kobject is properly dynamic; struct driver has also been reworked, and now the lifetime issues are resolved; the block subsystem has been converted to use struct device now, and not 'raw' kobjects; nozomi driver is added; lots of class_device conversions to use struct device instead."
Ingo Molnar posted a merge request for the latest git scheduler tree summarizing, "it contains various enhancements to the scheduler - find the full shortlog is below. 96 commits from 19 authors - scheduler developers have been busy again. :-/" He added, "the scheduling behavior of the kernel to normal users should not change over v2.6.24, but there are a good number of new features and enhancements under the hood." Ingo went on to list a number of these new features, including:
"Various instrumentation and debugging enhancements from Arjan van de Ven; Peter Zijlstra's RT time limit and RT throttling code for the RT scheduling class; Paul E. McKenney's preemptible RCU code; refcount based CPU-hotplug rework by Gautham R Shenoy; there's serious interest in running RT tasks on enterprise-class hardware, so Steven Rostedt and Gregory Haskins wrote a large number of enhancements to the RT scheduling class and load-balancer; Peter Zijlstra's high-resolution scheduler tick code; [...] and a good number of other, smaller enhancements."
"I think the SG stuff looks ok now, but I think we have a lot of 'fix up the rough edges' to go!" Linus Torvalds noted regarding some of the fallout from the recent merge of Jens Axboe's SG chaining patchset. During one of the many discussions, Jens explained:
"It's all about the end goal - having maintainable and resilient code. And I think the sg code will be better once we get past the next day or so, and it'll be more robust. That is what matters to me, not the simplicity of the patch itself."
Boaz Harrosh commented, "thanks Jens for doing all this, The performance gain is substantial and we will all enjoy it." Jens replied, "my pleasure, I just wish it could have been a little less painful. But in a day or two, it should all be behind us and we can move forward with making good use of it."
"This is a request to merge KGDB into the mainline kernel," Jason Wessel announced, posting a series of patches aiming toward that goal. He continued, "as of right now KGDB is comprised of 21 different patches adding in the core api and docs first and then working up to add drivers and arch specific support to KGDB. The patches were broken down into logical pieces for review and comments." He went on to explain:
"The intent of the KGDB patches is to unify the KGDB support across all the architectures that elect to implement the KGDB functionality by providing a common core and an arch specific stub. For quite some time there has been different features and uses of KGDB across the most popular architectures. Having a common core that takes care of protocol parsing and the typical use case of software breakpoints should eliminate the inconsistencies across the archs as well as making it easier to add KGDB support to a new arch."
Andrew Morton, who has been supportive of getting a kernel debugger into the mainline kernel, explained that it was too late in the 2.6.24 review cycle to merge KGDB, meaning it would have to wait for 2.6.25 at the earliest, "this won't work very well. There's a lot of review work to be done here, and a lot of it by busy architecture maintainers. Expecting people to do all this review and test work late in the merge window when they're all madly scrambling to get their bugs^Wpatches into mainline is not reasonable. This should all have started a month ago. So we're looking at a 2.6.25 merge for this work."
"It contains lots of scheduler updates from lots of people - hopefully the last big one for quite some time," began Ingo Molnar, describing his merge request for the linux-2.6-sched git tree. He continued, "most of the focus was on performance (both micro-performance and scalability/balancing), but there's the fair-scheduling feature now Kconfig selectable too. Find the shortlog below." Ingo noted, "code that is touched outside of the scheduler: the KVM bits were acked by Avi, the net/unix change is trivial and only affects sync wakeups, ditto the fs/pipe.c changes - but i can push those separately if it needs an ack from David first." He then concluded:
"Testing status: the changes are chronological and all the interactivity-impacting changes are near the head of the queue and most of them were done weeks ago, and were thus part of the CFS-v22 backport series - which was tested by many people. There are no known regressions at the moment. It's all fully bisectable."
Avi Kivitiy posted numerous KVM updates which Linus Torvalds merged into his mainline kernel source tree to be included in the upcoming 2.6.24 kernel. Avi summarized:
"Highlights include in-kernel pic/lapic/ioapic emulation, improved guest support, preemptibility, an improved x86 emulator, and a fair amount of cleanup.
"The changes outside drivers/kvm/ and include/linux/kvm*.h fix the CR8 mask definition (which is not otherwise used in the kernel) and expose some ioapic register definitions even if ioapic support is not compiled in. The diff is appended below."
Andrew Morton posted his first -mm patchset against the recently released 2.6.23 kernel, preparing for a big merge of patches bound for inclusion in the upcoming 2.6.24 kernel. He noted:
"I've been largely avoiding applying anything since rc8-mm2 in an attempt to stabilise things for the 2.6.23 merge.
"But that didn't stop all the subsystem maintainers from going nuts, with the usual accuracy. We're up to a 37MB diff now, but it seems to be working a bit better."
Casey Schaufler posted an updated Smack patchset based on feedback from the previous posting, "I have broken the Smack patch into the netlabel changes from Paul Moore (1/2) and the Smack LSM (2/2), at Paul's kind suggestion." He added:
"The smackfs symlinks have proven too contentious. I have removed the facility. Al and Alan are correct that the rich set of mount options currently available can handle any of the use cases I was looking at without excessive difficulty."
Smack is the Simplified Mandatory Access Control Kernel, utilizing the LSM framework to implement label-based mandatory access control and slated for inclusion in the upcoming 2.6.24 mainline kernel during the 2.6.24-rc1 merge window.
Noting the approaching 2.6.24 merge window which will follow the upcoming release of the 2.6.23 kernel, MultiMedia Card (MMC) subsystem maintainer Pierre Ossman described what he plans to push upstream, "this release will probably be one of the biggest ones for the MMC layer so far. The major pieces are SDIO and SPI support, but there are several small nuggets as well." Regarding the new Secure Digital Input Output (SDIO) stack he noted, "gone are the days of having to rely on proprietary stacks for SDIO support in Linux. So no more spotty support for hosts and possible GPL problems. SDIO will now be a standard feature of Linux." He also described three working drivers already ported to the new stack.
Pierre went on to discuss the Serial Peripheral Interface (SPI) stack, "the second large feature is the fact that you can now use your SPI controllers for MMC, SD and SDIO. Yes, even SDIO works nicely over SPI. This means that a lot more systems can get storage and expansion I/O at basically the cost of a connector." He added, "David Brownell is currently marked as providing 'odd fixes' for the mmc_spi driver, but we could really use a proper maintainer. So if you have sufficient experience with Linux' SPI interface and the time, please raise your hand."
Mathieu Desnoyers posted an updated version of his Linux Kernel Markers patchset explaining, "following Christoph Hellwig's suggestion, aiming at a Linux Kernel Markers inclusion for 2.6.24, I made a simplified version of the Linux Kernel Markers. There are no more dependencies on any other patchset." He continued, "the modification only involved turning the immediate values into static variables and adapting the documentation accordingly. It will have a little more data cache impact when disabled than the version based on the immediate values, but it is far less complex." The patch includes documentation which explains:
"A marker placed in code provides a hook to call a function (probe) that you can provide at runtime. A marker can be 'on' (a probe is connected to it) or 'off' (no probe is attached). When a marker is 'off' it has no effect, except for adding a tiny time penalty (checking a condition for a branch) and space penalty (adding a few bytes for the function call at the end of the instrumented function and adds a data structure in a separate section). When a marker is 'on', the function you provide is called each time the marker is executed, in the execution context of the caller. When the function provided ends its execution, it returns to the caller (continuing from the marker site)."