Jens Axboe detailed the changes in his linux-2.6-block.git tree that he plans to merge into the upcoming 2.6.24 kernel. Among the changes were the necessary updates to enable SG chaining which is used for large IO commands, "the goal of sg chaining is to allow support for very large sgtables, without requiring that they be allocated from one contigious piece of memory." Andrew Morton asked for more information, "presumably sg chaining means more overhead on the IO submission paths? If so, has this been quantified?"
Jens explained that there is no overhead for existing logic which doesn't use sg chaining, "just cleanups to drivers to use
for_each_sg() and so on." He continued:
"For actually using the sg chaining, there's some overhead of course. Say we support 256 entries without chaining, or 1mb with 4kb pages. A request with 1000 entried would require 4 trips to the allocator to allocate the chainable lists and 4 trips when freeing that list again. We don't loop the sg list on setup of freeing, just jump to the correct locations. So even for chaining, the cost isn't that big. It enables us to support much larger IO commands and potentially speed up some devices quite a lot, so CPU cost is less of a concern. And for small sglists, there isn't a noticable overhead."
Ying Huang posted an updated version of his kexec based hiberation patches. Pavel Machek, one of the uswsusp maintainers, responsed favorably, suggesting, "seems like good enough for -mm to me." He went on to note that he didn't see the kexec patches being a hibernation solution anytime soon, but that the functinality was useful for other purposes such as simply dumping memory and continuing. TuxOnIce maintainer Nigel Cunningham added, "Andrew, if I recall correctly, you said a while ago that you didn't want another hibernation implementation in the vanilla kernel. If you're going to consider merging this kexec code, will you also please consider merging TuxOnIce?"
Andrew Morton explained what made the kexec solution attractive, "the theory is that kexec-based hibernation will mainly use preexisting kexec code and will permit us to delete the existing hibernation implementation. That's different from replacing it." Rafael Wysocki disagreed, pointing out that there was still quite a bit of work that kexec would have to do which would require more code in the kernel. He also pointed to the complexity of dealing with ACPI systems. Ying replied, "Yes. ACPI is a biggest issue of kexec based hibernation now. I will try to work on that. At least I can prove whether kexec based hibernation is possible with ACPI."
"Recently, the CE Linux forum has been working to revive the Linux-tiny project," stated Tim Bird on the Linux Kernel mailing list, adding that Michael Opdenacker has been selected as the project's new primary maintainer. The project's website explains:
"The linux-tiny patchset is a series of patches against the 2.6 mainline Linux kernel to reduce its memory and disk footprint, as well as to add features to aid working on small systems. Target users are developers of embedded system and users of small or legacy machines such as 386s and handhelds."
Andrew Morton suggested that patches should be sent to him to be merged into his -mm tree, aiming for inclusion in the mainline kernel, "seriously, putting this stuff into some private patch collection should be a complete last resort - you should only do this with patches which you (and the rest of us) agree have no hope of ever getting into mainline." Michael, the project's new maintainer, agreed, "you're completely right... The patches should all aim at being included into mainline or die." Tim added, "the patchkit gives a place for things to live while they are out of mainline, and still have multiple people use and work on them. Optimally the duration of being out-of-mainline would be short, but my experience is that sometimes what an embedded developer considers reasonable to hack off the kernel is not considered so reasonable by other developers (even with config options)."
Following Andrew Morton's recent comment, "this just isn't working any more," Miles Lane asked, "what can be done to reduce the huge number of build fixes required to release an MM tree?" Andrew jokingly replied, "my mind turns to cattle prods." Regarding the suggestion that he could publicly list the offenders he quipped, "I could name names, but it would look like '
grep @ MAINTAINERS' ;))" He continued to say, "I don't think much can be done about it, really," going on to explain:
"See, what I do is to merge probably hundreds of patches into the -mm-only part of the tree and then, after a few days, get down and compile-test it all, then fix it, then runtime test it all, then fix that. Because it is vastly more efficient to do all this work against hundreds of patches than it is to do it against one patch at a time, no?
"And guess what? All the other maintainers do the same thing: someone sends you a patch, it looks good, so you commit it. After you've committed a decent batch of patches, get in there and test it all. Problem is, I often will get in there and do all that testing before the subsystem-tree owner has done his testing."
A frustrated sounding Andrew Morton released the 2.6.23-rc6-mm1 kernel as "a 29MB diff against 2.6.23-rc6." Many patches are merged first into Andrew's -mm tree for testing before being pushed to Linus' mainline tree during the merge window. Andrew suggested that the -mm process wasn't working as well as it could:
"It took me over two solid days to get this lot compiling and booting on a few boxes. This required around ninety fixup patches and patch droppings. There are several bugs in here which I know of (details below) and presumably many more which I don't know of. I have to say that this just isn't working any more."
Randy Dunlap sent a patch to the Linux kernel mailing list described as adding "info about various email clients and their applicability in being used to send Linux kernel patches." The first revision generated quite a bit of discussion, quickly resulting in a second version, and eventually a third version that Andrew Morton referred to as "soon to be merged". In addition to some general suggestions about emailing patches, it also offers some specific configuration suggestions for a number of popular email clients. It begins:
"Patches for the Linux kernel are submitted via email, preferably as inline text in the body of the email. Some maintainers accept attachments, but then the attachments should have content-type 'text/plain'. However, attachments are generally frowned upon because it makes quoting portions of the patch more difficult in the patch review process.
"Email clients that are used for Linux kernel patches should send the patch text untouched. For example, they should not modify or delete tabs or spaces, even at the beginning or end of lines."
"The cfs core has been enhanced since quite sometime now to understand task-groups and [to] provide fairness to such task-groups," began Srivatsa Vaddagiri, "what was needed was an interface for the administrator to define task-groups and specify group 'importance' in terms of its cpu share. The patch below adds such an interface."
Srivatsa requested that his patch be merged into Andrew Morton's -mm tree to receive more testing, "note that the load balancer needs more work, esp to deal with cases like 2-groups on 4-cpus, one group has 3 tasks and other having 4 tasks. We are working on some ideas, but nothing to share in the form of a patch yet. I felt sending this patch out would help folks start testing the feature and also improve it."
"Some people seem to be using 'Acked-by' to mean, 'seems good to me', without necessarily doing a full review of the patch, and instead of trying to change the meaning of 'Acked-by', [the plan is] to have a new sign off which is a bit more explicitly about what it means," Theodore Tso explained in a recent thread on the Linux Kernel mailing list. He continued:
"This was proposed by Andrew and discussed at the Kernel Summit; the basic idea is that it is a formal indication that the person has done a *full* review of the patch (a few random comments from the local whitespace police don't count), and is willing to vouch that the patch is correct, safe, extremely unlikely to cause regressions, etc. If the patch does need to be reverted or fixed because it was buggy, then both the original submitter and the reviewer would bear responsibility and subsystem maintainers might take that into account when assessing the reputations of the submitter and reviewer in the future when deciding whether or not to accept a patch."
Andrew Morton noted that the idea isn't fully fleshed out yet, "we will start introducing Reviewed-by: (I haven't yet quite worked out how yet) but it will be a quite formal thing and it would be something which the reviewer explicitly provided. For now, let's please stick with acked-by". Theodore added, "there was also some discussion about whether or not patches would not be accepted at all without a Reviewed-by, but that probably won't happen initially. The general consensus was to gently ease into it and see how well it works first."
A recent patch posted to the lkml aimed to make it possible to use both kdb and kdump at the same time, and instead led to an interesting discussion about RAS (Reliability, Availability, and Serviceability) tools. Vivek Goyal compared the two main philosophies, "so basically there are two kind of users. One who believes that despite the kernel [having] crashed something meaningful can be done," versus, "exec on panic, which thinks that once [the] kernel is crashed nothing meaningful can be done". When the discussion focused on kdb, Keith Owens noted:
"The problem above applies to all the RAS tools, not just kdb. My stance is that _all_ the RAS tools (kdb, kgdb, nlkd, netdump, lkcd, crash, kdump etc.) should be using a common interface that safely puts the entire system in a stopped state and saves the state of each cpu. Then each tool can do what it likes, instead of every RAS tool doing its own thing and they all conflict with each other, which is why this thread started."
Andrew Morton summarized the current state of affairs, "lots of different groups, little commonality in their desired funtionality, little interest in sharing infrastructure or concepts." In response to an earlier patch Keith posted to a lesser-trafficked mailing list, Andrew suggested it be resubmitted in a working form for a full review, "much of the onus is upon the various RAS tool developers to demonstrate why it is unsuitable for their use and, hopefully, to explain how it can be fixed for them."
"Is anyone testing the kgdb code in here?" Andrew Morton asked in his release announcement for the 2.6.23-rc1-mm2 patchset. Mike Frysinger asked, "does kgdb actually have a chance to get merged? With the history of it, i just assumed it was never going in". In the past, Linus Torvalds has resisted merging kernel debuggers and famously said, "I don't like debuggers. Never have, probably never will," going on to explain why he didn't want it to be too easy to hack the Linux kernel. An earlier push to get kgdb merged in 2004 didn't succeed, though some architectures already have versions of the debugger. The current kgdb patchset in Andrew's tree includes code for the i386, x86_64, ppc, mips, sh and arm architectures.
Andrew replied to Mike's question, "I was hoping for a 2.6.24 merge. But I haven't actually looked at it yet. Hopefully Jason is planning to get it all out for review soonish." He went on to add, "runtime testing isn't actually the most important thing at this time - if is doesn't work, well hey, we fix it, easy - we always have bugs. The main emphasis right now should be on higher-level design/review/integration stuff." Jason Wessel noted, "the KGDB tree is broken up into incremental units each layer adding more functionality and or arch specific pieces."
Neil Horman posted an enhancement to a /proc/sys/kernel interface for redirecting core dumps, "allowing the core_pattern to contain arguments to be passed as an argv array to the userspace helper application. It also adds a format specifier, %c, which allows the RLIM_CORE value of the crashing application to be passed on the command line, since RLIMIT_CORE is reduced to zero when execing the userspace helper". Andrew Morton was skeptical at first, "this all seems to be getting a bit nutty. Who needs this feature and what will they do with it, etc?"
Neil pointed to Ubuntu's Apport, "Ubuntu has implemented lots of their functionality with some patches that they never pushed upstream (and IMHO, have some security issues). This is my attempt to do what their doing sanely, so the other distro's (primarily fedora) can take advantage of this technology." Will Woods reiterated, "we're using it for doing a system-wide crash dump handler. Currently Ubuntu's using it with their Apport tool for this purpose; I'm adapting that for Fedora." He went on to explain, "the Ubuntu approach requires a kernel patch that adds a bunch of process information (process pid, RLIMIT_CORE, etc) to the environment of the crash handler. Most of that information can instead be parsed out of the ELF headers - which is what I wrote code to do. The problem that remains is determining the value of RLIMIT_CORE. (This is used to determine whether the user wants a normal corefile, thus retaining normal core dump behavior)."
"Lguest is an adventure, with you, the reader, as Hero," began some documentation for lguest recently submitted by Rusty Russell. The documentation continued, "but be warned; this is an arduous journey of several hours or more! And as we know, all true Heroes are driven by a Noble Goal. Thus I offer a Beer (or equivalent) to anyone I meet who has completed this documentation. So get comfortable and keep your wits about you (both quick and humorous). Along your way to the Noble Goal, you will also gain masterly insight into lguest, and hypervisors and x86 virtualization in general."
Andrew Morton noted that he would consider the documentation patches for inclusion in the 2.6.23 kernel, to which Rusty replied, "indeed, no code changes, and I feel strongly that it should go into 2.6.23 because it's *fun*. And (as often complained) there's not enough poetry in the kernel." Linus Torvalds quipped, "there's a reason for that," going on to rhyme, "there once was a lad from Braidwood, with a wife and a hatred for FUD, he hacked kernels for fun, couldn't get them to run, but he always felt that he should." He added, "so when you say 'there's not enough poetry', next time you'll know why. You *really* don't want want poetry." This led to numerous additional poetic submissions about which Rusty noted, "there was a poetic infection, which distorted the kernel's direction, the code got no time, as they all tried to rhyme, and it shipped needing lots of correction."
Ying Huang posted a new version of his hibernation patches that utilize kexec noting two changes, "1) the kexec jump implementation is put into the kexec/kdump framework instead of software suspend framework. The device and CPU state save/restore code of software suspend is called when needed; and 2) the same code path is used for both kexec a new kernel and jump back to original kernel." Andrew Morton noted that he was still interested however didn't intend to merge the patches right away, "I like the idea but I think I'll let people chat about it a bit more before looking at merging the patches, OK?" TuxOnIce maintainer Nigel Cunningham expressed some strong reservations:
"Please wait until you see a complete implementation that actually works. I'm sitting here quietly, following (and now breaking) the 'If you can't say anything positive, don't say anything at all' line because I think that the more into the implementation details people get, the uglier this is going to show itself to be. I'm perfectly willing to be proven wrong, but haven't seen anything so far that's even begun to convince me otherwise."
In response to another merge request, Andrew Morton retorted, "argh. I have a backlog of maybe 300 patches here which I am cheerfully ignoring while concentrating on preventing 2.6.23 from being less of a disaster than it has already been." He noted that he was not planning to merge any new code into his -mm tree for 2.6.23 inclusion, "the door for new 2.6.23 material shut two weeks ago. Here, at least." He went on to note:
"Please, stop writing patches. Maybe do something to help get 2.6.23 off its back. Like, go review some of the code which people are cheerfully merging five minutes after having written it."
Recent merges into the upcoming 2.6.23 kernel can be found by browsing the gitweb interface to Linus' 2.6 kernel tree. The 2.6.23-rc1 kernel should be released on or shortly after Sunday the 22nd, two weeks after 2.6.22 was released, and at which time the merge window is closed.
H. Peter Anvin submitted a series of patches rewriting the x86 setup code, "this patch set replaces the x86 setup code, which is currently all in assembly, with a version written in C, using the '.code16gcc' feature of binutils (which has been present since at least 2001.)" He went on to explain why he did this, "the new code is vastly easier to read, and, I hope, debug. It should be noted that I found a fair number of minor bugs while going through this code, and have attempted to correct them."
Linus Torvalds reacted favorably, "I can't really argue against this on any sane grounds - not only is it removing more lines than it adds, but moving from mostly unreadable assembly to C seems a good idea." He went on to note, "so let's just get this merged. But the question is, do we put it in 2.6.23-rc1, or do we put it in -mm for a few weeks, which would imply waiting for the next merge window? Andrew?" Andrew Morton pointed out that the patches have been in -mm already for a couple of months, "this code has been in -mm since 11 May, as git-newsetup.patch. It has caused (for what it is) astonishingly few problems. Maybe a couple of build glitches and one runtime failure, all quickly fixed. I'd say it's ready." Linus agreed, "Ok. That makes it easy. I'll just merge it."