login
Header Space

 
 

Linux: Reviewing the Tickless Kernel for x86-64

July 12, 2007 - 1:47pm
Submitted by Jeremy on July 12, 2007 - 1:47pm.
Linux news

Included in Andrew Morton's potential 2.6.23 merge list [story] were a series of patches to make the x86-64 architecture tickless. Andi Kleen, the x86-64 maintainer replied, "I'm sceptical about the dynticks code. It just rips out the x86-64 timing code completely, which needs a lot more review and testing. Probably not .23." Linus Torvalds agreed, "we are *not* going to do another 'rip everything out, and replace it with new code' again. Over my dead body. We're going to do this thing gradually, or not at all." He went on to explain "the patch-set itself actually looks fine, as far as I'm concerned. But it does seem to have that 'enable everything in one go' problem. I'd much rather see one time source at a time being converted, and enabled then and there, so that when people report problems and do a bisection, if it was HPET that broke, you get the commit that changed HPET."

In response to the pains caused by the original dyntick merge in 2.6.21, Ingo Molnar acknowledged, "we had 12 -hrt/dynticks merge related regressions between 2.6.21-rc1 and -final, and 4 after final." He went on to point out, "it's all pretty quiet today on the dynticks regressions front. (there are no open regressions in either the upstream i386 code or in the devel patches we are aware of)." As to the source of the bugs, he explained, "the majority of the above bugs were in the infrastructure code. (the worst was the generic resume/suspend one fixed in 2.6.21.2) And sadly, a fair number of the infrastructure bugs we introduced during the frentic clockevents/dynticks rewrites/redesigns we did between .20 and .21. That was a royally stupid mistake for us to do - instead of patiently waiting for the bugs to be shaken out we destabilized the infrastructure. (it was a 'lets make this thing so nice that it's impossible to reject' instintic gut reaction.)" Linus replied, "one thing I'll happily talk about is that while 2.6.21 was painful, you and Thomas in particular were both very responsible about the thing, so no, I'm not at all complaining or worried about it in that sense! I just really _really_ wish we could have two fairly stable releases in a row. I think 2.6.22 has the potential to be a pretty good setup, and I'd really like to avoid having another 2.6.21 immediately afterwards."


From:	Andi Kleen [email blocked]
To:	Andrew Morton [email blocked]
Subject: x86 status was Re: -mm merge plans for 2.6.23
Date:	11 Jul 2007 14:43:15 +0200

Andrew Morton [email blocked] writes:

> revert-x86_64-mm-verify-cpu-rename.patch
> add-kstrndup-fix.patch
> xen-build-fix.patch
> fix-x86_64-numa-fake-apicid_to_node-mapping-for-fake-numa-2.patch
> fix-x86_64-mm-xen-xen-smp-guest-support.patch
> more-fix-x86_64-mm-xen-xen-smp-guest-support.patch

> fix-x86_64-mm-sched-clock-share.patch
> fix-x86_64-mm-xen-add-xen-virtual-block-device-driver.patch
> fix-x86_64-mm-add-common-orderly_poweroff.patch
> fix-x86_64-mm-xen-xen-event-channels.patch
> arch-i386-xen-mmuc-must-include-linux-schedh.patch
> tidy-up-usermode-helper-waiting-a-bit-fix.patch
> update-x86_64-mm-xen-use-iret-directly-where-possible.patch


Xen is probably going to be merged. I'm still not fully happy
about the review status of the drivers and xenbus, but there doesn't seem
to be much value in delaying it further.

I'll consolidate the fixes and fixes-to-fixes.


These all need re-review:

> i386-add-support-for-picopower-irq-router.patch
> make-arch-i386-kernel-setupcremapped_pgdat_init-static.patch
> arch-i386-kernel-i8253c-should-include-asm-timerh.patch
> make-arch-i386-kernel-io_apicctimer_irq_works-static-again.patch
> quicklist-support-for-x86_64.patch
> x86_64-extract-helper-function-from-e820_register_active_regions.patch
> x86_64-fix-e820_hole_size-based-on-address-ranges.patch
> x86_64-acpi-disable-srat-when-numa-emulation-succeeds.patch
> x86_64-slit-fake-pxm-to-node-mapping-for-fake-numa-2.patch
> x86_64-numa-fake-apicid_to_node-mapping-for-fake-numa-2.patch
> x86-use-elfnoteh-to-generate-vsyscall-notes-fix.patch
> mmconfig-x86_64-i386-insert-unclaimed-mmconfig-resources.patch
> x86_64-fix-smp_call_function_single-return-value.patch
> x86_64-o_excl-on-dev-mcelog.patch
> x86_64-support-poll-on-dev-mcelog.patch

It's still not clear to me this is any useful. The current code
can run a program on MCE which should be really fast enough
for machine check handling.

> i386-fix-machine-rebooting.patch
> x86-fix-section-mismatch-warnings-in-mtrr.patch
> x86_64-ratelimit-segfault-reporting-rate.patch

I think that one was bogus.

> x86_64-pm_trace-support.patch
> make-alt-sysrq-p-display-the-debug-register-contents.patch
> i386-flush_tlb_kernel_range-add-reference-to-the-arguments.patch
> round_jiffies-for-i386-and-x86-64-non-critical-corrected-mce-polling.patch
> pci-disable-decode-of-io-memory-during-bar-sizing.patch
> mmconfig-validate-against-acpi-motherboard-resources.patch
> x86_64-irq-check-remote-irr-bit-before-migrating-level-triggered-irq-v3.patch
> i386-remove-support-for-the-rise-cpu.patch

> i386-make-arch-i386-mm-pgtablecpgd_cdtor-static.patch
> i386-fix-section-mismatch-warning-in-intel_cacheinfo.patch
> i386-do-not-restore-reserved-memory-after-hibernation.patch
> paravirt-helper-to-disable-all-io-space-fix.patch
> dmi_match-patch-in-rebootc-for-sff-dell-optiplex-745-fixes-hang.patch
> i386-hpet-check-if-the-counter-works.patch
> i386-trim-memory-not-covered-by-wb-mtrrs.patch

Might need more testing? 

More review:

> kprobes-x86_64-fix-for-mark-ro-data.patch
> kprobes-i386-fix-for-mark-ro-data.patch
> divorce-config_x86_pae-from-config_highmem64g.patch
> remove-unneeded-test-of-task-in-dump_trace.patch
> i386-move-the-kernel-to-16mb-for-numa-q.patch
> i386-show-unhandled-signals.patch
> i386-minor-nx-handling-adjustment.patch
> x86-smp-alt-once-option-is-only-useful-with-hotplug_cpu.patch
> x86-64-remove-unused-variable-maxcpus.patch
> move-functions-declarations-to-header-file.patch
> x86_64-during-vm-oom-condition.patch
> i386-during-vm-oom-condition.patch
> x86-64-disable-the-gart-in-shutdown.patch
> x86_84-move-iommu-declaration-from-proto-to-iommuh.patch
> i386-uaccessh-replace-hard-coded-constant-with-appropriate-macro-from-kernelh.patch
> i386-add-cpu_relax-to-cmos_lock.patch
> x86_64-flush_tlb_kernel_range-warning-fix.patch
> x86_64-add-ioapic-nmi-support.patch
> x86_64-change-_map_single-to-static-in-pci_gartc-etc.patch
> x86_64-geode-hw-random-number-generator-depend-on-x86_3.patch
> x86_64-fix-wrong-comment-regarding-set_fixmap.patch
> arch-x86_64-kernel-processc-lower-printk-severity.patch
> nohz-fix-nohz-x86-dyntick-idle-handling.patch
> acpi-move-timer-broadcast-and-pmtimer-access-before-c3-arbiter-shutdown.patch
> clockevents-fix-typo-in-acpi_pmc.patch
> timekeeping-fixup-shadow-variable-argument.patch
> timerc-cleanup-recently-introduced-whitespace-damage.patch
> clockevents-remove-prototypes-of-removed-functions.patch
> clockevents-fix-resume-logic.patch
> clockevents-fix-device-replacement.patch
> tick-management-spread-timer-interrupt.patch
> highres-improve-debug-output.patch
> hrtimer-speedup-hrtimer_enqueue.patch
> pcspkr-use-the-global-pit-lock.patch
> ntp-move-the-cmos-update-code-into-ntpc.patch
> i386-pit-stop-only-when-in-periodic-or-oneshot-mode.patch
> i386-remove-volatile-in-apicc.patch
> i386-hpet-assumes-boot-cpu-is-0.patch
> i386-move-pit-function-declarations-and-constants-to-correct-header-file.patch
> x86_64-untangle-asm-hpeth-from-asm-timexh.patch
> x86_64-use-generic-cmos-update.patch
> x86_64-remove-dead-code-and-other-janitor-work-in-tscc.patch
> x86_64-fix-apic-typo.patch
> x86_64-convert-to-cleckevents.patch
> acpi-remove-the-useless-ifdef-code.patch
> x86_64-hpet-restore-vread.patch
> x86_64-restore-restore-nohpet-cmdline.patch
> x86_64-block-irq-balancing-for-timer.patch
> x86_64-prep-idle-loop-for-dynticks.patch
> x86_64-enable-high-resolution-timers-and-dynticks.patch
> x86_64-dynticks-disable-hpet_id_legsup-hpets.patch


I'm sceptical about the dynticks code. It just rips out the
x86-64 timing code completely, which needs a lot more review and testing.
Probably not .23

More review: 

> xen-fix-x86-config-dependencies.patch
> x86_64-get-mp_bus_to_node-as-early.patch
> xen-suppress-abs-symbol-warnings-for-unused-reloc-pointers.patch
> xen-cant-support-numa-yet.patch
> x86-fix-iounmaps-use-of-vm_structs-size-field.patch
> arch-x86_64-kernel-aperturec-lower-printk-severity.patch
> arch-x86_64-kernel-e820c-lower-printk-severity.patch
> ich-force-hpet-make-generic-time-capable-of-switching-broadcast-timer.patch
> ich-force-hpet-restructure-hpet-generic-clock-code.patch
> ich-force-hpet-ich7-or-later-quirk-to-force-detect-enable.patch
> ich-force-hpet-late-initialization-of-hpet-after-quirk.patch
> ich-force-hpet-ich5-quirk-to-force-detect-enable.patch
> ich-force-hpet-ich5-fix-a-bug-with-suspend-resume.patch
> ich-force-hpet-add-ich7_0-pciid-to-quirk-list.patch
> geode-basic-infrastructure-support-for-amd-geode-class.patch
> geode-mfgpt-support-for-geode-class-machines.patch
> geode-mfgpt-clock-event-device-support.patch
> i386-x86_64-insert-hpet-firmware-resource-after-pci-enumeration-has-completed.patch
> i386-ioapic-remove-old-irq-balancing-debug-cruft.patch
> i386-deactivate-the-test-for-the-dead-config_debug_page_type.patch

-Andi


From: Ingo Molnar [email blocked] To: Andi Kleen [email blocked] Subject: Re: x86 status was Re: -mm merge plans for 2.6.23 Date: Wed, 11 Jul 2007 19:42:52 +0200 * Andi Kleen [email blocked] wrote: > > clockevents-fix-typo-in-acpi_pmc.patch > > timekeeping-fixup-shadow-variable-argument.patch > > timerc-cleanup-recently-introduced-whitespace-damage.patch > > clockevents-remove-prototypes-of-removed-functions.patch > > clockevents-fix-resume-logic.patch > > clockevents-fix-device-replacement.patch > > tick-management-spread-timer-interrupt.patch > > highres-improve-debug-output.patch > > hrtimer-speedup-hrtimer_enqueue.patch > > pcspkr-use-the-global-pit-lock.patch > > ntp-move-the-cmos-update-code-into-ntpc.patch > > i386-pit-stop-only-when-in-periodic-or-oneshot-mode.patch > > i386-remove-volatile-in-apicc.patch > > i386-hpet-assumes-boot-cpu-is-0.patch > > i386-move-pit-function-declarations-and-constants-to-correct-header-file.patch > > x86_64-untangle-asm-hpeth-from-asm-timexh.patch > > x86_64-use-generic-cmos-update.patch > > x86_64-remove-dead-code-and-other-janitor-work-in-tscc.patch > > x86_64-fix-apic-typo.patch > > x86_64-convert-to-cleckevents.patch > > acpi-remove-the-useless-ifdef-code.patch > > x86_64-hpet-restore-vread.patch > > x86_64-restore-restore-nohpet-cmdline.patch > > x86_64-block-irq-balancing-for-timer.patch > > x86_64-prep-idle-loop-for-dynticks.patch > > x86_64-enable-high-resolution-timers-and-dynticks.patch > > x86_64-dynticks-disable-hpet_id_legsup-hpets.patch > > I'm sceptical about the dynticks code. It just rips out the x86-64 > timing code completely, which needs a lot more review and testing. > Probably not .23 What you just did here is a slap in the face to a lot of contributors who worked hard on this code :( Let me tell you about the history of this project first. Arjan wrote the first version of it a year ago, and it was added to -rt and tested there by many people and went through many iterations and fixes. Chris Wright then created a x86_64 clockevents cleanup and dynticks enabling patchset from it this spring and sent it to lkml three and a half months ago, on March 31: http://lwn.net/Articles/229094/ Thomas, the high resolution timers and clockevents maintainer, immediately picked up Chris' splitup/splitout/cleanup work and fixed and extended it, and sent a first cut to lkml on May 6th: http://lwn.net/Articles/233226/ Thomas then sent an updated version of the x86_64 clockevents cleanup and dynticks code to lkml (on June 10th), for a second round of review: http://lwn.net/Articles/237687/ As Thomas stated it in his submission: " The patch set has been tested in the -hrt and -rt trees for quite a while and the initial problems have been sorted out. Thanks to the folks from the PowerTop project for testing and feedback. " Then on June 16th Thomas sent the third series: http://lwn.net/Articles/238834/ (which too was in -rt and was tested there on numerous machines. It was also added to -mm.) Then on June 23rd Thomas sent the fourth series of the x86_64 clockevents and dynticks code: http://lwn.net/Articles/239620/ We finally have someone (Thomas) with core kernel clue who actually _cares_ about the x86 time code and does not see it as an ugly chore, one who collects the right patches and maintains the -hrt tree and co-maintains the -rt tree and interacts with other contributors. What he did was _hard_ to do but we are making really good progress: http://lkml.org/lkml/2007/7/5/242 " All in all, personally I'm very happy to see Linux making such a huge step forward with tickless and can't wait for this step to be available in all distros and for all architectures... " Yes, touching the time code is a pain because both the hardware and the kernel has skeletons hidden in the closet, but we are mapping them one by one, and we already fixed several kernel skeletons in the process. The code is in -mm and there is no open regression related to this queue of patches at the moment. But what is curiously absent from all this positive and dynamic activity around these patches on lkml? There is not a single email from Andi Kleen, the official maintainer of the x86_64 tree directly reacting to this submission in any way, shape or form. Not a single email from you thanking Arjan, Chris and Thomas for this amount of cleanup to the architecture you are maintaining: 31 files changed, 698 insertions(+), 1367 deletions(-) Not a single email from you reviewing the patchset in any meaningful way. Not a single email from you to indicate that you even did so much as boot into this patchset. What contribution do we have from you instead? A week before the .23 merge window is closed, in the very last possible moment, we finally get your first reaction to this patchset, albeit in the form of three terse sentences: > I'm sceptical about the dynticks code. It just rips out the x86-64 > timing code completely, which needs a lot more review and testing. > Probably not .23 In the past 3+ months there was not a single email from you indicating that you are "doubtful" about this submission, and that you think that it needs "lot more review and testing". You dont offer any alternative, you dont offer any feedback, no review, no testing, no support, just a simple rejection on lkml that prevents this project from going upstream. Yes, maintainers have veto power and often have to make hard decisions, but, and let me stress this properly: Only if they actually act as honest maintainers! Altogether 197 emails on lkml discussed these patches, and you were Cc:-ed to every one of them. Over a dozen kernel developers reviewed it or reacted to the patchset in one way or another. And your only reaction to this is silence and a rejection claiming that it needs "lot more review"? I'm utterly speechless. Ingo
From: Linus Torvalds [email blocked] To: Ingo Molnar [email blocked] Subject: Re: x86 status was Re: -mm merge plans for 2.6.23 Date: Wed, 11 Jul 2007 14:42:53 -0700 (PDT) On Wed, 11 Jul 2007, Ingo Molnar wrote: > > What you just did here is a slap in the face to a lot of contributors > who worked hard on this code :( Ingo, I'm sorry to say so, but your answer just convinced me that you're wrong, and we MUST NOT take that code. That was *exactly* the same thing you talked about when I refused to take the original timer changes into 2.6.20. You were talking about how lots of people had worked really hard, and how it was really tested. And it damn well was NOT really tested, and 2.6.21 ended up being a horribly painful experience (one of the more painful kernel releases in recent times), and we ended up havign to fix a *lot* of stuff. And you admitted you were wrong at the time. Now you do the *exact* same thing. Here's a big clue: it doesn't matter one _whit_ how much face-slapping you get, or how much effort some programmers have put into the code. It's untested. And no, we are *not* going to do another "rip everything out, and replace it with new code" again. Over my dead body. We're going to do this thing gradually, or not at all. And if somebody feels slighted by the face-slap, and thinks he has already done enough, and isn't interested in doing it gradually, then good riddance. The "not at all" seems like a good idea, and maybe we can re-visit this in a year or two. I'm not going to have another 2.6.21 on my hands. Linus
From: Ingo Molnar [email blocked] To: Linus Torvalds [email blocked] Subject: Re: x86 status was Re: -mm merge plans for 2.6.23 Date: Thu, 12 Jul 2007 01:19:12 +0200 * Linus Torvalds [email blocked] wrote: > That was *exactly* the same thing you talked about when I refused to > take the original timer changes into 2.6.20. You were talking about > how lots of people had worked really hard, and how it was really > tested. yes - i was (way too!) upset about it, and your reasoning for the rejection was hard (on us) but fair: you wanted a quiet 2.6.20, and you felt fundamentally uneasy about the patches. > And it damn well was NOT really tested, and 2.6.21 ended up being a > horribly painful experience (one of the more painful kernel releases > in recent times), and we ended up havign to fix a *lot* of stuff. yes. We had 12 -hrt/dynticks merge related regressions between 2.6.21-rc1 and -final, and 4 after final. Here's a quick post-mortem: 12 fixes after -rc1: [PATCH] i386: Fix bogus return value in hpet_next_event() [PATCH] clockevents: remove bad designed sysfs support for now [PATCH] clocksource: Fix thinko in watchdog selection [PATCH] dynticks: fix hrtimer rounding error in next_timer_interrupt [PATCH] i386: add command line option "local_apic_timer_c2_ok" [PATCH] i386: disable local apic timer via command line or dmi quirk [PATCH] i386: clockevents fix breakage on Geode/Cyrix PIT [PATCH] i386: trust the PM-Timer calibration of the local APIC timer [PATCH] clockevents: Fix suspend/resume to disk hangs [PATCH] highres: do not run the TIMER_SOFTIRQ after switching to highres mode [PATCH] hrtimer: prevent overrun DoS in hrtimer_forward() [PATCH] Save/restore periodic tick information over suspend/resume implementations 4 fixes after -final: 2.6.21.1: - 2.6.21.2: [PATCH] clocksource: fix resume logic 2.6.21.3: - 2.6.21.4: - 2.6.21.5: [PATCH] NOHZ: Rate limit the local softirq pending warning output [PATCH] Ignore bogus ACPI info for offline CPUs [PATCH] i386: HPET, check if the counter works 2.6.21.6: - it's all pretty quiet today on the dynticks regressions front. (there are no open regressions in either the upstream i386 code or in the devel patches we are aware of. Forced-HPET in -mm, which is not part of this queue in question [but which is done for dynticks], has one open regression.) The majority of the above bugs were in the infrastructure code. (the worst was the generic resume/suspend one fixed in 2.6.21.2) And sadly, a fair number of the infrastructure bugs we introduced during the frentic clockevents/dynticks rewrites/redesigns we did between .20 and .21. That was a royally stupid mistake for us to do - instead of patiently waiting for the bugs to be shaken out we destabilized the infrastructure. (it was a "lets make this thing so nice that it's impossible to reject" instintic gut reaction.) In the 'weird arch bugs' category, out of the 6 i386 breakages listed above, 'i386 legacy systems' was/is by far the worst offender: 4-5 were on such old (not 64-bit-capable) systems. (this is not really a surprise) While x86_64 certainly has weird crap hardware too, it probably is an order of magnitude fewer than i386 - just due to the sheer volume, time and diversity difference. (On the other hand if there's crap then it will be debugged/tested slower than on 32-bit, which offsets that advantage.) The most prominent bugs were the ones that were in the infrastructure - they affected many machines. (But i'd expect the infrastructure to be pretty robust by now.) The x86_64 hrt/dynticks code makes the x86_64 PIT driver (and hpet too) shared between the two architectures - which is perhaps another difference to the original i386 clockevents merge. We also integrated _all_ feedback we got, and we had the capacity and capability to fix whatever other feedback comes back - it just never came ... until today. But i fully agree with you that the cleanups should be done separately - it's just so hard to actually hack on the old hpet code (and to understand it to begin with) without first cleaning it up a bit so that it does not cause permanent brain damage ;) Ingo
From: Linus Torvalds [email blocked] To: Ingo Molnar [email blocked] Subject: Re: x86 status was Re: -mm merge plans for 2.6.23 Date: Wed, 11 Jul 2007 16:45:11 -0700 (PDT) On Thu, 12 Jul 2007, Ingo Molnar wrote: > > We also integrated _all_ feedback we got, and we had the capacity and > capability to fix whatever other feedback comes back - it just never > came ... until today. One thing I'll happily talk about is that while 2.6.21 was painful, you and Thomas in particular were both very responsible about the thing, so no, I'm not at all complaining or worried about it in that sense! I just really _really_ wish we could have two fairly stable releases in a row. I think 2.6.22 has the potential to be a pretty good setup, and I'd really like to avoid having another 2.6.21 immediately afterwards. So I'm not worried about integration and getting fixes when things break per se, but I *am* worried that this is an area where we've traditionally had lots of unexpected problems. And hey, maybe this time there will be none. I just still smart from the last time, so I'd prefer it to go more smoothly this time around. Linus
From: Thomas Gleixner [email blocked] To: Linus Torvalds [email blocked] Subject: Re: x86 status was Re: -mm merge plans for 2.6.23 Date: Thu, 12 Jul 2007 00:04:48 +0200 Linus, On Wed, 2007-07-11 at 14:42 -0700, Linus Torvalds wrote: > Here's a big clue: it doesn't matter one _whit_ how much face-slapping you > get, or how much effort some programmers have put into the code. It's > untested. And no, we are *not* going to do another "rip everything out, > and replace it with new code" again. > > Over my dead body. > > We're going to do this thing gradually, or not at all. Can you please shed some light on me, how exactly you switch an architecture gradually to clock events. You simply can not convert PIT today and the HPET next week followed by the local APIC in three month. At least not to my knowledge. > And if somebody feels slighted by the face-slap, and thinks he has already > done enough, and isn't interested in doing it gradually, then good > riddance. The "not at all" seems like a good idea, and maybe we can > re-visit this in a year or two. I have no problem to brew this for some more time. I got not repulsed by the 2.6.20 decision, but I have no clue how to communicate with a black hole. tglx
From: Linus Torvalds [email blocked] To: Thomas Gleixner [email blocked] Subject: Re: x86 status was Re: -mm merge plans for 2.6.23 Date: Wed, 11 Jul 2007 15:20:31 -0700 (PDT) On Thu, 12 Jul 2007, Thomas Gleixner wrote: > > Can you please shed some light on me, how exactly you switch an > architecture gradually to clock events. For example, we can make sure that the code in question that actually touches the hardware stays exactly the same, and then just move the interfaces around - and basically guarantee that _zero_ hardware-specific issues pop up when you switch over, for example. That way there is a gradual change-over. The other approach (which would be nice _too_) is to actually try to convert one clock source at a time. Why is that not an option? Linus
From: Thomas Gleixner [email blocked] To: Linus Torvalds [email blocked] Subject: Re: x86 status was Re: -mm merge plans for 2.6.23 Date: Thu, 12 Jul 2007 00:50:19 +0200 Linus, On Wed, 2007-07-11 at 15:20 -0700, Linus Torvalds wrote: > For example, we can make sure that the code in question that actually > touches the hardware stays exactly the same, and then just move the > interfaces around - and basically guarantee that _zero_ hardware-specific > issues pop up when you switch over, for example. > > That way there is a gradual change-over. Ok, I can try to split this down further. > The other approach (which would be nice _too_) is to actually try to > convert one clock source at a time. Why is that not an option? We need to give control to the clock events core code once we convert one clock event device. Having two competing subsystems controlling different devices (e.g. PIT and APIC) is not really desirable. The HPET change, which is the larger part of the conversion set simply because we now share the code with i386, might be split out by disabling HPET in the first step, doing the PIT / APIC conversion and then the HPET one in a separate step. Thanks, tglx
From: Linus Torvalds [email blocked] To: Thomas Gleixner [email blocked] Subject: Re: x86 status was Re: -mm merge plans for 2.6.23 Date: Wed, 11 Jul 2007 16:07:47 -0700 (PDT) On Thu, 12 Jul 2007, Thomas Gleixner wrote: > > The HPET change, which is the larger part of the conversion set simply > because we now share the code with i386, might be split out by disabling > HPET in the first step, doing the PIT / APIC conversion and then the > HPET one in a separate step. But that misses the point. It means that the commit that actually *changes* the code never actually gets tested on its own Why not just fix up the HPET code so that it can be shared *first*. Without the other conversion? Really - What's so wrong with the hpet.c changes in the *absense* of conversion to clockevents? Those changes seem to be totally independent - just abstracting ou tthe "hpet_get_virt_address()" stuff etc. None of that has anything to do with clockevents, as far as I can see. In other words, you now change a i386-only file, and maybe it breaks subtly on i386 as a result. Wouldn't it be nicer to see that breakage as a separate event? Then, the x86-64 clockevents code will switch over entirely, but now it switches over to something we can say has gotten testing, and we know the switch-over won't break any 32-bit code, because the switch-over literally didn't change anything at all for that case. See? THAT is what I mean by "gradual". Bugs happen, but if we can make _independent_ bugs show up in _independent_ commits, that will make it much easier to figure out what happened. The same is true of a lot of the APIC timer code. Sure, that patch has the actual conversion in it, and you don't have the cross-architecture issues, but more than 50% of the patch seems to be just cleanup that is independent of the actual switch-over, no? Again, if it was done as a "one patch for cleanup, and another patch that actually switches the higher-level interfaces around", then the two mostly independent issues (of "hardware access/initialization" vs "higher-level changes in how it got called") get done as two independent commits. And no, I really probably wouldn't ask for this, but 2.6.21 showed *exactly* this problem. Trivial debugging helps like "git bisect" didn't help at all, because all the problems started when the new code was "activated", not when it was actually brought in. Linus
From: Thomas Gleixner [email blocked] To: Linus Torvalds [email blocked] Subject: Re: x86 status was Re: -mm merge plans for 2.6.23 Date: Thu, 12 Jul 2007 01:29:25 +0200 Linus, On Wed, 2007-07-11 at 16:07 -0700, Linus Torvalds wrote: > Why not just fix up the HPET code so that it can be shared *first*. > Without the other conversion? Really - What's so wrong with the hpet.c > changes in the *absense* of conversion to clockevents? Those changes seem > to be totally independent - just abstracting ou tthe > "hpet_get_virt_address()" stuff etc. > > None of that has anything to do with clockevents, as far as I can see. > > In other words, you now change a i386-only file, and maybe it breaks > subtly on i386 as a result. Wouldn't it be nicer to see that breakage as a > separate event? Sure, I meant to do the HPET changes to i386 separate as a preparatory patch. Sharing HPET before the conversion is nasty at best (it involves a ton of ifdeffery at least). > Then, the x86-64 clockevents code will switch over entirely, but now it > switches over to something we can say has gotten testing, and we know the > switch-over won't break any 32-bit code, because the switch-over literally > didn't change anything at all for that case. Well, we know that it works on i386, but once we turn on the x64 switch we have not tested the shared code for x64 yet. I try to find some practicable compromise between the big bang patch and the theoretical gradual optimum. > The same is true of a lot of the APIC timer code. Sure, that patch has the > actual conversion in it, and you don't have the cross-architecture issues, > but more than 50% of the patch seems to be just cleanup that is > independent of the actual switch-over, no? I said before, that I'm going to split them further. tglx
From: Andi Kleen [email blocked] To: Ingo Molnar [email blocked] Subject: Re: x86 status was Re: -mm merge plans for 2.6.23 Date: Wed, 11 Jul 2007 23:16:38 +0200 Well I spent a lot of time making the x86-64 timing code work well on a variety of machines; working around a wide variety of hardware and platform bugs. I obviously don't agree on your description of its maintenance state. > What contribution do we have from you instead? A week before the .23 I told him my objections privately earlier. Basically i would like to see an actually debuggable step-by-step change, not a rip everything out. If that isn't possible it needs very careful review which just hasn't happened yet. But I'm not convinced even step by step is not possible here. I thought it was clear that rip everything out is rarely a good idea in Linux land? That's really not something I should need to harp on repeatedly. -Andi
From: Andrea Arcangeli [email blocked] To: Andi Kleen [email blocked] Subject: Re: x86 status was Re: -mm merge plans for 2.6.23 Date: Wed, 11 Jul 2007 23:46:49 +0200 Hi Andi, On Wed, Jul 11, 2007 at 11:16:38PM +0200, Andi Kleen wrote: > I thought it was clear that rip everything out is rarely a good idea > in Linux land? That's really not something I should need to harp on > repeatedly. I'm going to change topic big time because your sentence above perfectly applies to the O(1) scheduler too. It's not like process schedulers are sacred and there shall be only one, while I/O schedulers and packet schedulers are profane and there can be many of them. FWIW IMHO the right way would have been to make the new scheduler pluggable and switchable at runtime, too bad it was ripped off instead. The difficulty of making the scheduler pluggable isn't really enormous, there have been patches floating around to achieve it, some I even deal with them myself once. The only positive side of being forced to CFS I can imagine, is that more testing will make it more stable and more tuned more quickly. But I'm fairly certain Ingo's good enough to achieve without it, perhaps with a few more weeks. Personally I very much like the unfariness of O(1), I'm afraid CFS will overschedule under a certain number of workloads in its attempt to provide a complete fair queieing at all costs, and it won't deal with the X server as nicely as O(1), but I may as well be wrong. The only thing I'm more sure about is that the computational complexity is higher, and that reason alone is a good technical reason to provide both and let the java folks stick with O(1) if they want.
From: Linus Torvalds [email blocked] To: Andrea Arcangeli [email blocked] Subject: Re: x86 status was Re: -mm merge plans for 2.6.23 Date: Wed, 11 Jul 2007 15:09:28 -0700 (PDT) On Wed, 11 Jul 2007, Andrea Arcangeli wrote: > > I'm going to change topic big time because your sentence above > perfectly applies to the O(1) scheduler too. I disagree to a large degree. We almost never have problems with code you can "think about". Sure, bugs happen, but code that everybody runs the same generally doesn't break. So a CPU scheduler doesn't worry me all that much. CPU schedulers are "easy". What worries me is interfaces to hardware that we know looks different for different people. That means that any testing that one person has done doesn't necessarily translate to anything at *all* on another persons machine. The timer problems we had when merging the stuff in 2.6.21 just scarred me. I'd _really_ hate to have to go through that again. And no, the "gradual" thing where the patch that actually *enables* something isn't very gradual at all, so that's the absolutely worst kind of thing, because then people can "git bisect" to the point where it got enabled and tell us that's where things broke, but that doesn't actually say anything at all about the patch that actually implements the new behaviour. So the "enable" kind of patch is actually the worst of the lot, when it comes to hardware. When it comes to pure software algorithms, and things like schedulers, you'll still obviously have timing issues and tuning, but generally things *work*, which makes it a lot easier to debug and describe. Linus
From: Linus Torvalds [email blocked] To: Valdis.Kletnieks Subject: Re: x86 status was Re: -mm merge plans for 2.6.23 Date: Wed, 11 Jul 2007 15:12:49 -0700 (PDT) On Wed, 11 Jul 2007, [email blocked] wrote: > > Odd, I looked at the patchset fairly closely a number of times, as I was > hand-retrofitting the -rc[1-4] versions onto -rc[1-4]-mm kernels, and it looked > to *me* like it was a nice set of 20 or so step-by-step changes (bisectable > and everything - I got to do that once trying to figure out which one I botched). > Was there something in there that I missed? The patch-set itself actually looks fine, as far as I'm concerned. But it does seem to have that "enable everything in one go" problem. I'd much rather see one time source at a time being converted, and enabled then and there, so that when people report problems and do a bisection, if it was HPET that broke, you get the commit that changed HPET. As it is, looking at that set, it *looks* like you'd get the "ok, now enable it all" as the commit that breaks, which tells you hardly anything, since the commit that _shows_ the behaviour has absolutely nothing to do with the code that actually causes it. But yeah, the patch series per se doesn't look bad. If it wasn't for me being burnt by the last big switch-over for timers, I probably wouldn't mind it at all, personally. Linus



Related Links:

Reply

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <b> <quote> <pre> <hr> <br> <p> <img> <blockquote> <font> <tt> <table> <tr> <i>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

speck-geostationary