Linus,
Please pull the latest x86-fixes-for-linus git tree from:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git x86-fixes-for-linus
Thanks,
-hpa
------------------>
H. Peter Anvin (1):
x86: enable CONFIG_X86_GENERIC by default
arch/x86/Kconfig.cpu | 19 ++++++++++---------
1 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
index 2c518fb..46d0acf 100644
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -279,17 +279,18 @@ config GENERIC_CPU
endchoice
config X86_GENERIC
- bool "Generic x86 support"
+ bool "Generic x86 support" if EMBEDDED
depends on X86_32
+ default y
help
- Instead of just including optimizations for the selected
- x86 variant (e.g. PII, Crusoe or Athlon), include some more
- generic optimizations as well. This will make the kernel
- perform better on x86 CPUs other than that selected.
-
- This is really intended for distributors who need more
- generic optimizations.
-
+ Instead of just including optimizations and workarounds for
+ the selected x86 variant (e.g. PII, Crusoe or Athlon),
+ include some more generic optimizations and workarounds as
+ well. Without this option, the kernel is not guaranteed to
+ run on anything other than the exact CPU selected.
+
+ Disable this if you want to run the kernel on a specific CPU
+ *only* and want maximum optimizations for that CPU.
endif
config X86_CPU
--
Ok, so after having realized that this seems to be more about a bug with gcc, I'm really not as convinced any more. As far as I can tell, there are three issues: - "-mtune=core/core2/pentium4/.." is buggy in some gas/gcc versions on x86-32, and makes architectural choices. Any actual _released_ versions? Maybe it's just a current SVN issue? Workaround: don't use it. And yes, X86_GENERIC=y will do that, although quite frankly that seems to be dubious in itself. But quite frankly, it's a gcc bug, and we should see it as such. The better workaround may well be "-Wa,-mtune=generic" as you pointed out. - We do the CONFIG_P6_NOPL thing ourselves, and we should just stop doing that on 32-bit. There simply isn't a good enough reason to do so. I already posteed the Kconfig.cpu patch to just stop doing it. - X86_GENERIC means _other_ things too, like doing a 128-bit cacheline just so that it won't suck horribly on P4's even if it's otherwise tuned for a good microarchitecture. And they really do seem to be _separate_ issues. Do we really want to tie these things together under X86_GENERIC? Linus --
Hmm. The only other thing seems to be X86_INTEL_USERCOPY. Which doesn't seem to be something we want to force either. And I have to say, that whole X86_GENERIC -> L1_CACHE_BYTES=128 -> cache_line_size() -> SLAB/SLUB/SLOB alignment worries me too. Looking at that, I really don't feel like I want to force 128-byte alignment on everybody, just because the P4 was a pig in cacheline size. So NOPL really stands out as being different from the other things that X86_GENERIC does. Linus --
SLAB/SLUB should actually auto detect the cache line at runtime. Similar feeling here. -Andi -- ak@linux.intel.com --
As far as I can tell, -Wa,-mtune=generic *should* work. It doesn't look to me as if cc1 will generate the long NOPs. That one we can do Well, the argument in favour would be that if you want a kernel that can cross between different microarchitectures, then you want the "don't suck horribly on any of them". We can, of course, divide them down further, but is it useful? The "ideal" way to do any of this would probably to have checkboxes for all the CPUs you want to support and then a drop-down box for the CPU to optimize for. However, the combinatorics of that would be horrible, and it would be very unlikely we would avoid bugs. -hpa --
On Mon, 08 Sep 2008 11:22:24 -0700 the ideal case would be "support them all" the second-most ideal case would be "support all as of <year>" I suppose a third one for advanced users not distros would be "support only <vendor>" since that would be the biggest part of code to drop between models of the same vendor.. not too much to win there. -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org --
Not really. That would include things like the i386, which is a bunch of really nasty stuff. -hpa --
agreed - especially the verify_area() impact makes it a non-starter. but 486 and higher is certainly quite reasonable, and is still being tested. ... and _in practice_ 99% of all systems that run Linux today understand CMOV. ... _and_ in practice 99% of all new Linux systems shipped today are Core2 or better. ... and so on it goes with this argument. Everyone has a different target audience and there's no firm limit. Maybe what makes more sense is to have some sort of time dependency: support all x86 CPUs released in the last year support all x86 CPUs released in the past 5 years support all x86 CPUs released in the past 10 years support all x86 CPUs released ever [ ... or configure a specific model ] and people/distributions would use _those_ switches. That means we could continuously tweak those targets, as systems become obsolete and new CPUs arrive. Ingo --
cmov, cmpxchg and xadd are the noticeable things. I think there are realistically three classes: - _really_ old, to the point of being totally useless for SMP. This is really just 386 and clones. We _need_ a working WP for a race-free access_ok(), and we need cmpxchg (and lately xadd). SMP cannot really realistically work reasonably (ys, there were SMP machines. No, they don't matter), and you'd have to be insane to care about this as a vendor even on UP. Probably nobody really cares (ie if you have hardware that old, you are likely much better off with an older kernel too) Smaller pains even on UP: bswap doesn't exist. invlpg doesn't exist. - old. pre-cmov. i486 and pentium, and some clones. It's workable, but code generation differences are really big enough that it's worth having a totally separate architecture option for newer CPUs where the kernel simply won't work. And most newer distros probably simply don't care, although there may be individual cases where this makes sense (embedded places still use pentium clones etc, and there are probably a fair amount of individuals that want to still use this) Other pains: TSC doesn't necessarily exist. - "modern 32-bit": PPro and better. Can take CMOV, MMX and TSC for granted. Yes, there are graduations to the above, but reasonably, those three are I think the "architectural" big versions. The rest should be: - pure "tuning" options. A Pentium 4 is different from Core 2 in tuning, and the best code sequences can be very very different, but the binary should work on both. - with *dynamic* choices for the differences that are architecturally visible. Ie the whole choice of syscall/sysenter/int80 is dynamic, not specified statically at compile time with a config option. So are things like the different XMM versions etc. Hmm? Doesn't that sound like a sane model? Linus --
On Mon, 8 Sep 2008 12:30:02 -0700 (PDT) I'd lump all cpus that don't have cpuid in this bucket too (eg half the 486es) simply because not having cpuid is painful in pretty much the again makes sense; question is if it makes sense to take PSE and PAE it does to me; the only question is if we hit a new bucket with the various fancy string instructions that are in upcoming models; doing string/copy operations inlined for those guys will make a fourth bucket. -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org --
Not really. Detecting CPUID is pretty trivial, and we just initialize Well, PAE implies PSE. Unfortunately Intel released a series of Pentium-Ms without PAE support. We *should* be able to take PSE for granted, but there is Xen damage. -hpa --
VIA C3 (Samuel 2/Ezra, 600 - 1000 MHz?, common on VIA EPIA-*: home theatres etc) can't CMOV. -- Krzysztof Halasa --
On Tue, 09 Sep 2008 01:17:19 +0200 so your cpu does not fall into this bucket...... no big deal. -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org --
Well, more practically, the C3 simply _isn't_ a "modern 32-bit" one. It would fall into the other category of "pre-PPro, but at least better than i386". Linus --
.. Our firewall here uses a Via C3-600 CPU, and CMOV has never worked on it. But based upon your posting, I have today upgraded the BIOS to the latest (2004) version. Now.. how can I check whether CMOV works or not? It's not listed in /proc/cpuinfo. Thanks --
..
..
Okay, done. And the binary does indeed have a ton of CMOV instructions.
When running it, this appears immediately:
Illegal instruction
So much for the "BIOS upgrade fixes CMOV microcode" theory.
Cheers
--
We use 3DNow! for bigger memcpy's if the kernel is configured for a K7.
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
--
It doesn't. I guess I don't care that much, since explicitly asking for some odd-ball case does indicate that you want a very specific kernel. I guess that's ok. I'm certainly not violently against it. Of course, I also suspect that we _could_ fix it so that things like memcpy really only have two cases: - the special inlined "rep movs" thing. Although I'm not actually sure gcc even does this, and I don't think we force it any more. - If doing a function call, we could just fix things up to be more dynamic. Of course, the fixups for the SMP cases are scary (ie we'd probably have to first change it to a one-byte "int $3" instruction, then change the target, and then write the first byte back - and handle any race with another CPU by fixing up the trap). but I dunno. Linus --
That's just *asking* for flame mail if somebody builds a kernel for a system that's 4 year 9 months old, and he builds a kernel 6 months later, and it fails to boot because the CPU is now 3 months out and we've deprecated it... Quick - what year/month was the CPU you're using now released? No peeking. ;) (For the record, I have no *clue* when Intel actually released the Core2 T7200, which is a whole *nother* can of worms - the chip release date can be quite some time before the system vendor ships, and when the consumer actually buys it - it's quite possible that we can write "released in the past 5 years", a user looks at it and says "I bought this system 4 years 2 months ago", and think he's OK, but he's not because he bought a system released 4 years 9 months ago that used a chipset released 5 years 6 months ago...
yeah, in terms of precision of the definition it's certainly more towards the 'vague' end of the spectrum. OTOH, we do change our defaults slowly but surely to match the hardware. So this would give a practical definition. If someone _does_ complain legitimately, it doesnt cost us much to revert a tweak and delay it some more. So the idea is to have some sort of independent platform, instead of the current practice of distros like Debian chosing pretty much random options. No strong opinion though. We can cover 90% of the real advantages via dynamic methods, it's quite rare that we have to make hard .config choices. Pretty much the only hardcoded aspect that hurts in practice is the cache alignment parameter - all the rest is either dynamic already or insignificant. Ever since distros have discovered CONFIG_CC_OPTIMIZE_FOR_SIZE=y, even the various compiler optimization parameters have less of a role. We just have to wait a year or two for P4's to not matter that much anymore, then we can do generic kernels with 64 byte alignment and cmov, that will just work almost everywhere rather optimally. Ingo --
Support all from the last 10 years (ok excluding legacy models that just shipped forever like 486). I think that's quite reasonable to do and worked for a long time. -Andi -- ak@linux.intel.com --
As far as I understood it it's a gas issue, and X86_GENERIC=y would
therefore *not* fix the bug with gcc < 4.2 and affected binutils
since we pass -mtune=i686 for gcc < 4.2 with X86_GENERIC=y.
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
--
Well, for one thing, gcc doesn't actually pass the -mtune= option to gas, it turns out. But yes, "-Wa,-march=generic32" is really the proper fix. -hpa --
If I understand the binutils changelog correctly -march=generic32
support was added one week before the NOP code in question, so all
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
--
