On Sun, Sep 16, 2007 at 09:54:18PM +0100, Mel Gorman wrote:
Sorry for the misunderstanding, I totally agree!
Sure, I completely agree.
This is exactly correct, some memory will be wasted. It'll reach 0
free memory more quickly depending on which kind of applications are
being run.
Agreed.
Allocating userpages from slab in 4k chunks with a 64k PAGE_SIZE is
really complex indeed. I'm not planning for that in the short term but
it remains a possibility to make the kernel more generic. Perhaps it
could worth it...
Allocating ptes from slab is fairly simple but I think it would be
better to allocate ptes in PAGE_SIZE (64k) chunks and preallocate the
nearby ptes in the per-task local pagetable tree, to reduce the number
of locks taken and not to enter the slab at all for that. Infact we
could allocate the 4 levels (or anyway more than one level) in one
single alloc_pages(0) and track the leftovers in the mm (or similar).
I'm unsure who reads /proc/buddyinfo (that can change a lot and that
is not very significant information if the vm can defrag well inside
the reclaim code), but it's not much different and it's more about
knowing the real meaning of /proc/meminfo, freeable (unmapped) cache,
anon ram, and free memory.
The idea is that to succeed an mmap over a large xfs file with
mlockall being invoked, those largepages must become available or
it'll be oom despite there are still 512M free... I'm quite sure
admins will gets confused if they get oom killer invoked with lots of
ram still free.
The overcommit feature will also break, just to make an example (so
much for overcommit 2 guaranteeing -ENOMEM retvals instead of oom
killage ;).
Yes, I totally agree. It sounds worthwhile to have a good defrag logic
in the VM. Even allocating the kernel stack in today kernels should be
able to benefit from your work. It's just comparing a fork() failure
(something that will happen with ulimit -n too and that apps must be
able to deal with) with an I/O failure that worries me a bit. I'm
quite sure a db failing I/O will not recover too nicely. If fork fails
that's most certainly ok... at worst a new client won't be able to
connect and he can retry later. Plus order 1 isn't really a big deal,
you know the probability to succeeds decreases exponentially with the
order.
-