I'm half way between you two on this one. I agree with Christoph in that
it's currently very difficult to trigger a failure scenario and today we
don't have a way of dealing with it. I agree with Nick in that conceivably a
failure scenario does exist somewhere and the careful person (or paranoid if
you prefer) would deal with it pre-emptively. The fact is that no one knows
what a large block workload is going to look like to the allocator so we're
all hand-waving.
Right now, I can't trigger the worst failure scenarious that cannot be
dealt with for fragmentation but that might change with large blocks. The
worst situation I can think is a process that continously dirties large
amounts of data on a large block filesystem while another set of processes
works with large amounts of anonymous data without any swap space configured
with slub_min_order set somewhere between order-0 and the large block size.
Fragmentation wise, that's just a kick in the pants and might produce
the failure scenario being looked for.
If it does fail, I don't think it should be used to beat Christoph with as
such because it was meant to be a #2 solution. What hits it is if the mmap()
change is unacceptable.
Performance figures would be nice. dbench is flaky as hell but can
comparison figures be generated on one filesystem with 4K blocks and one
with 64K? I guess we can do it ourselves too because this should work on
normal machines.
If I had this bus that couldn't go below 50MPH, right...... never mind.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
-