Re: [PATCH] remove throttle_vm_writeout()

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Andrew Morton
Date: Friday, October 5, 2007 - 10:20 am

On Fri, 5 Oct 2007 20:30:28 +0800
Fengguang Wu <wfg@mail.ustc.edu.cn> wrote:


Sure, but we don't have one disk queue per disk per zone!  The queue is
shared by all the zones.  So if writeback from one zone has filled the
queue up, the kernel can't write back data from another zone.

(Well, it can, by blocking in get_request_wait(), but that causes long and
uncontrollable latencies).


Or someone ran fsync(), or pdflush is writing back data because it exceeded
dirty_writeback_centisecs, etc.


Yeah.  In 2.4 and early 2.5, page-reclaim (both direct reclaim and kswapd,
iirc) would throttle by waiting on writeout of a particular page.  This was
a poor design, because writeback against a *particular* page can take
anywhere from one millisecond to thirty seconds to complete, depending upon
where the disk head is and all that stuff.

The critical change I made was to switch the throttling algorithm from
"wait for one page to get written" to "wait for _any_ page to get written".
 Becaue reclaim really doesn't care _which_ page got written: we want to
wake up and start scanning again when _any_ page got written.

That's what congestion_wait() does.

It is pretty crude.  It could be that writeback completed against pages which
aren't in the correct zone, or it could be that some other task went and
allocated the just-cleaned pages before this task can get running and
reclaim them, or it could be that the just-written-back pages weren't
reclaimable after all, etc.

It would take a mind-boggling amount of logic and locking to make all this
100% accurate and the need has never been demonstrated.  So page reclaim
presently should be viewed as a polling algorithm, where the rate of
polling is paced by the rate at which the IO system can retire writes.


Something like that.

The critical numbers to watch are /proc/vmstat's *scan* and *steal*.  Look:

akpm:/usr/src/25> uptime
 10:08:14 up 10 days, 16:46, 15 users,  load average: 0.02, 0.05, 0.04
akpm:/usr/src/25> grep steal /proc/vmstat
pgsteal_dma 0
pgsteal_dma32 0
pgsteal_normal 0
pgsteal_high 0
pginodesteal 0
kswapd_steal 1218698
kswapd_inodesteal 266847
akpm:/usr/src/25> grep scan /proc/vmstat      
pgscan_kswapd_dma 0
pgscan_kswapd_dma32 1246816
pgscan_kswapd_normal 0
pgscan_kswapd_high 0
pgscan_direct_dma 0
pgscan_direct_dma32 448
pgscan_direct_normal 0
pgscan_direct_high 0
slabs_scanned 2881664

Ignore kswapd_inodesteal and slabs_scanned.  We see that this machine has
scanned 1246816+448 pages and has reclaimed (stolen) 1218698 pages.  That's
a reclaim success rate of 97.7%, which is pretty damn good - this machine
is just a lightly-loaded 3GB desktop.

When testing reclaim, it is critical that this ratio be monitored (vmmon.c
from ext3-tools is a vmstat-like interface to /proc/vmstat).  If the
reclaim efficiency falls below, umm, 25% then things are getting into some
trouble.

Actually, 25% is still pretty good.  We scan 4 pages for each reclaimed
page, but the amount of wall time which that takes is vastly less than the
time to write one page, bearing in mind that these things tend to be seeky
as hell.  But still, keeping an eye on the reclaim efficiency is just your
basic starting point for working on page reclaim.


-
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[PATCH] remove throttle_vm_writeout(), Miklos Szeredi, (Thu Oct 4, 5:25 am)
Re: [PATCH] remove throttle_vm_writeout(), Peter Zijlstra, (Thu Oct 4, 5:40 am)
Re: [PATCH] remove throttle_vm_writeout(), Miklos Szeredi, (Thu Oct 4, 6:00 am)
Re: [PATCH] remove throttle_vm_writeout(), Peter Zijlstra, (Thu Oct 4, 6:23 am)
Re: [PATCH] remove throttle_vm_writeout(), Miklos Szeredi, (Thu Oct 4, 6:49 am)
Re: [PATCH] remove throttle_vm_writeout(), Peter Zijlstra, (Thu Oct 4, 9:47 am)
Re: [PATCH] remove throttle_vm_writeout(), Andrew Morton, (Thu Oct 4, 10:46 am)
Re: [PATCH] remove throttle_vm_writeout(), Peter Zijlstra, (Thu Oct 4, 11:10 am)
Re: [PATCH] remove throttle_vm_writeout(), Andrew Morton, (Thu Oct 4, 11:54 am)
Re: [PATCH] remove throttle_vm_writeout(), Miklos Szeredi, (Thu Oct 4, 2:07 pm)
Re: [PATCH] remove throttle_vm_writeout(), Andrew Morton, (Thu Oct 4, 2:56 pm)
Re: [PATCH] remove throttle_vm_writeout(), Miklos Szeredi, (Thu Oct 4, 3:39 pm)
Re: [PATCH] remove throttle_vm_writeout(), Andrew Morton, (Thu Oct 4, 4:09 pm)
Re: [PATCH] remove throttle_vm_writeout(), Miklos Szeredi, (Thu Oct 4, 4:26 pm)
Re: [PATCH] remove throttle_vm_writeout(), Andrew Morton, (Thu Oct 4, 4:48 pm)
Re: [PATCH] remove throttle_vm_writeout(), Miklos Szeredi, (Thu Oct 4, 5:12 pm)
Re: [PATCH] remove throttle_vm_writeout(), Andrew Morton, (Thu Oct 4, 5:48 pm)
Re: [PATCH] remove throttle_vm_writeout(), Peter Zijlstra, (Fri Oct 5, 12:32 am)
Re: [PATCH] remove throttle_vm_writeout(), Peter Zijlstra, (Fri Oct 5, 1:22 am)
Re: [PATCH] remove throttle_vm_writeout(), Miklos Szeredi, (Fri Oct 5, 2:22 am)
Re: [PATCH] remove throttle_vm_writeout(), Peter Zijlstra, (Fri Oct 5, 2:47 am)
Re: [PATCH] remove throttle_vm_writeout(), Miklos Szeredi, (Fri Oct 5, 3:27 am)
Re: [PATCH] remove throttle_vm_writeout(), Miklos Szeredi, (Fri Oct 5, 3:32 am)
Re: [PATCH] remove throttle_vm_writeout(), Peter Zijlstra, (Fri Oct 5, 3:57 am)
Re: [PATCH] remove throttle_vm_writeout(), Miklos Szeredi, (Fri Oct 5, 4:27 am)
Re: [PATCH] remove throttle_vm_writeout(), Fengguang Wu, (Fri Oct 5, 5:30 am)
Re: [PATCH] remove throttle_vm_writeout(), John Stoffel, (Fri Oct 5, 8:43 am)
Re: [PATCH] remove throttle_vm_writeout(), Andrew Morton, (Fri Oct 5, 10:20 am)
Re: [PATCH] remove throttle_vm_writeout(), Trond Myklebust, (Fri Oct 5, 10:50 am)
Re: [PATCH] remove throttle_vm_writeout(), Peter Zijlstra, (Fri Oct 5, 11:32 am)
Re: [PATCH] remove throttle_vm_writeout(), Trond Myklebust, (Fri Oct 5, 12:20 pm)
Re: [PATCH] remove throttle_vm_writeout(), Trond Myklebust, (Fri Oct 5, 12:23 pm)
Re: [PATCH] remove throttle_vm_writeout(), Rik van Riel, (Fri Oct 5, 12:54 pm)
Re: [PATCH] remove throttle_vm_writeout(), Peter Zijlstra, (Fri Oct 5, 2:07 pm)
Re: [PATCH] remove throttle_vm_writeout(), Fengguang Wu, (Fri Oct 5, 5:40 pm)
Re: [PATCH] remove throttle_vm_writeout(), Fengguang Wu, (Fri Oct 5, 7:32 pm)