From: Fabio Checconi <fchecconi@...>
Subject: [RESEND][RFC] BFQ I/O Scheduler
Date: Apr 1, 11:29 am 2008
[sorry for reposting, wrong subject]
Hi,
we are working to a new I/O scheduler based on CFQ, aiming at
improved predictability and fairness of the service, while maintaining
the high throughput it already provides.
The patchset, too big for lkml posting, is available here:
http://feanor.sssup.it/~fabio/linux/bfq/patches/
The Budget Fair Queueing (BFQ) scheduler turns the CFQ Round-Robin
scheduling policy of time slices into a fair queueing scheduling
of sector budgets. More precisely, each task is assigned a budget
measured in number of sectors instead of amount of time, and budgets
are scheduled using a slightly modified version of WF2Q+. The
budget assigned to each task varies over time as a function of its
behaviour. However, one can set the maximum value of the budget
that BFQ can assign to any task.
The time-based allocation of the disk service in CFQ, while having
the desirable effect of implicitly charging each application for
the seek time it incurs, suffers from unfairness problems also
towards processes making the best possible use of the disk bandwidth.
In fact, even if the same time slice is assigned to two processes,
they may get a different throughput each, as a function of the
positions on the disk of their requests. On the contrary, BFQ can
provide strong guarantees on bandwidth distribution because the
assigned budgets are measured in number of sectors. Moreover, due
to its Round Robin policy, CFQ is characterized by an O(N) worst-case
delay (jitter) in request completion time, where N is the number
of tasks competing for the disk. On the contrary, given the accurate
service distribution of the internal WF2Q+ scheduler, BFQ exhibits
O(1) delay.
We made several tests to measure the aggregate throughput, long-term
bandwidth distribution and single-request completion time guaranteed
by CFQ and BFQ; what we present here was obtained with an outdated
version of the code, we are in the process of collecting data for
the current one (see [1]).
In the first type of tests, to achieve a higher throughput than CFQ
(with the default 100 ms time slice), the maximum budget for BFQ
had to be set to at least 4k sectors. Using the same value for the
maximum budget, in the second type of tests, BFQ guaranteed a maximum
deviation from the desired bandwidth distribution in the order of
3% over all the experiments. On the contrary CFQ exhibited a maximum
deviation of 28% in consequence of the different positions of the
files on the disk.
Slowest task's bw (MB/s) Fastest task's bw (MB/s)
-------------------------------------------------------------------
BFQ (2 files) 9.81 +/- 0.47 9.95 +/- 0.43
CFQ (2 files) 8.61 +/- 0.67 11.92 +/- 0.44
-------------------------------------------------------------------
BFQ (5 files) 4.29 +/- 0.10 4.31 +/- 0.09
CFQ (5 files) 4.01 +/- 0.17 5.24 +/- 0.14
Finally, we set up a VLC video streaming server to stream an
increasing number of movies in presence of disturbing ON/OFF random
file readers. Each test ended when a 1% packet loss was reached
(a packet was deemed as lost if transmitted with a delayed of more
than 1 second). With BFQ it was possible to transmit at most 24
movies in parallel (again with a 4k sectors maximum budget), against
15 movies with CFQ (with a time slice of 20 ms). This is likely
to be a consequence of the higher jitter of CFQ.
Nr. of movies Aggr. bw (MB/s)
---------------------------------------------------------------
BFQ (max_budget=4096) 24.00 +/- 0.00 7.56 +/- 0.87
BFQ (max_budget=16384) 18.70 +/- 9.45 12.78 +/- 5.64
CFQ (slice_sync=20) 14.35 +/- 1.40 12.59 +/- 2.12
More stuff related to BFQ (extended results, the test programs used
and the setup for the tests, a document describing the algorithm in
detail and so on) can be found at:
[1] http://algo.ing.unimo.it/people/paolo/disk_sched/
We would greatly appreciate any sort of feedback from you, comments,
suggestions, corrections and so on. Thank you for your attention.
--
From: Jens Axboe <jens.axboe@...>
Subject: Re: [RESEND][RFC] BFQ I/O Scheduler
Date: Apr 15, 4:22 am 2008
On Tue, Apr 01 2008, Fabio Checconi wrote:
> [sorry for reposting, wrong subject]
>
> Hi,
> we are working to a new I/O scheduler based on CFQ, aiming at
> improved predictability and fairness of the service, while maintaining
> the high throughput it already provides.
>
> The patchset, too big for lkml posting, is available here:
> http://feanor.sssup.it/~fabio/linux/bfq/patches/
>
> The Budget Fair Queueing (BFQ) scheduler turns the CFQ Round-Robin
> scheduling policy of time slices into a fair queueing scheduling
> of sector budgets. More precisely, each task is assigned a budget
> measured in number of sectors instead of amount of time, and budgets
> are scheduled using a slightly modified version of WF2Q+. The
> budget assigned to each task varies over time as a function of its
> behaviour. However, one can set the maximum value of the budget
> that BFQ can assign to any task.
>
> The time-based allocation of the disk service in CFQ, while having
> the desirable effect of implicitly charging each application for
> the seek time it incurs, suffers from unfairness problems also
> towards processes making the best possible use of the disk bandwidth.
> In fact, even if the same time slice is assigned to two processes,
> they may get a different throughput each, as a function of the
> positions on the disk of their requests. On the contrary, BFQ can
> provide strong guarantees on bandwidth distribution because the
> assigned budgets are measured in number of sectors. Moreover, due
> to its Round Robin policy, CFQ is characterized by an O(N) worst-case
> delay (jitter) in request completion time, where N is the number
> of tasks competing for the disk. On the contrary, given the accurate
> service distribution of the internal WF2Q+ scheduler, BFQ exhibits
> O(1) delay.
>
> We made several tests to measure the aggregate throughput, long-term
> bandwidth distribution and single-request completion time guaranteed
> by CFQ and BFQ; what we present here was obtained with an outdated
> version of the code, we are in the process of collecting data for
> the current one (see [1]).
>
> In the first type of tests, to achieve a higher throughput than CFQ
> (with the default 100 ms time slice), the maximum budget for BFQ
> had to be set to at least 4k sectors. Using the same value for the
> maximum budget, in the second type of tests, BFQ guaranteed a maximum
> deviation from the desired bandwidth distribution in the order of
> 3% over all the experiments. On the contrary CFQ exhibited a maximum
> deviation of 28% in consequence of the different positions of the
> files on the disk.
>
> Slowest task's bw (MB/s) Fastest task's bw (MB/s)
> -------------------------------------------------------------------
> BFQ (2 files) 9.81 +/- 0.47 9.95 +/- 0.43
> CFQ (2 files) 8.61 +/- 0.67 11.92 +/- 0.44
> -------------------------------------------------------------------
> BFQ (5 files) 4.29 +/- 0.10 4.31 +/- 0.09
> CFQ (5 files) 4.01 +/- 0.17 5.24 +/- 0.14
>
> Finally, we set up a VLC video streaming server to stream an
> increasing number of movies in presence of disturbing ON/OFF random
> file readers. Each test ended when a 1% packet loss was reached
> (a packet was deemed as lost if transmitted with a delayed of more
> than 1 second). With BFQ it was possible to transmit at most 24
> movies in parallel (again with a 4k sectors maximum budget), against
> 15 movies with CFQ (with a time slice of 20 ms). This is likely
> to be a consequence of the higher jitter of CFQ.
>
> Nr. of movies Aggr. bw (MB/s)
> ---------------------------------------------------------------
> BFQ (max_budget=4096) 24.00 +/- 0.00 7.56 +/- 0.87
> BFQ (max_budget=16384) 18.70 +/- 9.45 12.78 +/- 5.64
> CFQ (slice_sync=20) 14.35 +/- 1.40 12.59 +/- 2.12
>
> More stuff related to BFQ (extended results, the test programs used
> and the setup for the tests, a document describing the algorithm in
> detail and so on) can be found at:
>
> [1] http://algo.ing.unimo.it/people/paolo/disk_sched/
>
> We would greatly appreciate any sort of feedback from you, comments,
> suggestions, corrections and so on. Thank you for your attention.
Fabio, I've merged the scheduler for some testing. Overall the code
looks great, you've done a good job!
I didn't touch patches 2 and 3, but I rewrote #1 somewhat. See the
result here:
http://git.kernel.dk/?p=linux-2.6-block.git;a=commitdiff;h=973a02c4ea1c324c41e45b69c07...
I'm sure you'll agree with the hlist_sched_*() functions. I also killed
the ->bfq_ioprio_changed modification, what exactly was the purpose of
that?
The code is now in the 'bfq' branch of the block git repo.
--
Jens Axboe
--
From: Fabio Checconi <fchecconi@...>
Subject: Re: [RESEND][RFC] BFQ I/O Scheduler
Date: Apr 15, 5:11 am 2008
> From: Jens Axboe <jens.axboe@oracle.com>
> Date: Tue, Apr 15, 2008 10:22:36AM +0200
>
> On Tue, Apr 01 2008, Fabio Checconi wrote:
...
> > We would greatly appreciate any sort of feedback from you, comments,
> > suggestions, corrections and so on. Thank you for your attention.
>
> Fabio, I've merged the scheduler for some testing. Overall the code
> looks great, you've done a good job!
>
thank you very much :)
> I didn't touch patches 2 and 3, but I rewrote #1 somewhat. See the
> result here:
>
> http://git.kernel.dk/?p=linux-2.6-block.git;a=commitdiff;h=973a02c4ea1c324c41e45b69c07...
>
> I'm sure you'll agree with the hlist_sched_*() functions. I also killed
> the ->bfq_ioprio_changed modification, what exactly was the purpose of
> that?
>
of course the hlist_sched_*() functions are much better than what was
in the patch (the idea behind the patch was to not touch at all cfq code).
the ->bfq_ioprio_changed was there to avoid that a process/ioc doing
i/o on multiple devices managed by cfq and bfq would see ioprio
changes only for one of the two schedulers.
cfq_ioc_set_ioprio() (and its bfq counterpart bfq_ioc_set_ioprio())
unconditionally assign 0 to (bfq_)ioprio_changed, so the first
scheduler that sees the ioprio change eats the new priority values.
of course I may be wrong, but I think it (or some better mechanism
to avoid the problem) is necessary.
> The code is now in the 'bfq' branch of the block git repo.
>
of course we'll be glad to help in testing in any way you can find useful.
thank you!
--
From: Jens Axboe <jens.axboe@...>
Subject: Re: [RESEND][RFC] BFQ I/O Scheduler
Date: Apr 15, 8:42 am 2008
On Tue, Apr 15 2008, Fabio Checconi wrote:
> > From: Jens Axboe <jens.axboe@oracle.com>
> > Date: Tue, Apr 15, 2008 10:22:36AM +0200
> >
> > On Tue, Apr 01 2008, Fabio Checconi wrote:
> ...
> > > We would greatly appreciate any sort of feedback from you, comments,
> > > suggestions, corrections and so on. Thank you for your attention.
> >
> > Fabio, I've merged the scheduler for some testing. Overall the code
> > looks great, you've done a good job!
> >
>
> thank you very much :)
>
>
> > I didn't touch patches 2 and 3, but I rewrote #1 somewhat. See the
> > result here:
> >
> > http://git.kernel.dk/?p=linux-2.6-block.git;a=commitdiff;h=973a02c4ea1c324c41e45b69c07...
> >
> > I'm sure you'll agree with the hlist_sched_*() functions. I also killed
> > the ->bfq_ioprio_changed modification, what exactly was the purpose of
> > that?
> >
>
> of course the hlist_sched_*() functions are much better than what was
> in the patch (the idea behind the patch was to not touch at all cfq code).
As long as the changes at that point are straight forward and 'obviously
correct', there's no harm done. Have you thought about merging bfq and
cfq?
> the ->bfq_ioprio_changed was there to avoid that a process/ioc doing
> i/o on multiple devices managed by cfq and bfq would see ioprio
> changes only for one of the two schedulers.
>
> cfq_ioc_set_ioprio() (and its bfq counterpart bfq_ioc_set_ioprio())
> unconditionally assign 0 to (bfq_)ioprio_changed, so the first
> scheduler that sees the ioprio change eats the new priority values.
> of course I may be wrong, but I think it (or some better mechanism
> to avoid the problem) is necessary.
I see. If we can and will merge bfq and cfq, then it's not really an
issue. Otherwise, I'd suggest using bits 0 of ioprio_changed for cfq and
1 for bfq and so on. That avoids adding another int to the io context.
> > The code is now in the 'bfq' branch of the block git repo.
> >
>
> of course we'll be glad to help in testing in any way you can find useful.
I'll push it to the for-akpm branch as well, so it should show up in the
next -mm and get some testing there.
--
Jens Axboe
--
Scheduller flame-war troll time!
Please Con Kolivas fans and trolls, start posting...
This is about the I/O
This is about the I/O scheduler, not the CPU scheduler. This stuff isn't connected to the CFS/SD flamewars.
So?
Since when has such a minor difference mattered to such trolls?
--
Program Intellivision and play Space Patrol!
The only troll here, so far,
The only troll here, so far, appears to be you, actually.
(Yes, the kind of post you made is indeed of the trolling variety, i.e. pointless and utterly worthless in every aspect imaginable. Why I even bother to respond is depressing, but simply due to flaws in my personality. I apologize.)
In other words: *yawn* And please go away.
Explain in layman terms
Explain in laymen terms to a noob what this is, and how it is good for me?
Why would I want this? What does it do for me?
IO schedulers allocate the
IO schedulers allocate the limited IO bandwidth of block devices (disks) to different processes.
Linux has a set of them, including noop, deadline, anticipatory scheduler (as) and CFQ (completely
fair queueing?). Default is CFQ.
The CFQ scheduler also allows to set priorities for IO, which means you can give some processes
a higher priority or lower priority and they then get more or less of the available disk bandwidth.
In fact the IO priority is inherited automatically from the CPU priority (nice level)
BFQ is an enhancement of the CFQ IO scheduler that claims to have a new fairer allocation algorithm.
> Why would I want this?
Typical use case would be either you have some background process you don't want to impact foreground
jobs significantly (e.g. a backup job or similar) or you have some very important job (like a media
server which needs to supply some high bandwidth data stream from disk) that you want to prioritise
over everything else the system does.
But with the current defaults it should really work all transparently already if the CPU
priorities are set correctly.
So how old is CFQ? I find it
So how old is CFQ? I find it interesting how the Linux community of developers gets something into mainline, only to dump it couple of months later.
I find it interesting how
I find it interesting how you have no idea what you are talking about, CFQ is over 4 years old now...
Besides, IF someone found a better way of doing something (for example a better scheduler than CFS, which you probably confused this with), do you really think it should be blocked on basis of "but the old one is only a few months old". If so, why? To make the developer of the original method feel all warm and fuzzy that his code lived for a year?
Nobody is dumping anything
BFQ is included in the block git repo (and thus going to -mm as well) for some testing. That does NOT mean that it will go into mainline at any point in time, just that it is interesting enough to get some wider testing.
Additionally, this is not at all about dumping CFQ or any other existing in-kernel IO scheduler. Even if BFQ is merged - either as a standalone scheduler, or merged with CFQ - it can coexist nicely with the others.
My apologies, I did confuse
My apologies, I did confuse it with CFS. Its been ages since I used anything that runs Linux. Anyway, instead of writing a new /(IO|CPU)/ scheduler everytime a fly takes longer to shit, seems to be a waste of resources. How about having 2 and building on top of existing stuff one piece at a time.
Next time, read the article, then *think*
The article states it is based on CFQ (the one that is in use for 4 years). Besides depending on the workload it makes sense to have different IO schedulers for different block devices.
Before you make strong statements, maybe you should ask yourself if you understand or even give a clue. You act as if you pretend to know more about it than Linux kernel developers, while in fact you are not even running Linux.
'nuf said.
How does it handle RAID-5 resync?
For me, CFQ performance was _extremely_ bad on my FTP server (ftp.linux.cz) during RAID-5 resync. I have 8x 1 TB drives in software RAID-5, and the resync with normal FTP traffic (about 100 Mbit/s daily average, maybe a bit more) took over three days, with the system latency being bad. Switching to the deadline scheduler helped: now the resync takes between 1 and 2 days, depending on the FTP traffic, and the system latency during resync is much better.
Would BFQ treat the RAID5 resync thread as just one of many tasks?
-Yenya
Layer problem
raid/dm and io schedulers are always problematic, since the md and dm personalities sit on top of the io scheduler. BFQ will not treat it differently than CFQ, both should see it as a single task. deadline isn't task oriented, that is probably what helps you out there.
Task orientation
Are you saying that if you run a software RAID, the CFQ gets no information about which task triggered the traffic, because it all funnels through md? If so, does it make sense to augment traffic coming from this layer with some notion of which task it's assigned to?
--
Program Intellivision and play Space Patrol!