login
Header Space

 
 

CFS Digressions

September 14, 2007 - 5:25am
Submitted by Jeremy on September 14, 2007 - 5:25am.
Linux news

"In the patch you really remove _a_lot_ of stuff," commented Roman Zippel in his reaction to Ingo Molnar's latest updates to the Completely Fair Scheduler. Roman has been consistently critical of Ingo's efforts, asking questions and criticizing Ingo's feedback. He continued, "you also removed a lot of things I tried to get you to explain them to me. On the one hand I could be happy that these things are gone, as they were the major road block to splitting up my own patch. On the other hand it still leaves me somewhat unsatisfied, as I still don't know what that stuff was good for."

Ingo replied to Roman's technical concerns, and pointed out that he'd been traveling for the recent kernel summit, adding, "I bent backwards trying to somehow get you to cooperate with us (and I still haven't given up on that!) - instead of you disparaging CFS and me frequently :-(". Willy Tarreau took a more critical stance, calling into question Roman's motives. He noted that he had been impressed by Roman's original review of the scheduler, but disappointed as the discussion seemed to degenerate, "it's the way you're trying to prove Ingo is a bastard and that you're a victim. But if we just re-read a few pick-ups of your mails since Aug 1st, its getting pretty obvious that you completely made up this situation." Kyle Moffett added, "I get the impression that Ingo re-implemented some ideas that you had because you refused to do so in a way that was acceptable for the upstream kernel. How exactly is this a bad thing?"


From:	Roman Zippel [email blocked]
Subject: Re: [announce] CFS-devel, performance improvements
Date:	Thu, 13 Sep 2007 00:17:42 +0200 (CEST)

Hi,

On Tue, 11 Sep 2007, Ingo Molnar wrote:

> fresh back from the Kernel Summit, Peter Zijlstra and me are pleased to 
> announce the latest iteration of the CFS scheduler development tree. Our 
> main focus has been on simplifications and performance - and as part of 
> that we've also picked up some ideas from Roman Zippel's 'Really Fair 
> Scheduler' patch as well and integrated them into CFS. We'd like to ask 
> people go give these patches a good workout, especially with an eye on 
> any interactivity regressions.

I'm must really say, I'm quite impressed by your efforts to give me as 
little credit as possible.
On the one hand it's of course positive to see so much sudden activity, on 
the other hand I'm not sure how much had happened if I hadn't posted my 
patch, I don't really think it were my complaints about CFS's complexity 
that finally lead to the improvements in this area. I presented the basic 
concepts of my patch already with my first CFS review, but at that time 
you didn't show any interest and instead you were rather quick to simply 
dismiss it. My patch did not add that much new, it's mostly a conceptual 
improvement and describes the math in more detail, but it also 
demonstrated a number of improvements.

> The combo patch against 2.6.23-rc6 can be picked up from:
> 
>   http://people.redhat.com/mingo/cfs-scheduler/devel/
> 
> The sched-devel.git tree can be pulled from:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel.git

Am I the only one who can't clone that thing? So I can't go into much 
detail about the individual changes here.
The thing that makes me curious, is that it also includes patches by 
others. It can't be entirely explained with the Kernel Summit, as this is 
not the first time patches appear out of the blue in form of a git tree. 
The funny/sad thing is that at some point Linus complained about Con that 
his development activity happend on a separate mailing list, but there was 
at least a place to go to. CFS's development appears to mostly happen in 
private. Patches may be your primary form of communication, but that isn't 
true for many other people, with patches a lot of intent and motivation 
for a change is lost. I know it's rather tempting to immediately try out 
an idea first, but would it really hurt you so much to formulate an idea 
in a more conventional manner? Are you afraid it might hurt your 
ueberhacker status by occasionally screwing up in public? Patches on the 
other hand have the advantage to more easily cover that up by simply 
posting a fix - it makes it more difficult to understand what's going on.
A more conventional way of communication would give more people a chance 
to participate, they may not understand every detail of the patch, but 
they can try to understand the general concepts and apply them to their 
own situation and eventually come up with some ideas/improvements of their 
own, they would be less dependent on you to come up with a solution to 
their problem. Unless of course that's exactly what you want - unless you 
want to be in full control of the situation and you want to be the hero 
that saves the day.

> There are lots of small performance improvements in form of a 
> finegrained 29-patch series. We have removed a number of features and 
> metrics from CFS that might have been needed but ended up being 
> superfluous - while keeping the things that worked out fine, like 
> sleeper fairness. On 32-bit x86 there's a ~16% speedup (over -rc6) in 
> lmbench (lat_ctx -s 0 2) results:

In the patch you really remove _a_lot_ of stuff. You also removed a lot of 
things I tried to get you to explain them to me. On the one hand I could 
be happy that these things are gone, as they were the major road block to 
splitting up my own patch. On the other hand it still leaves me somewhat 
unsatisfied, as I still don't know what that stuff was good for.
In a more collaborative development model I would have expected that you 
tried to explain these features, which could have resulted in a discussion 
how else things can be implemented or if it's still needed at all. Instead 
of this you now simply decide unilaterally that these things are not 
needed anymore.

BTW the old sleeper fairness logic "that worked out fine" is actually 
completely gone and is now conceptually closer to what I'm already doing 
in my patch (only the amount of sleeper bonus differs).

>                                   (microseconds, lower is better)
>      ------------------------------------------------------------
>         v2.6.22    2.6.23-rc6(CFS)     v2.6.23-rc6-CFS-devel
>      ----------------------------------------------------
>            0.70          0.75                0.65
>            0.62          0.66                0.63
>            0.60          0.72                0.69
>            0.62          0.74                0.61
>            0.69          0.73                0.53
>            0.66          0.73                0.63
>            0.63          0.69                0.61
>            0.63          0.70                0.64
>            0.61          0.76                0.61
>            0.69          0.74                0.63
>      ----------------------------------------------------
>       avg: 0.64          0.72 (+12%)         0.62 (-3%)
> 
> there is a similar speedup on 64-bit x86 as well. We are now a bit 
> faster than the O(1) scheduler was under v2.6.22 - even on 32-bit. The 
> main speedup comes from the avoidance of divisions (or shifts) in the 
> wakeup and context-switch fastpaths.
> 
> there's also a visible reduction in code size:
> 
>    text    data     bss     dec     hex filename
>   13369     228    2036   15633    3d11 sched.o.before  (UP, nodebug)
>   11167     224    1988   13379    3443 sched.o.after   (UP, nodebug)

Well, one could say that you used every little trick in the book to get 
these numbers down. On the other hand at this point it's a little unclear 
whether you maybe removed it a little too much to get there, so the 
significance of these numbers is a bit limited.

> Changes: besides the many micro-optimizations, one of the changes is 
> that se->vruntime (virtual runtime) based scheduling has been introduced 
> gradually, step by step - while keeping the wait_runtime metric working 
> too. (so that the two methods are comparable side by side, in the same 
> scheduler)

I can't quite see that, the wait_runtime metric is relative to fair_clock 
and this is gone without any replacement, in my patch I at least 
calculate these values for the debug output, but in your patch even that 
is simply gone, so I'm not sure what you actually compare "side by side".

> The ->vruntime metric is similar to the ->time_norm metric used by 
> Roman's patch (and both are losely related to the already existing 
> sum_exec_runtime metric in CFS), it's in essence the sum of CPU time 
> executed by a task, in nanoseconds - weighted up or down by their nice 
> level (or kept the same on the default nice 0 level). Besides this basic 
> metric our implementation and math differs from RFS.

At this point it gets really interesting - I'm amazed how much you stress 
the differences. If we take the basic math as I more simply explained it 
in this example http://lkml.org/lkml/2007/9/3/168, you now also make the 
step from the relative wait_runtime value to an absolute virtual time 
value. Basically it's really the same thing, only the resolution differs. 
This means you already reimplemented a key element of my patch, so would 
you please give me at least that much credit?
The rest of the math is indeed different - it's simply missing. What is 
there is IMO not really adequate. I guess you will see the differences, 
once you test a bit more with different nice levels. There's a good reason 
I put that much effort into maintaining a good, but still cheap average, 
it's needed for a good task placement. There is of course more than one 
way to implement this, so you'll have good chances to simply reimplement 
it somewhat differently, but I'd be surprised if it would be something 
completely different.

To make it very clear to everyone else: this is primarely not about 
getting some credit, although that's not completely unimportant of course. 
This is about getting adequate help. I had to push very hard to get any 
kind of acknowledgment with scheduler issues only to be rewarded with 
silence, so that I was occassionally wondering myself, whether I'm just 
hallucinating all this. Only after I provide the prove that further 
improvements are possible is there some activity. But instead of providing 
help (e.g. by answering my questions) Ingo just goes ahead and 
reimplements the damn thing himself and simply throws out all questionable 
items instead of explaining them.
The point is that I have no real interest in any stinking competition, I 
have no interest to be reduced to simply complaining all the time. I'm 
more interested in a cooperation, but that requires better communication 
and an actual exchange of ideas, patches are no real communication, they 
are a supplement and should rather be the end result instead of means to 
get anything started. A collaboration could bundle the individual 
strengths, so that this doesn't degenerate into a contest of who's the 
greatest hacker.
Is this really too much to expect?

bye, Roman


From: Ingo Molnar [email blocked] Subject: Re: [announce] CFS-devel, performance improvements Date: Thu, 13 Sep 2007 11:19:34 +0200 * Roman Zippel [email blocked] wrote: > > The sched-devel.git tree can be pulled from: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel.git > > Am I the only one who can't clone that thing? [...] Ah - i have messed up my sched-devel.git script so the git-push went to kernel.org but into my home directory :-/ Should work now - let me know if it doesnt. i've also uploaded the patch series in quilt format, to: http://people.redhat.com/mingo/cfs-scheduler/devel/patches.tar.gz > [...] It can't be entirely explained with the Kernel Summit, as this > is not the first time patches appear out of the blue in form of a git > tree. i'm not sure what you mean, but i can definitely tell you that there was no scheduler hacking at the Kernel Summit. (there's no good wireless in the pubs and not enough space for a laptop anyway ;) The impressive linecount has been mostly achieved by dumb removal: sched: remove wait_runtime fields and features 4 files changed, 14 insertions(+), 161 deletions(-) sched: remove wait_runtime limit 5 files changed, 3 insertions(+), 124 deletions(-) sched: remove precise CPU load calculations #2 1 file changed, 1 insertion(+), 31 deletions(-) sched: remove precise CPU load 3 files changed, 9 insertions(+), 41 deletions(-) sched: remove stat_gran 4 files changed, 15 insertions(+), 50 deletions(-) Hack time to do them: ~10 minutes apiece. Removing stuff is _easy_ :-) The rest is finegrained, small changes. One of the harder patches was this one: commit 28c4b8ed35f0fc7050f186147da9e10b55e1e446 sched: introduce se->vruntime 3 files changed, 50 insertions(+), 33 deletions(-) And i sent you the first variant of that already: http://lkml.org/lkml/2007/9/2/76 we needed 2 days after the KS to put it into shape and send it out for feedback. Ingo
From: Peter Zijlstra <a.p.zijlstra@chello.nl> Subject: Re: [announce] CFS-devel, performance improvements Date: Thu, 13 Sep 2007 13:35:20 +0200 On Thu, 2007-09-13 at 00:17 +0200, Roman Zippel wrote: > The rest of the math is indeed different - it's simply missing. What is > there is IMO not really adequate. I guess you will see the differences, > once you test a bit more with different nice levels. The rounding error we now still have is accumulative over the long time but has no real effect. The only effect is that a nice level would be a little different that it would have been had the division been perfect, not dissimilar to having a small error in the divisor series to being with. (note that in order to see this little fuzz you need amazingly high context switch rates) We've measured the effect with the strongest nice levels -20 and 19, a normal loop against two yield loops (this generated 700.000 context switches per second), and the effect is <1%. Not something worth fixing IMHO (unless it comes for free). At that high switching rates the overhead of scheduling itself and caching causes more skew than this - the small error is totally swamped by the time lost scheduling. > There's a good reason > I put that much effort into maintaining a good, but still cheap average, > it's needed for a good task placement. While I agree that having this average is nice, your particular implementation has the problem that it quickly overflows u64 at which point it becomes a huge problem (a CPU hog could basically lock up your box when that happens). I solved the wrap around problem in cfs-devel, and from that base I _could_ probably maintain the average without overflow problems, but have yet to try. > There is of course more than one > way to implement this, so you'll have good chances to simply reimplement > it somewhat differently, but I'd be surprised if it would be something > completely different. Currently we have 2 approximations in place: (leftmost + rightmost) / 2 and leftmost + period/2 (where period should match the span of the tree) neither are perfect but they seem to work quite well.
From: Ingo Molnar [email blocked] Subject: Re: [announce] CFS-devel, performance improvements Date: Thu, 13 Sep 2007 14:47:38 +0200 * Roman Zippel [email blocked] wrote: > The rest of the math is indeed different - it's simply missing. What > is there is IMO not really adequate. I guess you will see the > differences, once you test a bit more with different nice levels. Roman, i disagree strongly. I did test with different nice levels. Here are some hard numbers: the CPU usage table of 40 busy loops started at once, all running at a different nice level, from nice -20 to nice +19: top - 12:25:07 up 19 min, 2 users, load average: 40.00, 39.15, 28.35 Tasks: 172 total, 41 running, 131 sleeping, 0 stopped, 0 zombie PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2455 root 0 -20 1576 248 196 R 20 0.0 3:47.56 loop 2456 root 1 -19 1576 244 196 R 16 0.0 3:03.96 loop 2457 root 2 -18 1576 244 196 R 13 0.0 2:24.80 loop 2458 root 3 -17 1576 248 196 R 10 0.0 1:58.63 loop 2459 root 4 -16 1576 244 196 R 8 0.0 1:33.04 loop 2460 root 5 -15 1576 248 196 R 7 0.0 1:14.73 loop 2461 root 6 -14 1576 248 196 R 5 0.0 0:59.61 loop 2462 root 7 -13 1576 244 196 R 4 0.0 0:47.95 loop 2463 root 8 -12 1576 248 196 R 3 0.0 0:38.31 loop 2464 root 9 -11 1576 244 196 R 3 0.0 0:30.54 loop 2465 root 10 -10 1576 244 196 R 2 0.0 0:24.47 loop 2466 root 11 -9 1576 244 196 R 2 0.0 0:19.52 loop 2467 root 12 -8 1576 248 196 R 1 0.0 0:15.63 loop 2468 root 13 -7 1576 248 196 R 1 0.0 0:12.56 loop 2469 root 14 -6 1576 248 196 R 1 0.0 0:10.00 loop 2470 root 15 -5 1576 244 196 R 1 0.0 0:07.99 loop 2471 root 16 -4 1576 244 196 R 1 0.0 0:06.40 loop 2472 root 17 -3 1576 244 196 R 0 0.0 0:05.09 loop 2473 root 18 -2 1576 244 196 R 0 0.0 0:04.05 loop 2474 root 19 -1 1576 248 196 R 0 0.0 0:03.26 loop 2475 root 20 0 1576 244 196 R 0 0.0 0:02.61 loop 2476 root 21 1 1576 244 196 R 0 0.0 0:02.09 loop 2477 root 22 2 1576 244 196 R 0 0.0 0:01.67 loop 2478 root 23 3 1576 244 196 R 0 0.0 0:01.33 loop 2479 root 24 4 1576 248 196 R 0 0.0 0:01.07 loop 2480 root 25 5 1576 244 196 R 0 0.0 0:00.84 loop 2481 root 26 6 1576 248 196 R 0 0.0 0:00.68 loop 2482 root 27 7 1576 248 196 R 0 0.0 0:00.54 loop 2483 root 28 8 1576 248 196 R 0 0.0 0:00.43 loop 2484 root 29 9 1576 248 196 R 0 0.0 0:00.34 loop 2485 root 30 10 1576 244 196 R 0 0.0 0:00.27 loop 2486 root 31 11 1576 248 196 R 0 0.0 0:00.21 loop 2487 root 32 12 1576 244 196 R 0 0.0 0:00.17 loop 2488 root 33 13 1576 244 196 R 0 0.0 0:00.13 loop 2489 root 34 14 1576 244 196 R 0 0.0 0:00.10 loop 2490 root 35 15 1576 244 196 R 0 0.0 0:00.08 loop 2491 root 36 16 1576 248 196 R 0 0.0 0:00.06 loop 2493 root 38 18 1576 248 196 R 0 0.0 0:00.03 loop 2492 root 37 17 1576 244 196 R 0 0.0 0:00.04 loop 2494 root 39 19 1576 244 196 R 0 0.0 0:00.02 loop check a few select rows (the ratio of CPU time should be 1.25 at every step) and see that CPU time is distributed very exactly. (and the same is true for both -rc6 and -rc6-cfs-devel) So even in this pretty extreme example (who on this planet runs 40 busy loops with each loop on exactly one separate nice level, creating a load average of 40.0 and expects perfect distribution after just a few minutes?) CFS still distributes CPU time perfectly. When you first raised accuracy issues i have asked you to provide specific real-world examples showing any of the "problems" with nice levels you implied to repeatedly: http://lkml.org/lkml/2007/9/2/38 In the announcement of your "Really Fair Scheduler" patch you used the following very strong statement: " This model is far more accurate than CFS is [...]" http://lkml.org/lkml/2007/8/30/307 but when i stressed you for actual real-world proof of CFS misbehavior, you said: "[...] they have indeed little effect in the short term, [...] " http://lkml.org/lkml/2007/9/2/282 so how can CFS be "far less accurate" (paraphrased) while it has "little effect in the short term"? so to repeat my question: my (and Peter's) claim is that there is no real-world significance of much of the complexity you added to avoid rounding effects. You do disagree with that, so our follow-up question is: what actual real-world significance does it have in your opinion? What is the worst-case effect? Do we even care? We have measured it every which way and it just does not matter. (but we could easily be wrong, so please be specific if you know about something that we overlooked.) Thanks, Ingo
From: Willy Tarreau [email blocked] Subject: Re: [announce] CFS-devel, performance improvements Date: Fri, 14 Sep 2007 01:08:36 +0200 Roman, I've been trying to follow your mails about CFS since your review posted on Aug 1st. Back to that date, I was thinking "cool, an in-depth review by someone who understands schedulers and mathematics very well, we'll quickly have a very solid design". On Aug 10th, I was disappointed to see that you still had not provided the critical information that Ingo had been asking to you for 9 days (cfs-sched-debug output). Your motivations in this work started to become a bit fuzzy to me, since people who behave like this generally do so to get all the lights on them and you really don't need this. Your explanation was kind of "show me yours and only then I'll show you mine". Pretty childish but you finally sent that long-requested information. Since then, I've been noticing your now popular "will I get a response to my questions" stuffed in most of your mails. That was getting very suspicious from someone who can write down mathematics equations to prove his design is right, especially considering the fact that your "question" only relates to what a few lines were supposed to do. Nobody believes that someone as smart as you is still blocked on the same line of code after one month! And if getting CFS fixed wasn't your real motivation... On Thu, Sep 13, 2007 at 12:17:42AM +0200, Roman Zippel wrote: > On Tue, 11 Sep 2007, Ingo Molnar wrote: > > > fresh back from the Kernel Summit, Peter Zijlstra and me are pleased to > > announce the latest iteration of the CFS scheduler development tree. Our > > main focus has been on simplifications and performance - and as part of > > that we've also picked up some ideas from Roman Zippel's 'Really Fair > > Scheduler' patch as well and integrated them into CFS. We'd like to ask > > people go give these patches a good workout, especially with an eye on > > any interactivity regressions. > > I'm must really say, I'm quite impressed by your efforts to give me as > little credit as possible. > On the one hand it's of course positive to see so much sudden activity, on > the other hand I'm not sure how much had happened if I hadn't posted my > patch, I don't really think it were my complaints about CFS's complexity > that finally lead to the improvements in this area. I'm now fairly convinced that you're not seeking credits either. There are more credits to your name per line of patch here than there is in your own code in the kernel. That complaint does not stand by itself. In fact, I'm beginning to think that you're like a cat who has found a mouse. Why kill it if you can play with it ? Each of your "will I get a response" are just like a small kick in the mouse's back to make it move. But by dint of doing this, you're slowly pushing the mouse to the door where it risks to escape from you, and you're losing your toy. So right now, I'm sure you really do not want to get any code merged. It's so much fun for you to say "hey, Ingo, respond to me" that you would lose this ability would your code get merged. > I presented the basic > concepts of my patch already with my first CFS review, but at that time > you didn't show any interest and instead you were rather quick to simply > dismiss it. At that time, if my memory serves me, you were complaining about a fairness problem you had with a few programs that you already took days to show the sources. Proposing an alternate design with a bug report generally has no chance to be considered because the developer mostly focuses on the bug report. You should have spent time explaining how your design would work *after* your problems were solved. > My patch did not add that much new, it's mostly a conceptual > improvement and describes the math in more detail - why those details were never explained in pure english when nobody could understand your maths, then ? - if you have no problem reading code and translating it to concepts, without any comment around it, then how is it believable that you have a problem understanding 10 lines of code after 1 month ? >, but it also demonstrated a number of improvements. Very likely, reason why Ingo and Peter accepted to take parts of those improvements. But do you realize that your lack of ability to communicate on this list has probably delayed mainline integration of parts of your work, because it was required to get a patch to try to understand your intents ? It's not sci.math here, its linux-kernel, the _development_ mailing list, where the raw material and common language between people is the _code_. Some people do not have the skills required to code their excellent ideas, but they can spend time explaining those to other people. In your case, it was just a guess game. It does not work like this and you know it. I really think that you deliberately slowed all the process down in order to stay on the scene playing this game. > > The sched-devel.git tree can be pulled from: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel.git > > Am I the only one who can't clone that thing? So I can't go into much > detail about the individual changes here. Even your question here is suspicious: the fact that you wonder whether you're the only one implies you think it could be possible, thus implying something intentionally targetted at you. And no, do not tell me you meant you could have failed your git-clone command, you would have asked differently, such as "sorry, I cannot clone from there right now". You're taking advantage of everything around you to show that there is a deliberate intention not to cooperate. > The thing that makes me curious, is that it also includes patches by > others. It can't be entirely explained with the Kernel Summit, as this is > not the first time patches appear out of the blue in form of a git tree. Once again: implied accusation of things being done without you knowing about them. What's wrong with this? Fortunately, Linus does not tell you when he merges a patch by someone different than you. > The funny/sad thing is that at some point Linus complained about Con that > his development activity happend on a separate mailing list, but there was > at least a place to go to. CFS's development appears to mostly happen in > private. Not trying to take Ingo's defense because I too think he tends to show his code when it's well advanced, but it's often required to work with pencil and paper for hours before you suddenly can start. After that, it's true that changes can advance very fast. After all, how many iterations did you send before the patch that Ingo and Peter used ? Only one (or maybe zero, depending on what patch they started with). So you're dishonnest again. > Patches may be your primary form of communication, but that isn't > true for many other people, It's true that most of my family and relatives do not speak this language, but you would find it funny to discover that on this list, it's the most common form of expression. Look at the subjects. Most of them begin with '[PATCH]'. And even code reviews are done in patch form with lines starting with '-' and '+'. I won't tell you further, I know you know it, you were just playing the dumb. > with patches a lot of intent and motivation for a change is lost. Yes, that's true. And I think that you deliberately avoided any comments in your code exactly for this reason: slow down its integration process to play a bit longer here. Would it have been that hard to put comments to indicate people *your* intents ? > I know it's rather tempting to immediately try out > an idea first, but would it really hurt you so much to formulate an idea > in a more conventional manner? This is funny! Several people have been asking you to reformulate your ideas that nobody could understand because of your math notation, which you never did (at least not completely, just some parts). The conventional manner _is_ the patch on LKML. > Are you afraid it might hurt your > ueberhacker status by occasionally screwing up in public? Not speaking for Ingo of course, but I'd ask "and you?". Do you feel any particular pride of being able to send formulas nobody understands, and would it hurt your status explaining them to the normal people? (I mean "normal" for this list, you remember, the ones who only communicate in English or Patch). > Patches on the > other hand have the advantage to more easily cover that up by simply > posting a fix - it makes it more difficult to understand what's going on. I don't get it, it cannot hide a history. It happens to me very often to rediff any set of patches and/or launch interdiff to see what changed between multiple versions. On the other hand, if you would send 3 consecutive mails with your magnificient formulas, nobody would notice any change! > A more conventional way of communication would give more people a chance > to participate Exactly, that's what was asked to you! After 15 minutes reading your mail and trying to decipher it, I finally gave up. That's sad because it looked very interesting. I'm all for demonstrable designs instead of empirical ones. > they may not understand every detail of the patch, but > they can try to understand the general concepts and apply them to their > own situation and eventually come up with some ideas/improvements of their > own, they would be less dependent on you to come up with a solution to > their problem. I think that it's what Ingo and Peter did: try to apply their understanding of your concepts to their implementation, without being too much dependant on you to come up with a solution. > Unless of course that's exactly what you want - unless you want to be in > full control of the situation And we're at it! You've been controlling the situation pretty well for the last month. People politely entreating you to explain what you considered wrong, how your design worked, etc... Even your mail rate on LKML has doubled since August. You might have been feeling horny! > and you want to be the hero that saves the day. Right now, nobody saves the day. The Linux development process looks like a playground with little kids sending sand into their eyes. Lamentable! > In the patch you really remove _a_lot_ of stuff. You also removed a lot of > things I tried to get you to explain them to me. On the one hand I could > be happy that these things are gone, as they were the major road block to > splitting up my own patch. On the other hand it still leaves me somewhat > unsatisfied, as I still don't know what that stuff was good for. You do not appear sincere. You might have been believing this the first few days, but insisting for ONE MONTH on this part of the code means that you found a flaw in it or you found it did not serve any purpose, and you wanted Ingo to tell you anything about this so that you could reply "bullshit, it does not work". Now I suspect it was simply useless and they finally realized it then removed the code. What would it have cost you to say "It seems to me that this code does nothing" ? You would have got credited for it, since you're asking for that. > In a more collaborative development model I would have expected that you > tried to explain these features, which could have resulted in a discussion > how else things can be implemented or if it's still needed at all. Instead > of this you now simply decide unilaterally that these things are not > needed anymore. You know like me that explaining concepts by mail take *a lot* of time. I even refuse to do this anymore with the people I work with. Wasting 4 hours writing down something which goes to the bin in 5 minutes is stupid at best. Better refine the thinking all in our corners, and either meet or discuss the small pieces by mail. > > there's also a visible reduction in code size: > > > > text data bss dec hex filename > > 13369 228 2036 15633 3d11 sched.o.before (UP, nodebug) > > 11167 224 1988 13379 3443 sched.o.after (UP, nodebug) > > Well, one could say that you used every little trick in the book to get > these numbers down. And why is this wrong ? I too spend a lot of time reducing and optimizing code, sometimes even 1 hour to reduce some primitives by a few bytes or cycles on most architectures I can test, and it often pays off. At this stage of the development, its not unreasonable to try to reduce code size, since it is not meant to change a lot. And 15% is not bad at all! > On the other hand at this point it's a little unclear > whether you maybe removed it a little too much to get there, so the > significance of these numbers is a bit limited. That's clearly possible. But how would one say, given the level of outbound filtering you apply to your advices ? > > Changes: besides the many micro-optimizations, one of the changes is > > that se->vruntime (virtual runtime) based scheduling has been introduced > > gradually, step by step - while keeping the wait_runtime metric working > > too. (so that the two methods are comparable side by side, in the same > > scheduler) > > I can't quite see that, the wait_runtime metric is relative to fair_clock > and this is gone without any replacement, in my patch I at least > calculate these values for the debug output, but in your patch even that > is simply gone, so I'm not sure what you actually compare "side by side". Ah, this is where the useful information was hidden. In most mails from you, there's often : - a ton of crap - one complaint - a ton of crap - a very useful advice - a ton of crap Very easy after that yo ask for responses to your question and to say "I told you 1 month ago...". And don't pretend it's unintentional, I've been playing the same game with some other people for years in other contexts! > > The ->vruntime metric is similar to the ->time_norm metric used by > > Roman's patch (and both are losely related to the already existing > > sum_exec_runtime metric in CFS), it's in essence the sum of CPU time > > executed by a task, in nanoseconds - weighted up or down by their nice > > level (or kept the same on the default nice 0 level). Besides this basic > > metric our implementation and math differs from RFS. > > At this point it gets really interesting - I'm amazed how much you stress > the differences. Many people would be amazed how much you exagerate the fact that there are differences. Indeed, of those 6 lines, 5 are about similarity, and one is about a different implementation and math. I don't see "how much he stresses the differences". > If we take the basic math as I more simply explained it > in this example http://lkml.org/lkml/2007/9/3/168, you now also make the > step from the relative wait_runtime value to an absolute virtual time > value. Basically it's really the same thing, only the resolution differs. > This means you already reimplemented a key element of my patch, so would > you please give me at least that much credit? Ah, the episode of the guy having his code counterfeited with no credit. Anyway, since it's your idea, I too think that there should be coments in the code stating this, close to the explanations. > The rest of the math is indeed different - it's simply missing. What is > there is IMO not really adequate. I guess you will see the differences, > once you test a bit more with different nice levels. There's a good reason > I put that much effort into maintaining a good, but still cheap average, > it's needed for a good task placement. And this reason is ? > There is of course more than one > way to implement this, so you'll have good chances to simply reimplement > it somewhat differently, but I'd be surprised if it would be something > completely different. It would be stupid if they had to reimplement something they did not understand from your work. I would personally feel really desperate if I spent that much time inventing very smart concepts that people did not get right because I was totally unable to explain something with humain-understandable words. > To make it very clear to everyone else: this is primarely not about > getting some credit, although that's not completely unimportant of course. I believe you on this one. > This is about getting adequate help. I don't believe you on this one. Getting help is mostly what Ingo and Peter have been seeking from you and got in small parts with lots of difficulties. You could show everyone here that your brain really needs no help when it comes to play with those algorithms, but it likes to play and often with the same games. > I had to push very hard to get any > kind of acknowledgment with scheduler issues only to be rewarded with > silence, so that I was occassionally wondering myself, whether I'm just > hallucinating all this. Not credible, you should renice the amazing factor in your complaints, it's just a poor theatre play we're assisting to. With slightly less exageration, it might be believable. > Only after I provide the prove that further > improvements are possible is there some activity. That's true. What's unfortunate is that this proof was also the first understandable starting point. > But instead of providing > help (e.g. by answering my questions) Ingo just goes ahead and > reimplements the damn thing himself and simply throws out all questionable > items instead of explaining them. Oh, the persecuted guy again with his persistant pain due to the lack of answer to his same question since last month. > The point is that I have no real interest in any stinking competition, I > have no interest to be reduced to simply complaining all the time. False! It's the way you're trying to prove Ingo is a bastard and that you're a victim. But if we just re-read a few pick-ups of your mails since Aug 1st, its getting pretty obvious that you completely made up this situation. And I can only applaud you very high manipulation skills, I'm impressed, because you got me for a long time. But as always when such people are constantly pushing the limits further, they reveal themselves. > I'm more interested in a cooperation, Did I say that I doubt about it now ? > but that requires better communication and an actual exchange of ideas, > patches are no real communication, they are a supplement and should > rather be the end result instead of means to get anything started. On some complex algorithms, you may be right. But a quick and dirty patch has the advantage of showing the ideas and concepts in a way that many people can understand and comment on. > A collaboration could bundle the individual > strengths, so that this doesn't degenerate into a contest of who's the > greatest hacker. > Is this really too much to expect? Well, what are you going to do next? - wait for a no response and say "everybody, look, the weak bastard in front of me refuses the fight" ? - split up your patches and add comments in them so that Ingo and Peter finally understand what you really mean and not only what you're willing to show them ? - open a new thread on LKML detailing your ideas one at a time and proposing others to implement them if you cannot code cleanly ? - anything else? (eg: consult a specialist in schizophrenia?) You could at least choose to prove your intent to contribute by rediffing your patch against the last one which tries to imitate it, and commenting the result, then splitting it up in as many parts as you see fit. And to reuse a phrase from your last mail : > Is this really too much to expect? I sincerely hope you'll make everyone benefit from your unequalled skills, and that you will stop playing cat and mouse. It's boring for many people, and counter-productive. > bye, Roman Thanks, Willy
From: Roman Zippel [email blocked] Subject: Re: [announce] CFS-devel, performance improvements Date: Wed, 12 Sep 2007 02:42:25 +0200 (CEST) Hi, Hi, Out of curiousity: will I ever get answers to my questions? bye, Roman
From: Ingo Molnar [email blocked] Subject: Re: [announce] CFS-devel, performance improvements Date: Thu, 13 Sep 2007 09:52:58 +0200 * Roman Zippel [email blocked] wrote: > Hi, > > Out of curiousity: will I ever get answers to my questions? the last few weeks/months have been pretty hectic - i get more than 50 non-list emails a day so i could easily have missed some. (and to take a line from Linus: my attention span is roughly that of a slightly retarded golden retriever ;) so it would be helpful if you could please re-state any questions you still have, in context of our latest CFS-devel queue. I tried to answer the error/rounding worries you had - which seemed to be the main theme of your patch. There are lots of good kernel hackers on lkml who know the new scheduler code pretty well and who might be able to provide an answer even if i dont manage to answer. (Perhaps asking the questions without heavy math will also help more people be able to understand and answer your questions and their practical relevance.) In any case - if you see packet loss on my side then please resend :) That would be hugely helpful. Thanks, Ingo
From: Roman Zippel [email blocked] Subject: Re: [announce] CFS-devel, performance improvements Date: Thu, 13 Sep 2007 14:35:35 +0200 (CEST) Hi, On Thu, 13 Sep 2007, Ingo Molnar wrote: > > Out of curiousity: will I ever get answers to my questions? > > the last few weeks/months have been pretty hectic - i get more than 50 > non-list emails a day so i could easily have missed some. Well, let's just take the recent "Really Simple Really Fair Scheduler" thread. You had the time to ask me questions about my scheduler, I even explained to you how the sleeping bonus works in my model. At the end I was sort of hoping you would start answering my questions and explaining things how the same things work in CFS - but nothing. Then you had the time to reimplement the very things you've just asked me about and what do I get credit for - "two cleanups from RFS". And now I get this lame ass excuse for not answering my questions? :-( bye, Roman
From: Ingo Molnar [email blocked] Subject: Re: [announce] CFS-devel, performance improvements Date: Thu, 13 Sep 2007 16:28:42 +0200 * Roman Zippel [email blocked] wrote: > Then you had the time to reimplement the very things you've just asked > me about and what do I get credit for - "two cleanups from RFS". i'm sorry to say this, but you must be reading some other email list and a different git tree than what i am reading. Firstly, about communications - in the past 3 months i've written you 40 emails regarding CFS - and that's more emails than my wife (or any member of my family) got in that timeframe :-( I just ran a quick script: i sent more CFS related emails to you than to any other person on this planet. I bent backwards trying to somehow get you to cooperate with us (and i still havent given up on that!) - instead of you disparaging CFS and me frequently :-( Secondly, i prominently credited you as early as in the second sentence of our announcement: | fresh back from the Kernel Summit, Peter Zijlstra and me are pleased | to announce the latest iteration of the CFS scheduler development | tree. Our main focus has been on simplifications and performance - | and as part of that we've also picked up some ideas from Roman | Zippel's 'Really Fair Scheduler' patch as well and integrated them | into CFS. We'd like to ask people go give these patches a good | workout, especially with an eye on any interactivity regressions. http://lkml.org/lkml/2007/9/11/395 And you are duly credited in 3 patches: -------------------> Subject: sched: introduce se->vruntime introduce se->vruntime as a sum of weighted delta-exec's, and use that as the key into the tree. the idea to use absolute virtual time as the basic metric of scheduling has been first raised by William Lee Irwin, advanced by Tong Li and first prototyped by Roman Zippel in the "Really Fair Scheduler" (RFS) patchset. also see: http://lkml.org/lkml/2007/9/2/76 for a simpler variant of this patch. -------------------> Subject: sched: track cfs_rq->curr on !group-scheduling too Noticed by Roman Zippel: use cfs_rq->curr in the !group-scheduling case too. Small micro-optimization and cleanup effect: -------------------> Subject: sched: uninline __enqueue_entity()/__dequeue_entity() suggested by Roman Zippel: uninline __enqueue_entity() and __dequeue_entity(). -------------------> We could not add you as the author, because you unfortunately did not make your changes applicable to CFS. I've asked you _three_ separate times to send a nicely split up series so that we can apply your code: " it's far easier to review and merge stuff if it's nicely split up. " http://lkml.org/lkml/2007/9/2/38 " I also think that the core math changes should be split from the Breshenham optimizations. " http://lkml.org/lkml/2007/9/2/43 " That's also why i've asked for a split-up patch series - it makes it far easier to review and test the code and it makes it far easier to quickly apply the obviously correct bits. " http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg204094.html You never directly replied to these pretty explicit requests, all you did was this side remark 5 days later in one of your patch announcements: " For a split version I'm still waiting for some more explanation about the CFS tuning parameter. " http://lkml.org/lkml/2007/9/7/87 You are an experienced kernel hacker. How you can credibly claim that while you were capable of writing a new scheduler along with a series of 25 complex mathematical equations that few if any lkml readers are able to understand (and which scheduler came in one intermixed patch that added no new comments at all!), and that you are able to maintain the m68k Linux architecture code, but that at the same time some supposed missing explanation from _me_ makes you magically incapable to split up _your own fine code_? This is really beyond me. I even gave you the first baby step of the split-up by sending this: http://lkml.org/lkml/2007/9/2/76 And your reaction to this was dismissive: " It simplifies the math too much, the nice level weighting is an essential part of the math and without it one can't really understand the problem I'm trying to solve. " http://lkml.org/lkml/2007/9/3/174 So we advanced this whole issue by trying the vruntime concept in CFS and adding the 2 cleanups from RFS (we couldnt actually use any code from you, due to the way you shaped your patch - but we'd certainly be glad to!). You've seen the earliest iteration of that at: http://lkml.org/lkml/2007/9/2/76 So far you've sent 3 updates of your patch without addressing any of the structural feedback we gave. We virtually begged you to make your code finegrained and applicable - but you did not do that. And please understand, splitting up patches is paramount when cooperating with others: we are not against adding code that makes sense (to the contrary and we do that every day), but it has to be done gradually, in order of utility and impact, so please do send finegrained patches if you wish to contribute. (but plain verbal feedback is useful too - whichever you prefer.) I asked you to send a split-up queue repeatedly and finally we ended up extracting _one_ concept from your patch (which concept was suggested by others months ago already, in the CFS discussions) and two cleanups. You are credited for that in the patches. Please send us your other changes as a finegrained series and if they are applied you are (of course) credited as the author. Does this sound good to you? Ingo
From: Roman Zippel [email blocked] Subject: Re: [announce] CFS-devel, performance improvements Date: Thu, 13 Sep 2007 18:50:12 +0200 (CEST) Hi, On Thu, 13 Sep 2007, Ingo Molnar wrote: > > Then you had the time to reimplement the very things you've just asked > > me about and what do I get credit for - "two cleanups from RFS". > > i'm sorry to say this, but you must be reading some other email list and > a different git tree than what i am reading. > > Firstly, about communications - in the past 3 months i've written you 40 > emails regarding CFS - and that's more emails than my wife (or any > member of my family) got in that timeframe :-( I just ran a quick > script: i sent more CFS related emails to you than to any other person > on this planet. I bent backwards trying to somehow get you to cooperate > with us (and i still havent given up on that!) - instead of you > disparaging CFS and me frequently :-( > > Secondly, i prominently credited you as early as in the second sentence > of our announcement: > > | fresh back from the Kernel Summit, Peter Zijlstra and me are pleased > | to announce the latest iteration of the CFS scheduler development > | tree. Our main focus has been on simplifications and performance - > | and as part of that we've also picked up some ideas from Roman > | Zippel's 'Really Fair Scheduler' patch as well and integrated them > | into CFS. We'd like to ask people go give these patches a good > | workout, especially with an eye on any interactivity regressions. > > http://lkml.org/lkml/2007/9/11/395 > > And you are duly credited in 3 patches: This needs a little perspective, as I couldn't clone the repository (and you know that), all I had was this announcement, so using the patch descriptions now as defense is unfair by you. In this announcement you make relatively few references how this relates to my work. Maybe someone else can show me how to read that announcement differently, but IMO the casual reader is likely to get the impression, that you only picked some minor cleanups from my patch, but it's rather unclear that you already reimplemented key aspects of my patch. Don't blame me for your own ambiguity. > -------------------> > > Subject: sched: introduce se->vruntime > > introduce se->vruntime as a sum of weighted delta-exec's, and use > that as the key into the tree. > > the idea to use absolute virtual time as the basic metric of > scheduling has been first raised by William Lee Irwin, advanced by > Tong Li and first prototyped by Roman Zippel in the "Really Fair > Scheduler" (RFS) patchset. > > also see: > > http://lkml.org/lkml/2007/9/2/76 > > for a simpler variant of this patch. Let's compare this to the relevant part of the announcement: | The ->vruntime metric is similar to the ->time_norm metric used by | Roman's patch (and both are losely related to the already existing | sum_exec_runtime metric in CFS), it's in essence the sum of CPU time | executed by a task, in nanoseconds - weighted up or down by their nice | level (or kept the same on the default nice 0 level). Besides this basic | metric our implementation and math differs from RFS. In the patch you are more explicit about the virtual time aspect, in the announcement you're less clear that it's all based on the same idea and somehow it's important to stress the point that "implementation and math differs", which is not untrue, but your forget to mention that the differences are rather small. > You never directly replied to these pretty explicit requests, all you > did was this side remark 5 days later in one of your patch > announcements: This is ridiculous, I asked you multiple times to explain to me some of the differences relative to CFS as response to the splitup requests. Not once did you react, you didn't even ask what I'd like to know specifically. > > " For a split version I'm still waiting for some more explanation > about the CFS tuning parameter. " > > http://lkml.org/lkml/2007/9/7/87 > > You are an experienced kernel hacker. How you can credibly claim that > while you were capable of writing a new scheduler along with a series of > 25 complex mathematical equations that few if any lkml readers are able > to understand (and which scheduler came in one intermixed patch that > added no new comments at all!), and that you are able to maintain the > m68k Linux architecture code, but that at the same time some supposed > missing explanation from _me_ makes you magically incapable to split up > _your own fine code_? This is really beyond me. I never claimed to understand every detail of CFS, I can _guess_ what _might_ have been intended, but from that it's impossible to know for certain how important they are. Let's take this patch fragment: - /* - * Fix up delta_fair with the effect of us running - * during the whole sleep period: - */ - if (sched_feat(SLEEPER_AVG)) - delta_fair = div64_likely32((u64)delta_fair * load, - load + se->load.weight); - - delta_fair = calc_weighted(delta_fair, se); You simply remove this logic, without ever explaining what it was really good for. There is no indication how it has been replaced. AFAICT the comment refers to the calc_weighted() call, which is not the problem. I can _guess_ that it was meant to scale the bonus based on cpu load, what I can't guess from this is why this logic was added in first place, _why_ it was neccessary. It had been easy for me to simply remove this as well, but I had preferred to _know_ what the motivation for this logic was, so I can take it into account in my patch. bye, Roman
From: Kyle Moffett [email blocked] Subject: Re: [announce] CFS-devel, performance improvements Date: Thu, 13 Sep 2007 14:28:13 -0400 On Sep 13, 2007, at 12:50:12, Roman Zippel wrote: > On Thu, 13 Sep 2007, Ingo Molnar wrote: >> And you are duly credited in 3 patches: > > This needs a little perspective, as I couldn't clone the repository > (and you know that), all I had was this announcement, so using the > patch descriptions now as defense is unfair by you. How the hell is that unfair? The fact that nobody could clone the repo for about 24 hours is *totally* *irrelevant* to the whole discussion as it's simply a matter of a technical glitch. His point in referencing patch descriptions is to clear up matters of credit. Ingo has never in this discussion been "out to get you". From the point of view of a sideline observer it's been *you* that has been demanding answers and refusing to answer questions directed at you. The most brilliant mathematician in the world would have nothing to contribute to the Linux scheduler if he couldn't describe, code, and comment his algorithm in detail so that others (even code-monkeys like myself) could grok at least the basic outline and be able to give useful commentary and suggestions. > In this announcement you make relatively few references how this > relates to my work. Maybe someone else can show me how to read > that announcement differently, but IMO the casual reader is likely > to get the impression, that you only picked some minor cleanups > from my patch, but it's rather unclear that you already > reimplemented key aspects of my patch. As a casual reader and reviewer I have yet to actually see you post readable/reviewable patches in this thread. I was basically completely unable to follow the detailed math you go into (even with a math minor) due to your *complete* lack of comments. The fact that you renamed files and didn't split up your patch made it useless for actual practical kernel development, its only value was as a comparison point. I did however get the impression that Ingo got something significantly useful out of your code despite the problems, but I still haven't had time to read through his and Peter's patches in detail to understand exactly what it was. From personal inspection of a fair percentage of the changes that Ingo and Peter committed, they certainly appear to be deleting a lot more code than they add. More specifically they appear to describe in detail what they are deleting and why, with the exception of one patch that's missing a changelog entry. So yeah, I get the impression that Ingo re-implemented some ideas that you had because you refused to do so in a way that was acceptable for the upstream kernel. How exactly is this a bad thing? You came up with a great idea that worked and somebody else did the ugly grunt work to get it ready to go upstream! On the other hand, given the "pleasant" attitude that you've showed Ingo during this whole thing I doubt he'd be likely to do it again. >> You never directly replied to these pretty explicit requests, all >> you did was this side remark 5 days later in one of your patch >> announcements: > > This is ridiculous, I asked you multiple times to explain to me > some of the differences relative to CFS as response to the splitup > requests. Not once did you react, you didn't even ask what I'd like > to know specifically. How exactly is Ingo supposed to explain to YOU the differences between his scheduler and your modified one? Completely ignoring the fact that you merged all your changes into a single patch and didn't add a single comment, it's not *his* algorithm that I have trouble understanding. From a relatively basic scan of the source-code and comments I was able to figure out how the algorithm works in general, enough to ask much more specific questions than yours. If anything, Ingo should have been asking *you* how your scheduler differed from the one it was based on. > I never claimed to understand every detail of CFS, I can _guess_ > what _might_ have been intended, but from that it's impossible to > know for certain how important they are. Let's take this patch > fragment: Oh come on, you appear to be quite knowledgeable about CPU scheduling and the algorithms involved, surely as such you should have a much easier time with reading the comments and asking specific questions. For example, your below question specifically about the sleep averaging could have been answered in fifteen minutes had you actually *ASKED* that. You'll notice that in fact Peter Zijlstra's email response did come almost exactly 15 minutes after you sent this email, and for a casual reader like me it seems perfectly sufficient; it does depend on you asking specific questions instead of "how does it differ from my hundred-kbyte patch". As for that specific patch, it's very clear that the affected logic is controlled by one of the sched-feature tweaking tools, so you could very easily experiment with it yourself to see what the differences are when the feature is on or off and whether or not it's useful or harmful for your workloads. Such evidence would help indicate which way the scheduler feature should be hard-coded when the tunable is finally removed. Cheers, Kyle Moffett



Related Links:

Zippel replies...

September 14, 2007 - 9:47am
Anonymous (not verified)

"All I want is to be taken a bit more seriously, the communication aspect I mentioned is really important. From my perspective Ingo is somewhere up on his pedestal and I have to scream to get any kind of attention."

http://kerneltrap.org/mailarchive/linux-kernel/2007/9/14/259921

Meanwhile, more kernel devs

September 14, 2007 - 3:42pm
Anonymous (not verified)

Meanwhile, more kernel devs who followed this have entered the discussion on lkml. So far all of them were critical of Zippel and some of them even questioned his motives, rather pointedly. That pretty much tells it all. Having read through most of the thread I find it amazing how much patience Molnar has with this kind of crap. I guess that's what makes for a good maintainer.

"all of them were critical

September 17, 2007 - 1:31am
Anonymous (not verified)

"all of them were critical of Zippel"--> false

"some of them even questioned his motives, rather pointedly"--> that was one individual as far as I can see

"That pretty much tells it all." --> you're an idiot

A difficult situation

September 14, 2007 - 11:34am
Anonymous (not verified)

I have been following this discussion for some time, and I think that Roman is being a bit childish. Ingo has been nothing but professional in his correspondence. Maybe.. just maybe Roman is annoyed about what might seem like Ingo's ownership (dictatorship?) of scheduler decisions... Well, maybe im wrong about that but I know it would be a lot better for *everyone* if they would just kiss and make up! Roman is clearly a talented hacker and mathematician. Hopefully Roman will stick it out and we will not lose his contributions. I miss Con!

I get that feeling, too.

September 14, 2007 - 1:39pm
Anonymous (not verified)

I get that feeling, too. Roman's throwing out a pile of math, obtuse patches, asking hard-to-answer questions, and basically working very hard at tripping up the scheduler developers. It just screams "challenge". Unfortunately, the tricks he's pulling might work in an academic environment, but in this context he just comes across as pathetic.

I've always considered Ingo as a bit of an asshole, but he's being more patient with this junk than I ever could be.

I've always considered Ingo

September 14, 2007 - 2:19pm
Anonymous (not verified)

I've always considered Ingo as a bit of an asshole,

You have to consider that Ingo is not natively English, he does use surprising language sometimes but I think how patient he is with Roman (and others..) shows that in fact he's a polite gentleman.

Interview

September 14, 2007 - 2:28pm
Anonymous (not verified)

I was just reading this old KernelTrap interview with Ingo, left me with a rather good impression, actually.

He's remarkably consistent, too.

September 14, 2007 - 3:56pm

JA: Did you base the design on any existing scheduler implementations or research papers?

Ingo Molnar: this might sound a bit arrogant, but i have only read (most of the) research papers after writing the scheduler. This i found to be a good approach in the area of Linux - knowing about too many well-researched details can often confuse the real direction we have to take. I like writing new code, and i prefer to approach things from the physics side: take a few elementary rules and build up the 'one correct' solution, no compromises. This might not be as effective as first reading all the available material and then cherry-picking a few ideas and thinking up the remaining things, but it sure gives me lots of fun :-)

[ One thing i always try to ensure: i take a look at all existing kernel patches that were announced on the linux-kernel mailing list in the same area, to make sure there's no duplication of effort or NIH syndrome. Since such kernel mailing-list postings are progress reports of active research, it can be said that i read alot of bleeding-edge research. ]

Emphasis added. Ingo follows the age old LKML mantra: Show me the code. If it's not in a patch on the list, it doesn't exist. I see nothing wrong with this.

--
Program Intellivision and play Space Patrol!

Molnar must go!

September 14, 2007 - 2:56pm
Anonymous (not verified)

Ingo Molnar cannot do collaborative development. He has lost leadership control and alienated some brilliant contributors. Get rid of him already.

Mr. Ballmer, no need to be

September 14, 2007 - 3:05pm
Anonymous (not verified)

Mr. Ballmer, no need to be polite here. Tell us how you really feel about Linux - get it out of your system already! :-D

Molnar must go???!!!

September 14, 2007 - 4:22pm
Anonymous (not verified)

What are you talking about? Ingo "cannot do collaborative development" you said???
Have you any idea how many developers have already contribute patches to CFS?? Lets see...
Peter Zijlstra
Dmitry Adamushko
Matthias Kaehlcke
Suresh Siddha
Mike Galbraith
Ting Yang
Suresh Siddha
Oleg Nesterov
Sven-Thorsten Dietrich
Peter Williams
Josh Triplett
...
...

But I guess you are too blind to see that e?... (sigh)

Yeah, but...

September 14, 2007 - 6:37pm

...all the GIT trees come from Ingo. Therefore, he's not collaborative. He just puts those other peoples' names in to minimize their contributions.

*BWAHAHAHAHAAA*

Sorry, I couldn't say that with a straight face. :-)

Ingo's the de facto maintainer, and so yes, he owns the tree, chooses what to integrate and to leave out, and pushes the result. Duh. That's how it's supposed to work. That doesn't imply it happens in a vacuum, no matter how much some peoples' hot air works to suck the oxygen out of the room.

--
Program Intellivision and play Space Patrol!

The Scheduler Mafia

September 15, 2007 - 8:28pm
Anonymous (not verified)

"Have you any idea how many developers have already contribute patches to CFS?"

Molnar has not only stiffled the contributions of some bright developer (Con & Roman). He pointlessly and dishonestly reimplemented their ideas and got the code jammed into the kernel, even though he changes it week to week. As for the rest of Molnar's sycophants, if you can't see the Scheduler Mafia for what they are you are obtuse.

Scheduler Mafia?

September 16, 2007 - 12:35am
Flewellyn (not verified)

You're serious, then? You genuinely presume some sort of collusion to enhance Ingo's reputation at the expense of others?

Good heavens, sir! I fear I'm moved to enquire with what substance you fill your pipe to smoke!

Dance dance

September 14, 2007 - 6:29pm
Anonymous (not verified)

Monkey boy!

^ ^'
O

It's about Con

September 14, 2007 - 6:12pm
Anonymous (not verified)

Let's put this in perspective. I think Roman started being aggressive toward Ingo after the Con incident.

Ingo has always been in charge of Linux scheduler. However, despite various attempts from other people to improve scheduler, Ingo was never convinced/willing to do anything real about it, except:
1. The O(1) scheduler
2. The CFS

Both were long due and driven by competition: when someone came up with a better idea and implementation, Ingo got motivated and came up with his own solution in a week. I consider this a flaw in the "maintainership" system: if someone is not motivated enough but does not want to lose the title, development comes into stall.

Ingo is not perfect and does owe Con some explanation (and maybe an apology). But Roman's been childish too.

It' not about Con

September 14, 2007 - 10:43pm
Anonymous (not verified)

well I don't believe that. Ingo might not be perfect (who is?) but he doesn't own an apology to Con.

CFS has nothing to do with Con's SD/RDSL. The only common piece they have is that they both strive for fairness. And Ingo acknowledged Con's efforts who proved that fairness was feasible and good for desktop applications.

That's it. CFS technical concepts are more advanced than SD/RDSL and that's why a big deal of kernel developers went for CFS and not for SD/RDSL. Even Roman. Just google a bit and shall see.. The development of CFS was all that open source is about. Hundreds of mail exchanges between developers ( Li Tong N, Tong Li (these names are Chinese to me :) William Lee Irwin III, Balbir Singh, Cedric Le Goater, Nick Piggin, Satyam Sharma, Bill Huey, Arjan van de Ven, Willy Tarreau to name a few..) disagreements, disputes, arguments, propositions, patches that some found their way in, most of them went to the trash can. Ingo alone or even with a team around him, could never had reached to a so mature implementation of CFS in so little time.

This was a collaborative work. This is open source at its best!

The best technical solution was spontaneously chosen. That is, and it has nothing to do with Con.

And now that CFS has no rival, development does not stop. Check Srivatsa Vaddagiri's effords to add group awareness to CFS. [tasks/users/containers]

Ingo sometimes behaves like an arrogant bastard (especially towards Roman and NOT Con) and Roman like a child, but this is their style. And I like them both! The same story happened with the HIGH-RES timers. But Roman at the end contributed significantly. I just hope this will happen again. :D

"This is open source at its best!"

September 14, 2007 - 11:11pm
Anonymous (not verified)

Amen!

Jobs like this is why attracts me so much to Linux world and open-source. Transparency and collaboration (at it's good and bad times) is what this is all about. A big job done in a little time: just give my congratulations to all this people.

Keep up the _good_ work guys!

> CFS has nothing to do with

September 15, 2007 - 3:16am
Anonymous (not verified)

> CFS has nothing to do with Con's SD/RDSL.

I didn't say it was a clone of SD. But it came as a result of SD having proved fairness is the way to go.

Con did all the pioneer work. Ingo started CFS because SD had come to such a mature stage that O(1) simply lost and he felt threatened. So he had to come up with something better.

> That's it. CFS technical concepts are more advanced than SD/RDSL and
> that's why a big deal of kernel developers went for CFS and not for
> SD/RDSL. Even Roman.

The reason that more people contribute to CFS is Ingo's position as a maintainer. Nobody wants to work on something that has no future. And if you are at competition with Ingo, you'll lose.

CFS is different

September 15, 2007 - 4:51am
Anonymous (not verified)

Con did all the pioneer work. Ingo started CFS because SD had come to such a mature stage that O(1) simply lost and he felt threatened.

You should read KernelTrap more often!

Such as: CFS outperforms SD in 3D benchmarks.

Or: CFS is faster than the old O(1) scheduler.

So he had to come up with something better.

He came out with something better than SD. That's the whole point, isn't it? Con had months to join that effort, as many other kernel devs did. Con was ill initially, but later on he (and his supporters) trolled and flamed the CFS discussions, instead of joining the effort. I could not find a single email where Ingo flames Con - can you?

You should follow the

September 15, 2007 - 9:16am
Anonymous (not verified)

You should follow the timeline better. SD was /already/ better tested and /already/ better performing than O(1) when Ingo came up with his own scheduler, if it'd go into mainstream it might have been even better, or not, but you'll never know now.

He came out with something /allegedly/ better than SD, a hack which was steamrolled into the kernel blazingly fast unlike SD which although mature and well tested wasn't going in for months.

Also I can't find a single email where Con trolls and flames CFS discussions, can you? Why should Con join the effort when his scheduler is more mature?

Very true

September 15, 2007 - 4:32am
Anonymous (not verified)

I think Roman started being aggressive toward Ingo after the Con incident.

Very true. Read the Willy Tarreau email for a (damning) analysis about the motives of Roman Zippel. (Willy Tarreau maintains the 2.4 kernel)

This whole thing is a storm in a teapot. Roman is a well-known flamer on linux-kernel (read some of the other flamewars he was involved in recently), other kernel devs seem to have learned to avoid him like the plague.

Ingo should have done the same and should have ignored him after the first few emails, instead of writing dozens of polite emails trying to involve the guy.

Roman set himself on intentional collision course with Ingo, regardless of what Ingo did. He would have complained even louder had Ingo not acted on his feedback.

My Take

September 15, 2007 - 3:19am
Richi (not verified)

IMO, it seems that Roman's interpersonal skills are just okay. I truly don't seen any disparaging remarks he makes as being unfounded. Unfortunately, Ingo is MUCH better at sounding nicer. He's trying to save face and makes a good point of being the voice of reason and calmness. But that's interpersonal skills.

Technically speaking, however, Roman has a LOT more going for him. I can see that he's not being an asshole, but is getting frustrated that people like Willy are being idiots (see that email he made that attacked Roman point-by-point. Even if Roman were wrong, he couldn't be THAT wrong. Yet Willy Tareau seems to find fault in nearly every other sentence. Who wouldn't get frustrated with that?).

I say this would have gone much better had communications been more real-time and there was a neutral arbiter calling them out for what the two hackers are truly saying or not.

Just humans..

September 15, 2007 - 12:25pm
nis (not verified)

Roman has a LOT to offer and Ingo just knows that all right.
But Roman's interpersonal skills are not always okay, and yes, Ingo sometimes 's a bloody hypocrite trying hard to be "polite" and "politically correct" kind of junk (although in reality he's an "arrogant bastard"), and Willy Tarreau sometimes is a complete jerk trying to be a psychologist and analyse Roman, and Linus is a "heartless bastard"... :D

Personally speaking, I really enjoy all that! I do like all these guys! All these flames just show that the developers _are_not_ a bunch of _boring_professionals_, they are humans! The development is alive and kicking! And we are soon going to have the best possible scheduler in our boxens!!

By the way, check out this thread on CFS: http://lkml.org/lkml/2007/8/24/296
-Linus:"Why the hell can't you just make the code sane"
"So dammit, stop writing these totally bogus "explanations"
"Ingo, I'm not going to pull this kind of antics and crap"<\quote>

-Ingo: i was too chicken to pick a single granularity default :-/ <\quote>

That is "Linux kernel Development", and I love it! :)

I love you too!

September 15, 2007 - 6:37pm
Anonymous Lover (not verified)

Muac!

:D

I'm not qualified to comment

September 19, 2007 - 9:53am
Anonymous (not verified)

I'm not qualified to comment on the algorithms being used, because I haven't studied the code. But I do have to say, having followed this on lkml, I have not been impressed by Molnar's attitude. It does seem to me that there is a degree of him adopting NIH - rubbishing other people and then reimplementing their concepts.

Doesn't excuse other people being rude but you do also get the impression that Torvalds protects Molnar.

That said, in this area I'm just an end user. If it gets the job done then I am not in a position to complain. But i can see why some might be very frustrated in this field.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
speck-geostationary