"Slow servers, Skipping audio, Jerky video --everyone knows the symptoms of latency. But to know what's really going on in the system, what's causing the latency, and how to fix it... those are difficult questions without good answers right now," began Arjan van de Van, announcing version 0.1 of LatencyTop, "a tool for developers to visualize system latencies." He continued:
"LatencyTOP is a Linux tool for software developers (both kernel and userspace), aimed at identifying where system latency occurs, and what kind of operation/action is causing the latency to happen. By identifying this, developers can then change the code to avoid the worst latency hiccups.
"There are many types and causes of latency, and LatencyTOP focus on type that causes audio skipping and desktop stutters. Specifically, LatencyTOP focuses on the cases where the applications want to run and execute useful code, but there's some resource that's not currently available (and the kernel then blocks the process). This is done both on a system level and on a per process level, so that you can see what's happening to the system, and which process is suffering and/or causing the delays."
"With latencytop, I noticed that the (in memory) atime updates during a kernel build had latencies of 600 msec or longer; this is obviously not so nice behavior. Other EXT3 journal related operations had similar or even longer latencies," Arjan van de Ven reported, describing a "mass priority inversion" caused by, "an interaction between EXT3 and CFQ in that CFQ tries to be fair to everyone, including kjournald. However, in reality, kjournald is 'special' in that it does a lot of journal work". Finally, he offered a tiny patch to resolve the issue, "the patch below makes kjournald of the IOPRIO_CLASS_RT priority to break this priority inversion behavior. With this patch, the latencies for atime updates (and similar operation) go down by a factor of 3x to 4x !"
Andrew Morton took a cautious stance, "seems a pretty fundamental change which could do with some careful benchmarking, methinks. See, your patch amounts to 'do more seeks to improve one test case'. Surely other testcases will worsen. What are they?" CFQ author Jens Axboe agreed, "It should not be merged as-is, instead I'll provide a function to do this." Ingo Molnar wasn't convinced, "atime update latencies went down by a factor of 3x-4x ... but what bothers me even more is the large picture. Linux's development is still fundamentally skewed towards bandwidth (which goes up with hardware advances anyway), while the focus on latencies is very lacking (which users do care about much more and which usually does _not_ improve with improved hardware), so i cannot see why we shouldnt apply this." He added, "if bandwidth hurts anywhere, it will be pointed out and fixed, we've got like tons of bandwidth benchmarks and it's _easy_ to fix bandwidth problems. But _finally_ we now have desktop latency tools, hard numbers and patches that fix them, but what do we do ... we put up extra roadblocks??" Andrew calmy replied, "I think the situation is that we've asked for some additional what-can-be-hurt-by-this testing. Yes, we could sling it out there and wait for the reports. But often that's a pretty painful process and regressions can be discovered too late for us to do anything about them."