I'm still comparing my implementation with your code:
- f is called once for each cpu in the system, correct?
- if at least one cpu is in nohz mode, this loop will be needed for
every grace period.
That means an O(NR_CPUS) loop with disabled local interrupts :-(
Is that correct?
Unfortunately, my solution is even worse:
My rcu_irq_exit() acquires a global spinlock when called on a nohz cpus.
A few cpus in cpu_idle, nohz, executing 50k network interrupts/sec would
cacheline-trash that spinlock.
I'm considering counting interrupts: if a nohz cpu executes more than a
few interrupts/tick, then add a timer that check rcu_pending().
Perhaps even wouldn't be enough: I remember that the initial unhandled
irq detection code broke miserably on large SGI systems:
An atomic_inc(&global_var) in the local timer interrupt (i.e.:
NR_CPUS*HZ calls/sec) caused so severe trashing that the system wouldn't
boot. IIRC that was with 512 cpus.
--
Manfred
--