Re: 2.6.20->2.6.21 - networking dies after random time

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Ingo Molnar
Date: Tuesday, July 24, 2007 - 1:04 pm

* Linus Torvalds <torvalds@linux-foundation.org> wrote:


yeah - it's a totaly bad and unacceptable hack (i realized how bad it 
was when i wrote up that comment section ...), i just wanted to see 
which portion of ne2k/lib8390.c is sensitive to the fact whether an irq 
line is masked or not. The patch has no SOB line either.

the current best fix forward is to undo my original change, unless we 
find a better fix for this problem. (Note that the other patches posted 
in this thread are broken too: they only mask the irq but dont reliably 
unmask it.)

here's the current method of handling irqs for Marcin's card:

17:         12   IO-APIC-fasteoi   eth1, eth0

and fasteoi is a really simple sequence: no masking/unmasking by the 
flow handler itself but a NOP at entry and an APIC-EOI at the end. The 
disable/enable irq thing should thus have minimal effect if done within 
an irq handler.

now ne2k does something uncommon: for xmit (which is normally done 
outside of irq handlers) it will disable_irq_nosync()/enable_irq() the 
interrupt. It does it to exclude the handler from _that_ CPU, but due to 
the _nosync it does not exclude it from any other CPUs. So that's a bit 
weird already.

just in case, i've just re-checked all the genirq bits that change 
IRQ_DISABLED (that bit accidentally clear would be the only way to truly 
allow an IRQ handler to interrupt the disable_irq_nosync() critical 
section on that CPU) - but i can see no way for that to happen: we 
unconditionally detect and report unbalanced and underflowing 
irq_desc->depth, and the only other place (besides enable_irq()) that 
clears IRQ_DISABLED is __set_irq_handler(), and on x86 that is only used 
during bootup.

Marcin, could you try the patch below too? [without having any other 
patch applied.] It basically turns the critical section into an irqs-off 
critical section and thus checks whether your problem is related to that 
particular area of code.

	Ingo

Index: linux/drivers/net/lib8390.c
===================================================================
--- linux.orig/drivers/net/lib8390.c
+++ linux/drivers/net/lib8390.c
@@ -297,9 +297,7 @@ static int ei_start_xmit(struct sk_buff 
 	 *	Slow phase with lock held.
 	 */
 
-	disable_irq_nosync_lockdep_irqsave(dev->irq, &flags);
-
-	spin_lock(&ei_local->page_lock);
+	spin_lock_irqsave(&ei_local->page_lock, flags);
 
 	ei_local->irqlock = 1;
 
@@ -376,8 +374,7 @@ static int ei_start_xmit(struct sk_buff 
 	ei_local->irqlock = 0;
 	ei_outb_p(ENISR_ALL, e8390_base + EN0_IMR);
 
-	spin_unlock(&ei_local->page_lock);
-	enable_irq_lockdep_irqrestore(dev->irq, &flags);
+	spin_unlock_irqrestore(&ei_local->page_lock, flags);
 
 	dev_kfree_skb (skb);
 	ei_local->stat.tx_bytes += send_length;
-
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: 2.6.20-&gt;2.6.21 - networking dies after random time, Jean-Baptiste Vignaud, (Fri Jun 29, 1:50 am)
Re: 2.6.20-&gt;2.6.21 - networking dies after random time, Jarek Poplawski, (Fri Jun 29, 8:07 am)
Re: 2.6.20-&gt;2.6.21 - networking dies after random time, Marcin Ślusarz, (Sun Jul 22, 10:44 pm)
Re: 2.6.20-&gt;2.6.21 - networking dies after random time, Jarek Poplawski, (Mon Jul 23, 1:53 am)
Re: 2.6.20-&gt;2.6.21 - networking dies after random time, Jarek Poplawski, (Tue Jul 24, 12:18 am)
Re: 2.6.20-&gt;2.6.21 - networking dies after random time, Linus Torvalds, (Tue Jul 24, 12:30 pm)
Re: 2.6.20->2.6.21 - networking dies after random time, Ingo Molnar, (Tue Jul 24, 1:04 pm)
Re: 2.6.20-&gt;2.6.21 - networking dies after random time, Thomas Gleixner, (Tue Jul 24, 5:19 pm)
Re: 2.6.20-&gt;2.6.21 - networking dies after random time, Jarek Poplawski, (Wed Jul 25, 12:23 am)
Re: 2.6.20-&gt;2.6.21 - networking dies after random time, Jarek Poplawski, (Wed Jul 25, 6:57 am)
Re: 2.6.20-&gt;2.6.21 - networking dies after random time, Marcin Ślusarz, (Thu Jul 26, 12:16 am)
Re: 2.6.20-&gt;2.6.21 - networking dies after random time, Thomas Gleixner, (Thu Jul 26, 1:10 am)
Re: 2.6.20-&gt;2.6.21 - networking dies after random time, Jarek Poplawski, (Thu Jul 26, 1:13 am)
Re: 2.6.20-&gt;2.6.21 - networking dies after random time, Jarek Poplawski, (Thu Jul 26, 1:19 am)
Re: 2.6.20-&gt;2.6.21 - networking dies after random time, Jarek Poplawski, (Thu Jul 26, 1:55 am)
Re: 2.6.20-&gt;2.6.21 - networking dies after random time, Jarek Poplawski, (Thu Jul 26, 2:11 am)
Re: 2.6.20-&gt;2.6.21 - networking dies after random time, Marcin Ślusarz, (Mon Jul 30, 12:29 am)
Re: 2.6.20-&gt;2.6.21 - networking dies after random time, Jarek Poplawski, (Tue Jul 31, 6:20 am)
Re: 2.6.20-&gt;2.6.21 - networking dies after random time, Marcin Ślusarz, (Wed Aug 1, 12:24 am)
[patch] genirq: fix simple and fasteoi irq handlers, Jarek Poplawski, (Thu Aug 2, 11:07 pm)
Re: [patch] genirq: fix simple and fasteoi irq handlers, Jarek Poplawski, (Fri Aug 3, 2:10 am)
Re: [patch] genirq: fix simple and fasteoi irq handlers, Marcin Ślusarz, (Fri Aug 3, 4:57 am)
Re: [patch] genirq: fix simple and fasteoi irq handlers, Jarek Poplawski, (Fri Aug 3, 5:26 am)
[patch (take 2)] genirq: fix simple and fasteoi irq handlers, Jarek Poplawski, (Sun Aug 5, 11:07 pm)
Re: 2.6.20-&gt;2.6.21 - networking dies after random time, Marcin Ślusarz, (Sun Aug 5, 11:58 pm)
Re: 2.6.20-&gt;2.6.21 - networking dies after random time, Marcin Ślusarz, (Mon Aug 6, 12:00 am)
Re: [patch] genirq: fix simple and fasteoi irq handlers, Marcin Ślusarz, (Mon Aug 6, 12:05 am)
Re: 2.6.20-&gt;2.6.21 - networking dies after random time, Marcin Ślusarz, (Tue Aug 7, 12:46 am)
Re: 2.6.20-&gt;2.6.21 - networking dies after random time, Jarek Poplawski, (Tue Aug 7, 1:23 am)
Re: 2.6.20-&gt;2.6.21 - networking dies after random time, Jarek Poplawski, (Tue Aug 7, 3:09 am)