Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S264130AbUCZTdl (ORCPT ); Fri, 26 Mar 2004 14:33:41 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S264129AbUCZTdl (ORCPT ); Fri, 26 Mar 2004 14:33:41 -0500 Received: from parcelfarce.linux.theplanet.co.uk ([195.92.249.252]:42715 "EHLO www.linux.org.uk") by vger.kernel.org with ESMTP id S264130AbUCZTdd (ORCPT ); Fri, 26 Mar 2004 14:33:33 -0500 Message-ID: <4064857E.2050603@pobox.com> Date: Fri, 26 Mar 2004 14:33:18 -0500 From: Jeff Garzik User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030703 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Andreas Henriksson CC: Denis Vlasenko , netdev@oss.sgi.com, Marcelo Tosatti , Linux Kernel Subject: [PATCH] Re: fealnx oopses References: <200403261214.58127.vda@port.imtp.ilyichevsk.odessa.ua> <20040326192211.GA15319@scream.fjortis.info> In-Reply-To: <20040326192211.GA15319@scream.fjortis.info> Content-Type: multipart/mixed; boundary="------------050800020108060100060100" Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4456 Lines: 145 This is a multi-part message in MIME format. --------------050800020108060100060100 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Andreas Henriksson wrote: > On Fri, Mar 26, 2004 at 12:14:57PM +0200, Denis Vlasenko wrote: > > > >>BTW, my box is indeed slow and low on RAM, this fits. >> > > > I have only been looking at problems with races between the interrupt > handler and the rest of the driver code.. there might be a bunch of > problems with failed memory allocations that hasn't bitten me. > > >>Regarding your patch: I looked in start_tx(). Apart from latent >>bug in commented out part of code: >> next = (struct fealnx *) np->cur_tx_copy.next_desc_logical; >>which must be >> next = (struct fealnx_desc *) np->cur_tx_copy->next_desc_logical; >>I can't see anything racy there. The function just submits more >>tx buffers for the card, it never touch cur_tx or cur_tx->skbuff... > > > Francois Romieu explains the race in a comment to the bug I opened in > the bugzilla. > > http://bugzilla.kernel.org/show_bug.cgi?id=1902#c1 > > The problem is that really_tx_count and similar parts of the private > structure isn't atomically updated and both the interrupt handler and > the start_tx function updates them. > (they are regular integers instead of atomic_t) > > >>If I miss something and it indeed races with interrupt, you >>definitely need to add >> spin_lock(&np->lock); >>... >> spin_unlock(&np->lock); >>around interrupt handler body or at least around tx part >>of it, or else your patch is incomplete (race will still >>be possible on SMP). >> > > > I came to the conclusion that there should be a spinlock in the > interrupt handler yesterday, but it won't effect me at all since I don't > have SMP (nor preempt) so I'll leave it for now anyway. > > >>Anyway, I applied your patch and flooded with UDP again. >>My box did not oops. Unfortunately, it did not oops when >>I reverted back to old, presumably buggy driver. I cannot >>reproduce it anymore with old driver too! Bad. :( > > > I haven't been able to reproduce a kernel panic with my patch eighter. > And I've been transfering Terabytes of traffic during the last weeks (or > maybee it's months, well anyway.. I've done enough testing to say that > the card works "good enough" in this machine atleast). > And I've even tried your udp test.. > > Although I now have the myson/fealnx card in my p3-900 (256Mb) > workstation instead of the old p-166 (40Mb) which served as a gateway before. > It might just be that it's harder to trigger on newer/bigger machines. > Maybee I should power up my p-166 again.. I actually have 2 of these > cards so I can have one in each machine.. :) Well really, somebody needs to port Donald Becker's myson driver to 2.6 APIs... I would like to get rid of fealnx, or somebody needs to spend a decent amount of time fixing it. Does the attached patch fix the issue? Jeff --------------050800020108060100060100 Content-Type: text/plain; name="patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="patch" ===== drivers/net/fealnx.c 1.34 vs edited ===== --- 1.34/drivers/net/fealnx.c Sun Mar 14 01:54:58 2004 +++ edited/drivers/net/fealnx.c Fri Mar 26 14:31:07 2004 @@ -1303,14 +1303,15 @@ /* for the last tx descriptor */ np->tx_ring[i - 1].next_desc = np->tx_ring_dma; np->tx_ring[i - 1].next_desc_logical = &np->tx_ring[0]; - - return; } static int start_tx(struct sk_buff *skb, struct net_device *dev) { struct netdev_private *np = dev->priv; + unsigned long flags; + + spin_lock_irqsave(&np->lock, flags); np->cur_tx_copy->skbuff = skb; @@ -1377,6 +1378,7 @@ writel(0, dev->base_addr + TXPDR); dev->trans_start = jiffies; + spin_unlock_irqrestore(&np->lock, flags); return 0; } @@ -1423,6 +1425,8 @@ unsigned int num_tx = 0; int handled = 0; + spin_lock(&np->lock); + writel(0, dev->base_addr + IMR); ioaddr = dev->base_addr; @@ -1564,6 +1568,8 @@ dev->name, readl(ioaddr + ISR)); writel(np->imrvalue, ioaddr + IMR); + + spin_unlock(&np->lock); return IRQ_RETVAL(handled); } --------------050800020108060100060100-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/