Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S265922AbTGWKM6 (ORCPT ); Wed, 23 Jul 2003 06:12:58 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S265772AbTGWKM6 (ORCPT ); Wed, 23 Jul 2003 06:12:58 -0400 Received: from tudela.mad.ttd.net ([194.179.1.233]:32765 "EHLO tudela.mad.ttd.net") by vger.kernel.org with ESMTP id S264448AbTGWKMu (ORCPT ); Wed, 23 Jul 2003 06:12:50 -0400 Date: Wed, 23 Jul 2003 12:26:35 +0200 (MEST) From: Javier Achirica To: Daniel Ritz cc: Jeff Garzik , linux-kernel , linux-net , Jean Tourrilhes , Mike Kershaw Subject: Re: [PATCH 2.5] fixes for airo.c In-Reply-To: <000d01c350fd$e625c9d0$a61fc682@alcatel.ch> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5983 Lines: 163 You cannot use down() in xmit, as it may be called in interrupt context. I know it slows things down, but that's the only way I figured out of handling a transmission while the card is processing a long command. I thought about the fix and I think it's fixed. The only case the race could happen is if there's some work pending to be scheduled and the queue gets started again (by the interrupt handler), so airo_start_xmit overwrites the priv->xmit data. Now, because of the new flag, the interrupt handler won't wake the queue until the pending packet is sent to the card (or fails) so I don't see how can the race happen (although I didn't see it until you pointed out :-( Javier Achirica On Wed, 23 Jul 2003, Daniel Ritz wrote: > ok, now the braindamaged thing called sourceforge showed the changes, but: > - i don't think the race is fixed. just remove the whole down_trylock() > crap in the xmit altogether and replace it with a single down(). faster, > simpler, not racy...and with the schedule_work you win nothing, you lose > speed > - please don't commit bugfixes and new features in the same changeset... > - the loop-forever fix in transmit_allocate: you should have copied the > comment > changes from my patch too, so the spin-forever-comment goes away... > > i look closer when i'm home, having a real operating system to work on, not > this > winblows box at work now.. > > -daniel > > > Javier Achirica wrote: > > > > Today I updated the CVS and Sourceforge (airo-linux.sf.net) with the > > latest version (1.53) that (I hope) fixes the race problem. If everything > > is fine, I'll commit the changes to the kernel tree. > > > > Javier Achirica > > > > On Mon, 21 Jul 2003, Daniel Ritz wrote: > > > > > On Mon July 21 2003 21:44, Javier Achirica wrote: > > > > > > > > On Mon, 21 Jul 2003, Daniel Ritz wrote: > > > > > > > > > On Mon July 21 2003 13:00, Javier Achirica wrote: > > > > > > > > > > > > Daniel, > > > > > > > > > > > > Thank you for your patch. Some comments about it: > > > > > > > > > > > > - I'd rather fix whatever is broken in the current code than going > back to > > > > > > spinlocks, as they increase latency and reduce concurrency. In any > case, > > > > > > please check your code. I've seen a spinlock in the interrupt > handler that > > > > > > may lock the system. > > > > > > > > > > but we need to protect from interrupts while accessing the card and > waiting for > > > > > completion. semaphores don't protect you from that. > spin_lock_irqsave does. the > > > > > spin_lock in the interrupt handler is there to protect from > interrupts from > > > > > other processors in a SMP system (see Documentation/spinlocks.txt) > and is btw. > > > > > a no-op on UP. and semaphores are quite heavy.... > > > > > > > > Not really. You can still read the received packets from the card (as > > > > you're not issuing any command and are using the other BAP) while a > > > > command is in progress. There are some specific cases in which you > need > > > > to have protection, and that cases are avoided with the down_trylock. > > > > > > > > > > ok, i think i have to look closer...if the card can handle that then we > don't need > > > to irq-protect all the areas i did protect...but i do think that those > down_trylock and > > > then the schedule_work should be replaced by a simple > spinlock_irq_save... > > > > > > i look closer at it tomorrow. > > > you happen to have the tech spec lying aroung? > > > > > > > AFAIK, interrupt serialization is assured by the interrupt handler, so > you > > > > don't need to do that. > > > > > > > > > > - The fix for the transmit code you mention, is about fixing the > returned > > > > > > value in case of error? If not, please explain it to me as I don't > see any > > > > > > other changes. > > > > > > > > > > fixes: > > > > > - return values > > > > > - when to free the skb, when not > > > > > - disabling the queues > > > > > - netif_wake_queue called from the interrupt handler only (and on > the right > > > > > net_device) > > > > > - i think the priv->xmit stuff and then the schedule_work is evil: > > > > > if you return 0 from the dev->hard_start_xmit then the network > layer assumes > > > > > that the packet was kfree_skb()'ed (which does only frees the > packet when the > > > > > refcount drops to zero.) this is the cause for the keventd > killing, for sure! > > > > > > > > > > if you return 0 you already kfree_skb()'ed the packet. and that's > it. > > > > > > > > This is where I have the biggest problems. As I've read in > > > > Documentation/networking/driver.txt, looks like the packet needs to be > > > > freed "soon", but doesn't require to be before returning 0 in > > > > hard_start_xmit. Did I get it wrong? > > > > > > > > > > no, i got it wrong. but still...it's the xmit where the oops comes > from.... > > > > > > wait. isn't there a race in airo_do_xmit? at high xfer rates (when it > oopses) the > > > queue can wake right after it is stopped in the down_trylock section. so > you can > > > happen to loose an skb 'cos the write to priv->xmit is not protected at > all and > > > there should be a check so that only one skb can be queue there. no? > > > (and then the irq-handler can wake the queue too) > > > > > > ok, i think i got it now. i'll do a new patch tomorrow or so that tries: > > > - to fix the transmit not to oops > > > - to avoid disabling the irq's whenever possible > > > - using spinlocks instead of the heavier semaphores ('cos i think if > it's done cleaner > > > than i did it now, it's faster than the semas, and to make hch happy > :) > > > > > > > > > > Thanks for your help, > > > > Javier Achirica > > > > > > > > > > rgds > > > -daniel > > > > > > > > > > > > > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/