Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755429Ab0A0Rpu (ORCPT ); Wed, 27 Jan 2010 12:45:50 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755045Ab0A0Rpt (ORCPT ); Wed, 27 Jan 2010 12:45:49 -0500 Received: from mail.vyatta.com ([76.74.103.46]:44147 "EHLO mail.vyatta.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753344Ab0A0Rps (ORCPT ); Wed, 27 Jan 2010 12:45:48 -0500 Date: Wed, 27 Jan 2010 09:45:31 -0800 From: Stephen Hemminger To: Michael Breuer Cc: Jarek Poplawski , David Miller , akpm@linux-foundation.org, flyboy@gmail.com, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, Michael Chan , Don Fry , Francois Romieu , Matt Carlson Subject: Re: Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync) Message-ID: <20100127094531.53c85aa7@nehalam> In-Reply-To: <4B60707F.1000608@majjas.com> References: <20100120094103.GA6225@ff.dom.local> <4B58B217.8030001@majjas.com> <20100121204133.GB3085@del.dom.local> <4B59E7EB.3050605@majjas.com> <20100122215304.GA3105@del.dom.local> <4B5A2362.6000306@majjas.com> <20100122230605.GB3105@del.dom.local> <4B5A33D8.90501@majjas.com> <20100122234656.GC3105@del.dom.local> <4B5A39BD.8020305@majjas.com> <20100123232133.GA3487@del.dom.local> <4B605D1B.60402@majjas.com> <20100127085049.5b5048e9@nehalam> <4B60707F.1000608@majjas.com> Organization: Vyatta X-Mailer: Claws Mail 3.7.2 (GTK+ 2.18.3; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4270 Lines: 96 On Wed, 27 Jan 2010 11:57:35 -0500 Michael Breuer wrote: > On 1/27/2010 11:50 AM, Stephen Hemminger wrote: > > On Wed, 27 Jan 2010 10:34:51 -0500 > > Michael Breuer wrote: > > > > > >> On 01/23/2010 06:21 PM, Jarek Poplawski wrote: > >> > >>> On Fri, Jan 22, 2010 at 06:50:21PM -0500, Michael Breuer wrote: > >>> > >>> > >>>> When the packets were dropped, there was a different sequence in the > >>>> log - DISCOVER/OFFER repeated. The "normal" is that the sequence > >>>> appeared correct and complete - DISCOVER/OFFER/REQUEST/ACK - or > >>>> INFORM/ACK (vs. INFORM repeatedly sans ACK) as the case may be. > >>>> > >>>> > >>> Anyway, I'd be intersted if the switch matters here. > >>> > >>> Plus one more test: could you try to load sky2 with the parameter: > >>> "copybreak=1" (the rest as in any recent test, which gave you dmar > >>> errors; any switch). > >>> > >>> Thanks, > >>> Jarek P. > >>> > >>> > >> Ok - now up 80+ hours with copybreak=1. I'm going to redo w/o copybreak > >> to confirm that I haven't inadvertently fixed something. However, given > >> that it might be copybreak-related, I looked at sky2.c again and I'm > >> wondering about the copybreak max size in sky2_rx_start: > >> > >> size = roundup(sky2->netdev->mtu + ETH_HLEN + VLAN_HLEN, 8); > >> > >> /* Stopping point for hardware truncation */ > >> thresh = (size - 8) / sizeof(u32); > >> > >> sky2->rx_nfrags = size>> PAGE_SHIFT; > >> BUG_ON(sky2->rx_nfrags> ARRAY_SIZE(re->frag_addr)); > >> > >> /* Compute residue after pages */ > >> size -= sky2->rx_nfrags<< PAGE_SHIFT; > >> > >> /* Optimize to handle small packets and headers */ > >> if (size< copybreak) > >> size = copybreak; > >> if (size< ETH_HLEN) > >> size = ETH_HLEN; > >> > >> > >> Why would increasing size to copybreak be valid here? > >> > >> Guessing a bit as I'm not sure about rx_nfrags, but if I read this > >> correctly, if size is ever less than copybreak it's because there isn't > >> enough space left for anything larger. If so, wouldn't increasing size > >> potentially corrupt something? I'd further guess that the resulting > >> condition manifests sooner (or at least with a more visible effect) when > >> using DMAR. > >> > >> In any event, why "copybreak" as the minimum buffer size? I'd suggest > >> that if it isn't possible to allocate at least MTU + overhead that > >> sky2_rx_start ought to be delayed until there is room. > >> > > This code is where driver decides how much data will be received in skb > > data area and the remaining data spills over into skb frags. > > Copybreak is the threshold so that packets less than size are copied > > to a new skb. The code doing the copying there assumes the data is > > totally contained in the skb (not in frags). The size increase there > > is to make sure that assumption is always true. I suppose you > > could do something perverse like setting copybreak really huge > > and confuse driver, but that is a user error. > > > > > Ok - but I'm wondering under what circumstances size would be < > copybreak in the first place after computing the residue. If size ends > up being unreasonably small, is simply increasing the number to whatever > copybreak is correct? Assuming my testing is correct, then the crash > I've been experiencing when using dmar (only) seems related to the value > of copybreak. I don't think the other use (skb reuse) is the issue (but > hey, I could have missed something). The crash occurs when copybreak is > the default of 128, didn't happen when I set copybreak to 1. > Setting it to 1 causes driver to never go through the dma_sync_single/memcpy path. Perhaps the code for DMAR doesn't do dma_sync_single_for_cpu properly, or the value passed to sync_single_for_cpu doesn't account for all the overhead of padding and/or ether header. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/