Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935067AbaBDVdV (ORCPT ); Tue, 4 Feb 2014 16:33:21 -0500 Received: from smtp02.citrix.com ([66.165.176.63]:50655 "EHLO SMTP02.CITRIX.COM" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934591AbaBDVdP (ORCPT ); Tue, 4 Feb 2014 16:33:15 -0500 X-IronPort-AV: E=Sophos;i="4.95,782,1384300800"; d="scan'208";a="97968399" Message-ID: <52F15C85.7050200@citrix.com> Date: Tue, 4 Feb 2014 21:32:53 +0000 From: Zoltan Kiss User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Wei Liu , Zoltan Kiss CC: Jeff Kirsher , Jesse Brandeburg , Bruce Allan , Carolyn Wyborny , Don Skidmore , Greg Rose , Peter P Waskiewicz Jr , Alex Duyck , John Ronciak , Tushar Dave , Akeem G Abodunrin , "David S. Miller" , , "netdev@vger.kernel.org" , , Michael Chan , "xen-devel@lists.xenproject.org" Subject: Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer References: <52EAA31B.1090606@schaman.hu> <20140131185619.GB27553@zion.uk.xensource.com> In-Reply-To: <20140131185619.GB27553@zion.uk.xensource.com> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.80.2.133] X-DLP: MIA2 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 31/01/14 18:56, Wei Liu wrote: > On Thu, Jan 30, 2014 at 07:08:11PM +0000, Zoltan Kiss wrote: >> Hi, >> >> I've experienced some queue timeout problems mentioned in the >> subject with igb and bnx2 cards. I haven't seen them on other cards >> so far. I'm using XenServer with 3.10 Dom0 kernel (however igb were >> already updated to latest version), and there are Windows guests >> sending data through these cards. I noticed these problems in XenRT >> test runs, and I know that they usually mean some lost interrupt >> problem or other hardware error, but in my case they started to >> appear more often, and they are likely connected to my netback grant >> mapping patches. These patches causing skb's with huge (~64kb) >> linear buffers to appear more often. >> The reason for that is an old problem in the ring protocol: >> originally the maximum amount of slots were linked to MAX_SKB_FRAGS, >> as every slot ended up as a frag of the skb. When this value were >> changed, netback had to cope with the situation by coalescing the >> packets into fewer frags. >> My patch series take a different approach: the leftover slots >> (pages) were assigned to a new skb's frags, and that skb were >> stashed to the frag_list of the first one. Then, before sending it >> off to the stack it calls skb = skb_copy_expand(skb, 0, 0, >> GFP_ATOMIC, __GFP_NOWARN), which basically creates a new skb and >> copied all the data into it. As far as I understood, it put >> everything into the linear buffer, which can amount to 64KB at most. >> The original skb are freed then, and this new one were sent to the >> stack. > > Just my two cents, if it is this case, you can try to call > skb_copy_expand on every SKB netback receives to manually create SKBs > with ~64KB linear buffer to see how it goes... I've tried it, and it did break everything in a similar way, so that's a strong clue that the problem lies here. I've rewrote that part of my patches to do less modification, based on Malcolm's idea: netback pulls the first frag into linear buffer, then moves a frag from the frag_list skb into the first one. That seems to help, but so far I have only one relevant test result, I'm waiting for more results. Zoli -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/