Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp70533imj; Fri, 8 Feb 2019 15:11:11 -0800 (PST) X-Google-Smtp-Source: AHgI3IY9fwQ/OXtu5G01+InlXUyL8GLWHchRMqrajbBIbMBq91LXzkWhKPAFI99J2Zps/Pos1AHZ X-Received: by 2002:a17:902:be11:: with SMTP id r17mr25512306pls.308.1549667471717; Fri, 08 Feb 2019 15:11:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1549667471; cv=none; d=google.com; s=arc-20160816; b=BSgrWDzKPL9ngD5AlWy5C2g77huqDAQ5bNLht/Icl46zEGXDw8kGyUHHUljs4qB+BD X7poWoZVZKDzA72iI4y6E5vjnfBf2V3v+ARK4j+/e7zSZ8cHPfWwj9zMAItQz4gYB/WE +Mpa3W9ZuTP+MkWtG7aP6JcTgZQGMxstja6hTOm/S7ZgE6SHo9+RwI/KrWasAFEqh6uE ipCCnXQ1C2wtrEhXKEF7wsUJPiL+jNBRsoUFnL/mQreCZ01HcBC1cj98hboMQd7VhpyX 0UcW+fh7iUaZ09akxibMcOSmq8Wfw/fnncW/ROh1g1dNhRibFQLADRbOfUDNtO08JLo2 N/tw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=WmCAdBOidZNZs4Q4PZdkia77PAA/XBT4dM/l5uStmPg=; b=cZD8LKMkBAbv3fELWWrQqiK6I1GR/wcTDJ7V2SCcdlufnFG30MloI5wvgxAMn0hzhb sr+y9vHJ8lUd+gZrkTOkoYN+xhrT9E8ymAP3aM7GEKO8GO6x0/tXczUclus+1XybaJdB gSKbd9UdNSy2y2lSKjYQ6yjlUasOPUSNHnj3rOTQtHK7mTtCNocKz6qYoj2zLP3Y13gE R8sjGryYiy7hGxiky/Lblsms0cyCI8W6piyTWD4c755vX4P8EdDhlC4li35F5VVqA16B CWxleMtr+yRi5AvQvUT/CvxUMhUB63WJ5FK/gCfET5v+TzzcRc/YnzdnhKH9tGbPriHZ WFLQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=gHJUUlRb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w9si3514107pgj.69.2019.02.08.15.10.54; Fri, 08 Feb 2019 15:11:11 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=gHJUUlRb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727039AbfBHXJ3 (ORCPT + 99 others); Fri, 8 Feb 2019 18:09:29 -0500 Received: from mail-yb1-f196.google.com ([209.85.219.196]:38822 "EHLO mail-yb1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726020AbfBHXJ2 (ORCPT ); Fri, 8 Feb 2019 18:09:28 -0500 Received: by mail-yb1-f196.google.com with SMTP id x9so2093644ybj.5; Fri, 08 Feb 2019 15:09:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=WmCAdBOidZNZs4Q4PZdkia77PAA/XBT4dM/l5uStmPg=; b=gHJUUlRbz/VX6fOF+gme9q026/4TtVIypEpz8ZMGfNHQBxAIvXt+wb1kiIIA/fwSI/ JkR1nfkQWzlAEKRF5q6opIlmOzFKvpwzWJrTwIDobZ81RmWtXOzBHLUR6bxPthsI57pn R3OY7QfZWyXz1uYcj162Y5UZBs8dbjKg4AAOT5nfZqe/apcsKlPkLpqWhXygtdXvgBlt qY81oPP+fDq3Gvn4JnCUwXJ9DCq9UoYHzq6gCp7si4jic41T3+iheVwbKPLM68TrBqx8 ZOC8MtpO+2tF0bJ+Qgjb9+fyEMTDKm+9Pk0zE9m/nsHnbuEWF5ECb4SoGlvvPWI+StMP vExg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=WmCAdBOidZNZs4Q4PZdkia77PAA/XBT4dM/l5uStmPg=; b=TtpxfJpqoBWpMhPclYNUGMlRF5oIbWHDOhPG3Z1ju6TpGn3mvNJ/qTOAaAB33v345n OxbUuQJBsiyN8FaGflj22iQlnvEepX/LFKJGJpUE/wL01RXZxIY4sFjPkfawcrpXVNDe /RXQbTM4nql2U5kkbs3695v+krmHJ6EEx5PjOgFyn88s7PMvYMPrY52ivLaZksqQrkfh RB+i7NGfA5hMzRs66ktjS+CWTLrvjEbZO5UPi3znAWzgx2kcIRPcJQe/0Us38hk8m1mf QSPFSR1ejz7oldPegH8sjCCB43us5+9L3BR36clgyeB9g1kg0CsMVsHiRNbWCQPH4oDm KYKw== X-Gm-Message-State: AHQUAuZelqvQXWH9Bf7HHz87c7nNyFXCw0/AJG2q8ztsIHEpM7Npp42o pSV/CUv7CFzkIpOVJFgZ0qmj9qzp X-Received: by 2002:a25:dd3:: with SMTP id 202mr9835086ybn.415.1549667367428; Fri, 08 Feb 2019 15:09:27 -0800 (PST) Received: from [192.168.86.235] (c-73-241-150-70.hsd1.ca.comcast.net. [73.241.150.70]) by smtp.gmail.com with ESMTPSA id d126sm1297190ywa.53.2019.02.08.15.09.25 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 08 Feb 2019 15:09:25 -0800 (PST) Subject: Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27! To: Heiner Kallweit , Sander Eikelenboom , Realtek linux nic maintainers , Eric Dumazet Cc: Linus Torvalds , linux-kernel , netdev References: <6c389fde-4c8d-300b-8c3c-300d6105c30a@eikelenboom.it> <0f605e50-56fe-06b5-9b66-6aed89a608ce@gmail.com> <471e550b-c227-22e6-19fd-5f9abd450e5f@eikelenboom.it> <1265d424-4943-e571-a74b-b1512ebec179@gmail.com> <059e59c6-2264-fd5c-068f-3656e39539c1@eikelenboom.it> <140d0df7-1775-5457-aa03-b21ece250a72@gmail.com> From: Eric Dumazet Message-ID: Date: Fri, 8 Feb 2019 15:09:24 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <140d0df7-1775-5457-aa03-b21ece250a72@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/08/2019 01:50 PM, Heiner Kallweit wrote: > On 08.02.2019 22:45, Sander Eikelenboom wrote: >> On 08/02/2019 22:22, Heiner Kallweit wrote: >>> On 08.02.2019 21:55, Sander Eikelenboom wrote: >>>> On 08/02/2019 19:52, Heiner Kallweit wrote: >>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote: >>>>>> L.S., >>>>>> >>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top but they don't seem related) under Xen i the nasty splat below, >>>>>> that I haven encountered with Linux 4.20.x. >>>>>> >>>>>> Unfortunately I haven't got a clear reproducer for this and bisecting could be nasty due to another (networking related) kernel bug. >>>>>> >>>>>> If you need more info, want me to run a debug patch etc., please feel free to ask. >>>>>> >>>>> Thanks for the report. However I see no change in the r8169 driver between >>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root cause could >>>>> be somewhere else. Therefore I'm afraid a bisect will be needed. >>>> >>>> Hmm i did some diging and i think: >>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb barriers >>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and __netdev_sent_queue >>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queue >>>> >>> You're right. Thought this was added in 4.20 already. >>> The BQL code pattern I copied from the mlx4 driver and so far I haven't heard about >>> this issue from any user of physical hw. And due to the fact that a lot of mainboards >>> have onboard Realtek network I have quite a few testers out there. >>> Does the issue occur under specific circumstances like very high load? >> >> Yep, the box is already quite contented with the Xen VM's and if I remember correctly it occurred while kernel compiling >> on the host. >> >>> If indeed the xmit_more patch causes the issue, I think we have to involve Eric Dumazet >>> as author of the underlying changes. >> >> It could also be the barriers weren't that unneeded as assumed. > > The barriers were removed after adding xmit_more handling. Therefore it would be good to > test also with only > bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb barriers > removed. > >> Since we are almost at RC6 i took the liberty to CC Eric now. >> > Sure, thanks. > >> BTW am i correct these patches are merely optimizations ? > > Yes > >> If so and concluding they revert cleanly, perhaps it should be considered at this point in the RC's >> to revert them for 5.0 and try again for 5.1 ? >> > Before removing both it would be good to test with only the barrier-removal removed. > Commit 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and __netdev_sent_queue looks buggy to me, since the skb might have been freed already on another cpu when you call You could try : diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 3624e67aef72c92ed6e908e2c99ac2d381210126..f907d484165d9fd775e81bf2bfb9aa4ddedb1c93 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -6070,6 +6070,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb, dma_addr_t mapping; u32 opts[2], len; bool stop_queue; + bool door_bell; int frags; if (unlikely(!rtl_tx_slots_avail(tp, skb_shinfo(skb)->nr_frags))) { @@ -6116,6 +6117,8 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb, /* Force memory writes to complete before releasing descriptor */ dma_wmb(); + door_bell = __netdev_sent_queue(dev, skb->len, skb->xmit_more); + txd->opts1 = rtl8169_get_txd_opts1(opts[0], len, entry); /* Force all memory writes to complete before notifying device */ @@ -6127,7 +6130,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb, if (unlikely(stop_queue)) netif_stop_queue(dev); - if (__netdev_sent_queue(dev, skb->len, skb->xmit_more)) { + if (door_bell) { RTL_W8(tp, TxPoll, NPQ); mmiowb(); }