Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp460902imj; Sat, 9 Feb 2019 01:10:47 -0800 (PST) X-Google-Smtp-Source: AHgI3IaC81oi6XV0lDvxcWe1ptOG5HNXMHiH7pIzLTWtVZnlXUW7diJ7Re5G3vNatpHvB4/oTl1c X-Received: by 2002:a62:7086:: with SMTP id l128mr26773743pfc.68.1549703446951; Sat, 09 Feb 2019 01:10:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1549703446; cv=none; d=google.com; s=arc-20160816; b=LJEjs+6fA9bh2kXo8WlOjZyhSypSmvFpOSW150CSlxhYWt8//ZYMRh2mta24yayfjk hzr0AKtbYX57WG3uuK32wooGvI+DmAOL9dr75DgdkVp0kQCa5JwF3WMlWbYAz2uNWXrQ B8lXM0visf2rLWnSigIZeyHEqvmgwq+2Y3enHXkcE2Wkj5un932Ax4DHatEBgurWwr+R LUKwXdqEuXSsBd0lyiQngIh8BFaNV0A8/Dd4rJiM7QfESDyzkXTBRZq2pzx0UCD7hrbY c2Qz5O7K3LqPVhSKoFRxNrfRxd0CadyxV9/bQuZU9VaEWeHpM+ZdagN/w4poWTxtZaI9 1n7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=ctc6blCHPsKyW7yMr7t3QRmGzfxJI5J/OpTnSPXhKTo=; b=ruFt4PK07SYnMAtatPtJqOZuamgsAgoX98FadoaHcQVPFYmz4JsTmiyrUbCFaIc4fv mvONO1e+gPyDnFzokAbvN+8n971a7soDKAXIFVEKI7bOgNyKU1SEjRyeMIJfIA7C+Bbd L3YpO42dhKUc68eeFskDfGU5uTtC55dzQ2FPpEcyIsLH8YlOmnYal9TptALrbiNcYByA P8GmSaZmqmg3qNwCLNoC9ilS8Fa5WL6W5zg0oSSZp3ytCgGOp2OhC8uxfFZ5DN4r91hc gIipnS4LzKS3XZIZxWJ8+x7NSvvfc/2phatXyT8bFsPW7BK7wlZYcOLrqkyTjUdNrYh/ Yj6A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=RUQot31b; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a124si5114036pfb.263.2019.02.09.01.10.31; Sat, 09 Feb 2019 01:10:46 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=RUQot31b; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726977AbfBIJKT (ORCPT + 99 others); Sat, 9 Feb 2019 04:10:19 -0500 Received: from mail-wm1-f65.google.com ([209.85.128.65]:38072 "EHLO mail-wm1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726755AbfBIJKS (ORCPT ); Sat, 9 Feb 2019 04:10:18 -0500 Received: by mail-wm1-f65.google.com with SMTP id v26so7396153wmh.3; Sat, 09 Feb 2019 01:10:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=ctc6blCHPsKyW7yMr7t3QRmGzfxJI5J/OpTnSPXhKTo=; b=RUQot31bsn1AzyKZJEP7RKikwkk8Le5WpMkPjpTBylNy7YW3042Yt8Aoi7BU9MRY6H GosPsaD+2B1sJNd419x4ebvBRSBaDS39Ac0099s2zmL248kHOwRMOL9n48Y3hx7Zx+nC hOZrCqrBehmCKNxuZkvSgHMHg+h5Pbg7j+QugVeDOc9Au925BJf57eHocfZ1pjIOXNs0 QtPLVvP7vRdoiZ98SEo5JOB8oI6IVp6NVlbto86R/rmFV5HaQMgSv6/wbdfb+l/JFpn3 1v0UnHWGvp3julGm0TgvfOQOqCaxA6XApXmyYeDDNtt3IuEp4nKQ+u8LOXZ2laJUHWSt x+FQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=ctc6blCHPsKyW7yMr7t3QRmGzfxJI5J/OpTnSPXhKTo=; b=qOoHH/iScg/AKnpAgH0J4jaFHlR6yxSvJJeBc3fQZKLY0RPc76cEzaH8Nc+LzI0K+s 9LqFFaKK26nJX51kwpJwDf020yNfC5w01jui6c0ATIN4qBkheltbW5sqvzcdOTSBAFQK fsCUyRRRo57qC/fCSreaTFfd8nWey+OGfGI+Ae8nmwzsgtWPbLnrhNJ3zyizfBQqgB0n 8POdN7T9qxSqS/wPzofS0BXMH/w95hBiY3bKzmf1EzS7ZqxpoXLRdwEEhIkb3fE4Nc1i 5V85U5zG+hoDhP1GpfPAXQGuA566T8D3G80gn4T7Ujo/5mly2y4A41xd13GMAvT3cksN SODw== X-Gm-Message-State: AHQUAub41GBWgIaMq+6CxQbLmQajbUJU6VatYq+blmamxjwG7u65kZgh 4lHFv1+Pnp66v4679c2wQPeQRgAd X-Received: by 2002:a1c:b70b:: with SMTP id h11mr2273797wmf.72.1549703415702; Sat, 09 Feb 2019 01:10:15 -0800 (PST) Received: from ?IPv6:2003:ea:8bf1:e200:aca4:3d:8205:4c97? (p200300EA8BF1E200ACA4003D82054C97.dip0.t-ipconnect.de. [2003:ea:8bf1:e200:aca4:3d:8205:4c97]) by smtp.googlemail.com with ESMTPSA id v13sm11859567wra.54.2019.02.09.01.10.14 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 09 Feb 2019 01:10:14 -0800 (PST) Subject: Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27! To: Eric Dumazet , Sander Eikelenboom , Realtek linux nic maintainers , Eric Dumazet Cc: Linus Torvalds , linux-kernel , netdev References: <6c389fde-4c8d-300b-8c3c-300d6105c30a@eikelenboom.it> <0f605e50-56fe-06b5-9b66-6aed89a608ce@gmail.com> <471e550b-c227-22e6-19fd-5f9abd450e5f@eikelenboom.it> <1265d424-4943-e571-a74b-b1512ebec179@gmail.com> <059e59c6-2264-fd5c-068f-3656e39539c1@eikelenboom.it> <140d0df7-1775-5457-aa03-b21ece250a72@gmail.com> From: Heiner Kallweit Message-ID: Date: Sat, 9 Feb 2019 10:02:56 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09.02.2019 00:09, Eric Dumazet wrote: > > > On 02/08/2019 01:50 PM, Heiner Kallweit wrote: >> On 08.02.2019 22:45, Sander Eikelenboom wrote: >>> On 08/02/2019 22:22, Heiner Kallweit wrote: >>>> On 08.02.2019 21:55, Sander Eikelenboom wrote: >>>>> On 08/02/2019 19:52, Heiner Kallweit wrote: >>>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote: >>>>>>> L.S., >>>>>>> >>>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top but they don't seem related) under Xen i the nasty splat below, >>>>>>> that I haven encountered with Linux 4.20.x. >>>>>>> >>>>>>> Unfortunately I haven't got a clear reproducer for this and bisecting could be nasty due to another (networking related) kernel bug. >>>>>>> >>>>>>> If you need more info, want me to run a debug patch etc., please feel free to ask. >>>>>>> >>>>>> Thanks for the report. However I see no change in the r8169 driver between >>>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root cause could >>>>>> be somewhere else. Therefore I'm afraid a bisect will be needed. >>>>> >>>>> Hmm i did some diging and i think: >>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb barriers >>>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and __netdev_sent_queue >>>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queue >>>>> >>>> You're right. Thought this was added in 4.20 already. >>>> The BQL code pattern I copied from the mlx4 driver and so far I haven't heard about >>>> this issue from any user of physical hw. And due to the fact that a lot of mainboards >>>> have onboard Realtek network I have quite a few testers out there. >>>> Does the issue occur under specific circumstances like very high load? >>> >>> Yep, the box is already quite contented with the Xen VM's and if I remember correctly it occurred while kernel compiling >>> on the host. >>> >>>> If indeed the xmit_more patch causes the issue, I think we have to involve Eric Dumazet >>>> as author of the underlying changes. >>> >>> It could also be the barriers weren't that unneeded as assumed. >> >> The barriers were removed after adding xmit_more handling. Therefore it would be good to >> test also with only >> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb barriers >> removed. >> >>> Since we are almost at RC6 i took the liberty to CC Eric now. >>> >> Sure, thanks. >> >>> BTW am i correct these patches are merely optimizations ? >> >> Yes >> >>> If so and concluding they revert cleanly, perhaps it should be considered at this point in the RC's >>> to revert them for 5.0 and try again for 5.1 ? >>> >> Before removing both it would be good to test with only the barrier-removal removed. >> > > Commit 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and __netdev_sent_queue > looks buggy to me, since the skb might have been freed already on another cpu when you call > > You could try : > > diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c > index 3624e67aef72c92ed6e908e2c99ac2d381210126..f907d484165d9fd775e81bf2bfb9aa4ddedb1c93 100644 > --- a/drivers/net/ethernet/realtek/r8169.c > +++ b/drivers/net/ethernet/realtek/r8169.c > @@ -6070,6 +6070,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb, > dma_addr_t mapping; > u32 opts[2], len; > bool stop_queue; > + bool door_bell; > int frags; > > if (unlikely(!rtl_tx_slots_avail(tp, skb_shinfo(skb)->nr_frags))) { > @@ -6116,6 +6117,8 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb, > /* Force memory writes to complete before releasing descriptor */ > dma_wmb(); > > + door_bell = __netdev_sent_queue(dev, skb->len, skb->xmit_more); > + > txd->opts1 = rtl8169_get_txd_opts1(opts[0], len, entry); > > /* Force all memory writes to complete before notifying device */ > @@ -6127,7 +6130,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb, > if (unlikely(stop_queue)) > netif_stop_queue(dev); > > - if (__netdev_sent_queue(dev, skb->len, skb->xmit_more)) { > + if (door_bell) { > RTL_W8(tp, TxPoll, NPQ); > mmiowb(); > } > Thanks a lot for checking and for the proposed fix. Sander, can you try with this patch on top of 5.0-rc5 w/o removing two two commits? > > . > Heiner