Received: by 2002:a05:6a10:c604:0:0:0:0 with SMTP id y4csp4450719pxt; Wed, 11 Aug 2021 06:26:01 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyfJZAvuE6XlQJJwqr/TB7RCQYVfLvN62eBDpQJ4D3AGQkRLqXsCAjRr5N0JZQ/jIsmO8zu X-Received: by 2002:a17:906:d20a:: with SMTP id w10mr3790880ejz.426.1628688361287; Wed, 11 Aug 2021 06:26:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1628688361; cv=none; d=google.com; s=arc-20160816; b=DR+GUzEDcKob0+03M16QVH0HpP07gM45WSTedH9Ine/8y/k8uAIpDnzvGULNBZDyJ9 T8iWa0q4185UuJuD/lFZ48e0xHrBhOOh2R3gH3vB4CfMv8+y8bTMUNHwVYWBpZohki2s xTax5On960BRyJzHBS7imC+SgtsPyLz/6WTFQXzGGxPwMFJzp9RG/Uyn3TN9FlT8/sA/ df2eJ/KEXCxZFjhq8/mYB56LesJehQAvbxrKDw2ecthURtF56g9hrGmJxBJVnCamMEbt LxFXhjQoLvtJiovwHfwvsYjLIg8JaXH7dYOa0ZurM7yr8DpQlurqzUq/mqGuOZPJMELE EmEA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:references:in-reply-to :subject:cc:to:from:message-id:date; bh=mZQWAy2wMHwF6rQPaiFoJL1dkTmS+iiLoFCxR2NCQnM=; b=t3lhXwX44dZFFsGH+/HYfESuml0S7Lj+bMIugtGBd6o4xZpztfwSCKn3ZchfrW8/Lt nH1CYCxPQll0zstohg/7sx72HPwa9fPEQvttIX+tT8C+CjNsFBwBJJ2g8m+bASF9R7Da yDNTDX/yF+keEFXqDmol53NyFsWYayIRI8EzpUeLuJM4J5fwFVBJiHgm2V6LZ7ZdNyto 6xpRxLkA2Tvzc7L2laZXalZN3ndJ3jEgt6jDAKhFhmyDyNNVaOzJpa1JoV+GERiObmwF 5QHj01ZyS2dqBdVla6m1dqQqd2ItZD/w5iPAe4rL6tgZv4fzIdp0CPv1xs4rF2SOeUWI GGIg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q15si23655631ejs.626.2021.08.11.06.25.37; Wed, 11 Aug 2021 06:26:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231445AbhHKNYM (ORCPT + 99 others); Wed, 11 Aug 2021 09:24:12 -0400 Received: from mail.kernel.org ([198.145.29.99]:43060 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229968AbhHKNXg (ORCPT ); Wed, 11 Aug 2021 09:23:36 -0400 Received: from disco-boy.misterjones.org (disco-boy.misterjones.org [51.254.78.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id BD90F6054F; Wed, 11 Aug 2021 13:23:12 +0000 (UTC) Received: from sofa.misterjones.org ([185.219.108.64] helo=why.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1mDoCM-004LE7-P4; Wed, 11 Aug 2021 14:23:10 +0100 Date: Wed, 11 Aug 2021 14:23:10 +0100 Message-ID: <87pmuk9ku9.wl-maz@kernel.org> From: Marc Zyngier To: Thierry Reding Cc: Matteo Croce , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Giuseppe Cavallaro , Alexandre Torgue , "David S. Miller" , Jakub Kicinski , Palmer Dabbelt , Paul Walmsley , Drew Fustini , Emil Renner Berthing , Jon Hunter , Will Deacon Subject: Re: [PATCH net-next] stmmac: align RX buffers In-Reply-To: References: <20210614022504.24458-1-mcroce@linux.microsoft.com> <871r71azjw.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: thierry.reding@gmail.com, mcroce@linux.microsoft.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, peppe.cavallaro@st.com, alexandre.torgue@foss.st.com, davem@davemloft.net, kuba@kernel.org, palmer@dabbelt.com, paul.walmsley@sifive.com, drew@beagleboard.org, kernel@esmil.dk, jonathanh@nvidia.com, will@kernel.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 11 Aug 2021 11:41:59 +0100, Thierry Reding wrote: > > On Tue, Aug 10, 2021 at 08:07:47PM +0100, Marc Zyngier wrote: > > Hi all, > > > > [adding Thierry, Jon and Will to the fun] > > > > On Mon, 14 Jun 2021 03:25:04 +0100, > > Matteo Croce wrote: > > > > > > From: Matteo Croce > > > > > > On RX an SKB is allocated and the received buffer is copied into it. > > > But on some architectures, the memcpy() needs the source and destination > > > buffers to have the same alignment to be efficient. > > > > > > This is not our case, because SKB data pointer is misaligned by two bytes > > > to compensate the ethernet header. > > > > > > Align the RX buffer the same way as the SKB one, so the copy is faster. > > > An iperf3 RX test gives a decent improvement on a RISC-V machine: > > > > > > before: > > > [ ID] Interval Transfer Bitrate Retr > > > [ 5] 0.00-10.00 sec 733 MBytes 615 Mbits/sec 88 sender > > > [ 5] 0.00-10.01 sec 730 MBytes 612 Mbits/sec receiver > > > > > > after: > > > [ ID] Interval Transfer Bitrate Retr > > > [ 5] 0.00-10.00 sec 1.10 GBytes 942 Mbits/sec 0 sender > > > [ 5] 0.00-10.00 sec 1.09 GBytes 940 Mbits/sec receiver > > > > > > And the memcpy() overhead during the RX drops dramatically. > > > > > > before: > > > Overhead Shared O Symbol > > > 43.35% [kernel] [k] memcpy > > > 33.77% [kernel] [k] __asm_copy_to_user > > > 3.64% [kernel] [k] sifive_l2_flush64_range > > > > > > after: > > > Overhead Shared O Symbol > > > 45.40% [kernel] [k] __asm_copy_to_user > > > 28.09% [kernel] [k] memcpy > > > 4.27% [kernel] [k] sifive_l2_flush64_range > > > > > > Signed-off-by: Matteo Croce > > > > This patch completely breaks my Jetson TX2 system, composed of 2 > > Nvidia Denver and 4 Cortex-A57, in a very "funny" way. > > > > Any significant amount of traffic result in all sort of corruption > > (ssh connections get dropped, Debian packages downloaded have the > > wrong checksums) if any Denver core is involved in any significant way > > (packet processing, interrupt handling). And it is all triggered by > > this very change. > > > > The only way I have to make it work on a Denver core is to route the > > interrupt to that particular core and taskset the workload to it. Any > > other configuration involving a Denver CPU results in some sort of > > corruption. On their own, the A57s are fine. > > > > This smells of memory ordering going really wrong, which this change > > would expose. I haven't had a chance to dig into the driver yet (it > > took me long enough to bisect it), but if someone points me at what is > > supposed to synchronise the DMA when receiving an interrupt, I'll have > > a look. > > One other thing that kind of rings a bell when reading DMA and > interrupts is a recent report (and attempt to fix this) where upon > resume from system suspend, the DMA descriptors would get corrupted. > > I don't think we ever figured out what exactly the problem was, but > interestingly the fix for the issue immediately caused things to go > haywire on... Jetson TX2. I love this machine... Did this issue occur with the Denver CPUs disabled? > I recall looking at this a bit and couldn't find where exactly the DMA > was being synchronized on suspend/resume, or what the mechanism was to > ensure that (in transit) packets were not received after the suspension > of the Ethernet device. Some information about this can be found here: > > https://lore.kernel.org/netdev/708edb92-a5df-ecc4-3126-5ab36707e275@nvidia.com/ > > It's interesting that this happens only on Jetson TX2. Apparently on the > newer Jetson AGX Xavier this problem does not occur. I think Jon also > narrowed this down to being related to the IOMMU being enabled on Jetson > TX2, whereas Jetson AGX Xavier didn't have it enabled. I wasn't able to > find any notes on whether disabling the IOMMU on Jetson TX2 did anything > to improve on this, so perhaps that's something worth trying. Actually, I was running with the SMMU disabled, as I use the upstream u-boot provided DT. Switching to the kernel one didn't change a thing (with passthough or not). > We have since enabled the IOMMU on Jetson AGX Xavier, and I haven't seen > any test reports indicating that this is causing issues. So I don't > think this has anything directly to do with the IOMMU support. No, it looks more like either ordering or cache management. The fact that this patch messes with the buffer alignment makes me favour the latter... > That said, if these problems are all exclusive to Jetson TX2, or rather > Tegra186, that could indicate that we're missing something at a more > fundamental level (maybe some cache maintenance quirk?). That'd be pretty annoying. Do you know if the Ethernet is a coherent device on this machine? or does it need active cache maintenance? Thanks, M. -- Without deviation from the norm, progress is not possible.