Received: by 2002:a05:6a10:6744:0:0:0:0 with SMTP id w4csp4718448pxu; Wed, 21 Oct 2020 03:41:13 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyd8sDkzaq+L+pLj9KJKtYnh4399OLbfKgKgDllEzrmO8RF/iDB/YkmnT+uGb0s1e+Xy5cP X-Received: by 2002:a05:6402:d6:: with SMTP id i22mr2349188edu.53.1603276872843; Wed, 21 Oct 2020 03:41:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1603276872; cv=none; d=google.com; s=arc-20160816; b=qL1jbnZwnObw9kE1EFKMJw3rTFgOzG4Wahn+CCgNJe2mO2vF1jiJpPxd8IjKuz84PH 0qk009QNZzW6IxVTvZaM/l1mt8o5/ILrTUfWf9XHV6Gcikk1jNLTx33TNdQ93f2N8HQ8 kp4cwilEMhiXQ0EQLUy2WKxEg2KBJGyrIzWo9t4Ri28h4FDIWMsGtmKurzDXH1Q+Xhw2 so86lorKLHIaiXoy1LQuPQ1wl7znvn4O5ZToUxRwKK50O4xycDmul5pHsI54W/OW5eMl TTNdROB+isQvvVj/xCpXi9IZTwECSw+BlJSn/SSoX/eQua64erlfvyhyUc4j2RBm5fvR xbPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:date:cc:to:from:subject:message-id; bh=ZcVc2LWxFppYH0rFBuNCs3w5w6qLFqnsX8M+SIjvkcM=; b=v5l5W/9G7FCT3QtzsBw4QRwfIriEjOyKSfzfZ5A080hLK+pdzLGs+9V6aDMh2JDnYJ 7B9cqv/+cD/cdX95T8GyKv7iWrVQH5rOLveLuL7eGdVnY5mkOziP4uudD/lfdMBnoLu1 6hKaxKbJ6KYEmmaA+NDlh74mTgM287sNjAbi6xw08OgLvpeYqxIsOHNQ9OT1biG+3Kly om/Z+o+gClkuBTtgcoPH3S+69ymH/YFAFS7lC7TXjGqDcnHSLrbjwQpFOuB/0KaXPhwX asQEpovf1N7B59StuyAKtbV+bY9l/3WHVwAI7R0QyDjm3sEmRRRsQPjzm2ySc02jzfkF Gu5w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id k19si1112410ejg.614.2020.10.21.03.40.50; Wed, 21 Oct 2020 03:41:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2410396AbgJTWKr (ORCPT + 99 others); Tue, 20 Oct 2020 18:10:47 -0400 Received: from kernel.crashing.org ([76.164.61.194]:43522 "EHLO kernel.crashing.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2393563AbgJTWKr (ORCPT ); Tue, 20 Oct 2020 18:10:47 -0400 Received: from localhost (gate.crashing.org [63.228.1.57]) (authenticated bits=0) by kernel.crashing.org (8.14.7/8.14.7) with ESMTP id 09KMA9S4006827 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 20 Oct 2020 17:10:14 -0500 Message-ID: <32bfb619bbb3cd6f52f9e5da205673702fed228f.camel@kernel.crashing.org> Subject: Re: [PATCH] net: ftgmac100: Fix missing TX-poll issue From: Benjamin Herrenschmidt To: Arnd Bergmann , Dylan Hung Cc: Jakub Kicinski , Joel Stanley , "David S . Miller" , "netdev@vger.kernel.org" , Linux Kernel Mailing List , Po-Yu Chuang , linux-aspeed , OpenBMC Maillist , BMC-SW Date: Wed, 21 Oct 2020 09:10:02 +1100 In-Reply-To: References: <20201019073908.32262-1-dylan_hung@aspeedtech.com> <20201019120040.3152ea0b@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.28.5-0ubuntu0.18.04.2 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2020-10-20 at 21:49 +0200, Arnd Bergmann wrote: > On Tue, Oct 20, 2020 at 11:37 AM Dylan Hung wrote: > > > +1 @first is system memory from dma_alloc_coherent(), right? > > > > > > You shouldn't have to do this. Is coherent DMA memory broken on your > > > platform? > > > > It is about the arbitration on the DRAM controller. There are two queues in the dram controller, one is for the CPU access and the other is for the HW engines. > > When CPU issues a store command, the dram controller just acknowledges cpu's request and pushes the request into the queue. Then CPU triggers the HW MAC engine, the HW engine starts to fetch the DMA memory. > > But since the cpu's request may still stay in the queue, the HW engine may fetch the wrong data. Actually, I take back what I said earlier, the above seems to imply this is more generic. Dylan, please confirm, does this affect *all* DMA capable devices ? If yes, then it's a really really bad design bug in your chips unfortunately and the proper fix is indeed to make dma_wmb() do a dummy read of some sort (what address though ? would any dummy non-cachable page do ?) to force the data out as *all* drivers will potentially be affected. I was under the impression that it was a specific timing issue in the vhub and ethernet parts, but if it's more generic then it needs to be fixed globally. > There is still something missing in the explanation: The iowrite32() > only tells the > device that it should check the queue, but not where the data is. I would expect > the device to either see the correct data that was marked valid by the > 'dma_wmb();first->txdes0 = cpu_to_le32(f_ctl_stat);' operation, or it would see > the old f_ctl_stat value telling it that the data is not yet valid and > not look at > the rest of the descriptor. In the second case you would see the data > not getting sent out until the next start_xmit(), but the device should not > fetch wrong data. > > There are two possible scenarios in which your patch would still help: > > a) the dma_wmb() does not serialize the stores as seen by DMA the > way it is supposed to, so the device can observe the new value of txdec0 > before it observes the correct data. > > b) The txdes0 field sometimes contains stale data that marks the > descriptor as valid before the correct data is written. This field > should have been set in ftgmac100_tx_complete_packet() earlier > > If either of the two is the case, then the READ_ONCE() would just > introduce a long delay before the iowrite32() that makes it more likely > that the data is there, but the inconsistent state would still be observable > by the device if it is still working on previous frames. I think it just get stuck until we try another packet, ie, it doesn't see the new descriptor valid bit. But Dylan can elaborate. Cheers, Ben.