Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp5012875imm; Fri, 18 May 2018 14:56:39 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqUXPNHn8az99ecQWbskvcEeS3pkLJweY78Domg3N6Y1gFwwA2YbOfLXm28HGq3oj7PQmzQ X-Received: by 2002:a62:981d:: with SMTP id q29-v6mr11029766pfd.65.1526680599053; Fri, 18 May 2018 14:56:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526680599; cv=none; d=google.com; s=arc-20160816; b=0zqq6j9C8Nd/xTGqw3QnPttR9HkVy5xiIfc+2WZeNIpayAXKv0u/W9D8jKAPVirJn/ jhBDJm63HRjanxDpoPXtbQuXzFTJmXXlu8NyV2K/3ts5byzLX//dMNrl3oP6JbsPjG9Y BRGjCdw+q3A9jmmMCAVUVXmPKhaF3ibEylsAOyCvgHgJbEOgo27O371YjqvXwOY7kHEB AZBVnN0JCM3vW4k0lTi8BJAsM9pBj0d/IRachSyNMEjd2qDzY5/3gsLgBSi/v5A0z255 zRC+iA69ZlKf8GocmfM59hQZmtU6vYdjl2tjYYn1+3PX63YRlqKf7MhvVNmldziB2RIh runw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=YeH6iKSTcFyVKryKyev7q2vPwoXgmSMUa/5s6+wfddQ=; b=eSIT5IdcJJx8gN7vebDCNxwehOKOAEL3JjFtFo1yuyFsVc95BVFmyK7LgOPXJSe5+2 xJLnv/tzOEAgkJI3fjhObyPTrBeyoG4NXdgVgPOjqMZoZqZG7S7P2x8uhGHBMjD3MOTr WtHMw+9VqMwPWgG2IJA6/DOoo0uColMKSYg+Q0lznNkXyO9shiNxSGipVYpBNqU5J4Kl yaOh8usH0cYFh+Pq6ItmtVTg8cD7YUUnk8jVusd7TWdgjghZVQTpr9In3NjkoTCs0YmH AV72iStGtS+arnqX9/803HVA05NqNWirrQB//oUYJFIxoeSLLFgjYB2zpu2LT2g2nFcc U/lg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@armlinux.org.uk header.s=pandora-2014 header.b=glGgdr24; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=armlinux.org.uk Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z15-v6si6732767pgr.615.2018.05.18.14.56.24; Fri, 18 May 2018 14:56:39 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@armlinux.org.uk header.s=pandora-2014 header.b=glGgdr24; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=armlinux.org.uk Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752176AbeERV4G (ORCPT + 99 others); Fri, 18 May 2018 17:56:06 -0400 Received: from pandora.armlinux.org.uk ([78.32.30.218]:54836 "EHLO pandora.armlinux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751458AbeERV4C (ORCPT ); Fri, 18 May 2018 17:56:02 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=armlinux.org.uk; s=pandora-2014; h=Sender:In-Reply-To: Content-Transfer-Encoding:Content-Type:MIME-Version:References:Message-ID: Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=YeH6iKSTcFyVKryKyev7q2vPwoXgmSMUa/5s6+wfddQ=; b=glGgdr24duEvLrGz6gMDXiKT/ snPHPDe9wIr8QXRupiA5u3yd4LYPGlE6BrYgjOu4cdm3eJJi0MXMfOJ1TqtbykoyAe+tSN7KsE8Ql b2pVRT00WduEAdANOVu142CwMxUtaVx3HKXCUSLTFOhEqihHDY8xFZp69kO68WasXaxx4=; Received: from n2100.armlinux.org.uk ([fd8f:7570:feb6:1:214:fdff:fe10:4f86]:46510) by pandora.armlinux.org.uk with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.90_1) (envelope-from ) id 1fJnLt-0000FF-3a; Fri, 18 May 2018 22:55:53 +0100 Received: from linux by n2100.armlinux.org.uk with local (Exim 4.90_1) (envelope-from ) id 1fJnLp-000209-Vd; Fri, 18 May 2018 22:55:50 +0100 Date: Fri, 18 May 2018 22:55:48 +0100 From: Russell King - ARM Linux To: Vineet Gupta Cc: Alexey Brodkin , "hch@lst.de" , "linux-arch@vger.kernel.org" , "linux-xtensa@linux-xtensa.org" , "monstr@monstr.eu" , "deanbo422@gmail.com" , "linux-c6x-dev@linux-c6x.org" , "linux-parisc@vger.kernel.org" , "linux-sh@vger.kernel.org" , "linux-m68k@lists.linux-m68k.org" , "linux-hexagon@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "iommu@lists.linux-foundation.org" , "openrisc@lists.librecores.org" , "green.hu@gmail.com" , "linux-alpha@vger.kernel.org" , "sparclinux@vger.kernel.org" , "nios2-dev@lists.rocketboards.org" , Andrew Morton , "linux-snps-arc@lists.infradead.org" , "linux-arm-kernel@lists.infradead.org" Subject: Re: dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation) Message-ID: <20180518215548.GH17671@n2100.armlinux.org.uk> References: <20180511075945.16548-1-hch@lst.de> <20180511075945.16548-3-hch@lst.de> <5ac5b1e3-9b96-9c7c-4dfe-f65be45ec179@synopsys.com> <20180518175004.GF17671@n2100.armlinux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 18, 2018 at 01:35:08PM -0700, Vineet Gupta wrote: > On 05/18/2018 10:50 AM, Russell King - ARM Linux wrote: > >On Fri, May 18, 2018 at 10:20:02AM -0700, Vineet Gupta wrote: > >>I never understood the need for this direction. And if memory serves me > >>right, at that time I was seeing twice the amount of cache flushing ! > >It's necessary. Take a moment to think carefully about this: > > > > dma_map_single(, dir) > > > > dma_sync_single_for_cpu(, dir) > > > > dma_sync_single_for_device(, dir) > > > > dma_unmap_single(, dir) > > As an aside, do these imply a state machine of sorts - does a driver needs > to always call map_single first ? Kind-of, but some drivers do omit some of the dma_sync_*() calls. For example, if a buffer is written to, then mapped with TO_DEVICE, and then the CPU wishes to write to it, it's fairly common that a driver omits the dma_sync_single_for_cpu() call. If you think about the cases I gave and what cache operations happen, such a scenario practically turns out to be safe. > My original point of contention/confusion is the specific combinations of > API and direction, specifically for_cpu(TO_DEV) and for_device(TO_CPU) Remember that it is expected that all calls for a mapping use the same direction argument while that mapping exists. In other words, if you call dma_map_single(TO_DEVICE) and then use any of the other functions, the other functions will also use TO_DEVICE. The DMA direction argument describes the direction of the DMA operation being performed on the buffer, not on the individual dma_* operation. What isn't expected at arch level is for drivers to do: dma_map_single(TO_DEVICE) dma_sync_single_for_cpu(FROM_DEVICE) or vice versa. > Semantically what does dma_sync_single_for_cpu(TO_DEV) even imply for a non > dma coherent arch. > > Your tables below have "none" for both, implying it is unlikely to be a real > combination (for ARM and ARC atleast). Very little for the cases that I've stated (and as I mentioned above, some drivers do omit the call in that case.) > The other case, actually @dir TO_CPU, independent of for_{cpu, device}? > implies driver intends to touch it after the call, so it would invalidate > any stray lines, unconditionally (and not just for speculative prefetch > case). If you don't have a CPU that speculatively prefetches, and you've already had to invalidate the cache lines (to avoid write-backs corrupting DMA'd data) then there's no need for the architecture to do any work at the for_cpu(TO_CPU) case - the CPU shouldn't be touching cache lines that are part of the buffer while it is mapped, which means a non-speculating CPU won't pull in any cache lines without an explicit access. Speculating CPUs are different. The action of the speculation is to try and guess what data the program wants to access ahead of the program flow. That causes the CPU to prefetch data into the cache. The point in the program flow that this happens is not really determinant to the programmer. This means that if you try to read from the DMA buffer after the DMA operation has complete without invalidating the cache between the DMA completing and the CPU reading, you have no guarantee that you're reading the data that the DMA operation has been written. The cache may have loaded itself with data before the DMA operation completed, and the CPU may see that stale data. The difference between non-speculating CPUs and speculating CPUs is that for non-speculating CPUs, caches work according to explicit accesses by the program, and the program is stalled while the data is fetched from external memory. Speculating CPUs try to predict ahead of time what data the program will require in the future, and attempt to load that data into the caches _before_ the program requires it - which means that the program suffers fewer stalls. > >In the case of a DMA-incoherent architecture, the operations done at each > >stage depend on the direction argument: > > > > map for_cpu for_device unmap > >TO_DEV writeback none writeback none > >TO_CPU invalidate invalidate* invalidate invalidate* > >BIDIR writeback invalidate writeback invalidate > > > >* - only necessary if the CPU speculatively prefetches. > > > >The multiple invalidations for the TO_CPU case handles different > >conditions that can result in data corruption, and for some CPUs, all > >four are necessary. > > Can you please explain in some more detail, TO_CPU row, why invalidate is > conditional sometimes. See above - I hope my explanation above is sufficient. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up