Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp1359980yba; Tue, 2 Apr 2019 07:30:20 -0700 (PDT) X-Google-Smtp-Source: APXvYqzq+26StKb8I3YHsK879PY2x1X0geKCQGbzgitQigzsldGWYTVc+8HewLRfj11P+S32yL0J X-Received: by 2002:a62:ed10:: with SMTP id u16mr28859325pfh.187.1554215420510; Tue, 02 Apr 2019 07:30:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554215420; cv=none; d=google.com; s=arc-20160816; b=RlAGgd7aFQ8X5Ny4/Ik3utlv3u84Oi88MiaHTCMf7NvFPVw4RlNgD8riULraI7e2nE CcDOPLZqiJTQqIpn93pylB+r5ltSrfzYNU39ZZ3/JLo8ty2qjz6sfXIT6o7N8UuyOnzb Tj5ZAWcSvTLy0+ipGaHAgjWoadELoxsUveTCt9w7CMP4zNjaqaPW2kHadEqEYbBZfLBs SHr8waDp0EuNMg5sQVe6VHzsq5wd4ngR9sbKpNQQK1bu+qYhC7ktCSdk5+ztyilklKoG SuCW8ZOBSLaQdegZJ3nr9rf2LtjPQu6Vhrjw2v83bwaFdpXGfv/kToVrmPLrDhjGHo0W CObQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=aiuMiXu1iifNWXN56sgNzpmd184et2csZfrc7StD/+0=; b=N9enNCNwvpTZd8615f2tM1GCy9zacHVEmrz3sNu1Kcnb1wihjf52AM3Ad6yATQ6pXf YWmilzvmFHAV2k/spAMgPnSyzNaJBmxVKovV+5Du3rQX1qoTBHUq2hk1ITfxtq7UWO44 EqyU0I6ohFDGMn9bb9bngg+K1V3gpLzZKugXoY+V2XT+Eq8GCjGmaQSI2lEBp/ZNEOJr uARZjEbGeaFKwl3Kq5903iJIPhEP1RINRfq6W6zTi2JR1UDKPHYw22VsQNtTh8LZhYgM FEGQj3LBpAa64M9fckFEjZm7p3fl/hy9ATe0MdRxiCX6UYAxV5NxHL5kddUUi9C/49sK UpFw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h26si11787007pgl.21.2019.04.02.07.30.04; Tue, 02 Apr 2019 07:30:20 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730298AbfDBND5 (ORCPT + 99 others); Tue, 2 Apr 2019 09:03:57 -0400 Received: from foss.arm.com ([217.140.101.70]:50906 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726314AbfDBND5 (ORCPT ); Tue, 2 Apr 2019 09:03:57 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D0AD316A3; Tue, 2 Apr 2019 06:03:55 -0700 (PDT) Received: from fuggles.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.72.51.249]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0E9393F59C; Tue, 2 Apr 2019 06:03:51 -0700 (PDT) Date: Tue, 2 Apr 2019 14:03:46 +0100 From: Will Deacon To: "Paul E. McKenney" Cc: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, mingo@kernel.org, stern@rowland.harvard.edu, andrea.parri@amarulasolutions.com, peterz@infradead.org, boqun.feng@gmail.com, npiggin@gmail.com, dhowells@redhat.com, j.alglave@ucl.ac.uk, luc.maranget@inria.fr, akiyks@gmail.com, Benjamin Herrenschmidt , Michael Ellerman , Arnd Bergmann , Palmer Dabbelt , Daniel Lustig , Linus Torvalds , "Maciej W. Rozycki" , Mikulas Patocka Subject: Re: [PATCH tip/core/rcu 04/21] docs/memory-barriers.txt: Rewrite "KERNEL I/O BARRIER EFFECTS" section Message-ID: <20190402130346.GA14559@fuggles.cambridge.arm.com> References: <20190326234114.GA23843@linux.ibm.com> <20190326234133.24962-4-paulmck@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190326234133.24962-4-paulmck@linux.ibm.com> User-Agent: Mutt/1.11.1+86 (6f28e57d73f2) () Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 26, 2019 at 04:41:16PM -0700, Paul E. McKenney wrote: > From: Will Deacon > > The "KERNEL I/O BARRIER EFFECTS" section of memory-barriers.txt is vague, > x86-centric, out-of-date, incomplete and demonstrably incorrect in places. > This is largely because I/O ordering is a horrible can of worms, but also > because the document has stagnated as our understanding has evolved. > > Attempt to address some of that, by rewriting the section based on > recent(-ish) discussions with Arnd, BenH and others. Maybe one day we'll > find a way to formalise this stuff, but for now let's at least try to > make the English easier to understand. > > Cc: "Paul E. McKenney" > Cc: Benjamin Herrenschmidt > Cc: Michael Ellerman > Cc: Arnd Bergmann > Cc: Peter Zijlstra > Cc: Andrea Parri > Cc: Palmer Dabbelt > Cc: Daniel Lustig > Cc: David Howells > Cc: Alan Stern > Cc: Linus Torvalds > Cc: "Maciej W. Rozycki" > Cc: Mikulas Patocka > Signed-off-by: Will Deacon > Signed-off-by: Paul E. McKenney > --- > Documentation/memory-barriers.txt | 115 ++++++++++++++++++------------ > 1 file changed, 70 insertions(+), 45 deletions(-) If somebody could provide an Ack on this patch, I'd really appreciate it, please. Whilst the portable ordering guarantees that I've documented are fairly conservative, I do think that this change is a big improvement and gives you what you need if you're writing a portable device driver for a new piece of hardware. I'm tackling the removal of MMIOWB as a separate series. I think Paul now requires an Ack before he'll send a patch to mainline, hence the grovelling. Cheers, Will > diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt > index 1c22b21ae922..158947ae78c2 100644 > --- a/Documentation/memory-barriers.txt > +++ b/Documentation/memory-barriers.txt > @@ -2599,72 +2599,97 @@ likely, then interrupt-disabling locks should be used to guarantee ordering. > KERNEL I/O BARRIER EFFECTS > ========================== > > -When accessing I/O memory, drivers should use the appropriate accessor > -functions: > +Interfacing with peripherals via I/O accesses is deeply architecture and device > +specific. Therefore, drivers which are inherently non-portable may rely on > +specific behaviours of their target systems in order to achieve synchronization > +in the most lightweight manner possible. For drivers intending to be portable > +between multiple architectures and bus implementations, the kernel offers a > +series of accessor functions that provide various degrees of ordering > +guarantees: > > - (*) inX(), outX(): > + (*) readX(), writeX(): > > - These are intended to talk to I/O space rather than memory space, but > - that's primarily a CPU-specific concept. The i386 and x86_64 processors > - do indeed have special I/O space access cycles and instructions, but many > - CPUs don't have such a concept. > + The readX() and writeX() MMIO accessors take a pointer to the peripheral > + being accessed as an __iomem * parameter. For pointers mapped with the > + default I/O attributes (e.g. those returned by ioremap()), then the > + ordering guarantees are as follows: > > - The PCI bus, amongst others, defines an I/O space concept which - on such > - CPUs as i386 and x86_64 - readily maps to the CPU's concept of I/O > - space. However, it may also be mapped as a virtual I/O space in the CPU's > - memory map, particularly on those CPUs that don't support alternate I/O > - spaces. > + 1. All readX() and writeX() accesses to the same peripheral are ordered > + with respect to each other. For example, this ensures that MMIO register > + writes by the CPU to a particular device will arrive in program order. > > - Accesses to this space may be fully synchronous (as on i386), but > - intermediary bridges (such as the PCI host bridge) may not fully honour > - that. > + 2. A writeX() by the CPU to the peripheral will first wait for the > + completion of all prior CPU writes to memory. For example, this ensures > + that writes by the CPU to an outbound DMA buffer allocated by > + dma_alloc_coherent() will be visible to a DMA engine when the CPU writes > + to its MMIO control register to trigger the transfer. > > - They are guaranteed to be fully ordered with respect to each other. > + 3. A readX() by the CPU from the peripheral will complete before any > + subsequent CPU reads from memory can begin. For example, this ensures > + that reads by the CPU from an incoming DMA buffer allocated by > + dma_alloc_coherent() will not see stale data after reading from the DMA > + engine's MMIO status register to establish that the DMA transfer has > + completed. > > - They are not guaranteed to be fully ordered with respect to other types of > - memory and I/O operation. > + 4. A readX() by the CPU from the peripheral will complete before any > + subsequent delay() loop can begin execution. For example, this ensures > + that two MMIO register writes by the CPU to a peripheral will arrive at > + least 1us apart if the first write is immediately read back with readX() > + and udelay(1) is called prior to the second writeX(). > > - (*) readX(), writeX(): > + __iomem pointers obtained with non-default attributes (e.g. those returned > + by ioremap_wc()) are unlikely to provide many of these guarantees. > > - Whether these are guaranteed to be fully ordered and uncombined with > - respect to each other on the issuing CPU depends on the characteristics > - defined for the memory window through which they're accessing. On later > - i386 architecture machines, for example, this is controlled by way of the > - MTRR registers. > + (*) readX_relaxed(), writeX_relaxed(): > > - Ordinarily, these will be guaranteed to be fully ordered and uncombined, > - provided they're not accessing a prefetchable device. > + These are similar to readX() and writeX(), but provide weaker memory > + ordering guarantees. Specifically, they do not guarantee ordering with > + respect to normal memory accesses or delay() loops (i.e bullets 2-4 above) > + but they are still guaranteed to be ordered with respect to other accesses > + to the same peripheral when operating on __iomem pointers mapped with the > + default I/O attributes. > > - However, intermediary hardware (such as a PCI bridge) may indulge in > - deferral if it so wishes; to flush a store, a load from the same location > - is preferred[*], but a load from the same device or from configuration > - space should suffice for PCI. > + (*) readsX(), writesX(): > > - [*] NOTE! attempting to load from the same location as was written to may > - cause a malfunction - consider the 16550 Rx/Tx serial registers for > - example. > + The readsX() and writesX() MMIO accessors are designed for accessing > + register-based, memory-mapped FIFOs residing on peripherals that are not > + capable of performing DMA. Consequently, they provide only the ordering > + guarantees of readX_relaxed() and writeX_relaxed(), as documented above. > > - Used with prefetchable I/O memory, an mmiowb() barrier may be required to > - force stores to be ordered. > + (*) inX(), outX(): > > - Please refer to the PCI specification for more information on interactions > - between PCI transactions. > + The inX() and outX() accessors are intended to access legacy port-mapped > + I/O peripherals, which may require special instructions on some > + architectures (notably x86). The port number of the peripheral being > + accessed is passed as an argument. > > - (*) readX_relaxed(), writeX_relaxed() > + Since many CPU architectures ultimately access these peripherals via an > + internal virtual memory mapping, the portable ordering guarantees provided > + by inX() and outX() are the same as those provided by readX() and writeX() > + respectively when accessing a mapping with the default I/O attributes. > > - These are similar to readX() and writeX(), but provide weaker memory > - ordering guarantees. Specifically, they do not guarantee ordering with > - respect to normal memory accesses (e.g. DMA buffers) nor do they guarantee > - ordering with respect to LOCK or UNLOCK operations. If the latter is > - required, an mmiowb() barrier can be used. Note that relaxed accesses to > - the same peripheral are guaranteed to be ordered with respect to each > - other. > + Device drivers may expect outX() to emit a non-posted write transaction > + that waits for a completion response from the I/O peripheral before > + returning. This is not guaranteed by all architectures and is therefore > + not part of the portable ordering semantics. > + > + (*) insX(), outsX(): > + > + As above, the insX() and outX() accessors provide the same ordering > + guarantees as readsX() and writesX() respectively when accessing a mapping > + with the default I/O attributes. > > (*) ioreadX(), iowriteX() > > These will perform appropriately for the type of access they're actually > doing, be it inX()/outX() or readX()/writeX(). > > +All of these accessors assume that the underlying peripheral is little-endian, > +and will therefore perform byte-swapping operations on big-endian architectures. > + > +Composing I/O ordering barriers with SMP ordering barriers and LOCK/UNLOCK > +operations is a dangerous sport which may require the use of mmiowb(). See the > +subsection "Acquires vs I/O accesses" for more information. > > ======================================== > ASSUMED MINIMUM EXECUTION ORDERING MODEL > -- > 2.17.1 >