Received: by 2002:ac0:bc90:0:0:0:0:0 with SMTP id a16csp4764180img; Tue, 26 Mar 2019 16:43:07 -0700 (PDT) X-Google-Smtp-Source: APXvYqwqQZ7V9ZeMSpTntjDFlYA1slLNz6KQTEBu7+0lf9ifFam1CcCAv6zktfJCaKywCvJiqRv1 X-Received: by 2002:a17:902:781:: with SMTP id 1mr756674plj.300.1553643787365; Tue, 26 Mar 2019 16:43:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553643787; cv=none; d=google.com; s=arc-20160816; b=xQ7DUwbyARG6xVRhinb5jzVqU0v2iNX48cINmDEBt0kCy/8m98yAMcF5xEbZS277Zz 3O6PBFlTy9Z1u3K3CQRWi1raVvrRdTKRjfJF9trRlC4mNvAM028kdGvT08/7coJj7nAe cOe7VkreuFQZ5hEBgwED/L8hdI+HbJhIlZbc26vE2QlEoEoYqEFjpmQH8OfmQFe6QkLC 1urb9sxy628rXn7sANRqimGgHQ0XvG/H3R/c2nX7qH0r+Ef2u1hA3bmAiZq5XSnOQoLX FvO1p2xiDm7sPt47Sp5W/Zj0+8T/C50CJaDzhkSAUUZJ6YRTBN2qsdVyX/ZLxc5/VS+j 8GvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:references:in-reply-to:date :subject:cc:to:from; bh=KdC4N4tZ51MLgmzGggbA3EXWwhWulGGsBLKEgdztQ4A=; b=xdLXc4pFu1GeAW3IjsT4kx691aJ1yQnmXhf03PNiv1qjPoLoKYL/R+ILIOPV8gAbQc Jfrt9x2CS7oZM6T/QONEHmZkE59tFxMjoKPqYvuDy9U6hZXl+U9VG8TfbuiOSdNFn+on N0GNQ3ZOThbmEfiIkV1OyWF3Wub+8zpz1q64khSYnkJki5dWw2Fgk+SZgtoLWl9aYvP3 bNmUrms6rPmrv+rT++S84zgQtSPewTjHrdacpLyPbwWKoanNE5gFveMqyrXiUbpg8k4v ye3nATIaZAGVN457V6wHpQ6bztnRnlA5OCoVq9Xh8FyWGN2LOhSToDihHVnlWlH1XHk7 aitg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r133si17262309pgr.175.2019.03.26.16.42.52; Tue, 26 Mar 2019 16:43:07 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733132AbfCZXl7 (ORCPT + 99 others); Tue, 26 Mar 2019 19:41:59 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:40880 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732965AbfCZXlo (ORCPT ); Tue, 26 Mar 2019 19:41:44 -0400 Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x2QNXess135577 for ; Tue, 26 Mar 2019 19:41:43 -0400 Received: from e12.ny.us.ibm.com (e12.ny.us.ibm.com [129.33.205.202]) by mx0a-001b2d01.pphosted.com with ESMTP id 2rft2r11a7-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 26 Mar 2019 19:41:43 -0400 Received: from localhost by e12.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 26 Mar 2019 23:41:42 -0000 Received: from b01cxnp23032.gho.pok.ibm.com (9.57.198.27) by e12.ny.us.ibm.com (146.89.104.199) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Tue, 26 Mar 2019 23:41:35 -0000 Received: from b01ledav003.gho.pok.ibm.com (b01ledav003.gho.pok.ibm.com [9.57.199.108]) by b01cxnp23032.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x2QNfYHS10485848 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 26 Mar 2019 23:41:34 GMT Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 87E9CB206B; Tue, 26 Mar 2019 23:41:34 +0000 (GMT) Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 55671B2066; Tue, 26 Mar 2019 23:41:34 +0000 (GMT) Received: from paulmck-ThinkPad-W541 (unknown [9.70.82.188]) by b01ledav003.gho.pok.ibm.com (Postfix) with ESMTP; Tue, 26 Mar 2019 23:41:34 +0000 (GMT) Received: by paulmck-ThinkPad-W541 (Postfix, from userid 1000) id 78A7F16C6081; Tue, 26 Mar 2019 16:41:35 -0700 (PDT) From: "Paul E. McKenney" To: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, mingo@kernel.org Cc: stern@rowland.harvard.edu, andrea.parri@amarulasolutions.com, will.deacon@arm.com, peterz@infradead.org, boqun.feng@gmail.com, npiggin@gmail.com, dhowells@redhat.com, j.alglave@ucl.ac.uk, luc.maranget@inria.fr, akiyks@gmail.com, "Paul E. McKenney" , Benjamin Herrenschmidt , Michael Ellerman , Arnd Bergmann , Palmer Dabbelt , Daniel Lustig , Linus Torvalds , "Maciej W. Rozycki" , Mikulas Patocka Subject: [PATCH tip/core/rcu 04/21] docs/memory-barriers.txt: Rewrite "KERNEL I/O BARRIER EFFECTS" section Date: Tue, 26 Mar 2019 16:41:16 -0700 X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190326234114.GA23843@linux.ibm.com> References: <20190326234114.GA23843@linux.ibm.com> X-TM-AS-GCONF: 00 x-cbid: 19032623-0060-0000-0000-00000322757A X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00010820; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000282; SDB=6.01180142; UDB=6.00617577; IPR=6.00960858; MB=3.00026170; MTD=3.00000008; XFM=3.00000015; UTC=2019-03-26 23:41:40 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19032623-0061-0000-0000-000048BCDA66 Message-Id: <20190326234133.24962-4-paulmck@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-03-26_15:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1903260159 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Will Deacon The "KERNEL I/O BARRIER EFFECTS" section of memory-barriers.txt is vague, x86-centric, out-of-date, incomplete and demonstrably incorrect in places. This is largely because I/O ordering is a horrible can of worms, but also because the document has stagnated as our understanding has evolved. Attempt to address some of that, by rewriting the section based on recent(-ish) discussions with Arnd, BenH and others. Maybe one day we'll find a way to formalise this stuff, but for now let's at least try to make the English easier to understand. Cc: "Paul E. McKenney" Cc: Benjamin Herrenschmidt Cc: Michael Ellerman Cc: Arnd Bergmann Cc: Peter Zijlstra Cc: Andrea Parri Cc: Palmer Dabbelt Cc: Daniel Lustig Cc: David Howells Cc: Alan Stern Cc: Linus Torvalds Cc: "Maciej W. Rozycki" Cc: Mikulas Patocka Signed-off-by: Will Deacon Signed-off-by: Paul E. McKenney --- Documentation/memory-barriers.txt | 115 ++++++++++++++++++------------ 1 file changed, 70 insertions(+), 45 deletions(-) diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 1c22b21ae922..158947ae78c2 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -2599,72 +2599,97 @@ likely, then interrupt-disabling locks should be used to guarantee ordering. KERNEL I/O BARRIER EFFECTS ========================== -When accessing I/O memory, drivers should use the appropriate accessor -functions: +Interfacing with peripherals via I/O accesses is deeply architecture and device +specific. Therefore, drivers which are inherently non-portable may rely on +specific behaviours of their target systems in order to achieve synchronization +in the most lightweight manner possible. For drivers intending to be portable +between multiple architectures and bus implementations, the kernel offers a +series of accessor functions that provide various degrees of ordering +guarantees: - (*) inX(), outX(): + (*) readX(), writeX(): - These are intended to talk to I/O space rather than memory space, but - that's primarily a CPU-specific concept. The i386 and x86_64 processors - do indeed have special I/O space access cycles and instructions, but many - CPUs don't have such a concept. + The readX() and writeX() MMIO accessors take a pointer to the peripheral + being accessed as an __iomem * parameter. For pointers mapped with the + default I/O attributes (e.g. those returned by ioremap()), then the + ordering guarantees are as follows: - The PCI bus, amongst others, defines an I/O space concept which - on such - CPUs as i386 and x86_64 - readily maps to the CPU's concept of I/O - space. However, it may also be mapped as a virtual I/O space in the CPU's - memory map, particularly on those CPUs that don't support alternate I/O - spaces. + 1. All readX() and writeX() accesses to the same peripheral are ordered + with respect to each other. For example, this ensures that MMIO register + writes by the CPU to a particular device will arrive in program order. - Accesses to this space may be fully synchronous (as on i386), but - intermediary bridges (such as the PCI host bridge) may not fully honour - that. + 2. A writeX() by the CPU to the peripheral will first wait for the + completion of all prior CPU writes to memory. For example, this ensures + that writes by the CPU to an outbound DMA buffer allocated by + dma_alloc_coherent() will be visible to a DMA engine when the CPU writes + to its MMIO control register to trigger the transfer. - They are guaranteed to be fully ordered with respect to each other. + 3. A readX() by the CPU from the peripheral will complete before any + subsequent CPU reads from memory can begin. For example, this ensures + that reads by the CPU from an incoming DMA buffer allocated by + dma_alloc_coherent() will not see stale data after reading from the DMA + engine's MMIO status register to establish that the DMA transfer has + completed. - They are not guaranteed to be fully ordered with respect to other types of - memory and I/O operation. + 4. A readX() by the CPU from the peripheral will complete before any + subsequent delay() loop can begin execution. For example, this ensures + that two MMIO register writes by the CPU to a peripheral will arrive at + least 1us apart if the first write is immediately read back with readX() + and udelay(1) is called prior to the second writeX(). - (*) readX(), writeX(): + __iomem pointers obtained with non-default attributes (e.g. those returned + by ioremap_wc()) are unlikely to provide many of these guarantees. - Whether these are guaranteed to be fully ordered and uncombined with - respect to each other on the issuing CPU depends on the characteristics - defined for the memory window through which they're accessing. On later - i386 architecture machines, for example, this is controlled by way of the - MTRR registers. + (*) readX_relaxed(), writeX_relaxed(): - Ordinarily, these will be guaranteed to be fully ordered and uncombined, - provided they're not accessing a prefetchable device. + These are similar to readX() and writeX(), but provide weaker memory + ordering guarantees. Specifically, they do not guarantee ordering with + respect to normal memory accesses or delay() loops (i.e bullets 2-4 above) + but they are still guaranteed to be ordered with respect to other accesses + to the same peripheral when operating on __iomem pointers mapped with the + default I/O attributes. - However, intermediary hardware (such as a PCI bridge) may indulge in - deferral if it so wishes; to flush a store, a load from the same location - is preferred[*], but a load from the same device or from configuration - space should suffice for PCI. + (*) readsX(), writesX(): - [*] NOTE! attempting to load from the same location as was written to may - cause a malfunction - consider the 16550 Rx/Tx serial registers for - example. + The readsX() and writesX() MMIO accessors are designed for accessing + register-based, memory-mapped FIFOs residing on peripherals that are not + capable of performing DMA. Consequently, they provide only the ordering + guarantees of readX_relaxed() and writeX_relaxed(), as documented above. - Used with prefetchable I/O memory, an mmiowb() barrier may be required to - force stores to be ordered. + (*) inX(), outX(): - Please refer to the PCI specification for more information on interactions - between PCI transactions. + The inX() and outX() accessors are intended to access legacy port-mapped + I/O peripherals, which may require special instructions on some + architectures (notably x86). The port number of the peripheral being + accessed is passed as an argument. - (*) readX_relaxed(), writeX_relaxed() + Since many CPU architectures ultimately access these peripherals via an + internal virtual memory mapping, the portable ordering guarantees provided + by inX() and outX() are the same as those provided by readX() and writeX() + respectively when accessing a mapping with the default I/O attributes. - These are similar to readX() and writeX(), but provide weaker memory - ordering guarantees. Specifically, they do not guarantee ordering with - respect to normal memory accesses (e.g. DMA buffers) nor do they guarantee - ordering with respect to LOCK or UNLOCK operations. If the latter is - required, an mmiowb() barrier can be used. Note that relaxed accesses to - the same peripheral are guaranteed to be ordered with respect to each - other. + Device drivers may expect outX() to emit a non-posted write transaction + that waits for a completion response from the I/O peripheral before + returning. This is not guaranteed by all architectures and is therefore + not part of the portable ordering semantics. + + (*) insX(), outsX(): + + As above, the insX() and outX() accessors provide the same ordering + guarantees as readsX() and writesX() respectively when accessing a mapping + with the default I/O attributes. (*) ioreadX(), iowriteX() These will perform appropriately for the type of access they're actually doing, be it inX()/outX() or readX()/writeX(). +All of these accessors assume that the underlying peripheral is little-endian, +and will therefore perform byte-swapping operations on big-endian architectures. + +Composing I/O ordering barriers with SMP ordering barriers and LOCK/UNLOCK +operations is a dangerous sport which may require the use of mmiowb(). See the +subsection "Acquires vs I/O accesses" for more information. ======================================== ASSUMED MINIMUM EXECUTION ORDERING MODEL -- 2.17.1