Received: by 2002:ac0:a679:0:0:0:0:0 with SMTP id p54csp174585imp; Wed, 20 Feb 2019 22:22:51 -0800 (PST) X-Google-Smtp-Source: AHgI3Ianu/IJTYXXa39EpTGSK2KYj/bi72hKA2hGMiqYwU3Q+VQ5dZAp5gFyHovSRaH7MYOOdTaZ X-Received: by 2002:a63:f30d:: with SMTP id l13mr33159786pgh.399.1550730171126; Wed, 20 Feb 2019 22:22:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550730171; cv=none; d=google.com; s=arc-20160816; b=EcEVN4lf2VFttBkhW6bmd8L4fH09k0pvmZRHijXlWD207oQUWIsC6k0JDBokUnqqN+ xpeLaHZ4QxJ8JTd6t+e5WzKxjwhAqpptSksbpdCtY5/HD7wasvWDIkhkGs9iPCxxEdC+ ty3l91bpJaBy2C/BmR2FpFy8YU7lETgK/p3ueexAxvSMn2sgq+i7g5m29WrWyZ3YSrcq 61QOPE+7fQISaML08OOsWiQQj9o50+7ebl1hvGLX3+0FmC7jLtTb+/oVJOq+GBVgqHBO EAXp0etVtISdn6YSricqUhhH6S+ky23FrRpLONynxIp7wwcil+w4NI+50oFX39E0xhSc sdag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from; bh=ouWyF12t55AaEcwfnr4CfW5P5g4pmr6zh+LrjRp+1oA=; b=n3u6Ww129vjlLy0DeUELD4nPTM7PzMFDEBPH4B6dHYvXmC+utpBJDYgAah3n5Z281l COobU7Rt53ALWa0ShhO8E895oW6VWyH6CDjYam8EzMehNCTXMosW5soPyR8zIguP1v50 qDdc9/h8u4EwanO0ioO/k2cfOTC8f0kGyCA9/74nE9WlTCrdpBdSEDVVK19rkDXq8/Hx 2XC/XzhCrFjiyToz9dmOG+5t7Lr17N1Bzd2pZlEVYUFsxtuZbplrE7204zihJu+fosAL RhXHpO9dP7pbtxlseFZSsGgc+JE8diJf8zW0cn85vzC/o9+m7uMupEJtvS03yy9X0b4f y9nA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y6si11987712plk.126.2019.02.20.22.22.32; Wed, 20 Feb 2019 22:22:51 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726450AbfBUGWL (ORCPT + 99 others); Thu, 21 Feb 2019 01:22:11 -0500 Received: from ozlabs.org ([203.11.71.1]:54469 "EHLO ozlabs.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725823AbfBUGWK (ORCPT ); Thu, 21 Feb 2019 01:22:10 -0500 Received: from authenticated.ozlabs.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPSA id 444ks844Xyz9s6w; Thu, 21 Feb 2019 17:22:04 +1100 (AEDT) From: Michael Ellerman To: Will Deacon , Linus Torvalds Cc: linux-arch , Linux List Kernel Mailing , "Paul E. McKenney" , Benjamin Herrenschmidt , Arnd Bergmann , Peter Zijlstra , Andrea Parri , Daniel Lustig , David Howells , Alan Stern , Tony Luck , paulus@samba.org Subject: Re: [RFC PATCH] docs/memory-barriers.txt: Rewrite "KERNEL I/O BARRIER EFFECTS" section In-Reply-To: <20190219161334.GA28803@fuggles.cambridge.arm.com> References: <20190211172948.3322-1-will.deacon@arm.com> <20190213172047.GH6346@brain-police> <20190218165007.GC16713@fuggles.cambridge.arm.com> <20190219161334.GA28803@fuggles.cambridge.arm.com> Date: Thu, 21 Feb 2019 17:22:03 +1100 Message-ID: <87k1htwtis.fsf@concordia.ellerman.id.au> MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Will Deacon writes: > [+more ppc folks] > > On Mon, Feb 18, 2019 at 04:50:12PM +0000, Will Deacon wrote: >> On Wed, Feb 13, 2019 at 10:27:09AM -0800, Linus Torvalds wrote: >> > Note that even if mmiowb() is expensive (and I don't think that's >> > actually even the case on ia64), you can - and probably should - do >> > what PowerPC does. >> > >> > Doing an IO barrier on PowerPC is insanely expensive, but they solve >> > that simply track the whole "have I done any IO" manually. It's not >> > even that expensive, it just uses a percpu flag. >> > >> > (Admittedly, PowerPC makes it less obvious that it's a percpu variable >> > because it's actually in the special "paca" region that is like a >> > hyper-local percpu area). > > [...] > >> > But we *could* first just do the mmiowb() unconditionally in the ia64 >> > unlocking code, and then see if anybody notices? >> >> I'll hack this up as a starting point. We can always try to be clever later >> on if it's deemed necessary. > > Ok, so I started hacking this up in core code with the percpu flag (since > riscv apparently needs it), but I've now realised that I don't understand > how the PowerPC trick works after all. Consider the following: > > spin_lock(&foo); // io_sync = 0 > outb(42, port); // io_sync = 1 > spin_lock(&bar); // io_sync = 0 > ... > spin_unlock(&bar); > spin_unlock(&foo); > > The inner lock could even happen in an irq afaict, but we'll end up skipping > the mmiowb()/sync because the io_sync flag is unconditionally cleared by > spin_lock(). Fixing this is complicated by the fact that I/O writes can be > performed in preemptible context with no locks held, so we can end up > spuriously setting the io_sync flag for arbitrary CPUs, hence the desire > to clear it in spin_lock(). > > If the paca entry was more than a byte, we could probably track that a > spinlock is held and then avoid clearing the flag prematurely, but I have > a feeling that I'm missing something. Anybody know how this is supposed to > work? I don't think you're missing anything :/ Having two flags like you suggest could work. Or you could just make the flag into a nesting counter. Or do you just remove the clearing from spin_lock()? That gets you: spin_lock(&foo); outb(42, port); // io_sync = 1 spin_lock(&bar); ... spin_unlock(&bar); // mb(); io_sync = 0 spin_unlock(&foo); And I/O outside of the lock case: outb(42, port); // io_sync = 1 spin_lock(&bar); ... spin_unlock(&bar); // mb(); io_sync = 0 Extra barriers are not ideal, but the odd spurious mb() might be preferable to doing another compare and branch or increment in every spin_lock()? cheers