Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp701177ybt; Fri, 10 Jul 2020 10:08:06 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxrSI6R6zhux6KlPXkAI5ZZKpePUHMBJNYE0ZY9YszdoKLGNXDZYymNlPMXv1LwYJaIuEmF X-Received: by 2002:a17:906:6ad8:: with SMTP id q24mr55360307ejs.223.1594400886518; Fri, 10 Jul 2020 10:08:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1594400886; cv=none; d=google.com; s=arc-20160816; b=njGY5ZgBLhWvr3Ef4aVYZ+FL6nmc5l9f9pYcl6/YkQzt13r+h7onaQZLAjyENh1ws9 HHRwtITeiFgOoYrSrgSnpSdhKSRSsRD7EAsV843WHhHfNhb+eenADRw1HuES1w+p4LQG UVgDkuCRBY6s/scKD3jpyTkPozP/V9wNzHVBFLqPLdzGd2h8fjnp4s+u4WxJMhhxakBr oDhLoLw3Nq4hNcvhx+ajjLNLAwX4KC+cvNjUYaqOshHDw+1F7k2Lw7VEpk555n8D5aM+ 0Qe+E/+LCnIm6omO8cRtdnSVDL8ds0s9pepSyuXfKVCGpf2jqfYQuTC+XC5vXU5wTMnQ pWjA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=vt0OAqoeIMOK+iZ0+Cu+J0wZP1NlEtqEEmPrctNvOxA=; b=D+MB4qcagzaH9IZADsxkaGTc745KG0DQ0s7fAAMEysxTRsCFKV+rHGpolQFqp5mu5H 8xmJzCbsuOU2wCqvwo3Y+NISXbc7Lh9cXXAtCjlbPDuua1M3YPQ1L0qmKXW0T65C0Jp8 unX/f5ug/ai0/BZAmSZ2ElqMrBPVVoG07o+zfB35hkL07Q8Rok2ZTPyM6J49u2AsDQdH AOWxnQKjYmWL1sUR6JvzcHxOo8nJKW03cRFH70jiOlDOVMHoXtxpf4xnLPcOcqTXk6Fv sN8XNcWOttCA71RjTRKW4Ss40ZHD24C6x7y3jfliNwo72KxREDzgE8n3t+MuREUwBxzt 11EA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=1tRe4LZ6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j13si4175863eds.407.2020.07.10.10.07.43; Fri, 10 Jul 2020 10:08:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=1tRe4LZ6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728439AbgGJRFJ (ORCPT + 99 others); Fri, 10 Jul 2020 13:05:09 -0400 Received: from mail.kernel.org ([198.145.29.99]:39574 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728146AbgGJRFJ (ORCPT ); Fri, 10 Jul 2020 13:05:09 -0400 Received: from mail-wr1-f43.google.com (mail-wr1-f43.google.com [209.85.221.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id ED984207F9 for ; Fri, 10 Jul 2020 17:05:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1594400708; bh=3S7IewpNxJBD3L+ZfZ/E6af35AY9UGg9jqOelSrpKo4=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=1tRe4LZ68tVZGxEGQ7XeZ+yUcJKPuc4ZhIcHwk2QL6cG3SdjNkD2PO5p5HsegEww3 gmZfDPpe4jdgPlAuSo+u1XmlaVBSKvuVGiOOZM6SQIII2HSnN7c89x9nK4D4K4r+6q zcxAm6d3uRrL58fmcsOUp32tlZPRex6+m3Qe/Vmk= Received: by mail-wr1-f43.google.com with SMTP id z15so6685606wrl.8 for ; Fri, 10 Jul 2020 10:05:07 -0700 (PDT) X-Gm-Message-State: AOAM531kx3FU0wZfom8BD+OJuZfhO+wLtW1uCN8WrLTID8t3gEwPs7zM 966np6/FXELQxW7MvzBI2iuytQEICj2i7kYTzM8V7Q== X-Received: by 2002:adf:e482:: with SMTP id i2mr67761053wrm.75.1594400706520; Fri, 10 Jul 2020 10:05:06 -0700 (PDT) MIME-Version: 1.0 References: <20200710015646.2020871-1-npiggin@gmail.com> <20200710015646.2020871-5-npiggin@gmail.com> In-Reply-To: <20200710015646.2020871-5-npiggin@gmail.com> From: Andy Lutomirski Date: Fri, 10 Jul 2020 10:04:54 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode To: Nicholas Piggin Cc: linux-arch , X86 ML , Mathieu Desnoyers , Arnd Bergmann , Peter Zijlstra , LKML , linuxppc-dev , Linux-MM , Anton Blanchard Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 9, 2020 at 6:57 PM Nicholas Piggin wrote: > > And get rid of the generic sync_core_before_usermode facility. > > This helper is the wrong way around I think. The idea that membarrier > state requires a core sync before returning to user is the easy one > that does not need hiding behind membarrier calls. The gap in core > synchronization due to x86's sysret/sysexit and lazy tlb mode, is the > tricky detail that is better put in x86 lazy tlb code. > > Consider if an arch did not synchronize core in switch_mm either, then > membarrier_mm_sync_core_before_usermode would be in the wrong place > but arch specific mmu context functions would still be the right place. > There is also a exit_lazy_tlb case that is not covered by this call, which > could be a bugs (kthread use mm the membarrier process's mm then context > switch back to the process without switching mm or lazy mm switch). > > This makes lazy tlb code a bit more modular. The mm-switching and TLB-management has often had the regrettable property that it gets wired up in a way that seems to work at the time but doesn't have clear semantics, and I'm a bit concerned that this patch is in that category. If I'm understanding right, you're trying to enforce the property that exiting lazy TLB mode will promise to sync the core eventually. But this has all kinds of odd properties: - Why is exit_lazy_tlb() getting called at all in the relevant cases? When is it permissible to call it? I look at your new code and see: > +/* > + * Ensure that a core serializing instruction is issued before returning > + * to user-mode, if a SYNC_CORE was requested. x86 implements return to > + * user-space through sysexit, sysrel, and sysretq, which are not core > + * serializing. > + * > + * See the membarrier comment in finish_task_switch as to why this is done > + * in exit_lazy_tlb. > + */ > +#define exit_lazy_tlb exit_lazy_tlb > +static inline void exit_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk) > +{ > + /* Switching mm is serializing with write_cr3 */ > + if (tsk->mm != mm) > + return; And my brain says WTF? Surely you meant something like if (WARN_ON_ONCE(tsk->mm != mm)) { /* egads, what even happened? how do we try to recover well enough to get a crashed logged at least? */ } So this needs actual documentation, preferably in comments near the function, of what the preconditions are and what this mm parameter is. Once that's done, then we could consider whether it's appropriate to have this function promise to sync the core under some conditions. - This whole structure seems to rely on the idea that switching mm syncs something. I periodically ask chip vendor for non-serializing mm switches. Specifically, in my dream world, we have totally separate user and kernel page tables. Changing out the user tables doesn't serialize or even create a fence. Instead it creates the minimum required pipeline hazard such that user memory access and switches to usermode will make sure they access memory through the correct page tables. I haven't convinced a chip vendor yet, but there are quite a few hundreds of cycles to be saved here. With this in mind, I see the fencing aspects of the TLB handling code as somewhat of an accident. I'm fine with documenting them and using them to optimize other paths, but I think it should be explicit. For example: /* Also does a full barrier? (Or a sync_core()-style barrier.) However, if you rely on this, you must document it in a comment where you call this function. *? void switch_mm_irqs_off() { } This is kind of like how we strongly encourage anyone using smp_?mb() to document what they are fencing against. Also, as it stands, I can easily see in_irq() ceasing to promise to serialize. There are older kernels for which it does not promise to serialize. And I have plans to make it stop serializing in the nearish future. --Andy