Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp154782pxu; Thu, 3 Dec 2020 23:09:25 -0800 (PST) X-Google-Smtp-Source: ABdhPJxi1B7ho5Zi8/6HL7bOFqFApXJ4uqrXhB9cUya/G8wRwnS4XqJX+fMt+h16ZC43MrYTt1AL X-Received: by 2002:a17:907:20dc:: with SMTP id qq28mr5724419ejb.403.1607065764938; Thu, 03 Dec 2020 23:09:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1607065764; cv=none; d=google.com; s=arc-20160816; b=zah+9GQq8IQOqppj+IH4nLSaMBhSQ2QdkHr6uVa0EUJ5aAMyy/7V+THNRNzaJfjVN0 X801Tly1vzP1Ku8DjsnxFhdEWfIV+zgYO3f4myoBTQNdTHvv001WveBV90VlsTddGtY1 fg2WqdOP00IpO4MrJHs8M+xTBQNJbDBPBPaVpQ+E5ZSbWF2H5//jtr3W8ic2EdL4FfRI 3aqerwn2rFiT3QhnqUHrlmDJ2z+AwvSOiN4vw1Y/gDM1AlOqvf5c0uu2Rl8jEHMLPhsP ppCpT2Er1pyrEVcDgINWuoTC370uepa/GItQfDhlGdEy9mx/ImgZN0Hnln8jJaHjzfZP nOlQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:message-id :mime-version:in-reply-to:references:cc:to:subject:from:date :dkim-signature; bh=dwVSxo/1M8Xnxof7fMzvWiOBGddp/mlVm0/8qxQR6mk=; b=T9H7ZivzW2eRJGDWx6FoX5iGYrJTrrMhWcFK9Rt5UuGKXZZ86e2RNh6J92W0ge/Dah uqs8YEpOkGpmPpKMHBBYPzgBPz9ZhZrS5ROAjoH38ibF+AOJPlkMfntrDsjBynKDCqVf ULB7aWtwHV8A/4LqMg0ntLvRlhAD/+lovWLkB+h89u9JWGHi0MTt52lFqiZLm3HC98h8 0qQDzQPgEob1Zcm9y/CI4gJn13soIjXnd8vc36+hXZ7YaYwg1QqXeWz1eTZt+qFfQI8y DgRlD+z9uk5aj1SCaPvUI7sGARTLz0U16b2FHp9WBmrBR0lFfXyO365pMdscC+VbuQ7u YFzQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=AhugiSj4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l13si2442758edn.43.2020.12.03.23.09.02; Thu, 03 Dec 2020 23:09:24 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=AhugiSj4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728206AbgLDHHT (ORCPT + 99 others); Fri, 4 Dec 2020 02:07:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53680 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725550AbgLDHHT (ORCPT ); Fri, 4 Dec 2020 02:07:19 -0500 Received: from mail-pf1-x442.google.com (mail-pf1-x442.google.com [IPv6:2607:f8b0:4864:20::442]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A8D4C061A53; Thu, 3 Dec 2020 23:06:39 -0800 (PST) Received: by mail-pf1-x442.google.com with SMTP id t8so3065180pfg.8; Thu, 03 Dec 2020 23:06:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:subject:to:cc:references:in-reply-to:mime-version :message-id:content-transfer-encoding; bh=dwVSxo/1M8Xnxof7fMzvWiOBGddp/mlVm0/8qxQR6mk=; b=AhugiSj4YJkXajtvEzfvX3moDLw+5SLDF+ycyp2F6eeUGPNWtZ0JsFXZd/X+JHC/Gj ZucGXjwL4NWbZq8nulSLC3Gc12igAVspRrM1Je4YFt1KwMiaVyzUN+fQ+oTUM9heWFxs 5uKrnkb2kUa/5Zc2MvQI9wapHq8dS+XClP4zsUK2cwjkdJfGLrbQkOkXLAFTbdoEC2Cc lR3G+gHvamcmMgPMPRd/OsOhv3X+IfOnxUbqFB0emsSjKA8HQx9iEudOkEMMWNL39sHb EjuGWwpFLjImyDhBtxSCB6qycdrit+l3HsPQ0Wd+hoK1D1bXv6kFhiMUdZXXhyYCBypH LgZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:subject:to:cc:references:in-reply-to :mime-version:message-id:content-transfer-encoding; bh=dwVSxo/1M8Xnxof7fMzvWiOBGddp/mlVm0/8qxQR6mk=; b=MhAA9oCGnob3bjL8POZSIRL5RupbOfZ5JfqjHmj3AT/A2n+BJv+a3CrJ43HbI32vAJ FLTeNM9vVjhh5SaHVGkppYcG5bfeB/J2pWIpw8QaZmUrT9lHmA+fUxrFh0YsuiXfUXlP /xkfGiOSGgiI39skc7j8VSYzuiB3jXWseBeuPGCMOY7FZBsmKPm8SPEKGhela0dBwgHL 3/EccBQ9XjUinh0jSQSbfgRD4f+DSf8M4VASBus0NUT78mOzkkfNmUxu61PQdFlZe6ex GDBhZ/MNj6rPoedp5NODtmWZEvwt6m//Ux7nWZACAc9j7hJb/30HfJ/+bDxMhiAddMfE tJdA== X-Gm-Message-State: AOAM533Km2OOYWZjsrJGvv+X3xrm3Q1mEXLiNVd+vtGa55qfbtSYVYPv h5paTYaqAuH/0BkwTEhuPU0= X-Received: by 2002:a62:ea09:0:b029:198:3d34:989 with SMTP id t9-20020a62ea090000b02901983d340989mr2681973pfh.42.1607065598913; Thu, 03 Dec 2020 23:06:38 -0800 (PST) Received: from localhost ([1.129.136.33]) by smtp.gmail.com with ESMTPSA id b13sm3676126pfo.15.2020.12.03.23.06.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Dec 2020 23:06:38 -0800 (PST) Date: Fri, 04 Dec 2020 17:06:31 +1000 From: Nicholas Piggin Subject: Re: [RFC v2 1/2] [NEEDS HELP] x86/mm: Handle unlazying membarrier core sync in the arch code To: Andy Lutomirski Cc: Anton Blanchard , Arnd Bergmann , Catalin Marinas , Dave Hansen , Jann Horn , linux-arch , LKML , Linux-MM , linuxppc-dev , Mathieu Desnoyers , Nadav Amit , Rik van Riel , Will Deacon , X86 ML References: <203d39d11562575fd8bd6a094d97a3a332d8b265.1607059162.git.luto@kernel.org> In-Reply-To: <203d39d11562575fd8bd6a094d97a3a332d8b265.1607059162.git.luto@kernel.org> MIME-Version: 1.0 Message-Id: <1607064851.hub15e677x.astroid@bobo.none> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Excerpts from Andy Lutomirski's message of December 4, 2020 3:26 pm: > The core scheduler isn't a great place for > membarrier_mm_sync_core_before_usermode() -- the core scheduler doesn't > actually know whether we are lazy. With the old code, if a CPU is > running a membarrier-registered task, goes idle, gets unlazied via a TLB > shootdown IPI, and switches back to the membarrier-registered task, it > will do an unnecessary core sync. >=20 > Conveniently, x86 is the only architecture that does anything in this > hook, so we can just move the code. This should go on top of my series that adds the exit_lazy_mm call and switches x86 over, at least. > XXX: there are some comments in swich_mm_irqs_off() that seem to be > trying to document what barriers are expected, and it's not clear to me > that these barriers are actually present in all paths through the > code. So I think this change makes the code more comprehensible and > has no effect on the code's correctness, but I'm not at all convinced > that the code is correct. >=20 > Signed-off-by: Andy Lutomirski > --- > arch/x86/mm/tlb.c | 17 ++++++++++++++++- > kernel/sched/core.c | 14 +++++++------- > 2 files changed, 23 insertions(+), 8 deletions(-) >=20 > diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c > index 3338a1feccf9..23df035b80e8 100644 > --- a/arch/x86/mm/tlb.c > +++ b/arch/x86/mm/tlb.c > @@ -8,6 +8,7 @@ > #include > #include > #include > +#include > =20 > #include > #include > @@ -496,6 +497,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struc= t mm_struct *next, > * from one thread in a process to another thread in the same > * process. No TLB flush required. > */ > + > + // XXX: why is this okay wrt membarrier? > if (!was_lazy) > return; > =20 > @@ -508,12 +511,24 @@ void switch_mm_irqs_off(struct mm_struct *prev, str= uct mm_struct *next, > smp_mb(); > next_tlb_gen =3D atomic64_read(&next->context.tlb_gen); > if (this_cpu_read(cpu_tlbstate.ctxs[prev_asid].tlb_gen) =3D=3D > - next_tlb_gen) > + next_tlb_gen) { > + /* > + * We're reactivating an mm, and membarrier might > + * need to serialize. Tell membarrier. > + */ > + > + // XXX: I can't understand the logic in > + // membarrier_mm_sync_core_before_usermode(). What's > + // the mm check for? Writing CR3 is serializing, apparently. Another x86ism that gets=20 commented and moved into arch/x86 with my patch. > + membarrier_mm_sync_core_before_usermode(next); > return; > + } > =20 > /* > * TLB contents went out of date while we were in lazy > * mode. Fall through to the TLB switching code below. > + * No need for an explicit membarrier invocation -- the CR3 > + * write will serialize. > */ > new_asid =3D prev_asid; > need_flush =3D true; > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 2d95dc3f4644..6c4b76147166 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3619,22 +3619,22 @@ static struct rq *finish_task_switch(struct task_= struct *prev) > kcov_finish_switch(current); > =20 > fire_sched_in_preempt_notifiers(current); > + > /* > * When switching through a kernel thread, the loop in > * membarrier_{private,global}_expedited() may have observed that > * kernel thread and not issued an IPI. It is therefore possible to > * schedule between user->kernel->user threads without passing though > * switch_mm(). Membarrier requires a barrier after storing to > - * rq->curr, before returning to userspace, so provide them here: > + * rq->curr, before returning to userspace, and mmdrop() provides > + * this barrier. > * > - * - a full memory barrier for {PRIVATE,GLOBAL}_EXPEDITED, implicitly > - * provided by mmdrop(), > - * - a sync_core for SYNC_CORE. > + * XXX: I don't think mmdrop() actually does this. There's no > + * smp_mb__before/after_atomic() in there. mmdrop definitely does provide a full barrier. > */ > - if (mm) { > - membarrier_mm_sync_core_before_usermode(mm); > + if (mm) > mmdrop(mm); > - } > + > if (unlikely(prev_state =3D=3D TASK_DEAD)) { > if (prev->sched_class->task_dead) > prev->sched_class->task_dead(prev); > --=20 > 2.28.0 >=20 >=20