Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp6013507rwl; Tue, 4 Apr 2023 06:49:53 -0700 (PDT) X-Google-Smtp-Source: AKy350b5+WQSyEKc8LnHzzV89sf0qsgFzeKeimaqeDxq45hJathqCWPOvCb9Pq9GRnS9Rmzmf1NE X-Received: by 2002:aa7:d415:0:b0:501:cf67:97f3 with SMTP id z21-20020aa7d415000000b00501cf6797f3mr2536519edq.25.1680616193261; Tue, 04 Apr 2023 06:49:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680616193; cv=none; d=google.com; s=arc-20160816; b=dVF6ri4q3alm9xcDKryJQ9/Xt0+J3wVn+Tw7PPLJ15RgIjbB8tuwmya+69/nl5mgOp VRF0c4Wt8w+9dFXP5coNsx4mvgsdzF1N+j/rPi6tu+8obqqQdBwYV9JNGWkCAlsvybPr HMZ/Q7aqJp1/tz6za6oE1McbTiP6yqNJaDWo5XJiVpOSWaU7UKuMW+g5L3Zm1orKsAj3 qfQjKBB8CS5S4FLElwALWdVf9UG2AuszLZYTwX90BHa2nWcbRQ/mawnwfbeIM1QbqaSE G4Orc3ssPq60/kxEYu7hLibVYWpF8t4ROPbywV0Zrrl+stn0h1fLvwX36Q8oDlRF0vLe Zd9w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=/5AmH2Sv0AFTXQ2yib2CUEzf+dnZkAWR6OfJs2RIyZU=; b=q+YZB+30zDlvEnxQ81Bi2+WtQ5WoZK+uqcffZMF4O1NfgBtErrfjGrm0aCbJ/rc2qH hoF+jl/Fz8cumnm35ypCvmOoIN03xChJiANYfumfdZcYmmpAmFjO8PYF4JJlQb4b+pGk BKcbZwSdnfabnN3mVe/XoqIO8Hsgyz9PXWMj4nYZS1K2hEbJkBz+ad8GIyUD+qm7pS6t p6EmGbGRUGbH0xgZi4KmyDNPx0f5S1VfhKUd49S7p3xQ04r/U1mekyPMBD0DV7AnZsKI YknNr/AWyPZLl4NWjq7tgKUEvoFXuvPDrxP0SX0z7q34pV1w6OYK4U+ViriNr3rFoYkm hAeA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Mb9bMj2z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k25-20020aa7c399000000b004fb7ccf3b48si1109959edq.535.2023.04.04.06.49.28; Tue, 04 Apr 2023 06:49:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Mb9bMj2z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235282AbjDDNoa (ORCPT + 99 others); Tue, 4 Apr 2023 09:44:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34014 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235321AbjDDNoL (ORCPT ); Tue, 4 Apr 2023 09:44:11 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 82CDC109 for ; Tue, 4 Apr 2023 06:43:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680615799; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/5AmH2Sv0AFTXQ2yib2CUEzf+dnZkAWR6OfJs2RIyZU=; b=Mb9bMj2zzgtE7XDCfzXqBIiDVZgxtyYfpwxF/R+GSRJqFYJC8ELGzEJChBt2nxYT3s0K+C iYlRSHpiriVkbGg7iJ18E6565quh3dPkZn6Un7rW+2kDXzB1YOXayO7dLu5Ms+nHNuLgEV 0u7qmuTZfewF1AvzovpCUz9lhysZICM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-175-4B_pK5R4OeycA85aaITT_g-1; Tue, 04 Apr 2023 09:43:15 -0400 X-MC-Unique: 4B_pK5R4OeycA85aaITT_g-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5B2548030CD; Tue, 4 Apr 2023 13:43:13 +0000 (UTC) Received: from ypodemsk.tlv.csb (unknown [10.39.194.160]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 8A2D02166B26; Tue, 4 Apr 2023 13:43:05 +0000 (UTC) From: Yair Podemsky To: linux@armlinux.org.uk, mpe@ellerman.id.au, npiggin@gmail.com, christophe.leroy@csgroup.eu, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, davem@davemloft.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, will@kernel.org, aneesh.kumar@linux.ibm.com, akpm@linux-foundation.org, peterz@infradead.org, arnd@arndb.de, keescook@chromium.org, paulmck@kernel.org, jpoimboe@kernel.org, samitolvanen@google.com, frederic@kernel.org, ardb@kernel.org, juerg.haefliger@canonical.com, rmk+kernel@armlinux.org.uk, geert+renesas@glider.be, tony@atomide.com, linus.walleij@linaro.org, sebastian.reichel@collabora.com, nick.hawkins@hpe.com, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, sparclinux@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, mtosatti@redhat.com, vschneid@redhat.com, dhildenb@redhat.com Cc: ypodemsk@redhat.com, alougovs@redhat.com Subject: [PATCH 3/3] mm/mmu_gather: send tlb_remove_table_smp_sync IPI only to CPUs in kernel mode Date: Tue, 4 Apr 2023 16:42:24 +0300 Message-Id: <20230404134224.137038-4-ypodemsk@redhat.com> In-Reply-To: <20230404134224.137038-1-ypodemsk@redhat.com> References: <20230404134224.137038-1-ypodemsk@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_FILL_THIS_FORM_SHORT autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The tlb_remove_table_smp_sync IPI is used to ensure the outdated tlb page is not currently being accessed and can be cleared. This occurs once all CPUs have left the lockless gup code section. If they reenter the page table walk, the pointers will be to the new pages. Therefore the IPI is only needed for CPUs in kernel mode. By preventing the IPI from being sent to CPUs not in kernel mode, Latencies are reduced. Race conditions considerations: The context state check is vulnerable to race conditions between the moment the context state is read to when the IPI is sent (or not). Here are these scenarios. case 1: CPU-A CPU-B state == CONTEXT_KERNEL int state = atomic_read(&ct->state); Kernel-exit: state == CONTEXT_USER if (state & CT_STATE_MASK == CONTEXT_KERNEL) In this case, the IPI will be sent to CPU-B despite it is no longer in the kernel. The consequence of which would be an unnecessary IPI being handled by CPU-B, causing a reduction in latency. This would have been the case every time without this patch. case 2: CPU-A CPU-B modify pagetables tlb_flush (memory barrier) state == CONTEXT_USER int state = atomic_read(&ct->state); Kernel-enter: state == CONTEXT_KERNEL READ(pagetable values) if (state & CT_STATE_MASK == CONTEXT_USER) In this case, the IPI will not be sent to CPU-B despite it returning to the kernel and even reading the pagetable. However since this CPU-B has entered the pagetable after the modification it is reading the new, safe values. The only case when this IPI is truly necessary is when CPU-B has entered the lockless gup code section before the pagetable modifications and has yet to exit them, in which case it is still in the kernel. Signed-off-by: Yair Podemsky --- mm/mmu_gather.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c index 5ea9be6fb87c..731d955e152d 100644 --- a/mm/mmu_gather.c +++ b/mm/mmu_gather.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include @@ -191,6 +192,20 @@ static void tlb_remove_table_smp_sync(void *arg) /* Simply deliver the interrupt */ } + +#ifdef CONFIG_CONTEXT_TRACKING +static bool cpu_in_kernel(int cpu, void *info) +{ + struct context_tracking *ct = per_cpu_ptr(&context_tracking, cpu); + int state = atomic_read(&ct->state); + /* will return true only for cpus in kernel space */ + return state & CT_STATE_MASK == CONTEXT_KERNEL; +} +#define CONTEXT_PREDICATE cpu_in_kernel +#else +#define CONTEXT_PREDICATE NULL +#endif /* CONFIG_CONTEXT_TRACKING */ + #ifdef CONFIG_ARCH_HAS_CPUMASK_BITS #define REMOVE_TABLE_IPI_MASK mm_cpumask(mm) #else @@ -206,8 +221,8 @@ void tlb_remove_table_sync_one(struct mm_struct *mm) * It is however sufficient for software page-table walkers that rely on * IRQ disabling. */ - on_each_cpu_mask(REMOVE_TABLE_IPI_MASK, tlb_remove_table_smp_sync, - NULL, true); + on_each_cpu_cond_mask(CONTEXT_PREDICATE, tlb_remove_table_smp_sync, + NULL, true, REMOVE_TABLE_IPI_MASK); } static void tlb_remove_table_rcu(struct rcu_head *head) -- 2.31.1