Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp374566pxu; Tue, 1 Dec 2020 13:30:53 -0800 (PST) X-Google-Smtp-Source: ABdhPJwAwXByZNIuPl9oPM8LVT3GI5RoinNUBlpXL1C6iWQDHo+xxuncy0i6V5+8zFMjUY5xkcqM X-Received: by 2002:a17:906:7a18:: with SMTP id d24mr5019625ejo.324.1606858252768; Tue, 01 Dec 2020 13:30:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1606858252; cv=none; d=google.com; s=arc-20160816; b=DY3OEE7csuMGwZwvSq7CV3m48Rlnmo0QRwCBHQY/jf2vDepjeS0dEtMkqixFihp2ek fbeJYlxO/BEpvRMV88El5g7A70p5P1RRy+5xcPgpVBZU9BQXzI4NS7nHEzLrJcTn/dvX wG7C0J48EgtraNS5xZv0OasTnDQ4SZrC0TIm4bqGzX/KyvxERLz/1OyNeOtBjdjWfSCN thHof7Phf9px4WCN0dQ+CuArdmfFFJYK9HFC3iYa41P2PbdMRwY30xs3YzAUlwMjTVQz 8LGu7gZ1Y0gljdfFLZHndiQDkSaRuUrv/aGx1TNTUTLt7g7iDntYbQU2QcvhlUoPPyKO 2tOg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=T3JU8T2fV3+j5Rw1IQrz/xxgi3+ejKDeu6vIPgar7bc=; b=mxoCOQDmsEWrCq5JOSObLs5wE/t2eiRGZ2K3qse7JECU9yLAMH00SPwt1mGu6aM+El gi0Z0lIvf0CSvW6Y5pnoRrvo/jewzQyxRnTVIJjv2lkBTgqTgrOX0A7wHEqAVWmi6yct fuV9NZHY0HH8Qyqo7fcehoPAMZHI9u3oyryv7fUlNB+3E6kW+4OOugT0rctBoty+m7fr R/wgNGR1LOp8XIT1fttnEov1/cAVFyUCiHrjhM41bpEETkmk7RXYtiJLxTZZxXli4CxF HMlD0m7uvqST6Wd8h5HouHEH+TU44iPEzGHPdLQGVhhLJPZ4yXl5wAksh6HSbNOludVk C7DQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=nWeVMLcS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a21si667855eda.110.2020.12.01.13.30.29; Tue, 01 Dec 2020 13:30:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=nWeVMLcS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728039AbgLAV2r (ORCPT + 99 others); Tue, 1 Dec 2020 16:28:47 -0500 Received: from mail.kernel.org ([198.145.29.99]:48720 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725912AbgLAV2q (ORCPT ); Tue, 1 Dec 2020 16:28:46 -0500 Received: from willie-the-truck (236.31.169.217.in-addr.arpa [217.169.31.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 903E420870; Tue, 1 Dec 2020 21:28:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1606858085; bh=jydMEXFzO2t7+Ckqr+Txcfmmvphc90YJ7puBAL7qT2g=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=nWeVMLcStfE7SiK55mRRUlPOIopGrLWwUwpF6yOo/TkHyPd6hOZ0mDwpHL1HFm459 vrhD8GWAQ4U2gaXC1J3++fW3Bmas8OIxiXPakhwyW4QZZjHWE7gZQQxzVaGrl3oAxm zS0sn0zf66wWg3S6hprcUqxHbdRRYpRJAPfSw2fc= Date: Tue, 1 Dec 2020 21:27:58 +0000 From: Will Deacon To: Andy Lutomirski Cc: Catalin Marinas , Heiko Carstens , Vasily Gorbik , Christian Borntraeger , Dave Hansen , Nicholas Piggin , LKML , X86 ML , Mathieu Desnoyers , Arnd Bergmann , Peter Zijlstra , linux-arch , linuxppc-dev , Linux-MM , Anton Blanchard Subject: Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option Message-ID: <20201201212758.GA28300@willie-the-truck> References: <20201128160141.1003903-1-npiggin@gmail.com> <20201128160141.1003903-7-npiggin@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 30, 2020 at 10:31:51AM -0800, Andy Lutomirski wrote: > other arch folk: there's some background here: > > https://lkml.kernel.org/r/CALCETrVXUbe8LfNn-Qs+DzrOQaiw+sFUg1J047yByV31SaTOZw@mail.gmail.com > > On Sun, Nov 29, 2020 at 12:16 PM Andy Lutomirski wrote: > > > > On Sat, Nov 28, 2020 at 7:54 PM Andy Lutomirski wrote: > > > > > > On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin wrote: > > > > > > > > On big systems, the mm refcount can become highly contented when doing > > > > a lot of context switching with threaded applications (particularly > > > > switching between the idle thread and an application thread). > > > > > > > > Abandoning lazy tlb slows switching down quite a bit in the important > > > > user->idle->user cases, so so instead implement a non-refcounted scheme > > > > that causes __mmdrop() to IPI all CPUs in the mm_cpumask and shoot down > > > > any remaining lazy ones. > > > > > > > > Shootdown IPIs are some concern, but they have not been observed to be > > > > a big problem with this scheme (the powerpc implementation generated > > > > 314 additional interrupts on a 144 CPU system during a kernel compile). > > > > There are a number of strategies that could be employed to reduce IPIs > > > > if they turn out to be a problem for some workload. > > > > > > I'm still wondering whether we can do even better. > > > > > > > Hold on a sec.. __mmput() unmaps VMAs, frees pagetables, and flushes > > the TLB. On x86, this will shoot down all lazies as long as even a > > single pagetable was freed. (Or at least it will if we don't have a > > serious bug, but the code seems okay. We'll hit pmd_free_tlb, which > > sets tlb->freed_tables, which will trigger the IPI.) So, on > > architectures like x86, the shootdown approach should be free. The > > only way it ought to have any excess IPIs is if we have CPUs in > > mm_cpumask() that don't need IPI to free pagetables, which could > > happen on paravirt. > > Indeed, on x86, we do this: > > [ 11.558844] flush_tlb_mm_range.cold+0x18/0x1d > [ 11.559905] tlb_finish_mmu+0x10e/0x1a0 > [ 11.561068] exit_mmap+0xc8/0x1a0 > [ 11.561932] mmput+0x29/0xd0 > [ 11.562688] do_exit+0x316/0xa90 > [ 11.563588] do_group_exit+0x34/0xb0 > [ 11.564476] __x64_sys_exit_group+0xf/0x10 > [ 11.565512] do_syscall_64+0x34/0x50 > > and we have info->freed_tables set. > > What are the architectures that have large systems like? > > x86: we already zap lazies, so it should cost basically nothing to do > a little loop at the end of __mmput() to make sure that no lazies are > left. If we care about paravirt performance, we could implement one > of the optimizations I mentioned above to fix up the refcounts instead > of sending an IPI to any remaining lazies. > > arm64: AFAICT arm64's flush uses magic arm64 hardware support for > remote flushes, so any lazy mm references will still exist after > exit_mmap(). (arm64 uses lazy TLB, right?) So this is kind of like > the x86 paravirt case. Are there large enough arm64 systems that any > of this matters? Yes, there are large arm64 systems where performance of TLB invalidation matters, but they're either niche (supercomputers) or not readily available (NUMA boxes). But anyway, we blow away the TLB for everybody in tlb_finish_mmu() after freeing the page-tables. We have an optimisation to avoid flushing if we're just unmapping leaf entries when the mm is going away, but we don't have a choice once we get to actually reclaiming the page-tables. One thing I probably should mention, though, is that we don't maintain mm_cpumask() because we're not able to benefit from it and the atomic update is a waste of time. Will