Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp231941pxu; Wed, 2 Dec 2020 21:09:07 -0800 (PST) X-Google-Smtp-Source: ABdhPJyXm1jhUITOp6SaY9USEXNN/els2RNZu1e7kTwKaNI3MiDT108Nqru3Nkr5dCcGqbz0QcHp X-Received: by 2002:a17:906:1752:: with SMTP id d18mr1014357eje.529.1606972147572; Wed, 02 Dec 2020 21:09:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1606972147; cv=none; d=google.com; s=arc-20160816; b=XAqDHHL8GDtOkHpLqp3sMSf+tRPMqt/8AqTk5KMKTkDbaEdtTC5T0qNJt06k9fvJx1 S36hsnIkrOEfliX01dvozte9X/AdIQPB9WdWWC2z6wP/hy1Tk6bepwvTGA8zEd6SQKBl 8raQs9Zou3CWumRbh6q8vRCu8aY0xuGxJg6bLvEegOqeK5yNus5j/YQQUIkdm95G4hbs iOxqYHSrDSYXFf6e5mc7/JXa3Vb1uAuH0W48tn23cV4ap+qNNjS2xnk2GjAkRw47PZo7 x4bJkIppBLcuXr/cpflssScgXpQn30kMkBWjh22ZM5nC8LtcWq6s3Re0G2JtB+HaMP8B jKIQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=XUBoIVUFuTda74sYhc9AaUfIF0VYHNgbgX6L3CSfTWQ=; b=uOsXn9mxEISca1aB9kQHdnRsEihXvQ7GdeUKZLGjXtDX45bjX63Z2e+OrVZUGzcQjq LSmar0deUzV4C/+EDeldCdCTHiwyQIvFFLfqWv8wJH+ngNLp1OKOk691BqgyQtL93+RY kFvCZ/mCau0EfVuE3INoli9XKqm4rfjfIZZGAnVTjJskOZAtfa9OeYfsTq/jMKrgotXQ fOvCTu+9SwjamcVmmrAnAq2OD6jDxAPPCAjYr3w1kroKS9wxgX597vahvhFmNmyTSkKa e9NI2xHEn4Ag1pGtHByog3tGQ4XIyzig8ylpla7qN/7YMb68ahCADFIgbu6eFSbltgXk s7Sw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=FuUcSxWr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id t17si454332eji.269.2020.12.02.21.08.43; Wed, 02 Dec 2020 21:09:07 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=FuUcSxWr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726071AbgLCFGX (ORCPT + 99 others); Thu, 3 Dec 2020 00:06:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37522 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725872AbgLCFGX (ORCPT ); Thu, 3 Dec 2020 00:06:23 -0500 Received: from mail-wm1-x341.google.com (mail-wm1-x341.google.com [IPv6:2a00:1450:4864:20::341]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EA816C061A4E for ; Wed, 2 Dec 2020 21:05:42 -0800 (PST) Received: by mail-wm1-x341.google.com with SMTP id a6so1508622wmc.2 for ; Wed, 02 Dec 2020 21:05:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=XUBoIVUFuTda74sYhc9AaUfIF0VYHNgbgX6L3CSfTWQ=; b=FuUcSxWrRc7RgsOSwe/Ar1EO5Ff3N9odSv2XY0Zqm4g9XP/wOGYy94T5KC5HEmT6Zb CllAZYKNVheNctLUSpCVA6zIh60iTARxAlH2IqE8hO8QtRArgPC4FG1cylYSpVKMkKqk l/WPKsc5i7rBRvupJpVLvzY7Lu6Dye39+pVi0u97RbqVKvYtdT/QyH/uSImekLuKVcZI H2TXJq9pUhq9WbHGZbspm0eqJESP82vmo4Z90qkMrkjglVKyU2w+Rac/C03QfedVdBX2 LYJ5cEWr8ojZ0eYqFB+t9XwtqTSSuxp1lTUuvAXfuVZpKCZbQNzq80tI6A6eI04Y+GWe mHVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=XUBoIVUFuTda74sYhc9AaUfIF0VYHNgbgX6L3CSfTWQ=; b=diEmGgdFA5P8HCm2K+689hFOqX3aURBJpy/42zSu0QPN0CYZGSSMP0hSUC6sfT8bKB 8Z2hXXwJjpEp/8emTXCMHB/h4H2U79uFxx5qOGrLAQO+sNMhC9E8x7YN3PBIXBJBdA4E hM3dRUvZ/HxBrP6qm/843szI9FXlZVkC/tlfaBhiU37O6V7srh+YcT+6xcLRvsCOkzCm HLXhwxXoKOij/p9H2Y+r3AGdjlL8enQP9PHlaOSzXETFOqxiOP+VvEIEBID4/DqANc95 SDyoqHScbCIGy/A7lVjI+fWf227fQ2aND7rLWaJP/VrrKkLvr8Ghbvp+hy4F9pDho0wH tfRw== X-Gm-Message-State: AOAM533pkS1SJLQGNpobVScQZFZ8Augw21FmnBRz37Z8Wa1bVaLOpia6 I27KYz2vUs/bGq44sstH+6CyM2IrmQt2z1zrkKTReaEN4RE= X-Received: by 2002:a1c:1d85:: with SMTP id d127mr1216271wmd.49.1606971941686; Wed, 02 Dec 2020 21:05:41 -0800 (PST) MIME-Version: 1.0 References: <20201128160141.1003903-1-npiggin@gmail.com> <20201128160141.1003903-7-npiggin@gmail.com> <1606879302.tdngvs3yq4.astroid@bobo.none> In-Reply-To: <1606879302.tdngvs3yq4.astroid@bobo.none> From: Andy Lutomirski Date: Wed, 2 Dec 2020 21:05:30 -0800 Message-ID: Subject: Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option To: Nicholas Piggin Cc: Christian Borntraeger , Catalin Marinas , Dave Hansen , Vasily Gorbik , Heiko Carstens , Andy Lutomirski , Will Deacon , Anton Blanchard , Arnd Bergmann , linux-arch , LKML , Linux-MM , linuxppc-dev , Mathieu Desnoyers , Peter Zijlstra , X86 ML Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Dec 1, 2020, at 7:47 PM, Nicholas Piggin wrote: > > =EF=BB=BFExcerpts from Andy Lutomirski's message of December 1, 2020 4:31= am: >> other arch folk: there's some background here: >> >> https://lkml.kernel.org/r/CALCETrVXUbe8LfNn-Qs+DzrOQaiw+sFUg1J047yByV31S= aTOZw@mail.gmail.com >> >>> On Sun, Nov 29, 2020 at 12:16 PM Andy Lutomirski wrot= e: >>> >>> On Sat, Nov 28, 2020 at 7:54 PM Andy Lutomirski wrote= : >>>> >>>> On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin wr= ote: >>>>> >>>>> On big systems, the mm refcount can become highly contented when doin= g >>>>> a lot of context switching with threaded applications (particularly >>>>> switching between the idle thread and an application thread). >>>>> >>>>> Abandoning lazy tlb slows switching down quite a bit in the important >>>>> user->idle->user cases, so so instead implement a non-refcounted sche= me >>>>> that causes __mmdrop() to IPI all CPUs in the mm_cpumask and shoot do= wn >>>>> any remaining lazy ones. >>>>> >>>>> Shootdown IPIs are some concern, but they have not been observed to b= e >>>>> a big problem with this scheme (the powerpc implementation generated >>>>> 314 additional interrupts on a 144 CPU system during a kernel compile= ). >>>>> There are a number of strategies that could be employed to reduce IPI= s >>>>> if they turn out to be a problem for some workload. >>>> >>>> I'm still wondering whether we can do even better. >>>> >>> >>> Hold on a sec.. __mmput() unmaps VMAs, frees pagetables, and flushes >>> the TLB. On x86, this will shoot down all lazies as long as even a >>> single pagetable was freed. (Or at least it will if we don't have a >>> serious bug, but the code seems okay. We'll hit pmd_free_tlb, which >>> sets tlb->freed_tables, which will trigger the IPI.) So, on >>> architectures like x86, the shootdown approach should be free. The >>> only way it ought to have any excess IPIs is if we have CPUs in >>> mm_cpumask() that don't need IPI to free pagetables, which could >>> happen on paravirt. >> >> Indeed, on x86, we do this: >> >> [ 11.558844] flush_tlb_mm_range.cold+0x18/0x1d >> [ 11.559905] tlb_finish_mmu+0x10e/0x1a0 >> [ 11.561068] exit_mmap+0xc8/0x1a0 >> [ 11.561932] mmput+0x29/0xd0 >> [ 11.562688] do_exit+0x316/0xa90 >> [ 11.563588] do_group_exit+0x34/0xb0 >> [ 11.564476] __x64_sys_exit_group+0xf/0x10 >> [ 11.565512] do_syscall_64+0x34/0x50 >> >> and we have info->freed_tables set. >> >> What are the architectures that have large systems like? >> >> x86: we already zap lazies, so it should cost basically nothing to do > > This is not zapping lazies, this is freeing the user page tables. > > "lazy mm" is where a switch to a kernel thread takes on the > previous mm for its kernel mapping rather than switch to init_mm. The intent of the code is to flush the TLB after freeing user pages tables, but, on bare metal, lazies get zapped as a side effect. Anyway, I'm going to send out a mockup of an alternative approach shortly.