Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp131351imm; Tue, 17 Jul 2018 15:28:50 -0700 (PDT) X-Google-Smtp-Source: AAOMgpcQejppdccBHr5y62t3INLMzmd96CKyEZdz5BVlBCFX0U6sMQp7MOQjMKf3nvepqlytvsPJ X-Received: by 2002:a63:8042:: with SMTP id j63-v6mr3288506pgd.230.1531866530514; Tue, 17 Jul 2018 15:28:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531866530; cv=none; d=google.com; s=arc-20160816; b=buw92helBDKWj3hHiHKGqLdBxqGrJTK0mSS60bK/d9HfuexMzFsYr9SlZmlwt3iJNn zTZCcjJDaWro8hcr5DcioILhIlul9qjLE79rCKlFW56HuuhtdxFGr2ejbtWs8zFUDTq6 5tFZb/etdwMbLSX4+0Vqvo+gFJ8dGSlh6kyi2VtCqZygf+B00npbPilxc+dbIQt2wfvL F4aIlOAq2BzmSSwAcKypFbMG/RAWo4qjkYryJBWXrxNcQDojl5OXvmkwsrUuAHj3kGsy ZSS9UicQGE8LmojWhCfH+WcCngzWh+LPcGRDKlP/10sIagiZ9BEk1u/qmItoSk/xu3uJ IG1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature:arc-authentication-results; bh=ys6t5zYFx91zD5Dzo9fY7COJJTT3rEFOd/PeiNe8cFw=; b=m3iH5RTaeJqZ77+SzckbysBAmB/r16drIu+hzQaHMuf6DHaqKJBCR9qrhSEZxxlm/j ok9wpIzONpFJUKhXvMV1NkefWHfsMlS03uDiFkLWIBfsIWbUT8Vk1pBWMwVG9ysxXHEV s84x0jQX7cHmIpf9EfM1fEjGOBmtFhTw4IuVCx4BbJkJ/9q6+F2T4MKRsddREzxvyaKx JDJR68KKzLY9ZBaDcvcxw9w3yry6ClOmSPHfJHp+lXS6BUJuVL8vsUCe1Z+tRy+nTFcA 7/F7JIVZoZdnDgLsk1zhp8+ZqCOdgh6WURpCp4oonl8pCMyDrKT4aTkwN34bhtPbCMLV ZbGw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=YeiHnb9B; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v21-v6si1731618pgn.371.2018.07.17.15.28.33; Tue, 17 Jul 2018 15:28:50 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=YeiHnb9B; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730627AbeGQXCp (ORCPT + 99 others); Tue, 17 Jul 2018 19:02:45 -0400 Received: from mail-pl0-f65.google.com ([209.85.160.65]:39761 "EHLO mail-pl0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729973AbeGQXCp (ORCPT ); Tue, 17 Jul 2018 19:02:45 -0400 Received: by mail-pl0-f65.google.com with SMTP id p23-v6so1071670plo.6 for ; Tue, 17 Jul 2018 15:27:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=ys6t5zYFx91zD5Dzo9fY7COJJTT3rEFOd/PeiNe8cFw=; b=YeiHnb9BPK8/UI8+DOMcbV8xDektBXBAhs6BRcXuNEh8HjXuDRxRe0CywLZ32lp6Uo 0PI8WmoDZS8e58AjLpD123tT2sCas9UVV+CpJSWxRUboKGnje7rGpbxCc8ET8UdkCeKp q6h8cNUV52+XETcpZdEvVFqOKGin0GZWwo+zq44wT/KUP1m4c3CIm4cv7Fyx1lRlmn7A +JYVnqXKUY23UXg5Iy9sptNQTv+NLm+eQexfDDVJD4a9lkQkMRWmT7r3h1tA088HsAYA v/TqkkAEL/4MRzHy1+ETjp/VPy3vI11hyNpDBO7qC/XsV6entOyzKDVeTLefTLp52D4z IP/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=ys6t5zYFx91zD5Dzo9fY7COJJTT3rEFOd/PeiNe8cFw=; b=r1l36wrYIqc2Ue4TW7w+h/J/R12hNi/wMAQKyoH+Z2uxYNxJnNtR/iLpsaFXmGoMXo nT2cnodmOomxAx5HrCLCTf8k2OfUOSNZHspOJvdYbcwLgbC0Cfc9bDjPqo07AecIsJGe 1Xgy70lHTyJuITDnoeCXpVIxuyfwN304XmYEPp0PEg0NAmCmQuufJP8n3nlbdwFBOL7R fFHhfZSka593dplfra4Mo0vJdMLm4UQPeU6ZZQrhmrwdmvGUJOgKNKOGmULZDemXZIN5 9DSs9FwSwid8pxD8HykAvQ6QpPQKleQVORnuG0DRXQuv8NnZ/EFN8NekYMUCFbrDNYTp rT6A== X-Gm-Message-State: AOUpUlHiaJczqF40Z8RBXsI+uXgWW6Wg53dgEg/yM9zhwhnQ9vIRZg3A 0X4JE/+9tk3iEdnGbb2vGSxPe21YnmU= X-Received: by 2002:a17:902:a981:: with SMTP id bh1-v6mr3334956plb.2.1531866479427; Tue, 17 Jul 2018 15:27:59 -0700 (PDT) Received: from ?IPv6:2600:1013:b02f:8a6b:400a:40d5:690:b4ae? ([2600:1013:b02f:8a6b:400a:40d5:690:b4ae]) by smtp.gmail.com with ESMTPSA id 1-v6sm3298353pfm.145.2018.07.17.15.27.56 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 17 Jul 2018 15:27:58 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (1.0) Subject: Re: [PATCH 4/7] x86,tlb: make lazy TLB mode lazier From: Andy Lutomirski X-Mailer: iPhone Mail (15F79) In-Reply-To: Date: Tue, 17 Jul 2018 12:27:53 -1000 Cc: Andy Lutomirski , LKML , X86 ML , Mike Galbraith , kernel-team , Ingo Molnar , Dave Hansen Content-Transfer-Encoding: quoted-printable Message-Id: <67F32577-24D8-4E9F-ADB1-927B3AC18B5A@amacapital.net> References: <20180716190337.26133-1-riel@surriel.com> <20180716190337.26133-5-riel@surriel.com> To: Rik van Riel Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Jul 17, 2018, at 12:05 PM, Rik van Riel wrote: >=20 >=20 >=20 >> On Jul 17, 2018, at 5:29 PM, Andy Lutomirski wrote: >>=20 >> On Tue, Jul 17, 2018 at 1:16 PM, Rik van Riel wrote: >>> Can I skip both the cr4 and let switches when the TLB contents >>> are no longer valid and got reloaded? >>>=20 >>> If the TLB contents are still valid, either because we never went >>> into lazy TLB mode, or because no invalidates happened while >>> we were lazy, we immediately return. >>>=20 >>> The cr4 and ldt reloads only happen if the TLB was invalidated >>> while we were in lazy TLB mode. >>=20 >> Yes, since the only events that would change the LDT or the required >> CR4 value will unconditionally broadcast to every CPU in mm_cpumask >> regardless of whether they're lazy. The interesting case is that you >> go lazy, you miss an invalidation IPI because you were lazy, then you >> go unlazy, notice the tlb_gen change, and flush. If this happens, you >> know that you only missed a page table update and not an LDT update or >> a CR4 update, because the latter would have sent the IPI even though >> you were lazy. So you should skip the CR4 and LDT updates. >>=20 >> I suppose a different approach would be to fix the issue below and to >> try to track when the LDT actually needs reloading. But that latter >> part seems a bit complicated for minimal gain. >>=20 >> (Do you believe me? If not, please argue back!) >>=20 > I believe you :) >=20 >>>> Hmm. load_mm_cr4() should bypass itself when mm =3D=3D &init_mm. Want= to >>>> fix that part or should I? >>>=20 >>> I would be happy to send in a patch for this, and one for >>> the above optimization you pointed out. >>>=20 >>=20 >> Yes please! >>=20 > There is a third optimization left to do. Currently every time > we switch into lazy tlb mode, we take a refcount on the mm, > even when switching from one kernel thread to another, or > when repeatedly switching between the same mm and kernel > threads. >=20 > We could keep that refcount (on a per cpu basis) from the time > we first switch to that mm in lazy tlb mode, to when we switch > the CPU to a different mm. >=20 > That would allow us to not bounce the cache line with the > mm_struct reference count on every lazy TLB context switch. >=20 > Does that seem like a reasonable optimization? Are you referring to the core sched code that deals with mm_count and active= _mm? If so, last time I looked at it, I convinced myself that it was totall= y useless, at least on x86. I think the my reasoning was that, when mm_users= went to zero, we already waited for RCU before tearing down page tables. Things may have changed, but I strongly suspect that it should be possibly f= or at least x86 to opt out of mm_count and maybe even active_mm entirely. I= f nothing else, you=E2=80=99re shooting the mm out of CR3 on all CPUs whenev= er the pagetables get freed, and more or less the same logic should be suffi= cient so that, whenever mm_users hits zero, we can synchronously or via RCU c= allback kill the mm entirely. Want to take a look at that? >=20 > Am I overlooking anything? >=20 > I'll try to get all three optimizations working, and will run them > through some testing here before posting upstream. >=20