Received: by 2002:a05:6358:795:b0:dc:4c66:fc3e with SMTP id n21csp2191097rwj; Sun, 30 Oct 2022 12:58:07 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6AvSwTiiD9xbwIcUzS5UbeSp1lejFCWUIR1jznu+JZV/WV1SkHPasLC+kTaeIJW7U63mTq X-Received: by 2002:a17:902:9f8f:b0:187:161d:4f32 with SMTP id g15-20020a1709029f8f00b00187161d4f32mr5159259plq.79.1667159887224; Sun, 30 Oct 2022 12:58:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667159887; cv=none; d=google.com; s=arc-20160816; b=I0RDrAlwzz5Q4xeVQ30i6ThD71uKTMORP3pRdSkHeoKO6541DoLqAUxVhacj0C0onQ pkEuLIYK/9uUiVPmxvGmkBrPW2t5p+fzM93dJw+q6aiuaYSBX2Qq+KFJX/OKD0N6KnPJ RWFAc5kuzXAQaKLXw9eUjUYl1OGX8k+wZ0sK+bJRzvNZVb2zzq21GTE6OGN6BC6GOuAl JCkVlsdviK4OJMcfYdQ9Vw0+RL5QA8qbrj4/IY4Lh9oFdOu7hS8xL0ZmrfIF6Uw4N92r b0y01CTabRCxnpOTIpJVIlULoL3RqjefjizIT51SNlctiryxdH/gn5uVNu+8ozIlAqgQ Uckg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature; bh=3rRRU1s0zqHNH3Op2fFOiDASnNIOpIdPdkq7obGplwM=; b=OJfkNxvo7mMFrVngs6zurEnuH7HPtrpm0hE6zaoHlK2pv09DmVMtvfMG3tFWXiEsMe tTAlJEDV7plJyQxqx2PUsTZdiwCETvfHryZh9l/AINuOtvL4q+wEJ+0dkmoXT4TKOm/f lUaG7yfpbItIbxTDy0cHoBRY9JbUT+3DxYnm5811/rOKFRi2lV5KDzmpwWQ7w/s1TcoX fFV8l/lPeYiCeFrpP2ANHRBW8SdtnKcXwT5etLv7fnkgAJuC7sJdVFnr7Tqbc8Ip+jpY ysJrlCf8q4AjuIwEqZy7o3RhlGbBtUoF/8bJMRRIjnYQWdDTz8CzW6ZmnAhkXlVwEKm+ M8xA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=aSyq77AY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y14-20020a056a00180e00b0056b820bc25esi7112655pfa.228.2022.10.30.12.57.48; Sun, 30 Oct 2022 12:58:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=aSyq77AY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229629AbiJ3Te7 (ORCPT + 99 others); Sun, 30 Oct 2022 15:34:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55432 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229602AbiJ3Te5 (ORCPT ); Sun, 30 Oct 2022 15:34:57 -0400 Received: from mail-pf1-x42f.google.com (mail-pf1-x42f.google.com [IPv6:2607:f8b0:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 44712654A for ; Sun, 30 Oct 2022 12:34:56 -0700 (PDT) Received: by mail-pf1-x42f.google.com with SMTP id i3so8973000pfc.11 for ; Sun, 30 Oct 2022 12:34:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=3rRRU1s0zqHNH3Op2fFOiDASnNIOpIdPdkq7obGplwM=; b=aSyq77AYkfRACdpt7kfHTPFDClwHxTPC6OSBA1JZ84Iq2F/Lr+DgvXbiivKtzikCzT 3MJ9gjOASoSpmEjTs2YdsBaSBHPPwzjxUliBKwDidY7Wv9l1A/+0BKVzEh6Uv2rwRJF0 TQPWOzvF6YF+4G13b8QozqjpSdHSRq+QzSztLi+z0nKrHFNTEQg08qDgTg/arYyD6Ff0 awp/YVDIADj6tv9JuHPXfSQM6cY2hCriO3IABrmawstjrq/6zIR/4ggpu/uOvpT6QmWS mj4Xl6nfXEhLMIABAfqcbUj/Vz0eyMZWOu6xv2vox3tCjIhV5ZDtor6Ob1Ri1QFllqbC F/WA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3rRRU1s0zqHNH3Op2fFOiDASnNIOpIdPdkq7obGplwM=; b=N0c0jBGlo1Fpcz3E9aWSbtjtBlYQIN9o/wNOJC6ALTFVtxjtKyN84RVPwtsxotP7jw L7Bv2zDYNXlFKBpmwZJcPZxPiU5bL3haDX1HaVkeXDo7yvLgy9dJCdCWZDvyc1bNi1/V xmcL5b5YjDX4AZJGk37FUHUMtoaQq8puWJD5+P2QKQnbhIQ9ICE3D1JsduTIl98uQbUQ 3NMQwwAGFvkHUK245Vw1t7ADWZTAVyvkr+uMrIXDNV6KHtnn88rKSnlvfinhGQcj/yEX OsBO5RfvYTp9Uv93J1rBFmyW2iH0Ejn6IcSnZ4kunvHismEDzkstXtyHtG2pYhnFtmMe cj4A== X-Gm-Message-State: ACrzQf16JuQPsJzKD9XlUU7vyJPY9HRBl2fstyfP/znPzU+JWcexq5f1 ZgFWYd3uaVOJRVDsTw/aAVc= X-Received: by 2002:a63:914a:0:b0:46f:7e1c:6584 with SMTP id l71-20020a63914a000000b0046f7e1c6584mr9285280pge.562.1667158495355; Sun, 30 Oct 2022 12:34:55 -0700 (PDT) Received: from smtpclient.apple (c-24-6-216-183.hsd1.ca.comcast.net. [24.6.216.183]) by smtp.gmail.com with ESMTPSA id o18-20020a170903009200b0018691ce1696sm2993173pld.131.2022.10.30.12.34.53 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 30 Oct 2022 12:34:54 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.1\)) Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment From: Nadav Amit In-Reply-To: Date: Sun, 30 Oct 2022 12:34:51 -0700 Cc: Peter Zijlstra , Jann Horn , John Hubbard , X86 ML , Matthew Wilcox , Andrew Morton , kernel list , Linux-MM , Andrea Arcangeli , "Kirill A . Shutemov" , jroedel@suse.de, ubizjak@gmail.com, Alistair Popple Content-Transfer-Encoding: quoted-printable Message-Id: <44A8D373-24CA-4777-AFC8-DB48F0DC4FAE@gmail.com> References: <20221022111403.531902164@infradead.org> <20221022114424.515572025@infradead.org> <2c800ed1-d17a-def4-39e1-09281ee78d05@nvidia.com> <6C548A9A-3AF3-4EC1-B1E5-47A7FFBEB761@gmail.com> <47678198-C502-47E1-B7C8-8A12352CDA95@gmail.com> <140B437E-B994-45B7-8DAC-E9B66885BEEF@gmail.com> To: Linus Torvalds X-Mailer: Apple Mail (2.3696.120.41.1.1) X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Oct 30, 2022, at 11:19 AM, Linus Torvalds = wrote: > And page_remove_rmap() could *almost* be called later, but it does > have code that also depends on the page table lock, although it looks > like realistically that's just because it "knows" that means that > preemption is disabled, so it uses non-atomic statistics update. >=20 > I say "knows" in quotes, because that's what the comment says, but it > turns out that __mod_node_page_state() has to deal with CONFIG_RT > anyway and does that >=20 > preempt_disable_nested(); > ... > preempt_enable_nested(); >=20 > thing. >=20 > And then it wants to see the vma, although that's actually only to see > if it's 'mlock'ed, so we could just squirrel that away. >=20 > So we *could* move page_remove_rmap() later into the TLB flush region, > but then we would have lost the page table lock anyway, so then > folio_mkclean() can come in regardless. >=20 > So that doesn't even help. Well, if you combine it with the per-page-table stale TLB detection mechanism that I proposed, I think this could work. Reminder (feel free to skip): you would have per-mm =E2=80=9Ccompleted TLB-generation=E2=80=9D in addition to the current one, which would be = renamed to =E2=80=9Cpending TLB-generation=E2=80=9D. Whenever you update the = page-tables in a manner that might require a TLB flush, you would increase the =E2=80=9Cpending TLB-generation=E2=80=9D and save the pending TLB-generation in the = page-table=E2=80=99s page-struct. All of that is done once under the page-table lock. When = you finish a TLB-flush, you update the =E2=80=9Ccompleted TLB-generation=E2=80= =9D. Then on page_vma_mkclean_one(), you would check if the page-table=E2=80=99= s TLB-generation is greater than the completed TLB-generation, which would indicate that TLB entries for PTEs in this table might be stale. In that case you would just flush the TLB. [ Of course you can instead just = flush if mm_tlb_flush_pending(), but nobody likes this mechanism that has a very coarse granularity, and therefore can lead to many unnecessary TLB = flushes. ] Indeed, there would be potentially some overhead in extreme cases, since mm's TLB-generation since its cache is already highly-contended in = extreme cases. But I think it worth it to have simple logic that allows to = reason about correctness. My intuition is that although you appear to be right that we can just = mark this case as =E2=80=9Cextreme case nobody cares about=E2=80=9D, it might = have now or in the future some other implications that are hard to predict and prevent.