Received: by 2002:ab2:6857:0:b0:1ef:ffd0:ce49 with SMTP id l23csp3517182lqp; Tue, 26 Mar 2024 11:10:25 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUqdhDf14gYgaV2XumzMa3ep7DTfZEHRPzep2KXdgTxvSepJK/ukVW3NDYybx/cEmfh5zsDLi8H58lnAvZoXd3RtaHpCei32dkun1KLjw== X-Google-Smtp-Source: AGHT+IGlyVJH7AtIqDSApmcJ/rHoDekiJGI4pn/62SDiSakPs7vssJMNiN5rA+DZPejLqVjaQOWh X-Received: by 2002:a05:6a00:1ad2:b0:6ea:afdb:6d03 with SMTP id f18-20020a056a001ad200b006eaafdb6d03mr565523pfv.19.1711476624986; Tue, 26 Mar 2024 11:10:24 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1711476624; cv=pass; d=google.com; s=arc-20160816; b=0DjG8PJxSfyfLCOM8p0Qg7TF3oYVXelf3pgl/wo+zwST6eY5mSsRwuayEgg07zV0Lr DtDpckw217chkuw+0JxvLjFVWXUiNgotWjHj/qMyG+GTMM8IUtuRH71SevvkpbP3iEYZ w5KMwREQ2fVyRSWYzUpdN4HhYnBztVN6DZVCtSc9vujnHgIIikPKRartiJVg7+y4q1rT Irg1t3t3NOPVfhLRl+ItR0wwOkE9A8RlTwxtJcw0PBSnrFLyFBYfxVOHYC0sM4I+1a09 99siiwl7RuiazgTeg+XYuYqY2oPXA7KV9xfpISJg2fZ//RDMARL8rwMONawNap/6iF+3 JXiA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id; bh=CvpisOiIq20x9lhw6zGo9vMt2Jp7iikVwGfYc3NGP6c=; fh=5ElpXvGotM0/udMhnvi0GjXf9RQ5OKrboCm54T8RE10=; b=PpWQBpkUB7EJQypgv6Kq/57oTm+Zdsr8PF6iv0A8RiPSBqevfWYIO1tyKIDnJnx7D7 4pLvV6BxzE54zH1tC3FpFSfkBG9OGd3c/FNKX110q3zhtrEJfxMjxOxIAC+/Hv5nNgLG E8d4gViXhbrFrmLZK4N9GxxYJ+mTo7gM3WdxJwpKVxqQ4Xu2Jxtr3W+GLla/lPW0Kjos LwEBHz2xm+RxT4B77GUr9I4UZ4QP6WND7qBbYroo9S2dEKo/5i6TQckGUaDTCaH66LJg ocQ7Ow5B0hWq5rFNzYtSYLnav0lAGSkk6Jpcaw94gosqOOfd4Fyu/bDK6vHWLbJNcKQW FIlA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-119623-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-119623-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id fi32-20020a056a0039a000b006e546bd270dsi8051032pfb.364.2024.03.26.11.10.24 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Mar 2024 11:10:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-119623-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-119623-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-119623-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 5D45BB2619E for ; Tue, 26 Mar 2024 17:49:03 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 0D0A51F944; Tue, 26 Mar 2024 17:48:59 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6C8C31CD31 for ; Tue, 26 Mar 2024 17:48:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711475338; cv=none; b=eYZHkYcrb/EYYWaEbXk0ouv5O7FHkpj+S4sLwsCTtDAJl+cwEnQW7/Q9kPVV2I/ls1gKlHadgdCvmHVbnoiORjxbhvydkVXoLQ1ejDOpouOx9eqYSDcXaMdYnAwqsRbxlUrlxuAvTQi/9kWPztg4SjvavEth3fLRnyU75w4qj2E= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711475338; c=relaxed/simple; bh=6IN+0joCT6s/F6ksB1V2YWol5vs9YStCqbIPnUs9JDQ=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=aogGCMu9aEfJwfuhdJ9OsNUYtVfgCfYz3tt02PYEiSxIsrAlb8A7NkYGChNLZxOvP0aiZxswctk4hmYVPULgwnV86hsibHA36AAdPpoAk/8KQNQU16EtvC36hm8vQ6iuEh+OLzPG4bYuzgDppRW6105Fs9fHeWK2qZJ13Zc8F0M= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id F39BC2F4; Tue, 26 Mar 2024 10:49:27 -0700 (PDT) Received: from [10.1.29.179] (XHFQ2J9959.cambridge.arm.com [10.1.29.179]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 55DAE3F64C; Tue, 26 Mar 2024 10:48:52 -0700 (PDT) Message-ID: Date: Tue, 26 Mar 2024 17:48:52 +0000 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v1 3/4] mm/memory: Use ptep_get_lockless_norecency() for orig_pte Content-Language: en-GB To: David Hildenbrand , Mark Rutland , Catalin Marinas , Will Deacon , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Andrew Morton , Muchun Song Cc: linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20240215121756.2734131-1-ryan.roberts@arm.com> <20240215121756.2734131-4-ryan.roberts@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On 26/03/2024 17:38, David Hildenbrand wrote: > On 26.03.24 18:27, Ryan Roberts wrote: >> On 26/03/2024 17:02, David Hildenbrand wrote: >>> On 15.02.24 13:17, Ryan Roberts wrote: >>>> Let's convert handle_pte_fault()'s use of ptep_get_lockless() to >>>> ptep_get_lockless_norecency() to save orig_pte. >>>> >>>> There are a number of places that follow this model: >>>> >>>>       orig_pte = ptep_get_lockless(ptep) >>>>       ... >>>>       >>>>       if (!pte_same(orig_pte, ptep_get(ptep))) >>>>               // RACE! >>>>       ... >>>>       >>>> >>>> So we need to be careful to convert all of those to use >>>> pte_same_norecency() so that the access and dirty bits are excluded from >>>> the comparison. >>>> >>>> Additionally there are a couple of places that genuinely rely on the >>>> access and dirty bits of orig_pte, but with some careful refactoring, we >>>> can use ptep_get() once we are holding the lock to achieve equivalent >>>> logic. >>> >>> We really should document that changed behavior somewhere where it can be easily >>> found: that orig_pte might have incomplete/stale accessed/dirty information. >> >> I could add it to the orig_pte definition in the `struct vm_fault`? >> >>> >>> >>>> @@ -5343,7 +5356,7 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) >>>>                             vmf->address, &vmf->ptl); >>>>            if (unlikely(!vmf->pte)) >>>>                return 0; >>>> -        vmf->orig_pte = ptep_get_lockless(vmf->pte); >>>> +        vmf->orig_pte = ptep_get_lockless_norecency(vmf->pte); >>>>            vmf->flags |= FAULT_FLAG_ORIG_PTE_VALID; >>>> >>>>            if (pte_none(vmf->orig_pte)) { >>>> @@ -5363,7 +5376,7 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) >>>> >>>>        spin_lock(vmf->ptl); >>>>        entry = vmf->orig_pte; >>>> -    if (unlikely(!pte_same(ptep_get(vmf->pte), entry))) { >>>> +    if (unlikely(!pte_same_norecency(ptep_get(vmf->pte), entry))) { >>>>            update_mmu_tlb(vmf->vma, vmf->address, vmf->pte); >>>>            goto unlock; >>> >>> I was wondering about the following: >>> >>> Assume the PTE is not dirty. >>> >>> Thread 1 does >> >> Sorry not sure what threads have to do with this? How is the vmf shared between >> threads? What have I misunderstood... > > Assume we have a HW that does not have HW-managed access/dirty bits. One that > ends up using generic ptep_set_access_flags(). Access/dirty bits are always > updated under PT lock. > > Then, imagine two threads. One is the the fault path here. another thread > performs some other magic that sets the PTE dirty under PTL. > >> >>> >>> vmf->orig_pte = ptep_get_lockless_norecency(vmf->pte) >>> /* not dirty */ >>> >>> /* Now, thread 2 ends up setting the PTE dirty under PT lock. */ Ahh, this comment about thread 2 is not referring to the code immediately below it. It all makes much more sense now. :) >>> >>> spin_lock(vmf->ptl); >>> entry = vmf->orig_pte; >>> if (unlikely(!pte_same(ptep_get(vmf->pte), entry))) { >>>      ... >>> } >>> ... >>> entry = pte_mkyoung(entry); >> >> Do you mean pte_mkdirty() here? You're talking about dirty everywhere else. > > No, that is just thread 1 seeing "oh, nothing to do" and then goes ahead and > unconditionally does that in handle_pte_fault(). > >> >>> if (ptep_set_access_flags(vmf->vma, ...) >>> ... >>> pte_unmap_unlock(vmf->pte, vmf->ptl); >>> >>> >>> Generic ptep_set_access_flags() will do another pte_same() check and realize >>> "hey, there was a change!" let's update the PTE! >>> >>> set_pte_at(vma->vm_mm, address, ptep, entry); >> >> This is called from the generic ptep_set_access_flags() in your example, right? >> > > Yes. > >>> >>> would overwrite the dirty bit set by thread 2. >> >> I'm not really sure what you are getting at... Is your concern that there is a >> race where the page could become dirty in the meantime and it now gets lost? I >> think that's why arm64 overrides ptep_set_access_flags(); since the hw can >> update access/dirty we have to deal with the races. > > My concern is that your patch can in subtle ways lead to use losing PTE dirty > bits on architectures that don't have the HW-managed dirty bit. They do exist ;) But I think the example you give can already happen today? Thread 1 reads orig_pte = ptep_get_lockless(). So that's already racy, if thread 2 is going to set dirty just after the get, then thread 1 is going to set the PTE back to (a modified version of) orig_pte. Isn't it already broken? > > Arm64 should be fine in that regard. > There is plenty of arm64 HW that doesn't do HW access/dirty update. But our ptep_set_access_flags() can always deal with a racing update, even if that update originates from SW. Why do I have the feeling you're about to explain (very patiently) exactly why I'm wrong?... :)