Received: by 2002:a05:6358:700f:b0:131:369:b2a3 with SMTP id 15csp1016341rwo; Wed, 2 Aug 2023 07:35:13 -0700 (PDT) X-Google-Smtp-Source: APBJJlHL1oLvSIPa0tyWepPozc6Y1RDoYW08OVl8oy4cgvIdtbtjVlF0t/GbM5jmezkgfS/UYm8y X-Received: by 2002:a05:6402:508:b0:50b:c630:a956 with SMTP id m8-20020a056402050800b0050bc630a956mr4872864edv.17.1690986913392; Wed, 02 Aug 2023 07:35:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690986913; cv=none; d=google.com; s=arc-20160816; b=wcgCNisa6VRMdFQOt3eoThxuXAEdpk6bgSdfDbbXe4soYlMLKQ3uUzT7hVY/KasAbP TfNUtusEy9rLCGC4BdxAnhLonuje8tvCTpIT3QDQyF94JjeZ3Braz400bTZ3hdgZQvaW x/WUMD56w93No+ZgLxHNvAqxukkk2IRXMECHHBr/8rMSqkDrxCKb/lTaNJhm0Ik+uNtt riilLMaaNNuvZ05G3mKuGff3B92erwq+bD2+co5O1gVoVwXp5pHKvSgMqb4Z4PuG8s1O D9yauS2ADH8BepYLITo76CUwNiLLTlTUzeEE8c+er1SPWDGOOI2pYRPuezYOZYORlmqk t0+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:to:subject:user-agent:mime-version:date:message-id; bh=NLFZ9OxzYmZf7EMZFL3Nk3QWys46mqbh/xgtaiZn8R0=; fh=X+QCkwZKajD0hblP3KZt1RL39ugCYz8v7zjHXzrtA0I=; b=jOvB5+cvWOzGHGGFEqHs6Jxf6FUvjFeVSe96zUWDFTGxDl+gZTc/AOhiv/l6dB+fwh cvexWBpjVifkg7DLAo2D0j35GmdfnCwRl1kAcCoZwow/GemNS4GFnPsaMPMYJm24APPW +xjJ1gCe30cy1rj87gmwq0ZTqoZ4HmXAUs667SmQ9bJkZdtbOSXVFR31NHvZQd6rIGit sRy6OpfN+RPCtib65oaw5q1d5uvnyN7mgAIRWGu7utAMjEYBDIoih/acThN2a/wigdbw ZugHIAQ6Dai8buWtFfQ6v2LXOXC0NFexRXS7DaMnjpL/oeoF4TFzygHGZIKA7fWfN7rW denA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q26-20020aa7d45a000000b00522582183f0si10091996edr.581.2023.08.02.07.34.48; Wed, 02 Aug 2023 07:35:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234813AbjHBOJJ (ORCPT + 99 others); Wed, 2 Aug 2023 10:09:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51408 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234847AbjHBOIr (ORCPT ); Wed, 2 Aug 2023 10:08:47 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 2500E273C for ; Wed, 2 Aug 2023 07:08:18 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A7015113E; Wed, 2 Aug 2023 07:09:01 -0700 (PDT) Received: from [10.57.77.90] (unknown [10.57.77.90]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 07F303F5A1; Wed, 2 Aug 2023 07:08:16 -0700 (PDT) Message-ID: <8dcf002a-088e-32de-7868-5dc5ca6b1206@arm.com> Date: Wed, 2 Aug 2023 15:08:15 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [PATCH 1/3] mm: add functions folio_in_range() and folio_within_vma() To: "Yin, Fengwei" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, yuzhao@google.com, willy@infradead.org, david@redhat.com, shy828301@gmail.com, hughd@google.com References: <20230728070929.2487065-1-fengwei.yin@intel.com> <20230728070929.2487065-2-fengwei.yin@intel.com> <55c9e3f7-099d-6f57-32da-1f318a9688a0@arm.com> <65a36b41-d69e-4072-cfd2-253ed6e4e040@arm.com> <286cbca6-ab5e-ad06-ea2a-89ea08ee53d4@intel.com> From: Ryan Roberts In-Reply-To: <286cbca6-ab5e-ad06-ea2a-89ea08ee53d4@intel.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/08/2023 14:46, Yin, Fengwei wrote: > > > On 8/2/2023 9:09 PM, Ryan Roberts wrote: >> On 02/08/2023 13:50, Yin, Fengwei wrote: >>> >>> >>> On 8/2/2023 7:14 PM, Ryan Roberts wrote: >>>> On 28/07/2023 08:09, Yin Fengwei wrote: >>>>> It will be used to check whether the folio is mapped to specific >>>>> VMA and whether the mapping address of folio is in the range. >>>>> >>>>> Also a helper function folio_within_vma() to check whether folio >>>>> is in the range of vma based on folio_in_range(). >>>>> >>>>> Signed-off-by: Yin Fengwei >>>>> --- >>>>> mm/internal.h | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> 1 file changed, 69 insertions(+) >>>>> >>>>> diff --git a/mm/internal.h b/mm/internal.h >>>>> index 5a03bc4782a2..63de32154a48 100644 >>>>> --- a/mm/internal.h >>>>> +++ b/mm/internal.h >>>>> @@ -585,6 +585,75 @@ extern long faultin_vma_page_range(struct vm_area_struct *vma, >>>>> bool write, int *locked); >>>>> extern bool mlock_future_ok(struct mm_struct *mm, unsigned long flags, >>>>> unsigned long bytes); >>>>> + >>>>> +/* >>>>> + * Check whether the folio is in specific range >>>>> + * >>>>> + * First, check whether the folio is in the range of vma. >>>>> + * Then, check whether the folio is mapped to the range of [start, end]. >>>>> + * In the end, check whether the folio is fully mapped to the range. >>>>> + * >>>>> + * @pte page table pointer will be checked whether the large folio >>>>> + * is fully mapped to. Currently, if mremap in the middle of >>>>> + * large folio, the large folio could be mapped to to different >>>>> + * VMA and address check can't identify this situation. >>>>> + */ >>>>> +static inline bool >>>>> +folio_in_range(struct folio *folio, struct vm_area_struct *vma, >>>>> + unsigned long start, unsigned long end, pte_t *pte) >>>> >>>> This api seems a bit redundant to me. Wouldn't it be better to remove the vma >>>> parameter and instead fix up the start/end addresses in folio_within_vma()? >>> My understanding is it's necessary. As for madvise, we need to check whether >>> the folio is both in the range of VMA and also in the range of [start, end). >> >> But in folio_within_vma() you pass start as vma->vm_start and end as >> vma->vm_end. And in this function, you narrow start/end to be completely >> contained in vma. So surely there is only really one start/end you are >> interested in? Just seems a bit odd to me. > madvise() will call filio_in_range() with VMA and real range [start, end) passed > from user space. > >> >>> >>>> >>>>> +{ >>>>> + pte_t ptent; >>>>> + unsigned long i, nr = folio_nr_pages(folio); >>>>> + pgoff_t pgoff, addr; >>>>> + unsigned long vma_pglen = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; >>>>> + >>>>> + VM_WARN_ON_FOLIO(folio_test_ksm(folio), folio); >>>>> + >>>>> + if (start < vma->vm_start) >>>>> + start = vma->vm_start; >>>>> + if (end > vma->vm_end) >>>>> + end = vma->vm_end; >>>>> + >>>>> + pgoff = folio_pgoff(folio); >>>>> + /* if folio start address is not in vma range */ >>>>> + if (pgoff < vma->vm_pgoff || pgoff > vma->vm_pgoff + vma_pglen) >>>>> + return false; >>>>> + >>>>> + addr = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); >>>>> + if (addr < start || end - addr < folio_size(folio)) >>>>> + return false; >>>>> + >>>>> + /* not necessary to check pte for none large folio */ >>>>> + if (!folio_test_large(folio)) >>>>> + return true; >>>>> + >>>>> + if (!pte) >>>>> + return false; >>>>> + >>>>> + /* check whether parameter pte is associated with folio */ >>>>> + ptent = ptep_get(pte); >>>>> + if (pte_none(ptent) || !pte_present(ptent) || >>>>> + pte_pfn(ptent) - folio_pfn(folio) >= nr) >>>>> + return false; >>>>> + >>>>> + pte -= pte_pfn(ptent) - folio_pfn(folio); >>>>> + for (i = 0; i < nr; i++, pte++) { >>>>> + ptent = ptep_get(pte); >>>>> + >>>>> + if (pte_none(ptent) || !pte_present(ptent) || >>>>> + pte_pfn(ptent) - folio_pfn(folio) >= nr) >>>>> + return false; >>>>> + } >>>> >>>> I don't think I see anything to ensure you don't wander off the end (or start) >>>> of the pgtable? If the folio is mremapped so that it straddles multiple tables >>>> (or is bigger than a single table?) then I think pte can become invalid? Perhaps >>>> you intended start/end to always be within the same pgtable, but that is not >>>> guarranteed in the case that folio_within_vma() is making the call. >>> If pte is invalid for any reason (pass wrong parameter, not fully mapped etc), this >>> function just return false in page table entry check phase. >> >> Sorry I don't think this covers the issue I'm describing. If you have a >> pte-mapped THP that gets mremapped to straddle 2 pte tables, don't you have a >> problem? >> >> example for 4K base page set up: >> >> folio_nr_pages = 512 >> first page of folio mapped at vaddr = 2M - 4K = 0x1FF000 >> >> If you then call this function with the pte pointer for the second page in the >> folio, which is mapped at address 0x200000, that pte is pointing to the first >> pte entry in the table pointed to by the second pmd entry. The pte pointer can >> be legitimately manipulated to point to any entry within that table, >> corrsponding to vaddrs [0x200000, 0x400000). But you will end up subtracting 1 >> from the pointer, intending that it now points to the pte entry that represents >> vaddr 0x1FF000. But actually it has fallen off the front of the table into some >> other arbitrary memory in the linear map. 0x1FF000 is represented in a different >> table, pointed to by the first pmd entry. > Yes. This can be an issue as hold the second page table lock can't prevent the first > part unmapped. Let me add another check vaddr align to folio_size in next version. Locking is a problem but its not the only problem. The 2 tables are almost certainly not contiguous in virtual memory. So once you have moved the pointer to before the start of the second table, then you are pointing to arbitrary memory. > > Regards > Yin, Fengwei > >> >> >>> >>>> >>>> Also I want to check that this function is definitely always called under the >>>> PTL for the table that pte belongs to? >>> Yes. I should spell it out. Thanks. >>> >>> >>> Regards >>> Yin, Fengwei >>> >>>> >>>>> + >>>>> + return true; >>>>> +} >>>>> + >>>>> +static inline bool >>>>> +folio_within_vma(struct folio *folio, struct vm_area_struct *vma, pte_t *pte) >>>>> +{ >>>>> + return folio_in_range(folio, vma, vma->vm_start, vma->vm_end, pte); >>>>> +} >>>>> + >>>>> /* >>>>> * mlock_vma_folio() and munlock_vma_folio(): >>>>> * should be called with vma's mmap_lock held for read or write, >>>> >>