Received: by 2002:a05:6358:700f:b0:131:369:b2a3 with SMTP id 15csp951702rwo; Wed, 2 Aug 2023 06:44:53 -0700 (PDT) X-Google-Smtp-Source: APBJJlE+tVyLmoYjp10tdX71eE7+ITf9aV+7/Fw8ySxvKeXog62M8ahsaqsShQDrLqq9WuaKf5wl X-Received: by 2002:a05:6a20:3241:b0:138:58b:3259 with SMTP id hm1-20020a056a20324100b00138058b3259mr14399383pzc.35.1690983892841; Wed, 02 Aug 2023 06:44:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690983892; cv=none; d=google.com; s=arc-20160816; b=K7wucdBCNenpo5yu85+S+CkS7d+/b+e9V4YlAEPLKkCg9y3aX0ZO9Y58QUGNvmbUn5 erIJnxdamXIAa4b2sF0VSdNS3onGHAcvSzGjITtFqZ2OR9QixSq4w7D6cze+E7Y90iNX V4QCg28HRPWJvzZSMbd80F3zwwgJE7kRl+61VtNN3qHy/1UVb9utrD4ESNW61O/31jfV x4P6AEkBa6zwcLWKCi86IlR4vKTVEKRs71gd+LoXhItNnZQn1DTHi4T2iHZfcuv6qw6i am85r9TPgzYCEnHKONe0j77lGlvpjkzOlzDDaSou0Srj+n42/NiBgFVAFnGg/zgry3bp Y15Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:to:subject:user-agent:mime-version:date:message-id; bh=chGJ//b+GCc8oDmSemOIxTvVtcLXa1K4Jeyg0tJV3ck=; fh=X+QCkwZKajD0hblP3KZt1RL39ugCYz8v7zjHXzrtA0I=; b=rG3mfl7iffJLPk9z0NU1jnDN3mkQGnTQu1nQ7FaerhhgT1OCIjJB0krLAQwqWF0bHh al2eqrEYe7z184n/cS+zPd/3BxRSQq5G7alpfW8xL0Ku7yTWlqoEebOlsnf5jTMir9Ji lXu0cqmI7jtjxke1IAzXYJj++MxZXtkAGXqExFmyeiQFWU5EQHbaPBhRLloa6dgxaWtf Itlu0FUaCQ8UpRD89ArG3Iabv+7dPNGZwR2FsZborYxtKUEFF7+OUqo90oddw7bOvUUC zz5Ho9GLqcaUpbx9XQvNXS+z50RTM3wYXAySS5jC3xqW7DH4loUA5hf6kPPQJ0AHA329 s5TA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u70-20020a638549000000b0055c77028579si1291817pgd.747.2023.08.02.06.44.40; Wed, 02 Aug 2023 06:44:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233423AbjHBNJl (ORCPT + 99 others); Wed, 2 Aug 2023 09:09:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49932 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232731AbjHBNJi (ORCPT ); Wed, 2 Aug 2023 09:09:38 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 406A22695 for ; Wed, 2 Aug 2023 06:09:37 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 43E3C1474; Wed, 2 Aug 2023 06:10:20 -0700 (PDT) Received: from [10.57.77.90] (unknown [10.57.77.90]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 725363F5A1; Wed, 2 Aug 2023 06:09:35 -0700 (PDT) Message-ID: <65a36b41-d69e-4072-cfd2-253ed6e4e040@arm.com> Date: Wed, 2 Aug 2023 14:09:33 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [PATCH 1/3] mm: add functions folio_in_range() and folio_within_vma() To: "Yin, Fengwei" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, yuzhao@google.com, willy@infradead.org, david@redhat.com, shy828301@gmail.com, hughd@google.com References: <20230728070929.2487065-1-fengwei.yin@intel.com> <20230728070929.2487065-2-fengwei.yin@intel.com> <55c9e3f7-099d-6f57-32da-1f318a9688a0@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/08/2023 13:50, Yin, Fengwei wrote: > > > On 8/2/2023 7:14 PM, Ryan Roberts wrote: >> On 28/07/2023 08:09, Yin Fengwei wrote: >>> It will be used to check whether the folio is mapped to specific >>> VMA and whether the mapping address of folio is in the range. >>> >>> Also a helper function folio_within_vma() to check whether folio >>> is in the range of vma based on folio_in_range(). >>> >>> Signed-off-by: Yin Fengwei >>> --- >>> mm/internal.h | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++ >>> 1 file changed, 69 insertions(+) >>> >>> diff --git a/mm/internal.h b/mm/internal.h >>> index 5a03bc4782a2..63de32154a48 100644 >>> --- a/mm/internal.h >>> +++ b/mm/internal.h >>> @@ -585,6 +585,75 @@ extern long faultin_vma_page_range(struct vm_area_struct *vma, >>> bool write, int *locked); >>> extern bool mlock_future_ok(struct mm_struct *mm, unsigned long flags, >>> unsigned long bytes); >>> + >>> +/* >>> + * Check whether the folio is in specific range >>> + * >>> + * First, check whether the folio is in the range of vma. >>> + * Then, check whether the folio is mapped to the range of [start, end]. >>> + * In the end, check whether the folio is fully mapped to the range. >>> + * >>> + * @pte page table pointer will be checked whether the large folio >>> + * is fully mapped to. Currently, if mremap in the middle of >>> + * large folio, the large folio could be mapped to to different >>> + * VMA and address check can't identify this situation. >>> + */ >>> +static inline bool >>> +folio_in_range(struct folio *folio, struct vm_area_struct *vma, >>> + unsigned long start, unsigned long end, pte_t *pte) >> >> This api seems a bit redundant to me. Wouldn't it be better to remove the vma >> parameter and instead fix up the start/end addresses in folio_within_vma()? > My understanding is it's necessary. As for madvise, we need to check whether > the folio is both in the range of VMA and also in the range of [start, end). But in folio_within_vma() you pass start as vma->vm_start and end as vma->vm_end. And in this function, you narrow start/end to be completely contained in vma. So surely there is only really one start/end you are interested in? Just seems a bit odd to me. > >> >>> +{ >>> + pte_t ptent; >>> + unsigned long i, nr = folio_nr_pages(folio); >>> + pgoff_t pgoff, addr; >>> + unsigned long vma_pglen = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; >>> + >>> + VM_WARN_ON_FOLIO(folio_test_ksm(folio), folio); >>> + >>> + if (start < vma->vm_start) >>> + start = vma->vm_start; >>> + if (end > vma->vm_end) >>> + end = vma->vm_end; >>> + >>> + pgoff = folio_pgoff(folio); >>> + /* if folio start address is not in vma range */ >>> + if (pgoff < vma->vm_pgoff || pgoff > vma->vm_pgoff + vma_pglen) >>> + return false; >>> + >>> + addr = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); >>> + if (addr < start || end - addr < folio_size(folio)) >>> + return false; >>> + >>> + /* not necessary to check pte for none large folio */ >>> + if (!folio_test_large(folio)) >>> + return true; >>> + >>> + if (!pte) >>> + return false; >>> + >>> + /* check whether parameter pte is associated with folio */ >>> + ptent = ptep_get(pte); >>> + if (pte_none(ptent) || !pte_present(ptent) || >>> + pte_pfn(ptent) - folio_pfn(folio) >= nr) >>> + return false; >>> + >>> + pte -= pte_pfn(ptent) - folio_pfn(folio); >>> + for (i = 0; i < nr; i++, pte++) { >>> + ptent = ptep_get(pte); >>> + >>> + if (pte_none(ptent) || !pte_present(ptent) || >>> + pte_pfn(ptent) - folio_pfn(folio) >= nr) >>> + return false; >>> + } >> >> I don't think I see anything to ensure you don't wander off the end (or start) >> of the pgtable? If the folio is mremapped so that it straddles multiple tables >> (or is bigger than a single table?) then I think pte can become invalid? Perhaps >> you intended start/end to always be within the same pgtable, but that is not >> guarranteed in the case that folio_within_vma() is making the call. > If pte is invalid for any reason (pass wrong parameter, not fully mapped etc), this > function just return false in page table entry check phase. Sorry I don't think this covers the issue I'm describing. If you have a pte-mapped THP that gets mremapped to straddle 2 pte tables, don't you have a problem? example for 4K base page set up: folio_nr_pages = 512 first page of folio mapped at vaddr = 2M - 4K = 0x1FF000 If you then call this function with the pte pointer for the second page in the folio, which is mapped at address 0x200000, that pte is pointing to the first pte entry in the table pointed to by the second pmd entry. The pte pointer can be legitimately manipulated to point to any entry within that table, corrsponding to vaddrs [0x200000, 0x400000). But you will end up subtracting 1 from the pointer, intending that it now points to the pte entry that represents vaddr 0x1FF000. But actually it has fallen off the front of the table into some other arbitrary memory in the linear map. 0x1FF000 is represented in a different table, pointed to by the first pmd entry. > >> >> Also I want to check that this function is definitely always called under the >> PTL for the table that pte belongs to? > Yes. I should spell it out. Thanks. > > > Regards > Yin, Fengwei > >> >>> + >>> + return true; >>> +} >>> + >>> +static inline bool >>> +folio_within_vma(struct folio *folio, struct vm_area_struct *vma, pte_t *pte) >>> +{ >>> + return folio_in_range(folio, vma, vma->vm_start, vma->vm_end, pte); >>> +} >>> + >>> /* >>> * mlock_vma_folio() and munlock_vma_folio(): >>> * should be called with vma's mmap_lock held for read or write, >>