Received: by 2002:ab2:6203:0:b0:1f5:f2ab:c469 with SMTP id o3csp2859674lqt; Tue, 23 Apr 2024 04:03:51 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVaZJmnUGEp2ME6NK5MERvjqqUaqlze8kgcKnq45fVCRpelz4zO0QidKUX4Bd9qoO+4k7btgx7G4xxoVqtGkD6oplc+86dyUTSRjFkDsw== X-Google-Smtp-Source: AGHT+IGHDDCN7yeX1gvhGqPhyMhovGZj8+MS1sN6gzbICi7ffyqwE1K3yGcaLxaIVDImTe6doTmP X-Received: by 2002:a9d:7b4c:0:b0:6eb:720d:cf42 with SMTP id f12-20020a9d7b4c000000b006eb720dcf42mr15747802oto.21.1713870231502; Tue, 23 Apr 2024 04:03:51 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1713870231; cv=pass; d=google.com; s=arc-20160816; b=nOGzHZUMhMna1stWVr7+zL6jlnMGIhsbjZN0ISkVM4aHwFC4XNcAz7wJrp9i95SSEX kiPWQuXi9JgUPRXg0m8+zs5ZMbb+0EclIYejTj4sutFfyokdEdJ1GmLgUUorED284coo oORNURG/Wv/KuPWi6IStyU8Fe01Rccuvu21y6f13GkWSaZRxcAs++aTDaEZDB6/defcn 2K8SYWND5TinTqTtyt51YoxSU2Sa9jcpVl81IZ+ZMX960ErsEifLVh26zYM/9Qs7T9jb bO9yMgYBNH54IvhLsEh/KGuS+JDa0BrjSbjxjcqDJxekEBGEs9l4DOCND1VNG5KPkdSk jSQw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id; bh=+jYepw7RVY+ZNBIIwjJit0RHO3nWzlqLW4FMVIkPNtw=; fh=6zSiAGoAgSPa8hr0PPQjB1JgZAt8NAX3xQE/BuoU18Y=; b=EWaPMRTcOTvhM67Mmqc1iLTtbVqRAX8DNWESwGYu+7VSwVoRabtBU9t48CGX26SoQ4 CAS7s0TQNKaJqu0/LHzJcetJBzHSA48+sQiF6/ZgOyxFn8EOY/A2e7ioF+H3OKODtB0L 1iNJQGYLy+yYPHOZox8W9/kcOOWyu7myntUEy04+BNsGrlRm0UeJtOWI1e3+ZKQTj3cd oCaGfVX/NDWfWaXD5WT3ZrQU0J5NyRpaMjgFWEGJiGX8wqKfCT1UXoWfDMC4D0irG34O G/ndEGFHF7Dfk2jXhNfY6MsXdUJZY2LX+h0f3+hleLoU/QXaBrH8Q9NKRiE35dMUnhsr jE9g==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-154940-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-154940-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id bu10-20020a056102524a00b0047bd199a52bsi1371849vsb.284.2024.04.23.04.03.51 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 23 Apr 2024 04:03:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-154940-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-154940-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-154940-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 365761C21DF2 for ; Tue, 23 Apr 2024 11:03:51 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6F394745E2; Tue, 23 Apr 2024 11:03:38 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 9DFF45812B for ; Tue, 23 Apr 2024 11:03:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713870218; cv=none; b=McmGZiH4EAY2fbOTK7od/6l8h6YtPAVJtpHjgdOrJm7YXxqxDMV47HuHLwGtK05xR7KIIpLJO5Z+IHYDmJkbMHAkadJaG+pmmdumVO0ZjfKO/fbCk3dJZ/GHKaWAR5l2jVfhEFkQrxWCZXgnvAJpvUPUO969ekYdKvcvEUxnrLA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713870218; c=relaxed/simple; bh=Q/2gtcyulav/9Q2+HHK8JUU7iDQ7d8y7qqKXMLxj2GY=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=BTwgsEei+BjFfjrPa55spKkeTLYotOkO8IctrgbEgAnX7D7Z2+C/W4IUPIJFXpvT0In1ZW7e2UiwIuNnuhlzje4n3OjuGGisFQJ2Z8jrAjv38svU+xYFyFVaplsB13FKSvFrIP2/fwpT7rvgSow4DCMo4rz50annTHjOqCpfWEc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4F8B6339; Tue, 23 Apr 2024 04:04:00 -0700 (PDT) Received: from [10.57.74.127] (unknown [10.57.74.127]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1936F3F7BD; Tue, 23 Apr 2024 04:03:29 -0700 (PDT) Message-ID: <6aa25e2a-a6b6-4ab7-8300-053ca3c0d748@arm.com> Date: Tue, 23 Apr 2024 12:03:28 +0100 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH 1/5] mm: memory: extend finish_fault() to support large folio Content-Language: en-GB To: Baolin Wang , akpm@linux-foundation.org, hughd@google.com Cc: willy@infradead.org, david@redhat.com, wangkefeng.wang@huawei.com, 21cnbao@gmail.com, ying.huang@intel.com, shy828301@gmail.com, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <358aefb1858b63164894d7d8504f3dae0b495366.1713755580.git.baolin.wang@linux.alibaba.com> From: Ryan Roberts In-Reply-To: <358aefb1858b63164894d7d8504f3dae0b495366.1713755580.git.baolin.wang@linux.alibaba.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 22/04/2024 08:02, Baolin Wang wrote: > Add large folio mapping establishment support for finish_fault() as a preparation, > to support multi-size THP allocation of anonymous shared pages in the following > patches. > > Signed-off-by: Baolin Wang > --- > mm/memory.c | 25 ++++++++++++++++++------- > 1 file changed, 18 insertions(+), 7 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index b6fa5146b260..094a76730776 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4766,7 +4766,10 @@ vm_fault_t finish_fault(struct vm_fault *vmf) > { > struct vm_area_struct *vma = vmf->vma; > struct page *page; > + struct folio *folio; > vm_fault_t ret; > + int nr_pages, i; > + unsigned long addr; > > /* Did we COW the page? */ > if ((vmf->flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) > @@ -4797,22 +4800,30 @@ vm_fault_t finish_fault(struct vm_fault *vmf) > return VM_FAULT_OOM; > } > > + folio = page_folio(page); > + nr_pages = folio_nr_pages(folio); > + addr = ALIGN_DOWN(vmf->address, nr_pages * PAGE_SIZE); I'm not sure this is safe. IIUC, finish_fault() is called for any file-backed mapping. So you could have a situation where part of a (regular) file is mapped in the process, faults and hits in the pagecache. But the folio returned by the pagecache is bigger than the portion that the process has mapped. So you now end up mapping beyond the VMA limits? In the pagecache case, you also can't assume that the folio is naturally aligned in virtual address space. > vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, > - vmf->address, &vmf->ptl); > + addr, &vmf->ptl); > if (!vmf->pte) > return VM_FAULT_NOPAGE; > > /* Re-check under ptl */ > - if (likely(!vmf_pte_changed(vmf))) { > - struct folio *folio = page_folio(page); > - > - set_pte_range(vmf, folio, page, 1, vmf->address); > - ret = 0; > - } else { > + if (nr_pages == 1 && vmf_pte_changed(vmf)) { > update_mmu_tlb(vma, vmf->address, vmf->pte); > ret = VM_FAULT_NOPAGE; > + goto unlock; > + } else if (nr_pages > 1 && !pte_range_none(vmf->pte, nr_pages)) { I think you have grabbed this from do_anonymous_page()? But I'm not sure it works in the same way here as it does there. For the anon case, if userfaultfd is armed, alloc_anon_folio() will only ever allocate order-0. So we end up in the vmf_pte_changed() path, which will allow overwriting a uffd entry. But here, there is nothing stopping nr_pages being greater than 1 when there could be a uffd entry present, and you will fail due to the pte_range_none() check. (see pte_marker_handle_uffd_wp()). Thanks, Ryan > + for (i = 0; i < nr_pages; i++) > + update_mmu_tlb(vma, addr + PAGE_SIZE * i, vmf->pte + i); > + ret = VM_FAULT_NOPAGE; > + goto unlock; > } > > + set_pte_range(vmf, folio, &folio->page, nr_pages, addr); > + ret = 0; > + > +unlock: > pte_unmap_unlock(vmf->pte, vmf->ptl); > return ret; > }