Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp539039pxj; Thu, 3 Jun 2021 12:59:16 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwTfhx9XVrsr5hzNaIqyNtCBQFEuRlrddac64kNs5SxszIZDpqKI9pY643SV3tnhTW4N4RQ X-Received: by 2002:a50:afe4:: with SMTP id h91mr1107861edd.28.1622750356688; Thu, 03 Jun 2021 12:59:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1622750356; cv=none; d=google.com; s=arc-20160816; b=hlduSmLuZe6ltBe3wnQbknYGRcBmXSNtea/FfajEz7UMsEB5KdU5uFCKJiwd8CyvDZ pupzTkCoy2S3pD/swwC47mmmPubMgXvHKHVWXSsUcDoO45nSThRD1G6BGKUL6aypqmLx 2P33hUS6FFsvIaF3k2bz0gf1OCbZKeRzrlCyg0nAEw+tytKy8t4hgqwcaqOCNWLErtqT 2s/iimEwMvmrUabtdA08yQlBk3+1vtiosVJ3qBvbM6MgXX8lSRT23Aj/r2XahESAFlBu ol0A8D0AHDtI9IcVJ+iVccDNpxU5j8tJ7tZQ1r9JZH02XlMbv3MdnyHBJIgMOTf5JwXE DaPw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=uSJgSSbzEqX1l0EG9ayZpk5Kog1X8jkiX2PwCdLU9wA=; b=aQOFFD8NqXbwvj6/N3bOM4iPkIh19uD49bVhlW5YCUR/k3MOnaNkFk/iciUmnE4EkV CD7oCNbkfEeEszg7REnSE3uzJa5t5FTrOBBDrCVLzbSJoBe7UOm34c4RVk+tk1lfYFg2 518oQ1gPVKe8POwZchWcBR1MXrj7L4OcdE0JxC0G8CiLaRK22HaTwkIkb13jSZ6ROqr9 oA9uchVbjpxtuFfjwybWpp6rxXyqlRO8di+PzD2n8amC6heYix3IGohJ1A4KCFceIRfT jZPnlrPj3efZR9rff5rXxQ6xNsGIZvGcrbP123eC6Psoroma78UW3fpqDhfa/deSI75f 5DJw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=cnjbZym8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v13si2982864ejk.187.2021.06.03.12.58.52; Thu, 03 Jun 2021 12:59:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=cnjbZym8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229882AbhFCT6u (ORCPT + 99 others); Thu, 3 Jun 2021 15:58:50 -0400 Received: from mail.kernel.org ([198.145.29.99]:36442 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229576AbhFCT6u (ORCPT ); Thu, 3 Jun 2021 15:58:50 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 0AA1E6139A; Thu, 3 Jun 2021 19:57:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1622750224; bh=81DqeO774s33h+YAwWrBAiXQa7KeTEOqSdKa8sghF9M=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=cnjbZym8K9qCgQHxspSMzsHaHSF/RlTIzutUDrU8l5AVE+Y6KL0kum7iau+vYj1Gz x3w9EZwjJdgqclXYwgm8KFsMTLm7iF8wzcgXwYL42JH6Fz7bv1gcAuSMCH89GbDSSv c+xc9tdT04HVkKeKhM+xP2VT1W81C5nlPjmeb1xbGYHJ8L1OCnkrHYYfFgtfHbKZd3 2jBdLE/5WSQan772gkgaQDQaW1qMdngdBZZbzGPoP6mLs08iDhj8wTf7ndC4oJqodB RLJaTrvJz4aCaj82E/lqKjkS4tiw1kHRqDqNNUtgj3RcDpFBaJ2gy8/6sCy9KdkA64 nBRfjO3Mmhb0w== Subject: Re: [PATCH 2/2] mm: adds NOSIGBUS extension for out-of-band shmem read To: Hugh Dickins Cc: Linus Torvalds , Simon Ser , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org References: <1622589753-9206-1-git-send-email-mlin@kernel.org> <1622589753-9206-3-git-send-email-mlin@kernel.org> <79a27014-5450-1345-9eea-12fc9ae25777@kernel.org> From: Ming Lin Message-ID: Date: Thu, 3 Jun 2021 12:57:01 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.10.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/2/2021 5:46 PM, Hugh Dickins wrote > > It's do_anonymous_page()'s business to map in the zero page on > read fault (see "my_zero_pfn(vmf->address)" in there), or fill > a freshly allocated page with zeroes on write fault - and now > you're sticking to MAP_PRIVATE, write faults in VM_WRITE areas > are okay for VM_NOSIGBUS. > > Ideally you can simply call do_anonymous_page() from __do_fault() > in the VM_FAULT_SIGBUS on VM_NOSIGBUS case. That's what to start > from anyway: but look to see if there's state to be adjusted to > achieve that; and it won't be surprising if somewhere down in > do_anonymous_page() or something it calls, there's a BUG on it > being called when vma->vm_file is set, or something like that. > May need some tweaking. do_anonymous_page() works nicely for read fault and write fault. I didn't see any BUG() thing in my test. But I'm still struggling with how to do "punch hole should remove the mapping of zero page". Here is the hack I have now. diff --git a/mm/memory.c b/mm/memory.c index 46ecda5..6b5a897 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1241,7 +1241,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, struct page *page; page = vm_normal_page(vma, addr, ptent); - if (unlikely(details) && page) { + if (unlikely(details) && page && !(vma->vm_flags & VM_NOSIGBUS)) { /* * unmap_shared_mapping_pages() wants to * invalidate cache without truncating: And other parts of the patch is following, ---- diff --git a/include/linux/mm.h b/include/linux/mm.h index e9d67bc..af9e277 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -373,6 +373,8 @@ int __add_to_page_cache_locked(struct page *page, struct address_space *mapping, # define VM_UFFD_MINOR VM_NONE #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ +#define VM_NOSIGBUS VM_FLAGS_BIT(38) /* Do not SIGBUS on fault */ + /* Bits set in the VMA until the stack is in its final location */ #define VM_STACK_INCOMPLETE_SETUP (VM_RAND_READ | VM_SEQ_READ) diff --git a/include/linux/mman.h b/include/linux/mman.h index b2cbae9..c966b08 100644 --- a/include/linux/mman.h +++ b/include/linux/mman.h @@ -154,6 +154,7 @@ static inline bool arch_validate_flags(unsigned long flags) _calc_vm_trans(flags, MAP_DENYWRITE, VM_DENYWRITE ) | _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) | _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) | + _calc_vm_trans(flags, MAP_NOSIGBUS, VM_NOSIGBUS ) | arch_calc_vm_flag_bits(flags); } diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index f94f65d..a2a5333 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -29,6 +29,7 @@ #define MAP_HUGETLB 0x040000 /* create a huge page mapping */ #define MAP_SYNC 0x080000 /* perform synchronous page faults for the mapping */ #define MAP_FIXED_NOREPLACE 0x100000 /* MAP_FIXED which doesn't unmap underlying mapping */ +#define MAP_NOSIGBUS 0x200000 /* do not SIGBUS on fault */ #define MAP_UNINITIALIZED 0x4000000 /* For anonymous mmap, memory could be * uninitialized */ diff --git a/mm/memory.c b/mm/memory.c index eff2a47..46ecda5 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3676,6 +3676,17 @@ static vm_fault_t __do_fault(struct vm_fault *vmf) } ret = vma->vm_ops->fault(vmf); + if (unlikely(ret & VM_FAULT_SIGBUS) && (vma->vm_flags & VM_NOSIGBUS)) { + /* + * For MAP_NOSIGBUS mapping, map in the zero page on read fault + * or fill a freshly allocated page with zeroes on write fault + */ + ret = do_anonymous_page(vmf); + if (!ret) + ret = VM_FAULT_NOPAGE; + return ret; + } + if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY | VM_FAULT_DONE_COW))) return ret; diff --git a/mm/mmap.c b/mm/mmap.c index 096bba4..74fb49a 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1419,6 +1419,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr, if (!len) return -EINVAL; + /* Restrict MAP_NOSIGBUS to MAP_PRIVATE mapping */ + if ((flags & MAP_NOSIGBUS) && !(flags & MAP_PRIVATE)) + return -EINVAL; + /* * Does the application expect PROT_READ to imply PROT_EXEC? *