Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp5203286rwr; Sun, 23 Apr 2023 23:57:15 -0700 (PDT) X-Google-Smtp-Source: AKy350bQ2c23NL04DzwDTq7yxDt2cx8SGGD/934P9NFmblan4yQKysJTle+6uEblTYdNKyi4OLly X-Received: by 2002:a17:902:f684:b0:1a6:82a7:6e60 with SMTP id l4-20020a170902f68400b001a682a76e60mr16958215plg.47.1682319435575; Sun, 23 Apr 2023 23:57:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682319435; cv=none; d=google.com; s=arc-20160816; b=kX6GcPa5U1Euf8iC8V5WpleQNE+ZeZsHmlbCqub9cp1exDor/rxAIqlWPgfGskPIbA XsPRZccZMpRLBziz9CaCfty9IKl7JvuhxiG6e6qvffOy6RV+AppBfGWpbznYCgC/b5Sv YxSXHjAxFZ3/f43Qe7OW09YC0qhDafNhgmnae0+1KAE2z7i0FOQQ7ATWBzWgf6rMCpfm PJuCQcWW//2TkhyS1WyLe1iG6+ZITAreKnuqIWBCK0xkrVHzb6n7uCr9NyGF1YDSrRIY 5aDPxr2ufhAbei3NqVv3umSHWwikCKb0UX4qnX46dHddXMRmXoUusvRcJD15HTn3ktWw PGug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=bpg9oYb+78qUrnYZwDEGvn854L618wBnXXbfmiz3zNo=; b=Lnb135VMNANIClFIrywJa2M2n/kt/SvpNR86Xl/J2cNOTi86icts6weDOycLgsJ1xF k5ojX1HjMoHy8dqiayyIfHOF+mQEWvcjgAgN/aezI42wCVfhu0RjH2Z0mrkxhPRoPe0z SFl2XrTw+HPD+zjf1K5mdtVFedYfBudF2GFydywwsNlR5h7/z8xr65SDyY4a3pUVGjCF nTW7czN3Q0uiwq6+nRa7mGpszwmK2pSn4xc/PDU1b6+DIH8iDSmGstUM2KtQuuNPVxBs zirTavVbzwj+F6eyiKJHof1jr84spOMRconXxujAXC1v69bEr7DPRmYlE+6k1UnorYXx fZoA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=VmcFknFw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id lf5-20020a170902fb4500b001a179347bf6si9729856plb.474.2023.04.23.23.57.03; Sun, 23 Apr 2023 23:57:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=VmcFknFw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231166AbjDXGvq (ORCPT + 99 others); Mon, 24 Apr 2023 02:51:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55514 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231145AbjDXGvp (ORCPT ); Mon, 24 Apr 2023 02:51:45 -0400 Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com [IPv6:2a00:1450:4864:20::32a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 159B926A3; Sun, 23 Apr 2023 23:51:43 -0700 (PDT) Received: by mail-wm1-x32a.google.com with SMTP id 5b1f17b1804b1-3f173af665fso24963195e9.3; Sun, 23 Apr 2023 23:51:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1682319101; x=1684911101; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=bpg9oYb+78qUrnYZwDEGvn854L618wBnXXbfmiz3zNo=; b=VmcFknFwfUTacZeP8SNbm4iVcUYo+yv6oJjyWVAOsaCwms6qtrbmYFnq4zwpVLBcg1 7tyNIQcttY5KUXi2pWXg5/IiqsgETEFoltxSD6olC0ua6hEEYQbBtuhYpRXTBdAE0+X3 u+PbwBwwvYS5yPhd9gVLWKtMA2yFc4TXDoWIf2CJejTLrVwVFa8GaCZ5hxL8s7rWPzG7 PUuBSfsYo7zLxHJl0q2nPzW4gUM09MFYtQhxrZH7YsyJE2N1kkBkk2zNsFq5AgWyGYxJ 77qWsTyNgOHnzQYKIVbyRLdqSeBeOJRwhuJd/UcPFPlbIsRI3C+QZeM0KV2MXrFOBHLU 9UoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682319101; x=1684911101; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=bpg9oYb+78qUrnYZwDEGvn854L618wBnXXbfmiz3zNo=; b=VqZVLb8Eo9jHWSCY8VKvihzJtrNfYRtxkGSUnZczEcOoAMgrt2dNybVRO5DlGdH8LC TVPOUiTM1RELLgtmeHDsq5BoqxbXlPaVbzufAi4OOUMYaW4s4al1WJf/zhlPpURx2B1F 3FxIBLVyePoAQCqTOKmRr3kFbjPMwRyZGCdUG7Fb2FE/lN7bkoUxSIg65W+MJsiCnAoq 4+GN2vLO1rxcqAQosRlYxMX8c7wUqLU8lXsBnBfXKxeOacCiVnEdvPDp+PwrjpGx101n TwcURC4xeqXTKtxOdfjye6j+jUsnkh8aXpOpnuXj4cU+tKQn1f3zYNvSjikdxAwRvTbk C5XQ== X-Gm-Message-State: AAQBX9c08QILdbOK9wovYDxlz9iQEynvgHy568kFOTez1qEhFkg73vcM TCLh4CwpDbfpzy5PtO+uyr0= X-Received: by 2002:a05:600c:24cd:b0:3f1:6ef6:c9d0 with SMTP id 13-20020a05600c24cd00b003f16ef6c9d0mr6311844wmu.17.1682319100951; Sun, 23 Apr 2023 23:51:40 -0700 (PDT) Received: from localhost ([2a00:23c5:dc8c:8701:1663:9a35:5a7b:1d76]) by smtp.gmail.com with ESMTPSA id c9-20020a7bc009000000b003ede3e54ed7sm11366725wmb.6.2023.04.23.23.51.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 23 Apr 2023 23:51:40 -0700 (PDT) Date: Mon, 24 Apr 2023 07:51:39 +0100 From: Lorenzo Stoakes To: Mika =?iso-8859-1?Q?Penttil=E4?= Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Jason Gunthorpe , Jens Axboe , Matthew Wilcox , Dennis Dalessandro , Leon Romanovsky , Christian Benvenuti , Nelson Escobar , Bernard Metzler , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , Bjorn Topel , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , "David S . Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Richard Cochran , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , linux-fsdevel@vger.kernel.org, linux-perf-users@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org Subject: Re: [PATCH] mm/gup: disallow GUP writing to file-backed mappings by default Message-ID: References: <4b599782-3512-a177-c5b5-c562a22886c7@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <4b599782-3512-a177-c5b5-c562a22886c7@redhat.com> X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 24, 2023 at 06:41:38AM +0300, Mika Penttil? wrote: > > Hi, > > > On 22.4.2023 16.37, Lorenzo Stoakes wrote: > > It isn't safe to write to file-backed mappings as GUP does not ensure that > > the semantics associated with such a write are performed correctly, for > > instance filesystems which rely upon write-notify will not be correctly > > notified. > > > > There are exceptions to this - shmem and hugetlb mappings are (in effect) > > anonymous mappings by other names so we do permit this operation in these > > cases. > > > > In addition, if no pinning takes place (neither FOLL_GET nor FOLL_PIN is > > specified and neither flags gets implicitly set) then no writing can occur > > so we do not perform the check in this instance. > > > > This is an important exception, as populate_vma_page_range() invokes > > __get_user_pages() in this way (and thus so does __mm_populate(), used by > > MAP_POPULATE mmap() and mlock() invocations). > > > > There are GUP users within the kernel that do nevertheless rely upon this > > behaviour, so we introduce the FOLL_ALLOW_BROKEN_FILE_MAPPING flag to > > explicitly permit this kind of GUP access. > > > > This is required in order to not break userspace in instances where the > > uAPI might permit file-mapped addresses - a number of RDMA users require > > this for instance, as do the process_vm_[read/write]v() system calls, > > /proc/$pid/mem, ptrace and SDT uprobes. Each of these callers have been > > updated to use this flag. > > > > Making this change is an important step towards a more reliable GUP, and > > explicitly indicates which callers might encouter issues moving forward. > > > > Suggested-by: Jason Gunthorpe > > Signed-off-by: Lorenzo Stoakes > > --- > > drivers/infiniband/hw/qib/qib_user_pages.c | 3 +- > > drivers/infiniband/hw/usnic/usnic_uiom.c | 2 +- > > drivers/infiniband/sw/siw/siw_mem.c | 3 +- > > fs/proc/base.c | 3 +- > > include/linux/mm_types.h | 8 +++++ > > kernel/events/uprobes.c | 3 +- > > mm/gup.c | 36 +++++++++++++++++++++- > > mm/memory.c | 3 +- > > mm/process_vm_access.c | 2 +- > > net/xdp/xdp_umem.c | 2 +- > > 10 files changed, 56 insertions(+), 9 deletions(-) > > > > diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c > > index f693bc753b6b..b9019dad8008 100644 > > --- a/drivers/infiniband/hw/qib/qib_user_pages.c > > +++ b/drivers/infiniband/hw/qib/qib_user_pages.c > > @@ -110,7 +110,8 @@ int qib_get_user_pages(unsigned long start_page, size_t num_pages, > > for (got = 0; got < num_pages; got += ret) { > > ret = pin_user_pages(start_page + got * PAGE_SIZE, > > num_pages - got, > > - FOLL_LONGTERM | FOLL_WRITE, > > + FOLL_LONGTERM | FOLL_WRITE | > > + FOLL_ALLOW_BROKEN_FILE_MAPPING, > > p + got, NULL); > > if (ret < 0) { > > mmap_read_unlock(current->mm); > > diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c > > index 2a5cac2658ec..33cf79b248a9 100644 > > --- a/drivers/infiniband/hw/usnic/usnic_uiom.c > > +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c > > @@ -85,7 +85,7 @@ static int usnic_uiom_get_pages(unsigned long addr, size_t size, int writable, > > int dmasync, struct usnic_uiom_reg *uiomr) > > { > > struct list_head *chunk_list = &uiomr->chunk_list; > > - unsigned int gup_flags = FOLL_LONGTERM; > > + unsigned int gup_flags = FOLL_LONGTERM | FOLL_ALLOW_BROKEN_FILE_MAPPING; > > struct page **page_list; > > struct scatterlist *sg; > > struct usnic_uiom_chunk *chunk; > > diff --git a/drivers/infiniband/sw/siw/siw_mem.c b/drivers/infiniband/sw/siw/siw_mem.c > > index f51ab2ccf151..bc3e8c0898e5 100644 > > --- a/drivers/infiniband/sw/siw/siw_mem.c > > +++ b/drivers/infiniband/sw/siw/siw_mem.c > > @@ -368,7 +368,8 @@ struct siw_umem *siw_umem_get(u64 start, u64 len, bool writable) > > struct mm_struct *mm_s; > > u64 first_page_va; > > unsigned long mlock_limit; > > - unsigned int foll_flags = FOLL_LONGTERM; > > + unsigned int foll_flags = > > + FOLL_LONGTERM | FOLL_ALLOW_BROKEN_FILE_MAPPING; > > int num_pages, num_chunks, i, rv = 0; > > if (!can_do_mlock()) > > diff --git a/fs/proc/base.c b/fs/proc/base.c > > index 96a6a08c8235..3e3f5ea9849f 100644 > > --- a/fs/proc/base.c > > +++ b/fs/proc/base.c > > @@ -855,7 +855,8 @@ static ssize_t mem_rw(struct file *file, char __user *buf, > > if (!mmget_not_zero(mm)) > > goto free; > > - flags = FOLL_FORCE | (write ? FOLL_WRITE : 0); > > + flags = FOLL_FORCE | FOLL_ALLOW_BROKEN_FILE_MAPPING | > > + (write ? FOLL_WRITE : 0); > > while (count > 0) { > > size_t this_len = min_t(size_t, count, PAGE_SIZE); > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > > index 3fc9e680f174..e76637b4c78f 100644 > > --- a/include/linux/mm_types.h > > +++ b/include/linux/mm_types.h > > @@ -1185,6 +1185,14 @@ enum { > > FOLL_PCI_P2PDMA = 1 << 10, > > /* allow interrupts from generic signals */ > > FOLL_INTERRUPTIBLE = 1 << 11, > > + /* > > + * By default we disallow write access to known broken file-backed > > + * memory mappings (i.e. anything other than hugetlb/shmem > > + * mappings). Some code may rely upon being able to access this > > + * regardless for legacy reasons, thus we provide a flag to indicate > > + * this. > > + */ > > + FOLL_ALLOW_BROKEN_FILE_MAPPING = 1 << 12, > > /* See also internal only FOLL flags in mm/internal.h */ > > }; > > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c > > index 59887c69d54c..ec330d3b0218 100644 > > --- a/kernel/events/uprobes.c > > +++ b/kernel/events/uprobes.c > > @@ -373,7 +373,8 @@ __update_ref_ctr(struct mm_struct *mm, unsigned long vaddr, short d) > > return -EINVAL; > > ret = get_user_pages_remote(mm, vaddr, 1, > > - FOLL_WRITE, &page, &vma, NULL); > > + FOLL_WRITE | FOLL_ALLOW_BROKEN_FILE_MAPPING, > > + &page, &vma, NULL); > > if (unlikely(ret <= 0)) { > > /* > > * We are asking for 1 page. If get_user_pages_remote() fails, > > diff --git a/mm/gup.c b/mm/gup.c > > index 1f72a717232b..68d5570c0bae 100644 > > --- a/mm/gup.c > > +++ b/mm/gup.c > > @@ -959,16 +959,46 @@ static int faultin_page(struct vm_area_struct *vma, > > return 0; > > } > > +/* > > + * Writing to file-backed mappings using GUP is a fundamentally broken operation > > + * as kernel write access to GUP mappings may not adhere to the semantics > > + * expected by a file system. > > + * > > + * In most instances we disallow this broken behaviour, however there are some > > + * exceptions to this enforced here. > > + */ > > +static inline bool can_write_file_mapping(struct vm_area_struct *vma, > > + unsigned long gup_flags) > > +{ > > + struct file *file = vma->vm_file; > > + > > + /* If we aren't pinning then no problematic write can occur. */ > > + if (!(gup_flags & (FOLL_GET | FOLL_PIN))) > > + return true; > > + > > + /* Special mappings should pose no problem. */ > > + if (!file) > > + return true; > > + > > + /* Has the caller explicitly indicated this case is acceptable? */ > > + if (gup_flags & FOLL_ALLOW_BROKEN_FILE_MAPPING) > > + return true; > > + > > + /* shmem and hugetlb mappings do not have problematic semantics. */ > > + return vma_is_shmem(vma) || is_file_hugepages(file); > > +} > > + > > static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) > > { > > vm_flags_t vm_flags = vma->vm_flags; > > int write = (gup_flags & FOLL_WRITE); > > int foreign = (gup_flags & FOLL_REMOTE); > > + bool vma_anon = vma_is_anonymous(vma); > > if (vm_flags & (VM_IO | VM_PFNMAP)) > > return -EFAULT; > > - if (gup_flags & FOLL_ANON && !vma_is_anonymous(vma)) > > + if ((gup_flags & FOLL_ANON) && !vma_anon) > > return -EFAULT; > > if ((gup_flags & FOLL_LONGTERM) && vma_is_fsdax(vma)) > > @@ -978,6 +1008,10 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) > > return -EFAULT; > > if (write) { > > + if (!vma_anon && > > + WARN_ON_ONCE(!can_write_file_mapping(vma, gup_flags))) > > + return -EFAULT; > > + > > if (!(vm_flags & VM_WRITE)) { > > if (!(gup_flags & FOLL_FORCE)) > > return -EFAULT; > > diff --git a/mm/memory.c b/mm/memory.c > > index 146bb94764f8..e3d535991548 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -5683,7 +5683,8 @@ int access_process_vm(struct task_struct *tsk, unsigned long addr, > > if (!mm) > > return 0; > > - ret = __access_remote_vm(mm, addr, buf, len, gup_flags); > > + ret = __access_remote_vm(mm, addr, buf, len, > > + gup_flags | FOLL_ALLOW_BROKEN_FILE_MAPPING); > > mmput(mm); > > diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c > > index 78dfaf9e8990..ef126c08e89c 100644 > > --- a/mm/process_vm_access.c > > +++ b/mm/process_vm_access.c > > @@ -81,7 +81,7 @@ static int process_vm_rw_single_vec(unsigned long addr, > > ssize_t rc = 0; > > unsigned long max_pages_per_loop = PVM_MAX_KMALLOC_PAGES > > / sizeof(struct pages *); > > - unsigned int flags = 0; > > + unsigned int flags = FOLL_ALLOW_BROKEN_FILE_MAPPING; > > /* Work out address and page range required */ > > if (len == 0) > > diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c > > index 02207e852d79..b93cfcaccb0d 100644 > > --- a/net/xdp/xdp_umem.c > > +++ b/net/xdp/xdp_umem.c > > @@ -93,7 +93,7 @@ void xdp_put_umem(struct xdp_umem *umem, bool defer_cleanup) > > static int xdp_umem_pin_pages(struct xdp_umem *umem, unsigned long address) > > { > > - unsigned int gup_flags = FOLL_WRITE; > > + unsigned int gup_flags = FOLL_WRITE | FOLL_ALLOW_BROKEN_FILE_MAPPING; > > long npgs; > > int err; > > Not sure about this in general, but seemss at least ptrace > (ptrace_access_vm()) seems to be broken here.. Ah thanks, that was an oversight as it uses __access_remote_vm() rather than access_process_vm(). I had carefully examined both (and all other GUP callers) but in supplying the flag for the latter I in typically squeezy brained human fashion forgot to also do so for the former, will respin accordingly. > > > --Mika > >