Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp2832259pxb; Tue, 9 Mar 2021 12:01:38 -0800 (PST) X-Google-Smtp-Source: ABdhPJzu2T1ET37PB88zGaf6GzoPUgWzJLAgSlXiBuByt7/WB2A44OhDmor2fh4nrR71eF9AfKPE X-Received: by 2002:a50:fe08:: with SMTP id f8mr6070949edt.217.1615320098283; Tue, 09 Mar 2021 12:01:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1615320098; cv=none; d=google.com; s=arc-20160816; b=P+7J0YPFXwhPeU/LsfVcdo0qRYpiENH9AzC2ndDZT8yOl1ZHU3IQc3g1OxoeiaCw4r Wq2xrmHYJWWnXFtYgiZcpim+0o1KIgM0IjmsrdMXNdAG47h+pdWGaUIBDBJ/9hKCoEfI hi5UjTg24B1SDnNq6EX2cs6KQDXMqm6DtVi/vKt6VuLUULrVQonLbwNbbm24UaZc1mv3 uXMpuzcdpIzmbAPGQqF6PQRUgHLPh/+MAZ9iaICSJ371S2MIVQbJ8y6tUgLqwI4Jw2Fi g17ra4sBoq4/CJHC5al7yAsPCARB/quiXss8G3emGr4s2nJgWKQlDXhsOrj83nsGR81k RxqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=E3EULr6DvczRPFIruCZafwpzbVJHUineMzXORqlcchQ=; b=OAqCUgTtZkUOgB0pVT/BKfWG9PO5EneaSMqLwZrU37l2WQ3LeMCZ7WLiIytVKRz7jc QaK7IdatsfRv+r+jsSuPgzLdtpUeLZn8loJHHxm4Zlhlby6oJKTXtVPocJ8mV6mCqSGU 22+VXd6y3CmuYlLqkBaJ4xS6yXVQA/cxeu7YXGN3z364sAoYoAUS33MFtoT4cI6u9Mx0 La0x61gd/FMCuNGfPlfqnOfmO095dJ02YAGRVXPphtIMiu+KuFJ2754xIl0++QsPyujV tJNRvc8zJ8G5xDQFCOyB1SPUT9f3Utvyl/Pt3EOHi98O3AI3Pe24+dYogP3N4n3rJ2jX dtjQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=AmitSmdh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i5si9928329ejh.313.2021.03.09.12.01.15; Tue, 09 Mar 2021 12:01:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=AmitSmdh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231326AbhCIT6W (ORCPT + 99 others); Tue, 9 Mar 2021 14:58:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45410 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231559AbhCIT6E (ORCPT ); Tue, 9 Mar 2021 14:58:04 -0500 Received: from mail-il1-x132.google.com (mail-il1-x132.google.com [IPv6:2607:f8b0:4864:20::132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CC6DBC061760 for ; Tue, 9 Mar 2021 11:58:03 -0800 (PST) Received: by mail-il1-x132.google.com with SMTP id p10so13304490ils.9 for ; Tue, 09 Mar 2021 11:58:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=E3EULr6DvczRPFIruCZafwpzbVJHUineMzXORqlcchQ=; b=AmitSmdh/V2mw5Yx89HuI1Zq+N3q8QBASWKZCt0sbt3F5/6nVjBACQwMNjgBQym3Hf 0B7dtbRdzAUR1hIDa7+DzHiCD3NV0/FPbPkLK4d1QLy8sBtfB9I88MZSKrVsDnU8fpoG 5/FTWeqEB6uQp8dQanKUK5V452UeMcZRUDRSEgSg9bkQSaXcbub8grk58rsQ94TKtv5M fU1m99DEzED3Cq7im8oRtui5aN4cxAE21qZc8T8+nfu++Xu7WZCQpsmoz0tI3fEPoZgQ ekjBxXOtnnZwKk1bXqkLfpyLADqoy1BLnZEOGTZkpV+eTwR9kDCaDwGNvraPZlClZbcu 7AUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=E3EULr6DvczRPFIruCZafwpzbVJHUineMzXORqlcchQ=; b=Qvc9qZg1X3kjiCjZETVoi50rQO324m3Xih16jtwMRR2fEr+pXcoD6aM0vX24ejAgrh O78uO6c+qA5l+DEZezVAIVWun+4K8lQcxqt6ZLcH4HujjdCILzqhiND10uedTQUe9CB2 M5InBWBRO78aIBJfirRRFTfic658ehJBwHh4J3wkZTp4JjGeLBWpIteuF/K+aiTRqL3M SIgFAEFUYtKJOJR7Tu87G69NzhcHrGAbc2Y70Cnf9cxHIhFCxo2jWdNU9gh2dpHMNhW/ s69oY8Wi8UaKaiJAY2HrecWXmvRXBJQ869s4oaNsAGRA0cRm7kW8zS5WCSz9ACfygGTp hQiw== X-Gm-Message-State: AOAM532rHJrpIYVsixz6GDO0Y6kI2qsvPFvLSZXZkLNrog5bByE+ji64 hTKfhvZ/gBD83kp62Imc45gOJaX+/KoHw0RzK/1BDw== X-Received: by 2002:a05:6e02:194a:: with SMTP id x10mr26158846ilu.165.1615319882972; Tue, 09 Mar 2021 11:58:02 -0800 (PST) MIME-Version: 1.0 References: <20210302000133.272579-1-axelrasmussen@google.com> <20210302000133.272579-2-axelrasmussen@google.com> <04697A35-AEC7-43F1-8462-1CD39648544A@nvidia.com> In-Reply-To: <04697A35-AEC7-43F1-8462-1CD39648544A@nvidia.com> From: Axel Rasmussen Date: Tue, 9 Mar 2021 11:57:26 -0800 Message-ID: Subject: Re: [PATCH v2 1/5] userfaultfd: support minor fault handling for shmem To: Zi Yan Cc: Alexander Viro , Andrea Arcangeli , Andrew Morton , Hugh Dickins , Jerome Glisse , Joe Perches , Lokesh Gidra , Mike Rapoport , Peter Xu , Shaohua Li , Shuah Khan , Wang Qing , LKML , linux-fsdevel@vger.kernel.org, Linux MM , linux-kselftest@vger.kernel.org, Brian Geffon , Cannon Matthews , "Dr . David Alan Gilbert" , David Rientjes , Michel Lespinasse , Mina Almasry , Oliver Upton Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 9, 2021 at 11:52 AM Zi Yan wrote: > > On 1 Mar 2021, at 19:01, Axel Rasmussen wrote: > > > Modify the userfaultfd register API to allow registering shmem VMAs in > > minor mode. Modify the shmem mcopy implementation to support > > UFFDIO_CONTINUE in order to resolve such faults. > > > > Combine the shmem mcopy handler functions into a single > > shmem_mcopy_atomic_pte, which takes a mode parameter. This matches how > > the hugetlbfs implementation is structured, and lets us remove a good > > chunk of boilerplate. > > > > Signed-off-by: Axel Rasmussen > > --- > > fs/userfaultfd.c | 6 +-- > > include/linux/shmem_fs.h | 26 ++++----- > > include/uapi/linux/userfaultfd.h | 4 +- > > mm/memory.c | 8 +-- > > mm/shmem.c | 92 +++++++++++++++----------------- > > mm/userfaultfd.c | 27 +++++----- > > 6 files changed, 79 insertions(+), 84 deletions(-) > > > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > > index 14f92285d04f..9f3b8684cf3c 100644 > > --- a/fs/userfaultfd.c > > +++ b/fs/userfaultfd.c > > @@ -1267,8 +1267,7 @@ static inline bool vma_can_userfault(struct vm_ar= ea_struct *vma, > > } > > > > if (vm_flags & VM_UFFD_MINOR) { > > - /* FIXME: Add minor fault interception for shmem. */ > > - if (!is_vm_hugetlb_page(vma)) > > + if (!(is_vm_hugetlb_page(vma) || vma_is_shmem(vma))) > > return false; > > } > > > > @@ -1941,7 +1940,8 @@ static int userfaultfd_api(struct userfaultfd_ctx= *ctx, > > /* report all available features and ioctls to userland */ > > uffdio_api.features =3D UFFD_API_FEATURES; > > #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR > > - uffdio_api.features &=3D ~UFFD_FEATURE_MINOR_HUGETLBFS; > > + uffdio_api.features &=3D > > + ~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM= ); > > #endif > > uffdio_api.ioctls =3D UFFD_API_IOCTLS; > > ret =3D -EFAULT; > > diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h > > index d82b6f396588..f0919c3722e7 100644 > > --- a/include/linux/shmem_fs.h > > +++ b/include/linux/shmem_fs.h > > @@ -9,6 +9,7 @@ > > #include > > #include > > #include > > +#include > > > > /* inode in-kernel data */ > > > > @@ -122,21 +123,16 @@ static inline bool shmem_file(struct file *file) > > extern bool shmem_charge(struct inode *inode, long pages); > > extern void shmem_uncharge(struct inode *inode, long pages); > > > > +#ifdef CONFIG_USERFAULTFD > > #ifdef CONFIG_SHMEM > > -extern int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst= _pmd, > > - struct vm_area_struct *dst_vma, > > - unsigned long dst_addr, > > - unsigned long src_addr, > > - struct page **pagep); > > -extern int shmem_mfill_zeropage_pte(struct mm_struct *dst_mm, > > - pmd_t *dst_pmd, > > - struct vm_area_struct *dst_vma, > > - unsigned long dst_addr); > > -#else > > -#define shmem_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, dst_addr, \ > > - src_addr, pagep) ({ BUG(); 0; }) > > -#define shmem_mfill_zeropage_pte(dst_mm, dst_pmd, dst_vma, \ > > - dst_addr) ({ BUG(); 0; }) > > -#endif > > +int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, > > + struct vm_area_struct *dst_vma, > > + unsigned long dst_addr, unsigned long src_addr= , > > + enum mcopy_atomic_mode mode, struct page **pag= ep); > > +#else /* !CONFIG_SHMEM */ > > +#define shmem_mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, \ > > + src_addr, mode, pagep) ({ BUG(); 0;= }) > > +#endif /* CONFIG_SHMEM */ > > +#endif /* CONFIG_USERFAULTFD */ > > > > #endif > > diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/user= faultfd.h > > index bafbeb1a2624..47d9790d863d 100644 > > --- a/include/uapi/linux/userfaultfd.h > > +++ b/include/uapi/linux/userfaultfd.h > > @@ -31,7 +31,8 @@ > > UFFD_FEATURE_MISSING_SHMEM | \ > > UFFD_FEATURE_SIGBUS | \ > > UFFD_FEATURE_THREAD_ID | \ > > - UFFD_FEATURE_MINOR_HUGETLBFS) > > + UFFD_FEATURE_MINOR_HUGETLBFS | \ > > + UFFD_FEATURE_MINOR_SHMEM) > > #define UFFD_API_IOCTLS \ > > ((__u64)1 << _UFFDIO_REGISTER | \ > > (__u64)1 << _UFFDIO_UNREGISTER | \ > > @@ -196,6 +197,7 @@ struct uffdio_api { > > #define UFFD_FEATURE_SIGBUS (1<<7) > > #define UFFD_FEATURE_THREAD_ID (1<<8) > > #define UFFD_FEATURE_MINOR_HUGETLBFS (1<<9) > > +#define UFFD_FEATURE_MINOR_SHMEM (1<<10) > > __u64 features; > > > > __u64 ioctls; > > diff --git a/mm/memory.c b/mm/memory.c > > index c8e357627318..a1e5ff55027e 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -3929,9 +3929,11 @@ static vm_fault_t do_read_fault(struct vm_fault = *vmf) > > * something). > > */ > > if (vma->vm_ops->map_pages && fault_around_bytes >> PAGE_SHIFT > = 1) { > > - ret =3D do_fault_around(vmf); > > - if (ret) > > - return ret; > > + if (likely(!userfaultfd_minor(vmf->vma))) { > > + ret =3D do_fault_around(vmf); > > + if (ret) > > + return ret; > > + } > > } > > > > ret =3D __do_fault(vmf); > > diff --git a/mm/shmem.c b/mm/shmem.c > > index b2db4ed0fbc7..6f81259fabb3 100644 > > --- a/mm/shmem.c > > +++ b/mm/shmem.c > > @@ -77,7 +77,6 @@ static struct vfsmount *shm_mnt; > > #include > > #include > > #include > > -#include > > #include > > #include > > > > @@ -1785,8 +1784,8 @@ static int shmem_swapin_page(struct inode *inode,= pgoff_t index, > > * vm. If we swap it in we mark it dirty since we also free the swap > > * entry since a page cannot live in both the swap and page cache. > > * > > - * vmf and fault_type are only supplied by shmem_fault: > > - * otherwise they are NULL. > > + * vma, vmf, and fault_type are only supplied by shmem_fault: otherwis= e they > > + * are NULL. > > */ > > static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, > > struct page **pagep, enum sgp_type sgp, gfp_t gfp, > > @@ -1830,6 +1829,12 @@ static int shmem_getpage_gfp(struct inode *inode= , pgoff_t index, > > return error; > > } > > > > + if (page && vma && userfaultfd_minor(vma)) { > > + unlock_page(page); > > + *fault_type =3D handle_userfault(vmf, VM_UFFD_MINOR); > > + return 0; > > + } > > + > > if (page) > > hindex =3D page->index; > > if (page && sgp =3D=3D SGP_WRITE) > > @@ -2354,14 +2359,12 @@ static struct inode *shmem_get_inode(struct sup= er_block *sb, const struct inode > > return inode; > > } > > > > -static int shmem_mfill_atomic_pte(struct mm_struct *dst_mm, > > - pmd_t *dst_pmd, > > - struct vm_area_struct *dst_vma, > > - unsigned long dst_addr, > > - unsigned long src_addr, > > - bool zeropage, > > - struct page **pagep) > > +int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, > > + struct vm_area_struct *dst_vma, > > + unsigned long dst_addr, unsigned long src_addr= , > > + enum mcopy_atomic_mode mode, struct page **pag= ep) > > { > > + bool is_continue =3D (mode =3D=3D MCOPY_ATOMIC_CONTINUE); > > struct inode *inode =3D file_inode(dst_vma->vm_file); > > struct shmem_inode_info *info =3D SHMEM_I(inode); > > struct address_space *mapping =3D inode->i_mapping; > > @@ -2378,12 +2381,17 @@ static int shmem_mfill_atomic_pte(struct mm_str= uct *dst_mm, > > if (!shmem_inode_acct_block(inode, 1)) > > goto out; > > > > - if (!*pagep) { > > + if (is_continue) { > > + ret =3D -EFAULT; > > + page =3D find_lock_page(mapping, pgoff); > > + if (!page) > > + goto out_unacct_blocks; > > + } else if (!*pagep) { > > page =3D shmem_alloc_page(gfp, info, pgoff); > > if (!page) > > goto out_unacct_blocks; > > > > - if (!zeropage) { /* mcopy_atomic */ > > + if (mode =3D=3D MCOPY_ATOMIC_NORMAL) { /* mcopy_atom= ic */ > > page_kaddr =3D kmap_atomic(page); > > ret =3D copy_from_user(page_kaddr, > > (const void __user *)src_add= r, > > Hi Axel, > > shmem_mcopy_atomic_pte is not guarded by CONFIG_USERFAULTFD, thus it is > causing compilation errors due to the use of enum mcopy_atomic_mode mode, > when CONFIG_USERFAULTFD is not set. Ah, my apologies, I guarded it in the header but forgot to do so in shmem.c. I'll send an updated patch today. > > > =E2=80=94 > Best Regards, > Yan Zi