Received: by 2002:a05:6a10:17d3:0:0:0:0 with SMTP id hz19csp2622519pxb; Tue, 13 Apr 2021 06:27:43 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzyyW5bKa3hdSomBUUcQ8Bqo0R+q8+YRpmSMjws5q4uC1zjk0/XDeXO1+pTI9k1VzRAgdeH X-Received: by 2002:a50:ed17:: with SMTP id j23mr33460868eds.260.1618320463181; Tue, 13 Apr 2021 06:27:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1618320463; cv=none; d=google.com; s=arc-20160816; b=JtN3YB2ReKXoQvnARuOuzq9mk+05kru0tX0njuVm8YVy0tIjjP2gcoe0rlfHc0VvPX 7ib5RXwqNt9QRzvRMZ80QyI9ou+NN//b/UCy7mBRdGsp7yZgQLWRz01FU1AaPZ+8uSYk upO4H/gqzmURgtyAaosXhVbIVBJHkF1iq5Xn6iWAIVkCil/FvjAE/jxJ+0ygvV0HeG7h BBxtF/lBGC6HSJy+F3QqfpYc1kExbLa2iEbIPUev5PswEqWcUAAD3LY3qAgSQsWR1fsX fKKjtQtPBNO2d7kZ4Nb5iBweaQVnU09K2Av+jLluzWjTfkP0qeC8d/Z4G37DM6yaEIcP 4yeA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=Bx8fq9RMVJeYgjbVUnNNsjq4NPDKlddkrDot/R9W3U0=; b=STjI6hKThay6SXiF9AbSW9O8Q+i4Fd0UQn6i4YbIZcV+arfmPwFCPZuZfnKOeTThrf j7+T9jNIAakId1NekVxWxOhlGfWSv2TllSgpHEvsvfCFCnhnCxa1G1tfT2SFHrJfXrRK 10hKls6cXfZOuKgLjCwpWzWTTzEt5B8/012MhlWJFCQaGXGh2Md7BVOu+2ARziuCRqA+ lWNknMF64JaFu3dZwZP6IlqHtVTt+GzbtgBF/PMYkOSVQkcllDl34LU78jA3Kln7vMox y4EasnKeADxxbnpymPIQiU8ofNFMcHKj4KGnnkE5A+F/aAY6XAipw/XyWcbqmdjXI8EX mPGg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=qKzEmUjC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ka24si9598524ejc.64.2021.04.13.06.27.19; Tue, 13 Apr 2021 06:27:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=qKzEmUjC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229946AbhDMElU (ORCPT + 99 others); Tue, 13 Apr 2021 00:41:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42974 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229695AbhDMElT (ORCPT ); Tue, 13 Apr 2021 00:41:19 -0400 Received: from mail-io1-xd2e.google.com (mail-io1-xd2e.google.com [IPv6:2607:f8b0:4864:20::d2e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 40C3AC061574 for ; Mon, 12 Apr 2021 21:41:00 -0700 (PDT) Received: by mail-io1-xd2e.google.com with SMTP id b10so15782876iot.4 for ; Mon, 12 Apr 2021 21:41:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Bx8fq9RMVJeYgjbVUnNNsjq4NPDKlddkrDot/R9W3U0=; b=qKzEmUjCHxF9v7Gk82y4AynFWSLYAJC6SJRUorlaJEsI9Xns8UJNnx8nRAVPggQx0s LWJzTdXinMJNElG07Ww6Bnew9ELqWfMsI6yYw3jmvxBi5cXn2u2EMnnr9aZsXkwP6mTZ gaRf9nEztlW+Cn3CNcpGrJ9+gDcMngs9Nng8vE4gCYcpyqqhqI3V7U28TkEo9FmwJFwc jktzgI39xnAJx6LnHhRef1VUg9LKH28vrRXbzDYq1lnlVfxFGHaTaKlFx2Ah0D1uih1n c7zD8YQ/ssMJ61nT9aM1ExgPBPmLRWtwpTGBl1pxSvDXa96TffWdKP483bmZZ7+Bc1sT vXMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Bx8fq9RMVJeYgjbVUnNNsjq4NPDKlddkrDot/R9W3U0=; b=Eerh6OzzWd/yBC5TgqKoYn93GG2nzTJ0x6X7KRDll4INLEG/m4xp9gNxVcrb/9QnEq 7zdjTc5I4JdWfXYr9i1yr+f38fE+1Ytmo0Y+k3OWtVIaek3lDldaJ6UTVrod7I0EsqDH JUiMkRN8DKJI2scbygpVYupSn+I+WbkKfgFcnai7gy28WNXbTIZm7B4IkuUh32yoLdHW aNNbL/rL0xXifYauUkbxEL29X91JkNzS1UI8fDmAGnSkF9LDgNozm3brJ16pXu1Lw75D mnsTwdnYXPI2BsIxAYHorg21I508v7vEgGzYrufKAnzOSCsLRIUwahxpXR/CSpiMJxdT Q2nA== X-Gm-Message-State: AOAM530a7dbMYoUltmweRzPp6UYWFJjEfhd2IA8jUC0uhFq175VFG+M8 1KD+garXrSOwG0qLz4DWMjiNnUFtL1GWA6dVacPGyA== X-Received: by 2002:a05:6602:2dce:: with SMTP id l14mr6765893iow.23.1618288859391; Mon, 12 Apr 2021 21:40:59 -0700 (PDT) MIME-Version: 1.0 References: <20210408234327.624367-1-axelrasmussen@google.com> <20210408234327.624367-5-axelrasmussen@google.com> <20210412231736.GA1002612@xz-x1> In-Reply-To: <20210412231736.GA1002612@xz-x1> From: Axel Rasmussen Date: Mon, 12 Apr 2021 21:40:22 -0700 Message-ID: Subject: Re: [PATCH 4/9] userfaultfd/shmem: support UFFDIO_CONTINUE for shmem To: Peter Xu Cc: Alexander Viro , Andrea Arcangeli , Andrew Morton , Daniel Colascione , Hugh Dickins , Jerome Glisse , Joe Perches , Lokesh Gidra , Mike Kravetz , Mike Rapoport , Shaohua Li , Shuah Khan , Stephen Rothwell , Wang Qing , linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, LKML , linux-kselftest@vger.kernel.org, Linux MM , Brian Geffon , "Dr . David Alan Gilbert" , Mina Almasry , Oliver Upton Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 12, 2021 at 4:17 PM Peter Xu wrote: > > On Thu, Apr 08, 2021 at 04:43:22PM -0700, Axel Rasmussen wrote: > > +/* > > + * Install PTEs, to map dst_addr (within dst_vma) to page. > > + * > > + * This function handles MCOPY_ATOMIC_CONTINUE (which is always file-backed), > > + * whether or not dst_vma is VM_SHARED. It also handles the more general > > + * MCOPY_ATOMIC_NORMAL case, when dst_vma is *not* VM_SHARED (it may be file > > + * backed, or not). > > + * > > + * Note that MCOPY_ATOMIC_NORMAL for a VM_SHARED dst_vma is handled by > > + * shmem_mcopy_atomic_pte instead. > > + */ > > +static int mcopy_atomic_install_ptes(struct mm_struct *dst_mm, pmd_t *dst_pmd, > > + struct vm_area_struct *dst_vma, > > + unsigned long dst_addr, struct page *page, > > + bool newly_allocated, bool wp_copy) > > +{ > > + int ret; > > + pte_t _dst_pte, *dst_pte; > > + int writable; > > + bool vm_shared = dst_vma->vm_flags & VM_SHARED; > > + spinlock_t *ptl; > > + struct inode *inode; > > + pgoff_t offset, max_off; > > + > > + _dst_pte = mk_pte(page, dst_vma->vm_page_prot); > > + writable = dst_vma->vm_flags & VM_WRITE; > > + /* For private, non-anon we need CoW (don't write to page cache!) */ > > + if (!vma_is_anonymous(dst_vma) && !vm_shared) > > + writable = 0; > > + > > + if (writable || vma_is_anonymous(dst_vma)) > > + _dst_pte = pte_mkdirty(_dst_pte); > > + if (writable) { > > + if (wp_copy) > > + _dst_pte = pte_mkuffd_wp(_dst_pte); > > + else > > + _dst_pte = pte_mkwrite(_dst_pte); > > + } else if (vm_shared) { > > + /* > > + * Since we didn't pte_mkdirty(), mark the page dirty or it > > + * could be freed from under us. We could do this > > + * unconditionally, but doing it only if !writable is faster. > > + */ > > + set_page_dirty(page); > > + } > > + > > + dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl); > > + > > + if (vma_is_shmem(dst_vma)) { > > + /* The shmem MAP_PRIVATE case requires checking the i_size */ > > When you start to use this function in the last patch it'll be needed too even > if MAP_SHARED? > > How about directly state the reason of doing this ("serialize against truncate > with the PT lock") instead of commenting about "who will need it"? > > > + inode = dst_vma->vm_file->f_inode; > > + offset = linear_page_index(dst_vma, dst_addr); > > + max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE); > > + ret = -EFAULT; > > + if (unlikely(offset >= max_off)) > > + goto out_unlock; > > + } > > [...] > > > +/* Handles UFFDIO_CONTINUE for all shmem VMAs (shared or private). */ > > +static int mcontinue_atomic_pte(struct mm_struct *dst_mm, > > + pmd_t *dst_pmd, > > + struct vm_area_struct *dst_vma, > > + unsigned long dst_addr, > > + bool wp_copy) > > +{ > > + struct inode *inode = file_inode(dst_vma->vm_file); > > + pgoff_t pgoff = linear_page_index(dst_vma, dst_addr); > > + struct page *page; > > + int ret; > > + > > + ret = shmem_getpage(inode, pgoff, &page, SGP_READ); > > SGP_READ looks right, as we don't want page allocation. However I noticed > there's very slight difference when the page was just fallocated: > > /* fallocated page? */ > if (page && !PageUptodate(page)) { > if (sgp != SGP_READ) > goto clear; > unlock_page(page); > put_page(page); > page = NULL; > hindex = index; > } > > I think it won't happen for your case since the page should be uptodate already > (the other thread should check and modify the page before CONTINUE), but still > raise this up, since if the page was allocated it smells better to still > install the fallocated page (do we need to clear the page and SetUptodate)? Sorry for the somewhat rambling thought process: My first thought is, I don't really know what PageUptodate means for shmem pages. If I understand correctly, normally we say PageUptodate() if the in memory data is more recent or equivalent to the on-disk data. But, shmem pages are entirely in memory - they are file backed in name only, in some sense. fallocate() does all sorts of things so the comment to me seems a bit ambiguous, but it seems the implication is that we're worried specifically about the case where the shmem page was recently allocated with fallocate(mode=0)? In that case, do we use !PageUptodate() to denote that the page has been allocated, but its contents are undefined? I suppose that would make sense, as the action "goto clear;" generally memset()-s the page to zero it, and then calls SetPageUptodate(). Okay so let's say the following sequence of events happens: 1. Userspace calls fallocate(mode=0) to allocate some shmem pages. 2. Another thread, via a UFFD-registered mapping, manages to trigger a minor fault on one such page, while we still have !PageUptodate(). (I'm not 100% sure this can happen, but let's say it can.) 3. UFFD handler thread gets the minor fault event, and for whatever (buggy?) reason does nothing - it doesn't modify the page, it just calls CONTINUE. I think if we get to this point, zeroing the page, returning it, and setting up the PTEs seems somewhat reasonable to me. I suppose alternatively we could notice that this happened and return an error to the caller? I'm hesitant to mess with the behavior of shmem_getpage_gfp() to make such a thing happen though. I do think if we're going to set up the PTEs instead of returning an error, we definitely do need to clear and SetPageUptodate() the page first. In conclusion, I think this behavior is correct. > > -- > Peter Xu >