Received: by 2002:a05:7412:37c9:b0:e2:908c:2ebd with SMTP id jz9csp1990284rdb; Thu, 21 Sep 2023 05:50:00 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEyW8fm+kZJc+xij3K7xuL8ftNYr2cdrtft5+YkFmdasoCgHQysDcC0aoS2dXAzMcyL7OVu X-Received: by 2002:aca:2b0d:0:b0:3a7:1e98:80ad with SMTP id i13-20020aca2b0d000000b003a71e9880admr4571438oik.9.1695300600292; Thu, 21 Sep 2023 05:50:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695300600; cv=none; d=google.com; s=arc-20160816; b=EWuNbEHbY47SUQqkGkyq+vYQHbmIpiVaX67NvvGvGiJxZI1WyyN7svfsO9LgoDaj/Q XzlsK2BPYJJ/hC6ADSP6KByj9g0eM2p0V2iFFvhsAl/0cP5yGN7TUENsQCI6begZOMtE MEKVYglUXRk85nWWuZ1Z5baZN6emqal/m44KWUXps3m32WCIYO7WNm2qdx3o2gHrSY8V 2e+EVbgOJbe7X/+vw4cmokUuADwb4scIICGeLOiiVGIA3PQ2O0n17yApIolqUoLUqaal YvNqkdNbxllc4wOhLL6sSmTMjxU+Mgej0Ci4XPzkPye1aSXZN6jQRHTmpM3+zRl+u0Uw 6Crg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=zAzaV/SRyN9p4hGa3qLm9e9pe4b6sS/pBR+spEyhDRk=; fh=Mn5HfQMW1eim3i3AcQEmY+4uDleqNksYHlEpogfGoIA=; b=P5SEv5aF9D+dBxRLtTU5CHSdwr4H/2BxnRlmwePe7V54uvwF6/sRh8mRaKl7ffZyl8 XyvKR2QXjBwS49561kVOfBT3UQrwa3pzLMBRg36wtkavjnAP+3oMjv1QLLWa3QoOETuE +kcS4xPUQwXhcnQnMHo4ZAAOmqxHNrLubKW55lp+Q2GurNTDqJntv82X08nnQ9bRigVL +7m8bmGH6MFRuTS2/vuqy8j/Gkf5jXn8lW4i0U8eWwEFBcbPSV0IPW+kNPwPefrZXEI8 amtVaB99MwyYMNx8OsdpZFHUX2e/qFlMqtAS5ovJaKRlWRNaZI4Y8C9MekAYAExw0w5x 9MMQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b="ykRO6/aQ"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id ch3-20020a056a00288300b0068a2f6b841esi1374902pfb.293.2023.09.21.05.49.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Sep 2023 05:50:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b="ykRO6/aQ"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 91BD481DE5AD; Wed, 20 Sep 2023 09:12:25 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234100AbjITQMI (ORCPT + 99 others); Wed, 20 Sep 2023 12:12:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33032 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234119AbjITQMG (ORCPT ); Wed, 20 Sep 2023 12:12:06 -0400 Received: from mail-wm1-x332.google.com (mail-wm1-x332.google.com [IPv6:2a00:1450:4864:20::332]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8037AC6 for ; Wed, 20 Sep 2023 09:11:59 -0700 (PDT) Received: by mail-wm1-x332.google.com with SMTP id 5b1f17b1804b1-4051039701eso108195e9.1 for ; Wed, 20 Sep 2023 09:11:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1695226318; x=1695831118; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=zAzaV/SRyN9p4hGa3qLm9e9pe4b6sS/pBR+spEyhDRk=; b=ykRO6/aQLekA98nOaQvIq7FB1HNsM73wcdJHn/clyUtUifdQGDM3cSRQBOtgI5EkJ6 /geqgE6ZR/dpIdFX7TGZanL5UsqE0wSRgrdBkMYXVSvbNpDaVWMF7qzSB4OEorvrKNfi p42EJrrRawOyugxca/Qv+WytRtnImmV3Eoz+KQm9Sk70rd/aD8ZgBcsBc/rJ/mgVpVGe Ujh2TRQm8wzzekKlTcvF40pPt5DH+fCkDZEEuN0dPbg+p6IMjYVLdY9Xim7ovl7sgtgB CeaUyU2bEtODc3/tlamxnffS7S5N0YmvwZR/y/grAslsZieiAEv+fPy7fjVXkBXdWBx6 iRZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695226318; x=1695831118; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zAzaV/SRyN9p4hGa3qLm9e9pe4b6sS/pBR+spEyhDRk=; b=lYeEt3tgPFpY844rbYhQWpBwC1plW+XjF7gFHHZHmbDkYAl3NVToqxPdWXqAEtep5o sqiubBp5SXYbDFyXJi8t2BE9G5HZi3bKQAGOiBvNb104gYC4imvDAw0rAArmTuQtwrau U0ree4Cb3x7ATzhSMOFrkYfsiu4AuPnjrQ2IWo2zGHOZcMrC28iA6U34Pz+o240eA4zg RwfJFH2KKupbAmvOCAJ+3EdVmp4Fa1d8opCiMPWUTcaiVhyuXEGlpCq6L5hNrOnT4mtR 3gLY19a11psTZsriOq9H+G08WYeKjSiIlrRPPA4DmEhTgs9zVWNheCGxr5Sqzti73eS7 HN8Q== X-Gm-Message-State: AOJu0Yx0SPOt89MFY+61Nm4TExFxjPIBLPxedfP0pTPMvgQYddSqSYcI R5v3NhbUSi3EMBjSVfXWIJdO6zl3L6sLTfVHYeQjRA== X-Received: by 2002:a05:600c:3217:b0:3f7:3e85:36a with SMTP id r23-20020a05600c321700b003f73e85036amr88893wmp.7.1695226317753; Wed, 20 Sep 2023 09:11:57 -0700 (PDT) MIME-Version: 1.0 References: <20230914152620.2743033-1-surenb@google.com> <20230914152620.2743033-3-surenb@google.com> In-Reply-To: From: Jann Horn Date: Wed, 20 Sep 2023 18:11:19 +0200 Message-ID: Subject: Re: [PATCH 2/3] userfaultfd: UFFDIO_REMAP uABI To: Suren Baghdasaryan Cc: akpm@linux-foundation.org, viro@zeniv.linux.org.uk, brauner@kernel.org, shuah@kernel.org, aarcange@redhat.com, lokeshgidra@google.com, peterx@redhat.com, david@redhat.com, hughd@google.com, mhocko@suse.com, axelrasmussen@google.com, rppt@kernel.org, willy@infradead.org, Liam.Howlett@oracle.com, zhangpeng362@huawei.com, bgeffon@google.com, kaleshsingh@google.com, ngeoffray@google.com, jdduke@google.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Wed, 20 Sep 2023 09:12:25 -0700 (PDT) On Wed, Sep 20, 2023 at 3:49=E2=80=AFAM Suren Baghdasaryan wrote: > On Tue, Sep 19, 2023 at 4:51=E2=80=AFPM Jann Horn wrot= e: > > On Wed, Sep 20, 2023 at 1:08=E2=80=AFAM Suren Baghdasaryan wrote: > > > On Thu, Sep 14, 2023 at 7:28=E2=80=AFPM Jann Horn = wrote: > > > > On Thu, Sep 14, 2023 at 5:26=E2=80=AFPM Suren Baghdasaryan wrote: > > > > > From: Andrea Arcangeli > > > > > > > > > > This implements the uABI of UFFDIO_REMAP. > > > > > > > > > > Notably one mode bitflag is also forwarded (and in turn known) by= the > > > > > lowlevel remap_pages method. > > [...] > > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > [...] > > > > > +int remap_pages_huge_pmd(struct mm_struct *dst_mm, > > > > > + struct mm_struct *src_mm, > > > > > + pmd_t *dst_pmd, pmd_t *src_pmd, > > > > > + pmd_t dst_pmdval, > > > > > + struct vm_area_struct *dst_vma, > > > > > + struct vm_area_struct *src_vma, > > > > > + unsigned long dst_addr, > > > > > + unsigned long src_addr) > > > > > +{ > > > > > + pmd_t _dst_pmd, src_pmdval; > > > > > + struct page *src_page; > > > > > + struct anon_vma *src_anon_vma, *dst_anon_vma; > > > > > + spinlock_t *src_ptl, *dst_ptl; > > > > > + pgtable_t pgtable; > > > > > + struct mmu_notifier_range range; > > > > > + > > > > > + src_pmdval =3D *src_pmd; > > > > > + src_ptl =3D pmd_lockptr(src_mm, src_pmd); > > > > > + > > > > > + BUG_ON(!pmd_trans_huge(src_pmdval)); > > > > > + BUG_ON(!pmd_none(dst_pmdval)); > > > > > > > > Why can we assert that pmd_none(dst_pmdval) is true here? Can we no= t > > > > have concurrent faults (or userfaultfd operations) populating that > > > > PMD? > > > > > > IIUC dst_pmdval is a copy of the value from dst_pmd, so that local > > > copy should not change even if some concurrent operation changes > > > dst_pmd. We can assert that it's pmd_none because we checked for that > > > before calling remap_pages_huge_pmd. Later on we check if dst_pmd > > > changed from under us (see pmd_same(*dst_pmd, dst_pmdval) check) and > > > retry if that happened. > > > > Oh, right, I don't know what I was thinking when I typed that. > > > > But now I wonder about the check directly above that: What does this > > code do for swap PMDs? It looks like that might splat on the > > BUG_ON(!pmd_trans_huge(src_pmdval)). All we've checked on the path to > > here is that the virtual memory area is aligned, that the destination > > PMD is empty, and that pmd_trans_huge_lock() succeeded; but > > pmd_trans_huge_lock() explicitly permits swap PMDs (which is the > > swapped-out version of transhuge PMDs): > > > > static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd, > > struct vm_area_struct *vma) > > { > > if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pm= d)) > > return __pmd_trans_huge_lock(pmd, vma); > > else > > return NULL; > > } > > Yeah... Ok, I think I'm missing a check for pmd_trans_huge(*src_pmd) > after we lock it with pmd_trans_huge_lock(src_pmd, src_vma). And we > can remove the above BUG_ON(). Would that address your concern? Sounds good. It'll end up splitting huge swap entries but I guess the extra code for moving huge swap entries might not be worth it.