Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp47025pxb; Wed, 22 Sep 2021 15:33:43 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz4LZeru9PgjHr0fK3Pl4z4kxrtS9YP/h3T2QtqYLItbKo91YjC85Z1tY1+sUdWBib2yzW5 X-Received: by 2002:a17:906:e105:: with SMTP id gj5mr1677357ejb.408.1632350022799; Wed, 22 Sep 2021 15:33:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1632350022; cv=none; d=google.com; s=arc-20160816; b=nJhLQdzXJ47LEBv2kpIq4IEgWLuxZJ0BOtGNpXJSWhUov191srIqRPNAEyvA6/FbZV Z8+39ijuUKCniG3OgKtatIAnItPXPHFe2UIn0Zi83yBKPsug9vTwattxQCPYSfsd/0zp IENmvnzB1+QUe+lPjvSeG6fTZjncBHvh1xAK4mHiI3JThx+ke9DKMWoEBEi5un5bMNOL uS4RJfTtHGI9ePRnPRms5Ky6GeYgmMucBSwmU5uud83hCv8Pi5tA92zTUfXSqwBEjM/Q nJ0sbctKysH+7SMUz+RlzY6YiZOSfLvo6/K5HXpROsVYqd/EwLpA18rVGvMsBfQgIROy ypTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=ZVmQQIFn/AGiG5dhWv6Igf7NGcQ53Z5OL98C9qRWHKg=; b=GWP+sUtD2T39JinjqymuNepTk4N0FT5UCZXHtCgDgMvrFQ2xbfobRV4cwH9EKiZcC4 ETE1miit5HNKnT2nAM7XA/lUcQEBz6NNyFaDPk3+GNTvfmtKBOIY01lQdFlFnSbrohU3 IeGcDJwaiSLZWye0Vq3cqLBR1eJVFcze7vTHD5zDvT92Hfs14OZkzar3gMSqcQW6mkav bHZdm2pilJfJK6HSiEjATu/58wa5Ik9YKix/Z9b42Xyob9Xwh5HNZiV+KO6oHEZ+eoVo phchq7tXkhLJKLgIGObkt+gvAyBurycdyLGwApne+1lubVyWdTsvb137rizuiL9Cxu5Y 7C3w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=JP1y2Rel; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id c4si2830571ejc.492.2021.09.22.15.33.16; Wed, 22 Sep 2021 15:33:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=JP1y2Rel; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238337AbhIVWbv (ORCPT + 99 others); Wed, 22 Sep 2021 18:31:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56108 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238236AbhIVWbv (ORCPT ); Wed, 22 Sep 2021 18:31:51 -0400 Received: from mail-io1-xd36.google.com (mail-io1-xd36.google.com [IPv6:2607:f8b0:4864:20::d36]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 97AE6C061574 for ; Wed, 22 Sep 2021 15:30:20 -0700 (PDT) Received: by mail-io1-xd36.google.com with SMTP id m11so5555490ioo.6 for ; Wed, 22 Sep 2021 15:30:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ZVmQQIFn/AGiG5dhWv6Igf7NGcQ53Z5OL98C9qRWHKg=; b=JP1y2Rel5ayOpQ1MHdxi2P0JuzTzGjrDWiOHJw6RnGpgetZIjkudo5xs+TCZAyg8Q0 8jsq6SgQn62e28sJRzegap15Pnl3Db1UV7hYg7klWkFxjtlx4yoKU/qihIGuRE0yI9bG PZ0ZE+rOjgz+qC1mbLna3jW5oNOcvcfTRUPVJUQ+7ZL0KOsE5QLdwShlPYXbedcJHJ8/ pG87g7s04hpigyN1r/Y6BvO9HGaKBcJWy0Tqm65d4GRcXPOlT2IBey9pgQfckb9B2TGG e4pnvpNAkPLg/TYHAorbMUd477HGoctYswpc0YsJlT8Rv3X9nhKdJ8X8S03mKupCV61e RuSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ZVmQQIFn/AGiG5dhWv6Igf7NGcQ53Z5OL98C9qRWHKg=; b=Cz31eAzuUMIyGd3rW4g7dziaiA9I30bIKlVEOcjHxNd7oEFVSu8q2BmbGPrCdwGhBg ylMtIqtnp2SjCddeZWhXAUnn2kfgZFlHVisWSUOPEsMVSehVZbSuTChOBoVqPlTCPZwM x+ZBWqntMi502eqhmYv4PcKqBK88XWpLePm7+sukE2wtBDTlq/uOtE7twxgS/sWthe6+ /7y+oFZtnuQL20BWFa+Ra6P6Ge0SiJa6+pHUG9BRq8Wiq2Dmj3gvMMdtJqdAMjmoQTWY BjdGYpjhanh7ic7t5iVeyYf08fQZavA91HnaJC62YT1E8KN8V1Y+EC+zqiYI6dc5IyAk HdqQ== X-Gm-Message-State: AOAM531I3iT5kQbRWLS1IoMQyzthV7AxK2gAz4/7unEIPV2tSAm2ORX2 kqQbxf8va3IqcWwgSCpVnBozE8ihTzP0uA6saZMSiw== X-Received: by 2002:a5d:950d:: with SMTP id d13mr1142651iom.138.1632349819768; Wed, 22 Sep 2021 15:30:19 -0700 (PDT) MIME-Version: 1.0 References: <20210921163323.944352-1-axelrasmussen@google.com> In-Reply-To: From: Axel Rasmussen Date: Wed, 22 Sep 2021 15:29:42 -0700 Message-ID: Subject: Re: [PATCH 1/3] userfaultfd/selftests: fix feature support detection To: Peter Xu Cc: Andrew Morton , Shuah Khan , Linux MM , Linuxkselftest , LKML , James Houghton Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 22, 2021 at 2:52 PM Peter Xu wrote: > > On Wed, Sep 22, 2021 at 01:54:53PM -0700, Axel Rasmussen wrote: > > On Wed, Sep 22, 2021 at 10:33 AM Peter Xu wrote: > > > > > > Hello, Axel, > > > > > > On Wed, Sep 22, 2021 at 10:04:03AM -0700, Axel Rasmussen wrote: > > > > Thanks for discussing the design Peter. I have some ideas which might > > > > make for a nicer v2; I'll massage the code a bit and see what I can > > > > come up with. > > > > > > Sure thing. Note again that as I don't have a strong opinion on that, feel > > > free to keep it. However if you provide v2, I'll read. > > > > > > [off-topic below] > > > > > > Another thing I probably have forgot but need your confirmation is, when you > > > worked on uffd minor mode, did you explicitly disable thp, or is it allowed? > > > > I gave a more detailed answer in the other thread, but: currently it > > is allowed, but this was a bug / oversight on my part. :) THP collapse > > can break the guarantees minor fault registration is trying to > > provide. > > I've replied there: > > https://lore.kernel.org/linux-mm/YUueOUfoamxOvEyO@t490s/ > > We can try to keep the discussion unified there regarding this. > > > But there's another scenario: what if the collapse happened well > > before registration happened? > > Maybe yes, but my understanding of the current uffd-minor scenario tells me > that this is fine too. Meanwhile I actually have another idea regarding minor > mode, please continue reading. > > Firstly, let me try to re-cap on how minor mode is used in your production > systems: I believe there should have two processes A and B, if A is the main > process, B could be the migration process. B migrates pages in the background, > while A so far should have been stopped and never ran. When we want to start > A, we should register A with uffd-minor upon the whole range (note: I think so > far A does not have any pgtable mapped within uffd-minor range). Then any page > access of A should kick B and asking "whether it is the latest page", if yes > then UFFDIO_CONTINUE, if no then B modifies the page, plus UFFDIO_CONTINUE > afterwards. Am I right above? > > So if that's the case, then A should have no page table at all. > > Then, is that a problem if the shmem file that A maps contains huge thps? I > think no - because UFFDIO_CONTINUE will only install small pages. > > Let me know if I'm understanding it right above; I'll be happy to be corrected. Right, except that our use case is even more similar to QEMU: the code doing UFFDIO_CONTINUE / demand paging, and the code running the vCPUs, are in the same process (same mm) - just different threads. > > Actually besides this scenario, I'm also thinking of another scenario of using > minor fault in a single process - that's mostly what QEMU is doing right now, > as QEMU has the vcpu threads and migration thread sharing a single mm/pgtable. > So I think it'll be great to have a new madvise(MADV_ZAP) which will tear down > all the file-backed memory pgtables of a specific range. I think it'll suite > perfectly for the minor fault use case, and it can be used for other things > too. Let me know what you think about this idea, and whether that'll help in > your case too (e.g., if you worry a current process A mapped huge shmem thp > somewhere, we can use madvise(MADV_ZAP) to drop it). Yes, this would be convenient for our implementation too. :) There are workarounds if the feature doesn't exist, but it would be nice to have. It's also useful for memory poisoning, I think, if the host decides some page(s) are "bad" and wants to intercept any future guest accesses to those page(s). > > > I *think* the existing code deals with THPs correctly in that case, but then > > again I don't think our selftest really covers this case, and it's not > > something I've tested in production either (to work around the other bug, we > > currently MADV_NOHUGEPAGE the area until after VM demand paging completes, > > and the UFFD registration is removed), so I am not super confident this is > > the case. > > In all cases, enhancing the test program will always be welcomed. > > Thanks, > > -- > Peter Xu >