MIME-Version: 1.0
References: <20230206112856.1802547-1-stevensd@google.com> <Y+FOk+ty7OKmkwLL@casper.infradead.org>
 <Y+Fog2dO1kpRBMvr@x1n> <Y+F2IdXhqc5187s+@casper.infradead.org>
 <CAD=HUj4yhMLnBNpumxC4urSY2Js5XuekzGP+UOXJmUV=k5nx=A@mail.gmail.com> <Y+G3lDu3UPFZcFE0@casper.infradead.org>
In-Reply-To: <Y+G3lDu3UPFZcFE0@casper.infradead.org>
From:   David Stevens <stevensd@chromium.org>
Date:   Tue, 7 Feb 2023 13:14:04 +0900
Message-ID: <CAD=HUj4vyfsEc=vdEDPKm_gaTO94odpVxeCT3mto5sP3RcgRzg@mail.gmail.com>
Subject: Re: [PATCH v2] mm/khugepaged: skip shmem with userfaultfd
To:     Matthew Wilcox <willy@infradead.org>
Cc:     Peter Xu <peterx@redhat.com>, linux-mm@kvack.org,
        Andrew Morton <akpm@linux-foundation.org>,
        "Kirill A . Shutemov" <kirill@shutemov.name>,
        Yang Shi <shy828301@gmail.com>,
        David Hildenbrand <david@redhat.com>,
        Hugh Dickins <hughd@google.com>, linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="UTF-8"
Precedence: bulk

On Tue, Feb 7, 2023 at 11:29 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Feb 07, 2023 at 10:37:06AM +0900, David Stevens wrote:
> > On Tue, Feb 7, 2023 at 6:50 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > On Mon, Feb 06, 2023 at 03:52:19PM -0500, Peter Xu wrote:
> > > > The problem is khugepaged will release pgtable lock during collapsing, so
> > > > AFAICT there can be a race where some other thread tries to insert pages
> > > > into page cache in parallel with khugepaged right after khugepaged released
> > > > the page cache lock.
> > > >
> > > > For example, it seems to me new page cache can be inserted when khugepaged
> > > > is copying small page content to the new hpage.
> >
> > This particular race can't happen with either patch, since the missing
> > page cache entries are filled when we create the multi-index entry for
> > hpage.
>
> Can too.
>
>         for (index = start; index < end; index++) {
> ...
>                         if (xa_is_value(page) || !PageUptodate(page)) {
>                                 xas_unlock_irq(&xas);
>                                 /* swap in or instantiate fallocated page */
>                                 if (shmem_get_folio(mapping->host, index,
>                                                 &folio, SGP_NOALLOC)) {
>                                         result = SCAN_FAIL;
>                                         goto xa_unlocked;
>                                 }
> ...
>
> So we start the iteration, and then a page fault happens in one of
> the indices we've already examined, but we don't have the page on
> the list.  It's a nice wide race too because we're bringing the
> page in from swap.
>

Ah, I misunderstood your objection to refer specifically to the
copy_highpage loop at the end of collapse_file, rather than the entire
process of the collapse.

Yes, there can definitely be faults that fill in pages during the
overall process of collapsing. However, my patch re-checks nr_none
after acquiring the page cache lock for the final time, right before
creating the hpage multi-index entry. Since present pages are locked,
they can't be truncated after we've looked at them. That means that if
nr_none still matches, then there weren't any page faults that
populated missing pages. If we do detect any filled in pages, we can
just roll back the collapse to avoid any problems.

-David