Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20;
MIME-Version: 1.0
References: <20220611084731.55155-1-linmiaohe@huawei.com> <20220611084731.55155-3-linmiaohe@huawei.com>
 <CAHbLzkpc8ag7MkY_D17U1B7SjZFO2Bss8rVVj-scMOC8ttqxEg@mail.gmail.com> <c5b8f0c3-de35-9f0e-a3a8-6e132ac398cc@huawei.com>
In-Reply-To: <c5b8f0c3-de35-9f0e-a3a8-6e132ac398cc@huawei.com>
From:   Yang Shi <shy828301@gmail.com>
Date:   Thu, 16 Jun 2022 08:46:39 -0700
Message-ID: <CAHbLzkqUgTS0La43PaAXEL81UGR4Z7_YDOCYiMa-KcX=CCe9AA@mail.gmail.com>
Subject: Re: [PATCH 2/7] mm/khugepaged: stop swapping in page when
 VM_FAULT_RETRY occurs
To:     Miaohe Lin <linmiaohe@huawei.com>
Cc:     Andrew Morton <akpm@linux-foundation.org>,
        Andrea Arcangeli <aarcange@redhat.com>,
        Matthew Wilcox <willy@infradead.org>,
        Vlastimil Babka <vbabka@suse.cz>,
        David Howells <dhowells@redhat.com>, NeilBrown <neilb@suse.de>,
        Alistair Popple <apopple@nvidia.com>,
        David Hildenbrand <david@redhat.com>,
        Suren Baghdasaryan <surenb@google.com>,
        Peter Xu <peterx@redhat.com>, Linux MM <linux-mm@kvack.org>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Content-Type: text/plain; charset="UTF-8"
Precedence: bulk

On Wed, Jun 15, 2022 at 11:40 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> On 2022/6/16 1:49, Yang Shi wrote:
> > On Sat, Jun 11, 2022 at 1:47 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
> >>
> >> When do_swap_page returns VM_FAULT_RETRY, we do not retry here and thus
> >> swap entry will remain in pagetable. This will result in later failure.
> >> So stop swapping in pages in this case to save cpu cycles.
> >>
> >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> >> ---
> >>  mm/khugepaged.c | 19 ++++++++-----------
> >>  1 file changed, 8 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> >> index 73570dfffcec..a8adb2d1e9c6 100644
> >> --- a/mm/khugepaged.c
> >> +++ b/mm/khugepaged.c
> >> @@ -1003,19 +1003,16 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm,
> >>                 swapped_in++;
> >>                 ret = do_swap_page(&vmf);
> >>
> >> -               /* do_swap_page returns VM_FAULT_RETRY with released mmap_lock */
> >> +               /*
> >> +                * do_swap_page returns VM_FAULT_RETRY with released mmap_lock.
> >> +                * Note we treat VM_FAULT_RETRY as VM_FAULT_ERROR here because
> >> +                * we do not retry here and swap entry will remain in pagetable
> >> +                * resulting in later failure.
> >
> > Yeah, it makes sense.
> >
> >> +                */
> >>                 if (ret & VM_FAULT_RETRY) {
> >>                         mmap_read_lock(mm);
> >
> > A further optimization, you should not need to relock mmap_lock. You
> > may consider returning a different value or passing in *locked and
> > setting it to false, then check this value in the caller to skip
> > unlock.
>
> Could we just keep the mmap_sem unlocked when __collapse_huge_page_swapin() fails due to the caller
> always doing mmap_read_unlock when __collapse_huge_page_swapin() returns false and add some comments
> about this behavior? This looks like a simple way for me.

Yeah, that sounds better.

>
> >
> >> -                       if (hugepage_vma_revalidate(mm, haddr, &vma)) {
> >> -                               /* vma is no longer available, don't continue to swapin */
> >> -                               trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
> >> -                               return false;
> >> -                       }
> >> -                       /* check if the pmd is still valid */
> >> -                       if (mm_find_pmd(mm, haddr) != pmd) {
> >> -                               trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
> >> -                               return false;
> >> -                       }
> >> +                       trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
> >> +                       return false;
> >>                 }
> >>                 if (ret & VM_FAULT_ERROR) {
> >>                         trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
> >
> > And I think "swapped_in++" needs to be moved after error handling.
>
> Do you mean do "swapped_in++" only after pages are swapped in successfully?

Yes.

>
> Thanks!
>
> >
> >> --
> >> 2.23.0
> >>
> >>
> > .
> >
>