2023-09-18 23:32:14

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH 1/1] mm: lock VMAs skipped by a failed queue_pages_range()

When queue_pages_range() encounters an unmovable page, it terminates
its page walk. This walk, among other things, locks the VMAs in the range.
This termination might result in some VMAs being left unlock after
queue_pages_range() completes. Since do_mbind() continues to operate on
these VMAs despite the failure from queue_pages_range(), it will encounter
an unlocked VMA.
This mbind() behavior has been modified several times before and might
need some changes to either finish the page walk even in the presence
of unmovable pages or to error out immediately after the failure to
queue_pages_range(). However that requires more discussions, so to
fix the immediate issue, explicitly lock the VMAs in the range if
queue_pages_range() failed. The added condition does not save much
but is added for documentation purposes to understand when this extra
locking is needed.

Fixes: 49b0638502da ("mm: enable page walking API to lock vmas during the walk")
Reported-by: [email protected]
Closes: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Suren Baghdasaryan <[email protected]>
---
mm/mempolicy.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 42b5567e3773..cbc584e9b6ca 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1342,6 +1342,9 @@ static long do_mbind(unsigned long start, unsigned long len,
vma_iter_init(&vmi, mm, start);
prev = vma_prev(&vmi);
for_each_vma_range(vmi, vma, end) {
+ /* If queue_pages_range failed then not all VMAs might be locked */
+ if (ret)
+ vma_start_write(vma);
err = mbind_range(&vmi, vma, &prev, start, end, new);
if (err)
break;
--
2.42.0.459.ge4e396fd5e-goog


2023-09-19 05:55:40

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: lock VMAs skipped by a failed queue_pages_range()

On Mon, 18 Sep 2023, Suren Baghdasaryan wrote:

> When queue_pages_range() encounters an unmovable page, it terminates
> its page walk. This walk, among other things, locks the VMAs in the range.
> This termination might result in some VMAs being left unlock after
> queue_pages_range() completes. Since do_mbind() continues to operate on
> these VMAs despite the failure from queue_pages_range(), it will encounter
> an unlocked VMA.
> This mbind() behavior has been modified several times before and might
> need some changes to either finish the page walk even in the presence
> of unmovable pages or to error out immediately after the failure to
> queue_pages_range(). However that requires more discussions, so to
> fix the immediate issue, explicitly lock the VMAs in the range if
> queue_pages_range() failed. The added condition does not save much
> but is added for documentation purposes to understand when this extra
> locking is needed.
>
> Fixes: 49b0638502da ("mm: enable page walking API to lock vmas during the walk")
> Reported-by: [email protected]
> Closes: https://lore.kernel.org/all/[email protected]/
> Signed-off-by: Suren Baghdasaryan <[email protected]>

Acked-by: Hugh Dickins <[email protected]>

> ---
> mm/mempolicy.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 42b5567e3773..cbc584e9b6ca 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1342,6 +1342,9 @@ static long do_mbind(unsigned long start, unsigned long len,
> vma_iter_init(&vmi, mm, start);
> prev = vma_prev(&vmi);
> for_each_vma_range(vmi, vma, end) {
> + /* If queue_pages_range failed then not all VMAs might be locked */
> + if (ret)
> + vma_start_write(vma);
> err = mbind_range(&vmi, vma, &prev, start, end, new);
> if (err)
> break;
> --
> 2.42.0.459.ge4e396fd5e-goog

2023-09-19 15:11:24

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: lock VMAs skipped by a failed queue_pages_range()

On Mon 18-09-23 14:16:08, Suren Baghdasaryan wrote:
> When queue_pages_range() encounters an unmovable page, it terminates
> its page walk. This walk, among other things, locks the VMAs in the range.
> This termination might result in some VMAs being left unlock after
> queue_pages_range() completes. Since do_mbind() continues to operate on
> these VMAs despite the failure from queue_pages_range(), it will encounter
> an unlocked VMA.
> This mbind() behavior has been modified several times before and might
> need some changes to either finish the page walk even in the presence
> of unmovable pages or to error out immediately after the failure to
> queue_pages_range(). However that requires more discussions, so to
> fix the immediate issue, explicitly lock the VMAs in the range if
> queue_pages_range() failed. The added condition does not save much
> but is added for documentation purposes to understand when this extra
> locking is needed.

The semantic of the walk in this case is really clear as mud. I was
trying to reconstruct the whole picture and it really hurts... Then I
found http://lkml.kernel.org/r/CAHbLzkrmTaqBRmHVdE2kyW57Uoghqd_E+jAXC9cB5ofkhL-uvw@mail.gmail.com
and that helped a lot. Let's keep it a reference at least in the email
thread here for future.

> Fixes: 49b0638502da ("mm: enable page walking API to lock vmas during the walk")
> Reported-by: [email protected]
> Closes: https://lore.kernel.org/all/[email protected]/
> Signed-off-by: Suren Baghdasaryan <[email protected]>

I cannot say I like the patch (it looks like a potential double locking
unless you realize this lock is special) but considering this might be just
temporal I do not mind.

Acked-by: Michal Hocko <[email protected]>

Thanks!

> ---
> mm/mempolicy.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 42b5567e3773..cbc584e9b6ca 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1342,6 +1342,9 @@ static long do_mbind(unsigned long start, unsigned long len,
> vma_iter_init(&vmi, mm, start);
> prev = vma_prev(&vmi);
> for_each_vma_range(vmi, vma, end) {
> + /* If queue_pages_range failed then not all VMAs might be locked */
> + if (ret)
> + vma_start_write(vma);
> err = mbind_range(&vmi, vma, &prev, start, end, new);
> if (err)
> break;
> --
> 2.42.0.459.ge4e396fd5e-goog

--
Michal Hocko
SUSE Labs

2023-09-19 21:10:08

by Yang Shi

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: lock VMAs skipped by a failed queue_pages_range()

On Tue, Sep 19, 2023 at 1:53 AM Michal Hocko <[email protected]> wrote:
>
> On Mon 18-09-23 14:16:08, Suren Baghdasaryan wrote:
> > When queue_pages_range() encounters an unmovable page, it terminates
> > its page walk. This walk, among other things, locks the VMAs in the range.
> > This termination might result in some VMAs being left unlock after
> > queue_pages_range() completes. Since do_mbind() continues to operate on
> > these VMAs despite the failure from queue_pages_range(), it will encounter
> > an unlocked VMA.
> > This mbind() behavior has been modified several times before and might
> > need some changes to either finish the page walk even in the presence
> > of unmovable pages or to error out immediately after the failure to
> > queue_pages_range(). However that requires more discussions, so to
> > fix the immediate issue, explicitly lock the VMAs in the range if
> > queue_pages_range() failed. The added condition does not save much
> > but is added for documentation purposes to understand when this extra
> > locking is needed.
>
> The semantic of the walk in this case is really clear as mud. I was
> trying to reconstruct the whole picture and it really hurts... Then I
> found http://lkml.kernel.org/r/CAHbLzkrmTaqBRmHVdE2kyW57Uoghqd_E+jAXC9cB5ofkhL-uvw@mail.gmail.com
> and that helped a lot. Let's keep it a reference at least in the email
> thread here for future.

FYI, I'm working on a fix for the regression mentioned in that series,
and Hugh has some clean up and enhancement for that too.

>
> > Fixes: 49b0638502da ("mm: enable page walking API to lock vmas during the walk")
> > Reported-by: [email protected]
> > Closes: https://lore.kernel.org/all/[email protected]/
> > Signed-off-by: Suren Baghdasaryan <[email protected]>
>
> I cannot say I like the patch (it looks like a potential double locking
> unless you realize this lock is special) but considering this might be just
> temporal I do not mind.
>
> Acked-by: Michal Hocko <[email protected]>
>
> Thanks!
>
> > ---
> > mm/mempolicy.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 42b5567e3773..cbc584e9b6ca 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -1342,6 +1342,9 @@ static long do_mbind(unsigned long start, unsigned long len,
> > vma_iter_init(&vmi, mm, start);
> > prev = vma_prev(&vmi);
> > for_each_vma_range(vmi, vma, end) {
> > + /* If queue_pages_range failed then not all VMAs might be locked */
> > + if (ret)
> > + vma_start_write(vma);
> > err = mbind_range(&vmi, vma, &prev, start, end, new);
> > if (err)
> > break;
> > --
> > 2.42.0.459.ge4e396fd5e-goog
>
> --
> Michal Hocko
> SUSE Labs