2021-11-19 07:01:15

by Muchun Song

[permalink] [raw]
Subject: [RFC PATCH] x86/fault: move might_sleep() out of mmap read lock

The mmap lock is supposed to be a contended lock sometimes, scheduling
to other task with holding mmap read lock does not seems to be a wise
choice. So move it to the front of mmap_read_trylock(). Although
mmap_read_lock() implies a might_sleep(), I think redundant check is
not a problem since this task is about to sleep and it is not a hot
path.

Signed-off-by: Muchun Song <[email protected]>
---
arch/x86/mm/fault.c | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 4bfed53e210e..22fd1dfafa3d 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1323,6 +1323,8 @@ void do_user_addr_fault(struct pt_regs *regs,
}
#endif

+ might_sleep();
+
/*
* Kernel-mode access to the user address space should only occur
* on well-defined single instructions listed in the exception
@@ -1346,13 +1348,6 @@ void do_user_addr_fault(struct pt_regs *regs,
}
retry:
mmap_read_lock(mm);
- } else {
- /*
- * The above down_read_trylock() might have succeeded in
- * which case we'll have missed the might_sleep() from
- * down_read():
- */
- might_sleep();
}

vma = find_vma(mm, address);
--
2.11.0



2021-11-19 15:04:17

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/fault: move might_sleep() out of mmap read lock

On 11/18/21 10:58 PM, Muchun Song wrote:
> The mmap lock is supposed to be a contended lock sometimes, scheduling
> to other task with holding mmap read lock does not seems to be a wise
> choice. So move it to the front of mmap_read_trylock(). Although
> mmap_read_lock() implies a might_sleep(), I think redundant check is
> not a problem since this task is about to sleep and it is not a hot
> path.

This justification doesn't really do it for me. There are a billion
ways to sleep in the fault path while the mmap lock is held. Adding one
more cond_resched() doesn't seem like it would do much.

In other words, I don't think there's a performance justification here.

That said...

> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 4bfed53e210e..22fd1dfafa3d 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -1323,6 +1323,8 @@ void do_user_addr_fault(struct pt_regs *regs,
> }
> #endif
>
> + might_sleep();
> +
> /*
> * Kernel-mode access to the user address space should only occur
> * on well-defined single instructions listed in the exception
> @@ -1346,13 +1348,6 @@ void do_user_addr_fault(struct pt_regs *regs,
> }
> retry:
> mmap_read_lock(mm);
> - } else {
> - /*
> - * The above down_read_trylock() might have succeeded in
> - * which case we'll have missed the might_sleep() from
> - * down_read():
> - */
> - might_sleep();
> }
>
> vma = find_vma(mm, address);

The comment is stale, which isn't great. The might_sleep() is already
in the fast path. So, moving it up above makes a lot of sense just in
terms of simplicity.

The might_sleep() does need a comment, though, about what it's _doing_
there.

So, right idea, good result, but for the wrong reasons.

If you want to revise the justification and add a comment, I think this
is something we can take.

2021-11-22 07:00:36

by Muchun Song

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/fault: move might_sleep() out of mmap read lock

On Fri, Nov 19, 2021 at 11:04 PM Dave Hansen <[email protected]> wrote:
>
> On 11/18/21 10:58 PM, Muchun Song wrote:
> > The mmap lock is supposed to be a contended lock sometimes, scheduling
> > to other task with holding mmap read lock does not seems to be a wise
> > choice. So move it to the front of mmap_read_trylock(). Although
> > mmap_read_lock() implies a might_sleep(), I think redundant check is
> > not a problem since this task is about to sleep and it is not a hot
> > path.
>
> This justification doesn't really do it for me. There are a billion
> ways to sleep in the fault path while the mmap lock is held. Adding one
> more cond_resched() doesn't seem like it would do much.

I agree with you that there are lots of ways to sleep in the
fault path. Just try my best to not sleep with mmap lock.

>
> In other words, I don't think there's a performance justification here.
>
> That said...
>
> > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > index 4bfed53e210e..22fd1dfafa3d 100644
> > --- a/arch/x86/mm/fault.c
> > +++ b/arch/x86/mm/fault.c
> > @@ -1323,6 +1323,8 @@ void do_user_addr_fault(struct pt_regs *regs,
> > }
> > #endif
> >
> > + might_sleep();
> > +
> > /*
> > * Kernel-mode access to the user address space should only occur
> > * on well-defined single instructions listed in the exception
> > @@ -1346,13 +1348,6 @@ void do_user_addr_fault(struct pt_regs *regs,
> > }
> > retry:
> > mmap_read_lock(mm);
> > - } else {
> > - /*
> > - * The above down_read_trylock() might have succeeded in
> > - * which case we'll have missed the might_sleep() from
> > - * down_read():
> > - */
> > - might_sleep();
> > }
> >
> > vma = find_vma(mm, address);
>
> The comment is stale, which isn't great. The might_sleep() is already
> in the fast path. So, moving it up above makes a lot of sense just in
> terms of simplicity.

Without this patch, I didn't see the might_sleep() in the fast path. What
am I missing here?

Thanks.

>
> The might_sleep() does need a comment, though, about what it's _doing_
> there.
>
> So, right idea, good result, but for the wrong reasons.
>
> If you want to revise the justification and add a comment, I think this
> is something we can take.

2021-11-25 10:49:46

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/fault: move might_sleep() out of mmap read lock

Muchun, Dave!

On Mon, Nov 22 2021 at 14:59, Muchun Song wrote:
> On Fri, Nov 19, 2021 at 11:04 PM Dave Hansen <[email protected]> wrote:
>> >
>> > + might_sleep();
>> > +
>> > /*
>> > * Kernel-mode access to the user address space should only occur
>> > * on well-defined single instructions listed in the exception
>> > @@ -1346,13 +1348,6 @@ void do_user_addr_fault(struct pt_regs *regs,
>> > }
>> > retry:
>> > mmap_read_lock(mm);
>> > - } else {
>> > - /*
>> > - * The above down_read_trylock() might have succeeded in
>> > - * which case we'll have missed the might_sleep() from
>> > - * down_read():
>> > - */
>> > - might_sleep();
>> > }
>> >
>> > vma = find_vma(mm, address);
>>
>> The comment is stale, which isn't great. The might_sleep() is already
>> in the fast path. So, moving it up above makes a lot of sense just in
>> terms of simplicity.

I don't think so. The point is:

if (unlikely(!mmap_read_trylock(mm))) {
if (!user_mode(regs) && !search_exception_tables(regs->ip)) {
/*
* Fault from code in kernel from
* which we do not expect faults.
*/
bad_area_nosemaphore(regs, error_code, address);
return;
}

Moving it up will make the might_sleep() splat more important than an
unexpected fault when the unexpected fault happens in e.g. a preemption
disabled region. That's wrong because the important information in this
case is not the might sleep splat. The important information is the
fault itself.

But moving it up is even more wrong for spurious faults which are
correctly handled in that case via:

bad_area_nosemaphore()
__bad_area_nosemaphore()
kernelmode_fixup_or_oops()
handle(AMD erratum #91)
is_prefetch()

So if such a spurious fault happens in a condition which would trigger
the might_sleep() splat then moving might_sleep() before the trylock()
will cause false positives. So, no. It's going to stay where it is.

> Without this patch, I didn't see the might_sleep() in the fast path. What
> am I missing here?

I have no idea what you are doing. If the trylock() succeeds and the
fault happened in e.g. a preemption disabled region then the
might_sleep() in the else path will trigger no matter what.

Thanks,

tglx

2021-11-26 05:57:05

by Muchun Song

[permalink] [raw]
Subject: Re: [External] Re: [RFC PATCH] x86/fault: move might_sleep() out of mmap read lock

On Thu, Nov 25, 2021 at 6:45 PM Thomas Gleixner <[email protected]> wrote:
>
> Muchun, Dave!
>
> On Mon, Nov 22 2021 at 14:59, Muchun Song wrote:
> > On Fri, Nov 19, 2021 at 11:04 PM Dave Hansen <[email protected]> wrote:
> >> >
> >> > + might_sleep();
> >> > +
> >> > /*
> >> > * Kernel-mode access to the user address space should only occur
> >> > * on well-defined single instructions listed in the exception
> >> > @@ -1346,13 +1348,6 @@ void do_user_addr_fault(struct pt_regs *regs,
> >> > }
> >> > retry:
> >> > mmap_read_lock(mm);
> >> > - } else {
> >> > - /*
> >> > - * The above down_read_trylock() might have succeeded in
> >> > - * which case we'll have missed the might_sleep() from
> >> > - * down_read():
> >> > - */
> >> > - might_sleep();
> >> > }
> >> >
> >> > vma = find_vma(mm, address);
> >>
> >> The comment is stale, which isn't great. The might_sleep() is already
> >> in the fast path. So, moving it up above makes a lot of sense just in
> >> terms of simplicity.
>
> I don't think so. The point is:
>
> if (unlikely(!mmap_read_trylock(mm))) {
> if (!user_mode(regs) && !search_exception_tables(regs->ip)) {
> /*
> * Fault from code in kernel from
> * which we do not expect faults.
> */
> bad_area_nosemaphore(regs, error_code, address);
> return;
> }
>
> Moving it up will make the might_sleep() splat more important than an
> unexpected fault when the unexpected fault happens in e.g. a preemption
> disabled region. That's wrong because the important information in this
> case is not the might sleep splat. The important information is the
> fault itself.
>
> But moving it up is even more wrong for spurious faults which are
> correctly handled in that case via:
>
> bad_area_nosemaphore()
> __bad_area_nosemaphore()
> kernelmode_fixup_or_oops()
> handle(AMD erratum #91)
> is_prefetch()
>
> So if such a spurious fault happens in a condition which would trigger
> the might_sleep() splat then moving might_sleep() before the trylock()
> will cause false positives. So, no. It's going to stay where it is.

Got it. I didn't realize those cases. Thank you for your patient
explanation.


>
> > Without this patch, I didn't see the might_sleep() in the fast path. What
> > am I missing here?
>
> I have no idea what you are doing. If the trylock() succeeds and the
> fault happened in e.g. a preemption disabled region then the
> might_sleep() in the else path will trigger no matter what.
>
> Thanks,
>
> tglx