2018-11-24 08:18:25

by Julien Thierry

[permalink] [raw]
Subject: Sleeping in user_access section

Hi,

I made an attempt at implementing the
user_access_begin()/user_access_end() macros along with the
get/put_user_unsafe() for arm64 by toggling the status of PAN (more or
less similar to x86's STAC/CTAC).

With a small mistake in my patch, we realized that directly calling
function that could reschedule while in a user_access section could lead to:

- scheduling another task keeping the user_access status enabled despite
the task never calling user_access_begin()

- when re-scheduling the task that was mid user_access section,
user_access would be disabled and the task would fault on the next
get/put_user_unsafe.


This is because __switch_to does not alter the user_access status when
switching from next to prev (at least on arm64 we currently don't, and
by looking at the x86 code I don't think this is done either).


From my understanding, this is not an issue when the task in
user_access mode gets scheduled out/in as a result of an interrupt as
PAN and EFLAGS.AC get saved/restore on exception entry/exit (at least I
know it is the case for PAN, I am less sure for the x86 side).


So, the question is, should __switch_to take care of the user_access
status when scheduling new tasks? Or should there be a restriction about
scheduling out a task with user_access mode enabled and maybe add a
warning if we can detect this?

(Or did we miss something and this is not an issue on x86?)

Thanks,

--
Julien Thierry


2018-11-24 08:23:06

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Sleeping in user_access section

On November 23, 2018 1:27:02 AM PST, Julien Thierry <[email protected]> wrote:
>Hi,
>
>I made an attempt at implementing the
>user_access_begin()/user_access_end() macros along with the
>get/put_user_unsafe() for arm64 by toggling the status of PAN (more or
>less similar to x86's STAC/CTAC).
>
>With a small mistake in my patch, we realized that directly calling
>function that could reschedule while in a user_access section could
>lead to:
>
>- scheduling another task keeping the user_access status enabled
>despite
>the task never calling user_access_begin()
>
>- when re-scheduling the task that was mid user_access section,
>user_access would be disabled and the task would fault on the next
>get/put_user_unsafe.
>
>
>This is because __switch_to does not alter the user_access status when
>switching from next to prev (at least on arm64 we currently don't, and
>by looking at the x86 code I don't think this is done either).
>
>
> From my understanding, this is not an issue when the task in
>user_access mode gets scheduled out/in as a result of an interrupt as
>PAN and EFLAGS.AC get saved/restore on exception entry/exit (at least I
>
>know it is the case for PAN, I am less sure for the x86 side).
>
>
>So, the question is, should __switch_to take care of the user_access
>status when scheduling new tasks? Or should there be a restriction
>about
>scheduling out a task with user_access mode enabled and maybe add a
>warning if we can detect this?
>
>(Or did we miss something and this is not an issue on x86?)
>
>Thanks,

You should never call a sleeping function from a user_access section. It is intended for very limited regions.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

2018-11-24 08:24:47

by Julien Thierry

[permalink] [raw]
Subject: Re: Sleeping in user_access section



On 23/11/18 09:57, [email protected] wrote:
> On November 23, 2018 1:27:02 AM PST, Julien Thierry <[email protected]> wrote:
>> Hi,
>>
>> I made an attempt at implementing the
>> user_access_begin()/user_access_end() macros along with the
>> get/put_user_unsafe() for arm64 by toggling the status of PAN (more or
>> less similar to x86's STAC/CTAC).
>>
>> With a small mistake in my patch, we realized that directly calling
>> function that could reschedule while in a user_access section could
>> lead to:
>>
>> - scheduling another task keeping the user_access status enabled
>> despite
>> the task never calling user_access_begin()
>>
>> - when re-scheduling the task that was mid user_access section,
>> user_access would be disabled and the task would fault on the next
>> get/put_user_unsafe.
>>
>>
>> This is because __switch_to does not alter the user_access status when
>> switching from next to prev (at least on arm64 we currently don't, and
>> by looking at the x86 code I don't think this is done either).
>>
>>
>> From my understanding, this is not an issue when the task in
>> user_access mode gets scheduled out/in as a result of an interrupt as
>> PAN and EFLAGS.AC get saved/restore on exception entry/exit (at least I
>>
>> know it is the case for PAN, I am less sure for the x86 side).
>>
>>
>> So, the question is, should __switch_to take care of the user_access
>> status when scheduling new tasks? Or should there be a restriction
>> about
>> scheduling out a task with user_access mode enabled and maybe add a
>> warning if we can detect this?
>>
>> (Or did we miss something and this is not an issue on x86?)
>>
>> Thanks,
>
> You should never call a sleeping function from a user_access section. It is intended for very limited regions.
>

Thanks for the clarification.

Would it be worth documenting this somewhere? And add a check to detect
such issues?

Also, those limited regions can be interrupted and preempted, but I
guess you could consider the interrupted region being split into
separate user_access regions, before and after the interrupt. Should it
be stated that an exception/interrupt constitutes implicit
user_access_end()/begin() when taken from/returning to a user_access region?

Thanks,

--
Julien Thierry

2018-11-24 08:26:01

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: Sleeping in user_access section

On Fri, Nov 23, 2018 at 01:57:12AM -0800, [email protected] wrote:
> You should never call a sleeping function from a user_access section.
> It is intended for very limited regions.

So, what happens if the "unsafe" user access causes a page fault that
ends up sleeping?

--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

2018-11-24 08:30:03

by Julien Thierry

[permalink] [raw]
Subject: Re: Sleeping in user_access section



On 23/11/18 10:50, Russell King - ARM Linux wrote:
> On Fri, Nov 23, 2018 at 01:57:12AM -0800, [email protected] wrote:
>> You should never call a sleeping function from a user_access section.
>> It is intended for very limited regions.
>
> So, what happens if the "unsafe" user access causes a page fault that
> ends up sleeping?
>

Thanks for pointing that out.

On the arm64 side, PAN state is saved in spsr and (if PAN feature is
enabled in SCTLR) PAN bit gets set (disabling the user accesses). For
software PAN we follow the same behaviour on exception entry. So upon
exception we implicitly exit user_access mode and then re-enter it when
returning from the exception.

On x86, the EFLAGS.AC bit is also saved upon exception and I think it is
cleared upon exception entry so there is implicit exit from the
user_access mode when taking exception/interrupt.

This however is just how those two architectures happen to behave and
doesn't seem to be specified as part of the user_access API...

Which is why I'd like to clarify the semantics of user_access region wrt
sleeping functions.

Thanks,

--
Julien Thierry

2018-11-24 08:34:33

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: Sleeping in user_access section

On Fri, Nov 23, 2018 at 11:57:31AM +0000, Julien Thierry wrote:
>
>
> On 23/11/18 10:50, Russell King - ARM Linux wrote:
> >On Fri, Nov 23, 2018 at 01:57:12AM -0800, [email protected] wrote:
> >>You should never call a sleeping function from a user_access section.
> >>It is intended for very limited regions.
> >
> >So, what happens if the "unsafe" user access causes a page fault that
> >ends up sleeping?
> >
>
> Thanks for pointing that out.
>
> On the arm64 side, PAN state is saved in spsr and (if PAN feature is enabled
> in SCTLR) PAN bit gets set (disabling the user accesses). For software PAN
> we follow the same behaviour on exception entry. So upon exception we
> implicitly exit user_access mode and then re-enter it when returning from
> the exception.
>
> On x86, the EFLAGS.AC bit is also saved upon exception and I think it is
> cleared upon exception entry so there is implicit exit from the user_access
> mode when taking exception/interrupt.
>
> This however is just how those two architectures happen to behave and
> doesn't seem to be specified as part of the user_access API...
>
> Which is why I'd like to clarify the semantics of user_access region wrt
> sleeping functions.

Explicitly calling a sleeping function may not be recommended, but
it's a rather fundamental fact of the Linux kernel that if we want
to access userspace, we must be either prepared to sleep, or fail
the access and return an error.

Looking at kernel/exit.c and kernel/compat.c, I see no sign of any
retry mechanism in the current call sites, so any failed user access
would return an error to userspace - which is not the behaviour that
userspace would expect.

It's possible that, when user_access_begin() et.al. are implemented,
access_ok() also comes with the requirement to make sure that the
userspace pages are accessible, but even _that_ would be racy,
because there's no way to pin the pages, so another thread/CPU could
page those user pages back out... leading to a fault.

So, realistically, with how user_access_begin()..user_access_end()
is currently being used, an architecture fundamentally has to be
prepared for threads to sleep within that section of code through
the action of the page fault handling. I can't see any other
realistic possibility here.

--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

2018-11-27 19:30:09

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Sleeping in user_access section

On 11/23/18 3:57 AM, Julien Thierry wrote:
>
> On x86, the EFLAGS.AC bit is also saved upon exception and I think it is
> cleared upon exception entry so there is implicit exit from the
> user_access mode when taking exception/interrupt.
>

No, it is restored, not cleared.

In summary: on exceptions, user_access regions are suspended, and on
return the user_access status is resumed.

However, explicitly calling sleeping functions is not supported.

-hpa