2021-06-22 18:52:24

by Nadav Amit

[permalink] [raw]
Subject: Re: Do we need to unrevert "fs: do not prefault sys_write() user buffer pages"?



> On Jun 22, 2021, at 11:36 AM, Matthew Wilcox <[email protected]> wrote:
>
> On Tue, Jun 22, 2021 at 11:28:30AM -0700, Linus Torvalds wrote:
>> On Tue, Jun 22, 2021 at 11:23 AM Matthew Wilcox <[email protected]> wrote:
>>>
>>> It wouldn't be _that_ bad necessarily. filemap_fault:
>>
>> It's not actually the mm code that is the biggest problem. We
>> obviously already have readahead support.
>>
>> It's the *fault* side.
>>
>> In particular, since the fault would return without actually filling
>> in the page table entry (because the page isn't ready yet, and you
>> cannot expose it to other threads!), you also have to jump over the
>> instruction that caused this all.
>
> Oh, I was assuming that it'd be a function call like
> get_user_pages_fast(), not an instruction that was specially marked to
> be jumped over. Gag reflex diminishing now?

Just reminding the alternative (in the RFC that I mentioned before):
a vDSO exception table entry for a memory accessing function in the
vDSO. It then behaves as a sort of MADV_WILLNEED for the faulting
page if an exception is triggered. Unlike MADV_WILLNEED it maps the
page if no IO is needed. It can return through a register whether
the page was present or not.

I once implemented (another) alternative, in which the ELF had a section
with an exception-table (holding all the “Async-#PF” instructions),
which described where to skip to if a #PF occurs, but this solution
seemed too heavy-weight/intrusive.


2021-06-22 18:58:56

by Linus Torvalds

[permalink] [raw]
Subject: Re: Do we need to unrevert "fs: do not prefault sys_write() user buffer pages"?

On Tue, Jun 22, 2021 at 11:51 AM Nadav Amit <[email protected]> wrote:
> Just reminding the alternative (in the RFC that I mentioned before):
> a vDSO exception table entry for a memory accessing function in the
> vDSO. It then behaves as a sort of MADV_WILLNEED for the faulting
> page if an exception is triggered. Unlike MADV_WILLNEED it maps the
> page if no IO is needed. It can return through a register whether
> the page was present or not.

Yeah, that looks like a user-space equivalent.

And thanks to the vdso, it doesn't need to support all architectures.
Unlike a kernel model would (but yes, a kernel model could then have a
fallback for the non-prefetching synchronous case instead, so I guess
we could just do one architecture at a time).

Linus