Hi, Peter and Eric.
I am not expert about futex.
I am sorry if this is dumb question.
If we use private futex, get_futex_key don't call get_user_pages_fast
which pins page at page table.
Then, get_futex_value_locked calls __cpy_from_user_inatomic with
pagefault_disable.
Who make sure the user page is mapped at app's page table ?
--
Kinds regards,
Minchan Kim
Hmm, It seems even shared futex, too.
After calling get_user_pages_fast, get_futex_key calls unlock, put_page, too.
Then futex_wait calls get_futex_value_locked.
Wouldn't kernel reclaim the page between get_fuex_key and
get_futex_value_locked ?
How do we make sure this race condition ?
On Fri, Mar 27, 2009 at 11:12 AM, Minchan Kim <[email protected]> wrote:
> Hi, Peter and Eric.
>
> I am not expert about futex.
> I am sorry if this is dumb question.
>
> If we use private futex, get_futex_key don't call get_user_pages_fast
> which pins page at page table.
> Then, get_futex_value_locked calls __cpy_from_user_inatomic with
> pagefault_disable.
>
> Who make sure the user page is mapped at app's page table ?
>
> --
> Kinds regards,
> Minchan Kim
>
--
Kinds regards,
Minchan Kim
Minchan Kim a écrit :
> Hi, Peter and Eric.
>
> I am not expert about futex.
> I am sorry if this is dumb question.
>
> If we use private futex, get_futex_key don't call get_user_pages_fast
> which pins page at page table.
> Then, get_futex_value_locked calls __cpy_from_user_inatomic with
> pagefault_disable.
>
> Who make sure the user page is mapped at app's page table ?
>
Nothing makes sure user page is mapped, as we dont have to (for private futexes
at least, since the 'key' is a combination of the futex virtual address (not
depending on the underlying physical page) and the task mm (sort of a static
offset per task)
If no page is mapped, a normal error should be returned to user, since
access to futex location will trigger a fault.
Thanks for kind explanation.
On Fri, Mar 27, 2009 at 1:56 PM, Eric Dumazet <[email protected]> wrote:
> Minchan Kim a écrit :
>> Hi, Peter and Eric.
>>
>> I am not expert about futex.
>> I am sorry if this is dumb question.
>>
>> If we use private futex, get_futex_key don't call get_user_pages_fast
>> which pins page at page table.
>> Then, get_futex_value_locked calls __cpy_from_user_inatomic with
>> pagefault_disable.
>>
>> Who make sure the user page is mapped at app's page table ?
>>
>
> Nothing makes sure user page is mapped, as we dont have to (for private futexes
> at least, since the 'key' is a combination of the futex virtual address (not
> depending on the underlying physical page) and the task mm (sort of a static
> offset per task)
> If no page is mapped, a normal error should be returned to user, since
> access to futex location will trigger a fault.
>
I mean as follows.
It seems even shared futex case.
After calling get_user_pages_fast, get_futex_key calls unlock_page and
put_page, too. Then futex_wait calls get_futex_value_locked.
Generally, current page->count is one and nolocked.
I think kernel reclaimer can reclaim the page.
Wouldn't kernel reclaim the page between get_fuex_key and
get_futex_value_locked ?
If kernel reclaimed the page, __copy_from_user_inatomic can happens
page fault although pagefault_disable is on.
How do we make sure this race condition ?
Do I miss something ?
--
Kinds regards,
Minchan Kim
Minchan Kim a écrit :
> Thanks for kind explanation.
>
> On Fri, Mar 27, 2009 at 1:56 PM, Eric Dumazet <[email protected]> wrote:
>> Minchan Kim a écrit :
>>> Hi, Peter and Eric.
>>>
>>> I am not expert about futex.
>>> I am sorry if this is dumb question.
>>>
>>> If we use private futex, get_futex_key don't call get_user_pages_fast
>>> which pins page at page table.
>>> Then, get_futex_value_locked calls __cpy_from_user_inatomic with
>>> pagefault_disable.
>>>
>>> Who make sure the user page is mapped at app's page table ?
>>>
>> Nothing makes sure user page is mapped, as we dont have to (for private futexes
>> at least, since the 'key' is a combination of the futex virtual address (not
>> depending on the underlying physical page) and the task mm (sort of a static
>> offset per task)
>> If no page is mapped, a normal error should be returned to user, since
>> access to futex location will trigger a fault.
>>
>
> I mean as follows.
> It seems even shared futex case.
>
> After calling get_user_pages_fast, get_futex_key calls unlock_page and
> put_page, too. Then futex_wait calls get_futex_value_locked.
>
> Generally, current page->count is one and nolocked.
> I think kernel reclaimer can reclaim the page.
>
> Wouldn't kernel reclaim the page between get_fuex_key and
> get_futex_value_locked ?
> If kernel reclaimed the page, __copy_from_user_inatomic can happens
> page fault although pagefault_disable is on.
>
> How do we make sure this race condition ?
> Do I miss something ?
>
Hmmm, so your question is not about PRIVATE futexes, but shared ones.
I guess if page is no more present, its not a problem since
get_futex_value_locked() returns an error. We then take a slow
path, calling get_user() and retrying whole futex logic.
However, comment at line 1213 is misleading I guess, since
we dont hold mmap semaphore anymore ?
* for shared futexes, we hold the mmap semaphore, so the mapping
* cannot have changed since we looked it up in get_futex_key.
*/
ret = get_futex_value_locked(&uval, uaddr);
So if page was un-mapped by another thread, and re-mapped to another physical
page, then this thread might sleep on 'kernel futex' not anymore reachable...
User error, as it is not supposed to happen in a sane program, undefined
result...
On Fri, Mar 27, 2009 at 2:50 PM, Eric Dumazet <[email protected]> wrote:
> Minchan Kim a écrit :
>> Thanks for kind explanation.
>>
>> On Fri, Mar 27, 2009 at 1:56 PM, Eric Dumazet <[email protected]> wrote:
>>> Minchan Kim a écrit :
>>>> Hi, Peter and Eric.
>>>>
>>>> I am not expert about futex.
>>>> I am sorry if this is dumb question.
>>>>
>>>> If we use private futex, get_futex_key don't call get_user_pages_fast
>>>> which pins page at page table.
>>>> Then, get_futex_value_locked calls __cpy_from_user_inatomic with
>>>> pagefault_disable.
>>>>
>>>> Who make sure the user page is mapped at app's page table ?
>>>>
>>> Nothing makes sure user page is mapped, as we dont have to (for private futexes
>>> at least, since the 'key' is a combination of the futex virtual address (not
>>> depending on the underlying physical page) and the task mm (sort of a static
>>> offset per task)
>>> If no page is mapped, a normal error should be returned to user, since
>>> access to futex location will trigger a fault.
>>>
>>
>> I mean as follows.
>> It seems even shared futex case.
>>
>> After calling get_user_pages_fast, get_futex_key calls unlock_page and
>> put_page, too. Then futex_wait calls get_futex_value_locked.
>>
>> Generally, current page->count is one and nolocked.
>> I think kernel reclaimer can reclaim the page.
>>
>> Wouldn't kernel reclaim the page between get_fuex_key and
>> get_futex_value_locked ?
>> If kernel reclaimed the page, __copy_from_user_inatomic can happens
>> page fault although pagefault_disable is on.
>>
>> How do we make sure this race condition ?
>> Do I miss something ?
>>
>
> Hmmm, so your question is not about PRIVATE futexes, but shared ones.
>
> I guess if page is no more present, its not a problem since
> get_futex_value_locked() returns an error. We then take a slow
> path, calling get_user() and retrying whole futex logic.
Indeed.
I misunderstood about __copy_from_user_inatomic.
It never sleep.
> However, comment at line 1213 is misleading I guess, since
> we dont hold mmap semaphore anymore ?
>
> * for shared futexes, we hold the mmap semaphore, so the mapping
> * cannot have changed since we looked it up in get_futex_key.
> */
> ret = get_futex_value_locked(&uval, uaddr);
>
> So if page was un-mapped by another thread, and re-mapped to another physical
> page, then this thread might sleep on 'kernel futex' not anymore reachable...
>
> User error, as it is not supposed to happen in a sane program, undefined
> result...
Yes. How about removing confusing comments ?
Thanks for great explanation :)
--
Kinds regards,
Minchan Kim
On Fri, 2009-03-27 at 11:12 +0900, Minchan Kim wrote:
> Hi, Peter and Eric.
>
> I am not expert about futex.
> I am sorry if this is dumb question.
>
> If we use private futex, get_futex_key don't call get_user_pages_fast
> which pins page at page table.
But also drops that page ref at the end of get_futex_key(). The whole
and only purpose of using get_user_pages_fast() is to get at the mapping
data without having to obtain the mmap_sem.
> Then, get_futex_value_locked calls __cpy_from_user_inatomic with
> pagefault_disable.
>
> Who make sure the user page is mapped at app's page table ?
Nobody, all uses of get_futex_value_locked() have to deal with it
returning -EFAULT.
Most of this is legacy btw, from when futex ops were done under the
mmap_sem. Back then we couldn't fault because that would cause mmap_sem
recursion. Howver, now that we don't hold mmap_sem anymore we could use
a faulting user access like get_user().
Darren has been working on patches to clean that up, some of those are
already merged in the -tip tree.
HTH
Hi, Perter.
Thanks for joining this thread.
My concern is page reclaimer can reclaim the user page which have
futex between get_fuex_key and get_futex_value_locked.
On Fri, Mar 27, 2009 at 5:49 PM, Peter Zijlstra <[email protected]> wrote:
> On Fri, 2009-03-27 at 11:12 +0900, Minchan Kim wrote:
>> Hi, Peter and Eric.
>>
>> I am not expert about futex.
>> I am sorry if this is dumb question.
>>
>> If we use private futex, get_futex_key don't call get_user_pages_fast
>> which pins page at page table.
>
> But also drops that page ref at the end of get_futex_key(). The whole
> and only purpose of using get_user_pages_fast() is to get at the mapping
> data without having to obtain the mmap_sem.
Thanks.
I understand. It's not only private futex.
>
>> Then, get_futex_value_locked calls __cpy_from_user_inatomic with
>> pagefault_disable.
>>
>> Who make sure the user page is mapped at app's page table ?
>
> Nobody, all uses of get_futex_value_locked() have to deal with it
> returning -EFAULT.
Does It mean that __copy_from_user_inatomic in get_futex_value_locked
would be failed rather than sleep?
In fact, I don't make sure _copy_from_user_inatomic function's meaning.
As far as I understand, It never sleep. It just can be failed in case
of user page isn't mapped. Is right ?
Otherwise, it can be scheduled with pagefault_disable which increments
preempt_count. It is a atomic bug.
If my assume is right, it can be failed rather than sleep.
At this case, other architecture implements __copy_from_user_inatomic
with __copy_from_user which can be scheduled. It also can be bug.
Hmm, Now I am confusing.
> Most of this is legacy btw, from when futex ops were done under the
> mmap_sem. Back then we couldn't fault because that would cause mmap_sem
> recursion. Howver, now that we don't hold mmap_sem anymore we could use
> a faulting user access like get_user().
> Darren has been working on patches to clean that up, some of those are
> already merged in the -tip tree.
Thanks for good information.
It will be very desirable way to enhance kernel performance.
> HTH
>
--
Kinds regards,
Minchan Kim
On Fri, 2009-03-27 at 19:56 +0900, Minchan Kim wrote:
> >> Then, get_futex_value_locked calls __cpy_from_user_inatomic with
> >> pagefault_disable.
> >>
> >> Who make sure the user page is mapped at app's page table ?
> >
> > Nobody, all uses of get_futex_value_locked() have to deal with it
> > returning -EFAULT.
>
> Does It mean that __copy_from_user_inatomic in get_futex_value_locked
> would be failed rather than sleep?
Correct.
> In fact, I don't make sure _copy_from_user_inatomic function's meaning.
> As far as I understand, It never sleep. It just can be failed in case
> of user page isn't mapped. Is right ?
Correct.
> Otherwise, it can be scheduled with pagefault_disable which increments
> preempt_count. It is a atomic bug.
> If my assume is right, it can be failed rather than sleep.
> At this case, other architecture implements __copy_from_user_inatomic
> with __copy_from_user which can be scheduled. It also can be bug.
>
> Hmm, Now I am confusing.
Confused I guess ;-)
The trick is in the in_atomic() check in the pagefault handler and the
fixup section of the copy routines.
#define __copy_user(to, from, size) \
do { \
int __d0, __d1, __d2; \
__asm__ __volatile__( \
" cmp $7,%0\n" \
" jbe 1f\n" \
" movl %1,%0\n" \
" negl %0\n" \
" andl $7,%0\n" \
" subl %0,%3\n" \
"4: rep; movsb\n" \
" movl %3,%0\n" \
" shrl $2,%0\n" \
" andl $3,%3\n" \
" .align 2,0x90\n" \
"0: rep; movsl\n" \
" movl %3,%0\n" \
"1: rep; movsb\n" \
"2:\n" \
".section .fixup,\"ax\"\n" \
"5: addl %3,%0\n" \
" jmp 2b\n" \
"3: lea 0(%3,%0,4),%0\n" \
" jmp 2b\n" \
".previous\n" \
".section __ex_table,\"a\"\n" \
" .align 4\n" \
" .long 4b,5b\n" \
" .long 0b,3b\n" \
" .long 1b,2b\n" \
".previous" \
: "=&c"(size), "=&D" (__d0), "=&S" (__d1), "=r"(__d2) \
: "3"(size), "0"(size), "1"(to), "2"(from) \
: "memory"); \
} while (0)
see that __ex_table section, it tells the fault handler where to
continue in case of an atomic fault.
> > Most of this is legacy btw, from when futex ops were done under the
> > mmap_sem. Back then we couldn't fault because that would cause mmap_sem
> > recursion. Howver, now that we don't hold mmap_sem anymore we could use
> > a faulting user access like get_user().
> > Darren has been working on patches to clean that up, some of those are
> > already merged in the -tip tree.
>
> Thanks for good information.
> It will be very desirable way to enhance kernel performance.
I doubt it'll make a measurable difference, if you need to fault
performance sucks anyway. If you don't, the current code is just as
fast.
On Fri, Mar 27, 2009 at 8:14 PM, Peter Zijlstra <[email protected]> wrote:> On Fri, 2009-03-27 at 19:56 +0900, Minchan Kim wrote:>>> >> Then, get_futex_value_locked calls __cpy_from_user_inatomic with>> >> pagefault_disable.>> >>>> >> Who make sure the user page is mapped at app's page table ?>> >>> > Nobody, all uses of get_futex_value_locked() have to deal with it>> > returning -EFAULT.>>>> Does It mean that __copy_from_user_inatomic in get_futex_value_locked>> would be failed rather than sleep?>> Correct.>>> In fact, I don't make sure _copy_from_user_inatomic function's meaning.>> As far as I understand, It never sleep. It just can be failed in case>> of user page isn't mapped. Is right ?>> Correct.>>> Otherwise, it can be scheduled with pagefault_disable which increments>> preempt_count. It is a atomic bug.>> If my assume is right, it can be failed rather than sleep.>> At this case, other architecture implements __copy_from_user_inatomic>> with __copy_from_user which can be scheduled. It also can be bug.>>>> Hmm, Now I am confusing.>> Confused I guess ;-)> The trick is in the in_atomic() check in the pagefault handler and the> fixup section of the copy routines.
Whew~, There was good hidden trick.I will dive into this assembly.I always thanks for your kindness. :)
> #define __copy_user(to, from, size) \> do { \> int __d0, __d1, __d2; \> __asm__ __volatile__( \> " cmp $7,%0\n" \> " jbe 1f\n" \> " movl %1,%0\n" \> " negl %0\n" \> " andl $7,%0\n" \> " subl %0,%3\n" \> "4: rep; movsb\n" \> " movl %3,%0\n" \> " shrl $2,%0\n" \> " andl $3,%3\n" \> " .align 2,0x90\n" \> "0: rep; movsl\n" \> " movl %3,%0\n" \> "1: rep; movsb\n" \> "2:\n" \> ".section .fixup,\"ax\"\n" \> "5: addl %3,%0\n" \> " jmp 2b\n" \> "3: lea 0(%3,%0,4),%0\n" \> " jmp 2b\n" \> ".previous\n" \> ".section __ex_table,\"a\"\n" \> " .align 4\n" \> " .long 4b,5b\n" \> " .long 0b,3b\n" \> " .long 1b,2b\n" \> ".previous" \> : "=&c"(size), "=&D" (__d0), "=&S" (__d1), "=r"(__d2) \> : "3"(size), "0"(size), "1"(to), "2"(from) \> : "memory"); \> } while (0)>> see that __ex_table section, it tells the fault handler where to> continue in case of an atomic fault.>>> > Most of this is legacy btw, from when futex ops were done under the>> > mmap_sem. Back then we couldn't fault because that would cause mmap_sem>> > recursion. Howver, now that we don't hold mmap_sem anymore we could use>> > a faulting user access like get_user().>> > Darren has been working on patches to clean that up, some of those are>> > already merged in the -tip tree.>>>> Thanks for good information.>> It will be very desirable way to enhance kernel performance.>> I doubt it'll make a measurable difference, if you need to fault> performance sucks anyway. If you don't, the current code is just as> fast.>
-- Kinds regards,Minchan Kim????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?
Minchan Kim wrote:
> On Fri, Mar 27, 2009 at 8:14 PM, Peter Zijlstra <[email protected]> wrote:
>> On Fri, 2009-03-27 at 19:56 +0900, Minchan Kim wrote:
>>
>>>>> Then, get_futex_value_locked calls __cpy_from_user_inatomic with
>>>>> pagefault_disable.
>>>>>
>>>>> Who make sure the user page is mapped at app's page table ?
>>>> Nobody, all uses of get_futex_value_locked() have to deal with it
>>>> returning -EFAULT.
>>> Does It mean that __copy_from_user_inatomic in get_futex_value_locked
>>> would be failed rather than sleep?
>> Correct.
>>
>>> In fact, I don't make sure _copy_from_user_inatomic function's meaning.
>>> As far as I understand, It never sleep. It just can be failed in case
>>> of user page isn't mapped. Is right ?
>> Correct.
>>
>>> Otherwise, it can be scheduled with pagefault_disable which increments
>>> preempt_count. It is a atomic bug.
>>> If my assume is right, it can be failed rather than sleep.
>>> At this case, other architecture implements __copy_from_user_inatomic
>>> with __copy_from_user which can be scheduled. It also can be bug.
>>>
>>> Hmm, Now I am confusing.
>> Confused I guess ;-)
>> The trick is in the in_atomic() check in the pagefault handler and the
>> fixup section of the copy routines.
>
> Whew~, There was good hidden trick.
> I will dive into this assembly.
> I always thanks for your kindness. :)
>
>> #define __copy_user(to, from, size) \
>> do { \
>> int __d0, __d1, __d2; \
>> __asm__ __volatile__( \
>> " cmp $7,%0\n" \
>> " jbe 1f\n" \
>> " movl %1,%0\n" \
>> " negl %0\n" \
>> " andl $7,%0\n" \
>> " subl %0,%3\n" \
>> "4: rep; movsb\n" \
>> " movl %3,%0\n" \
>> " shrl $2,%0\n" \
>> " andl $3,%3\n" \
>> " .align 2,0x90\n" \
>> "0: rep; movsl\n" \
>> " movl %3,%0\n" \
>> "1: rep; movsb\n" \
>> "2:\n" \
>> ".section .fixup,\"ax\"\n" \
>> "5: addl %3,%0\n" \
>> " jmp 2b\n" \
>> "3: lea 0(%3,%0,4),%0\n" \
>> " jmp 2b\n" \
>> ".previous\n" \
>> ".section __ex_table,\"a\"\n" \
>> " .align 4\n" \
>> " .long 4b,5b\n" \
>> " .long 0b,3b\n" \
>> " .long 1b,2b\n" \
>> ".previous" \
>> : "=&c"(size), "=&D" (__d0), "=&S" (__d1), "=r"(__d2) \
>> : "3"(size), "0"(size), "1"(to), "2"(from) \
>> : "memory"); \
>> } while (0)
>>
>> see that __ex_table section, it tells the fault handler where to
>> continue in case of an atomic fault.
>>
>>>> Most of this is legacy btw, from when futex ops were done under the
>>>> mmap_sem. Back then we couldn't fault because that would cause mmap_sem
>>>> recursion. Howver, now that we don't hold mmap_sem anymore we could use
>>>> a faulting user access like get_user().
>>>> Darren has been working on patches to clean that up, some of those are
>>>> already merged in the -tip tree.
I'm a little late to the party I guess. Minchan, a lot of the fault
logic has been cleaned up in the tip tree, core/futexes branch. The
removes a lot of the legacy complication from the faulting paths.
However, the get_futex_key code remains the same if I remember correctly.
>>> Thanks for good information.
>>> It will be very desirable way to enhance kernel performance.
>> I doubt it'll make a measurable difference, if you need to fault
>> performance sucks anyway. If you don't, the current code is just as
>> fast.
>>
Agreed. If you are suffering performance hits from excessive paging,
consider locking your memory.
--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team