2018-03-11 17:00:07

by 焦晓冬

[permalink] [raw]
Subject: resend, finish_wait with list_empty_careful maybe fragile or buggy

Sorry, this is a resend because the previous one was messed
up by my editor and hard to be read.

void finish_wait(
struct wait_queue_head *wq_head,
struct wait_queue_entry *wq_entry)
{
....
-> if (!list_empty_careful(&wq_entry->entry)) {
-> spin_lock_irqsave(&wq_head->lock, flags);
-> list_del_init(&wq_entry->entry);
-> spin_unlock_irqrestore(&wq_head->lock, flags);
-> }
}

After careful look into the stop/wakeup code, I found
the above code fragile or even buggy. This code was
introduced at least 14 years ago and seems fragile or
buggy now after years of study on SMP synchronization
by us.

I understand this code are being used a lot and no bug
seems to emerge. But, as I'll explain, it depends on a lot
of unreliable implementation details.

Firstly,

static inline int list_empty_careful(const struct list_head *head)
{
struct list_head *next = head->next;
return (next == head) && (next == head->prev);
}

note that the read of head->next & head->prev is not
protected by READ_ONCE. So the read is free to be
optimized out entirely. Luckily, this optimization is hard
for compilers now, since all other accesses are out of
finish_wait. And still, GCC won't spit aligned accesses
into multiple instructions so it is atomic so far.

Secondly,

if ( ! list_empty_careful(&wq_entry->entry) ) {
<remove entry with spinning-lock>
}
<ends stack frame of the function calling finish_wait>
<overwrites wq_entry with another frame>

and

__wake_up_common() -->
<read wq_entry->func> -->
<read wq_entry->flags> -->
autoremove_wake_function() -->
<remove wq_entry->entry from wait queue> -->

are not properly ordered for SMP so that <read wq_entry->flags>
may be reordered after <remove wq_entry->entry from wait queue>
since no dependency or memory barrier forbids it. This may cause
<overwrites wq_entry with another frame> on one CPU takes place
before <read wq_entry->flags> on another CPU and cause
<read wq_entry->flags> to return bad value.

This behavior is not reported may thank to:
- quite a few code is using autoremove_wake_function
- CPU pipeline is not as deep as to make this emerge

Best regards,
Patrick Trol


2018-03-12 01:53:22

by 焦晓冬

[permalink] [raw]
Subject: Re: resend, finish_wait with list_empty_careful maybe fragile or buggy

> Sorry, this is a resend because the previous one was messed
> up by my editor and hard to be read.
>
> void finish_wait(
> struct wait_queue_head *wq_head,
> struct wait_queue_entry *wq_entry)
> {
> ....
> -> if (!list_empty_careful(&wq_entry->entry)) {
> -> spin_lock_irqsave(&wq_head->lock, flags);
> -> list_del_init(&wq_entry->entry);
> -> spin_unlock_irqrestore(&wq_head->lock, flags);
> -> }
> }
>
> After careful look into the stop/wakeup code, I found
> the above code fragile or even buggy. This code was
> introduced at least 14 years ago and seems fragile or
> buggy now after years of study on SMP synchronization
> by us.
>
> I understand this code are being used a lot and no bug
> seems to emerge. But, as I'll explain, it depends on a lot
> of unreliable implementation details.
>
> Firstly,
>
> static inline int list_empty_careful(const struct list_head *head)
> {
> struct list_head *next = head->next;
> return (next == head) && (next == head->prev);
> }
>
> note that the read of head->next & head->prev is not
> protected by READ_ONCE. So the read is free to be
> optimized out entirely. Luckily, this optimization is hard
> for compilers now, since all other accesses are out of
> finish_wait. And still, GCC won't spit aligned accesses
> into multiple instructions so it is atomic so far.
>
> Secondly,
>
> if ( ! list_empty_careful(&wq_entry->entry) ) {
> <remove entry with spinning-lock>
> }
> <ends stack frame of the function calling finish_wait>
> <overwrites wq_entry with another frame>
>
> and
>
> __wake_up_common() -->
> <read wq_entry->func> -->
> <read wq_entry->flags> -->
> autoremove_wake_function() -->
> <remove wq_entry->entry from wait queue> -->
>
> are not properly ordered for SMP so that <read wq_entry->flags>
> may be reordered after <remove wq_entry->entry from wait queue>
> since no dependency or memory barrier forbids it. This may cause
> <overwrites wq_entry with another frame> on one CPU takes place
> before <read wq_entry->flags> on another CPU and cause
> <read wq_entry->flags> to return bad value.
>
> This behavior is not reported may thank to:
> - few code is using autoremove_wake_function
> - CPU pipeline is not as deep as to make this emerge
>

Hi, mingo, peterz

Would you please have a look at it if it won't waste you much time.
Thanks a lot.

CC scheduler maintainers

Best regards,
Patrick Trol

2018-03-12 14:29:44

by 焦晓冬

[permalink] [raw]
Subject: Re: resend, finish_wait with list_empty_careful maybe fragile or buggy

+ CC Boqun
in case you are interested in this topic

Best Regards,
Trol

> Sorry, this is a resend because the previous one was messed
> up by my editor and hard to be read.
>
> void finish_wait(struct wait_queue_head *wq_head,
> struct wait_queue_entry *wq_entry)
> {
> ....
> -> if (!list_empty_careful(&wq_entry->entry)) {
> -> spin_lock_irqsave(&wq_head->lock, flags);
> -> list_del_init(&wq_entry->entry);
> -> spin_unlock_irqrestore(&wq_head->lock, flags);
> -> }
> }
>
> After careful look into the stop/wakeup code, I found the above
> code fragile or even buggy. This code was introduced at least 14 years
> ago and seems fragile or buggy now after years of study on SMP
> synchronization by us.
>
> I understand this code are being used a lot and no bug seems to emerge.
> But, as I'll explain, it depends on a lot of unreliable implementation details.
>
> Firstly,
>
> static inline int list_empty_careful(const struct list_head *head)
> {
> struct list_head *next = head->next;
> return (next == head) && (next == head->prev);
> }
>
> note that the read of head->next & head->prev is not protected by
> READ_ONCE. So the read is free to be optimized out entirely.
> Luckily, this optimization is hard for compilers now, since all other accesses
> are out of finish_wait. And still, GCC won't spit aligned accesses into multiple
> instructions so it is atomic so far.
>
> Secondly,
>
> if ( ! list_empty_careful(&wq_entry->entry) ) {
> [remove entry with spinning-lock]
> }
> [ends stack frame of the function calling finish_wait]
> [overwrites wq_entry with another frame]
>
> and
>
> __wake_up_common() -->
> [read wq_entry->func] -->
> [read wq_entry->flags] -->
> autoremove_wake_function() -->
> [remove wq_entry->entry from wait queue] -->
>
> are not properly ordered for SMP so that [read wq_entry->flags]
> may be reordered after [remove wq_entry->entry from wait queue]
> since no dependency or memory barrier forbids it. This may cause
> [overwrites wq_entry with another frame] on one CPU takes place
> before [read wq_entry->flags] on another CPU and cause
> [read wq_entry->flags] to return bad value.
>
> This behavior is not reported may thank to:
> - few code is using autoremove_wake_function
> - CPU pipeline is not as deep as to make this emerge
>
> Best regards,
> Trol