2013-05-21 09:18:05

by Roman Gushchin

[permalink] [raw]
Subject: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

Hi, all!

This is a fix for a problem described here:
https://lkml.org/lkml/2013/4/16/371 .
---

Some network functions (udp4_lib_lookup2(), for instance) use the
hlist_nulls_for_each_entry_rcu macro in a way that assumes restarting
of a loop. In this case, it is strictly necessary to reread the head->first
value from the memory before each scan.
Without additional hints, gcc caches this value in a register. In this case,
if a cached node is moved to another chain during the scan, we can loop
forever getting wrong nulls values and restarting the loop uninterruptedly.

Signed-off-by: Roman Gushchin <[email protected]>
Reported-by: Boris Zhmurov <[email protected]>
---
include/linux/rculist_nulls.h | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
index 2ae1371..efd51bf 100644
--- a/include/linux/rculist_nulls.h
+++ b/include/linux/rculist_nulls.h
@@ -37,8 +37,9 @@ static inline void hlist_nulls_del_init_rcu(struct
hlist_nulls_node *n)
}
}

-#define hlist_nulls_first_rcu(head) \
- (*((struct hlist_nulls_node __rcu __force **)&(head)->first))
+#define hlist_nulls_first_rcu(head) \
+ (*((struct hlist_nulls_node __rcu __force **) \
+ &((volatile typeof(*head) *)head)->first))

#define hlist_nulls_next_rcu(node) \
(*((struct hlist_nulls_node __rcu __force **)&(node)->next))
--
1.8.1.2


2013-05-21 10:42:00

by David Laight

[permalink] [raw]
Subject: RE: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

> Some network functions (udp4_lib_lookup2(), for instance) use the
> hlist_nulls_for_each_entry_rcu macro in a way that assumes restarting
> of a loop. In this case, it is strictly necessary to reread the head->first
> value from the memory before each scan.
> Without additional hints, gcc caches this value in a register. In this case,
> if a cached node is moved to another chain during the scan, we can loop
> forever getting wrong nulls values and restarting the loop uninterruptedly.

Hmmm.... if either inet_ehashfn() or next_pseudo_random32() is
called gcc must reread it anyway.
I'm surprised gcc is generating separate code for all the conditional
loop endings. So why is it caching head->first.
The 'list empty' might be short-circuited - but that would only
be relevant after a rescan.
I suspect something else is going on.

I'd also have thought that this code needs to scan the entire
hash list. If things are moved under its feet this won't happen.
If it can end up on a different list (because a node got moved)
it is also possible for a later node to move it back.
In that case it would end up on the correct list
...
> -#define hlist_nulls_first_rcu(head) \
> - (*((struct hlist_nulls_node __rcu __force **)&(head)->first))
> +#define hlist_nulls_first_rcu(head) \
> + (*((struct hlist_nulls_node __rcu __force **) \
> + &((volatile typeof(*head) *)head)->first))

I'd have thought it would be better to change hlist_nulls_first_rcu().

David


2013-05-21 11:55:14

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On 21.05.2013 14:40, David Laight wrote:
>> Some network functions (udp4_lib_lookup2(), for instance) use the
>> hlist_nulls_for_each_entry_rcu macro in a way that assumes restarting
>> of a loop. In this case, it is strictly necessary to reread the head->first
>> value from the memory before each scan.
>> Without additional hints, gcc caches this value in a register. In this case,
>> if a cached node is moved to another chain during the scan, we can loop
>> forever getting wrong nulls values and restarting the loop uninterruptedly.
>
> Hmmm.... if either inet_ehashfn() or next_pseudo_random32() is
> called gcc must reread it anyway.
> I'm surprised gcc is generating separate code for all the conditional
> loop endings. So why is it caching head->first.
> The 'list empty' might be short-circuited - but that would only
> be relevant after a rescan.
> I suspect something else is going on.

What do you mean?

> I'd also have thought that this code needs to scan the entire
> hash list. If things are moved under its feet this won't happen.
> If it can end up on a different list (because a node got moved)
> it is also possible for a later node to move it back.
> In that case it would end up on the correct list

Things are always moved to the head of the list, so, it's not a problem.

> ...
>> -#define hlist_nulls_first_rcu(head) \
>> - (*((struct hlist_nulls_node __rcu __force **)&(head)->first))
>> +#define hlist_nulls_first_rcu(head) \
>> + (*((struct hlist_nulls_node __rcu __force **) \
>> + &((volatile typeof(*head) *)head)->first))
>
> I'd have thought it would be better to change hlist_nulls_first_rcu().

It's exactly what I suggest. May be I miss something? Please, clarify.

Regards,
Roman

2013-05-21 12:09:14

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On Tue, May 21, 2013 at 01:05:48PM +0400, Roman Gushchin wrote:
> Hi, all!
>
> This is a fix for a problem described here:
> https://lkml.org/lkml/2013/4/16/371 .
> ---
>
> Some network functions (udp4_lib_lookup2(), for instance) use the
> hlist_nulls_for_each_entry_rcu macro in a way that assumes restarting
> of a loop. In this case, it is strictly necessary to reread the head->first
> value from the memory before each scan.
> Without additional hints, gcc caches this value in a register. In this case,
> if a cached node is moved to another chain during the scan, we can loop
> forever getting wrong nulls values and restarting the loop uninterruptedly.
>
> Signed-off-by: Roman Gushchin <[email protected]>
> Reported-by: Boris Zhmurov <[email protected]>
> ---
> include/linux/rculist_nulls.h | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
> index 2ae1371..efd51bf 100644
> --- a/include/linux/rculist_nulls.h
> +++ b/include/linux/rculist_nulls.h
> @@ -37,8 +37,9 @@ static inline void hlist_nulls_del_init_rcu(struct
> hlist_nulls_node *n)
> }
> }
>
> -#define hlist_nulls_first_rcu(head) \
> - (*((struct hlist_nulls_node __rcu __force **)&(head)->first))
> +#define hlist_nulls_first_rcu(head) \
> + (*((struct hlist_nulls_node __rcu __force **) \
> + &((volatile typeof(*head) *)head)->first))

Why not use ACCESS_ONCE() or (better) rcu_dereference_raw() here?

Thanx, Paul

> #define hlist_nulls_next_rcu(node) \
> (*((struct hlist_nulls_node __rcu __force **)&(node)->next))
> --
> 1.8.1.2
>

2013-05-21 12:53:07

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On 21.05.2013 16:09, Paul E. McKenney wrote:
> On Tue, May 21, 2013 at 01:05:48PM +0400, Roman Gushchin wrote:
>> Hi, all!
>>
>> This is a fix for a problem described here:
>> https://lkml.org/lkml/2013/4/16/371 .
>> ---
>>
>> Some network functions (udp4_lib_lookup2(), for instance) use the
>> hlist_nulls_for_each_entry_rcu macro in a way that assumes restarting
>> of a loop. In this case, it is strictly necessary to reread the head->first
>> value from the memory before each scan.
>> Without additional hints, gcc caches this value in a register. In this case,
>> if a cached node is moved to another chain during the scan, we can loop
>> forever getting wrong nulls values and restarting the loop uninterruptedly.
>>
>> Signed-off-by: Roman Gushchin <[email protected]>
>> Reported-by: Boris Zhmurov <[email protected]>
>> ---
>> include/linux/rculist_nulls.h | 5 +++--
>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
>> index 2ae1371..efd51bf 100644
>> --- a/include/linux/rculist_nulls.h
>> +++ b/include/linux/rculist_nulls.h
>> @@ -37,8 +37,9 @@ static inline void hlist_nulls_del_init_rcu(struct
>> hlist_nulls_node *n)
>> }
>> }
>>
>> -#define hlist_nulls_first_rcu(head) \
>> - (*((struct hlist_nulls_node __rcu __force **)&(head)->first))
>> +#define hlist_nulls_first_rcu(head) \
>> + (*((struct hlist_nulls_node __rcu __force **) \
>> + &((volatile typeof(*head) *)head)->first))
>
> Why not use ACCESS_ONCE() or (better) rcu_dereference_raw() here?

It will be nice, but will require to keep the old variant too (for using
in hlist_nulls_add_head_rcu() as in rcu_assign_pointer() argument). Do
you think, it's better?

Regards,
Roman

2013-05-21 12:59:08

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On Tue, May 21, 2013 at 04:46:54PM +0400, Roman Gushchin wrote:
> On 21.05.2013 16:09, Paul E. McKenney wrote:
> >On Tue, May 21, 2013 at 01:05:48PM +0400, Roman Gushchin wrote:
> >>Hi, all!
> >>
> >>This is a fix for a problem described here:
> >>https://lkml.org/lkml/2013/4/16/371 .
> >>---
> >>
> >>Some network functions (udp4_lib_lookup2(), for instance) use the
> >>hlist_nulls_for_each_entry_rcu macro in a way that assumes restarting
> >>of a loop. In this case, it is strictly necessary to reread the head->first
> >>value from the memory before each scan.
> >>Without additional hints, gcc caches this value in a register. In this case,
> >>if a cached node is moved to another chain during the scan, we can loop
> >>forever getting wrong nulls values and restarting the loop uninterruptedly.
> >>
> >>Signed-off-by: Roman Gushchin <[email protected]>
> >>Reported-by: Boris Zhmurov <[email protected]>
> >>---
> >> include/linux/rculist_nulls.h | 5 +++--
> >> 1 file changed, 3 insertions(+), 2 deletions(-)
> >>
> >>diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
> >>index 2ae1371..efd51bf 100644
> >>--- a/include/linux/rculist_nulls.h
> >>+++ b/include/linux/rculist_nulls.h
> >>@@ -37,8 +37,9 @@ static inline void hlist_nulls_del_init_rcu(struct
> >>hlist_nulls_node *n)
> >> }
> >> }
> >>
> >>-#define hlist_nulls_first_rcu(head) \
> >>- (*((struct hlist_nulls_node __rcu __force **)&(head)->first))
> >>+#define hlist_nulls_first_rcu(head) \
> >>+ (*((struct hlist_nulls_node __rcu __force **) \
> >>+ &((volatile typeof(*head) *)head)->first))
> >
> >Why not use ACCESS_ONCE() or (better) rcu_dereference_raw() here?
>
> It will be nice, but will require to keep the old variant too (for
> using in hlist_nulls_add_head_rcu() as in rcu_assign_pointer()
> argument). Do you think, it's better?

Both ACCESS_ONCE() and rcu_dereference_raw() can be used by updaters
as well as readers, so yes, I do think that it is better. Better to
keep the encapsulation rather than having to search for lots of volatile
casts should this idiom ever need to change.

Thanx, Paul

2013-05-21 13:37:34

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On Tue, 2013-05-21 at 05:09 -0700, Paul E. McKenney wrote:
> >
> > -#define hlist_nulls_first_rcu(head) \
> > - (*((struct hlist_nulls_node __rcu __force **)&(head)->first))
> > +#define hlist_nulls_first_rcu(head) \
> > + (*((struct hlist_nulls_node __rcu __force **) \
> > + &((volatile typeof(*head) *)head)->first))
>
> Why not use ACCESS_ONCE() or (better) rcu_dereference_raw() here?
>

+1

Very good catch Roman !

Thanks !

2013-05-21 13:43:11

by David Laight

[permalink] [raw]
Subject: RE: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

> >> -#define hlist_nulls_first_rcu(head) \
> >> - (*((struct hlist_nulls_node __rcu __force **)&(head)->first))
> >> +#define hlist_nulls_first_rcu(head) \
> >> + (*((struct hlist_nulls_node __rcu __force **) \
> >> + &((volatile typeof(*head) *)head)->first))
> >
> > I'd have thought it would be better to change hlist_nulls_first_rcu().
>
> It's exactly what I suggest. May be I miss something? Please, clarify.

Gah - chasing through too many defines in too many header files!
Actually making the 'head' structure have volatile pointers
might be better.
Possibly in struct hlist_nulls_node itself.
There are toooo many casts for sanity :-)

David


2013-05-21 13:44:48

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On Tue, 2013-05-21 at 05:09 -0700, Paul E. McKenney wrote:

> >
> > -#define hlist_nulls_first_rcu(head) \
> > - (*((struct hlist_nulls_node __rcu __force **)&(head)->first))
> > +#define hlist_nulls_first_rcu(head) \
> > + (*((struct hlist_nulls_node __rcu __force **) \
> > + &((volatile typeof(*head) *)head)->first))
>
> Why not use ACCESS_ONCE() or (better) rcu_dereference_raw() here?

More exactly we have :

#define list_entry_rcu(ptr, type, member) \
({typeof (*ptr) __rcu *__ptr = (typeof (*ptr) __rcu __force *)ptr; \
container_of((typeof(ptr))rcu_dereference_raw(__ptr), type, member); \
})

#define list_for_each_entry_rcu(pos, head, member) \
for (pos = list_entry_rcu((head)->next, typeof(*pos), member); \
&pos->member != (head); \
pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
<< and >>

#define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member) \
for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \
(!is_a_nulls(pos)) && \
({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))

We need to change hlist_nulls_for_each_entry_rcu() to use same construct,
so that the rcu_dereference_raw() is performed at the right place.

2013-05-21 14:47:46

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On 21.05.2013 17:44, Eric Dumazet wrote:
> On Tue, 2013-05-21 at 05:09 -0700, Paul E. McKenney wrote:
>
>>>
>>> -#define hlist_nulls_first_rcu(head) \
>>> - (*((struct hlist_nulls_node __rcu __force **)&(head)->first))
>>> +#define hlist_nulls_first_rcu(head) \
>>> + (*((struct hlist_nulls_node __rcu __force **) \
>>> + &((volatile typeof(*head) *)head)->first))
>>
>> Why not use ACCESS_ONCE() or (better) rcu_dereference_raw() here?
>
> More exactly we have :
>
> #define list_entry_rcu(ptr, type, member) \
> ({typeof (*ptr) __rcu *__ptr = (typeof (*ptr) __rcu __force *)ptr; \
> container_of((typeof(ptr))rcu_dereference_raw(__ptr), type, member); \
> })
>
> #define list_for_each_entry_rcu(pos, head, member) \
> for (pos = list_entry_rcu((head)->next, typeof(*pos), member); \
> &pos->member != (head); \
> pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
> << and >>
>
> #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member) \
> for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \
> (!is_a_nulls(pos)) && \
> ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
> pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))
>
> We need to change hlist_nulls_for_each_entry_rcu() to use same construct,
> so that the rcu_dereference_raw() is performed at the right place.

No.

This code has the same mistake: it is rcu_dereference_raw(head->first),
so there is nothing that prevents gcc to store the (head->first) value
in a register.

Let's see on assembler (3.4.46, net/ipv4/udp.c, udp4_lib_lookup2()):

(0): here we get head address from stack
(1): here we get head->first
(2): this is a place, where we jump (by mistake) in original version
from (3), (4) and (5). But in (4) we make same as in (1), so it's not a
problem. In patched version we always jump to (1).

1) original version:
0000000000002840 <udp4_lib_lookup2.isra.35>:
2840: 41 57 push %r15
2842: 41 89 d7 mov %edx,%r15d
2845: 41 56 push %r14
2847: 41 55 push %r13
2849: 41 54 push %r12
284b: 55 push %rbp
284c: 48 89 fd mov %rdi,%rbp
284f: 53 push %rbx
2850: 48 83 ec 28 sub $0x28,%rsp
(0) 2854: 48 8b 44 24 60 mov 0x60(%rsp),%rax
2859: 44 8b 64 24 68 mov 0x68(%rsp),%r12d
(1) 285e: 4c 8b 28 mov (%rax),%r13
(2) 2861: 31 ff xor %edi,%edi
2863: 41 f6 c5 01 test $0x1,%r13b
2867: 4d 89 ea mov %r13,%r10
286a: bb ff ff ff ff mov $0xffffffff,%ebx
286f: 74 32 je 28a3 <udp4_lib_lookup2.isra.35+0x63>
2871: e9 20 02 00 00 jmpq 2a96
<udp4_lib_lookup2.isra.35+0x256>
...
2940: 49 d1 ea shr %r10
2943: 4d 39 e2 cmp %r12,%r10
(3) 2946: 0f 85 15 ff ff ff jne 2861 <udp4_lib_lookup2.isra.35+0x21>
294c: 48 85 ff test %rdi,%rdi
294f: 74 27 je 2978
<udp4_lib_lookup2.isra.35+0x138>
2951: 41 89 db mov %ebx,%r11d
2954: 48 8d 5f 4c lea 0x4c(%rdi),%rbx
2958: 41 ba 02 00 00 00 mov $0x2,%r10d
295e: 66 90 xchg %ax,%ax
2960: 45 8d 6a 01 lea 0x1(%r10),%r13d
2964: 44 89 d0 mov %r10d,%eax
2967: f0 44 0f b1 2b lock cmpxchg %r13d,(%rbx)
296c: 41 39 c2 cmp %eax,%r10d
296f: 74 36 je 29a7
<udp4_lib_lookup2.isra.35+0x167>
2971: 85 c0 test %eax,%eax
2973: 41 89 c2 mov %eax,%r10d
2976: 75 e8 jne 2960
<udp4_lib_lookup2.isra.35+0x120>
2978: 31 ff xor %edi,%edi
297a: 48 83 c4 28 add $0x28,%rsp
297e: 48 89 f8 mov %rdi,%rax
2981: 5b pop %rbx
2982: 5d pop %rbp
2983: 41 5c pop %r12
2985: 41 5d pop %r13
2987: 41 5e pop %r14
2989: 41 5f pop %r15
298b: c3 retq
298c: 0f 1f 40 00 nopl 0x0(%rax)
2990: 4d 8b b2 58 02 00 00 mov 0x258(%r10),%r14
2997: 41 f6 46 6e 10 testb $0x10,0x6e(%r14)
299c: 0f 85 de fe ff ff jne 2880 <udp4_lib_lookup2.isra.35+0x40>
29a2: e9 17 ff ff ff jmpq 28be <udp4_lib_lookup2.isra.35+0x7e>
29a7: 48 3b 6f 30 cmp 0x30(%rdi),%rbp
29ab: 74 43 je 29f0
<udp4_lib_lookup2.isra.35+0x1b0>
29ad: b8 ff ff ff ff mov $0xffffffff,%eax
29b2: 41 39 c3 cmp %eax,%r11d
29b5: 7e c3 jle 297a
<udp4_lib_lookup2.isra.35+0x13a>
29b7: 89 4c 24 10 mov %ecx,0x10(%rsp)
29bb: 89 74 24 18 mov %esi,0x18(%rsp)
29bf: 44 89 44 24 08 mov %r8d,0x8(%rsp)
29c4: 44 89 0c 24 mov %r9d,(%rsp)
29c8: e8 c3 dc ff ff callq 690 <sock_put>
29cd: 48 8b 44 24 60 mov 0x60(%rsp),%rax
29d2: 8b 4c 24 10 mov 0x10(%rsp),%ecx
29d6: 8b 74 24 18 mov 0x18(%rsp),%esi
29da: 44 8b 44 24 08 mov 0x8(%rsp),%r8d
29df: 44 8b 0c 24 mov (%rsp),%r9d
29e3: 4c 8b 28 mov (%rax),%r13
(4) 29e6: e9 76 fe ff ff jmpq 2861 <udp4_lib_lookup2.isra.35+0x21>
29eb: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
29f0: 44 0f b7 57 0c movzwl 0xc(%rdi),%r10d
29f5: 66 41 83 fa 0a cmp $0xa,%r10w
29fa: 74 7f je 2a7b
<udp4_lib_lookup2.isra.35+0x23b>
29fc: 3b 4f 04 cmp 0x4(%rdi),%ecx
29ff: b8 ff ff ff ff mov $0xffffffff,%eax
2a04: 75 ac jne 29b2
<udp4_lib_lookup2.isra.35+0x172>
2a06: 0f b7 9f 7a 02 00 00 movzwl 0x27a(%rdi),%ebx
2a0d: 41 39 d8 cmp %ebx,%r8d
2a10: 75 a0 jne 29b2
<udp4_lib_lookup2.isra.35+0x172>
2a12: 8b 1f mov (%rdi),%ebx
2a14: 66 41 83 fa 02 cmp $0x2,%r10w
2a19: 41 0f 94 c2 sete %r10b
2a1d: 45 0f b6 d2 movzbl %r10b,%r10d
2a21: 85 db test %ebx,%ebx
2a23: 44 89 d0 mov %r10d,%eax
2a26: 74 0d je 2a35
<udp4_lib_lookup2.isra.35+0x1f5>
2a28: 39 de cmp %ebx,%esi
2a2a: b8 ff ff ff ff mov $0xffffffff,%eax
2a2f: 75 81 jne 29b2
<udp4_lib_lookup2.isra.35+0x172>
2a31: 41 8d 42 02 lea 0x2(%r10),%eax
2a35: 44 0f b7 97 78 02 00 movzwl 0x278(%rdi),%r10d
2a3c: 00
2a3d: 66 45 85 d2 test %r10w,%r10w
2a41: 74 0d je 2a50
<udp4_lib_lookup2.isra.35+0x210>
2a43: 66 45 39 d7 cmp %r10w,%r15w
2a47: 0f 85 60 ff ff ff jne 29ad
<udp4_lib_lookup2.isra.35+0x16d>
2a4d: 83 c0 02 add $0x2,%eax
2a50: 44 8b 57 10 mov 0x10(%rdi),%r10d
2a54: 45 85 d2 test %r10d,%r10d
2a57: 0f 84 55 ff ff ff je 29b2
<udp4_lib_lookup2.isra.35+0x172>
2a5d: 8d 58 02 lea 0x2(%rax),%ebx
2a60: 45 39 d1 cmp %r10d,%r9d
2a63: b8 ff ff ff ff mov $0xffffffff,%eax
2a68: 0f 44 c3 cmove %ebx,%eax
2a6b: e9 42 ff ff ff jmpq 29b2
<udp4_lib_lookup2.isra.35+0x172>
2a70: 41 bb ff ff ff ff mov $0xffffffff,%r11d
2a76: e9 05 fe ff ff jmpq 2880 <udp4_lib_lookup2.isra.35+0x40>
2a7b: 48 8b 9f 70 02 00 00 mov 0x270(%rdi),%rbx
2a82: b8 ff ff ff ff mov $0xffffffff,%eax
2a87: f6 43 6e 10 testb $0x10,0x6e(%rbx)
2a8b: 0f 85 21 ff ff ff jne 29b2
<udp4_lib_lookup2.isra.35+0x172>
2a91: e9 66 ff ff ff jmpq 29fc
<udp4_lib_lookup2.isra.35+0x1bc>
2a96: 4c 89 e8 mov %r13,%rax
2a99: 48 d1 e8 shr %rax
2a9c: 4c 39 e0 cmp %r12,%rax
(5) 2a9f: 0f 85 bc fd ff ff jne 2861 <udp4_lib_lookup2.isra.35+0x21>
2aa5: e9 ce fe ff ff jmpq 2978
<udp4_lib_lookup2.isra.35+0x138>
2aaa: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)

2) patched version:
0000000000002840 <udp4_lib_lookup2>:
2840: 41 57 push %r15
2842: 41 89 d7 mov %edx,%r15d
2845: 41 56 push %r14
2847: 41 55 push %r13
2849: 41 54 push %r12
284b: 55 push %rbp
284c: 48 89 fd mov %rdi,%rbp
284f: 53 push %rbx
2850: 48 83 ec 28 sub $0x28,%rsp
2854: 44 8b 6c 24 68 mov 0x68(%rsp),%r13d
(0) 2859: 4c 8b 64 24 60 mov 0x60(%rsp),%r12
(1) 285e: 4d 8b 14 24 mov (%r12),%r10
(2) 2862: 31 ff xor %edi,%edi
2864: bb ff ff ff ff mov $0xffffffff,%ebx
2869: 41 f6 c2 01 test $0x1,%r10b
286d: 74 2c je 289b <udp4_lib_lookup2+0x5b>
286f: e9 0a 02 00 00 jmpq 2a7e <udp4_lib_lookup2+0x23e>
...
2930: 49 d1 ea shr %r10
2933: 4d 39 d5 cmp %r10,%r13
(3) 2936: 0f 85 22 ff ff ff jne 285e <udp4_lib_lookup2+0x1e>
293c: 48 85 ff test %rdi,%rdi
293f: 74 27 je 2968 <udp4_lib_lookup2+0x128>
2941: 41 89 db mov %ebx,%r11d
2944: 48 8d 5f 4c lea 0x4c(%rdi),%rbx
2948: 41 ba 02 00 00 00 mov $0x2,%r10d
294e: 66 90 xchg %ax,%ax
2950: 45 8d 72 01 lea 0x1(%r10),%r14d
2954: 44 89 d0 mov %r10d,%eax
2957: f0 44 0f b1 33 lock cmpxchg %r14d,(%rbx)
295c: 41 39 c2 cmp %eax,%r10d
295f: 74 36 je 2997 <udp4_lib_lookup2+0x157>
2961: 85 c0 test %eax,%eax
2963: 41 89 c2 mov %eax,%r10d
2966: 75 e8 jne 2950 <udp4_lib_lookup2+0x110>
2968: 31 ff xor %edi,%edi
296a: 48 83 c4 28 add $0x28,%rsp
296e: 48 89 f8 mov %rdi,%rax
2971: 5b pop %rbx
2972: 5d pop %rbp
2973: 41 5c pop %r12
2975: 41 5d pop %r13
2977: 41 5e pop %r14
2979: 41 5f pop %r15
297b: c3 retq
297c: 0f 1f 40 00 nopl 0x0(%rax)
2980: 4d 8b b2 58 02 00 00 mov 0x258(%r10),%r14
2987: 41 f6 46 6e 10 testb $0x10,0x6e(%r14)
298c: 0f 85 e6 fe ff ff jne 2878 <udp4_lib_lookup2+0x38>
2992: e9 1f ff ff ff jmpq 28b6 <udp4_lib_lookup2+0x76>
2997: 48 3b 6f 30 cmp 0x30(%rdi),%rbp
299b: 74 3b je 29d8 <udp4_lib_lookup2+0x198>
299d: b8 ff ff ff ff mov $0xffffffff,%eax
29a2: 41 39 c3 cmp %eax,%r11d
29a5: 7e c3 jle 296a <udp4_lib_lookup2+0x12a>
29a7: 89 4c 24 10 mov %ecx,0x10(%rsp)
29ab: 89 74 24 18 mov %esi,0x18(%rsp)
29af: 44 89 44 24 08 mov %r8d,0x8(%rsp)
29b4: 44 89 0c 24 mov %r9d,(%rsp)
29b8: e8 d3 dc ff ff callq 690 <sock_put>
29bd: 8b 4c 24 10 mov 0x10(%rsp),%ecx
29c1: 8b 74 24 18 mov 0x18(%rsp),%esi
29c5: 44 8b 44 24 08 mov 0x8(%rsp),%r8d
29ca: 44 8b 0c 24 mov (%rsp),%r9d
(4) 29ce: e9 8b fe ff ff jmpq 285e <udp4_lib_lookup2+0x1e>

29d3: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
29d8: 44 0f b7 57 0c movzwl 0xc(%rdi),%r10d
29dd: 66 41 83 fa 0a cmp $0xa,%r10w
29e2: 74 7f je 2a63 <udp4_lib_lookup2+0x223>
29e4: 3b 4f 04 cmp 0x4(%rdi),%ecx
29e7: b8 ff ff ff ff mov $0xffffffff,%eax
29ec: 75 b4 jne 29a2 <udp4_lib_lookup2+0x162>
29ee: 0f b7 9f 7a 02 00 00 movzwl 0x27a(%rdi),%ebx
29f5: 41 39 d8 cmp %ebx,%r8d
29f8: 75 a8 jne 29a2 <udp4_lib_lookup2+0x162>
29fa: 8b 1f mov (%rdi),%ebx
29fc: 66 41 83 fa 02 cmp $0x2,%r10w
2a01: 41 0f 94 c2 sete %r10b
2a05: 45 0f b6 d2 movzbl %r10b,%r10d
2a09: 85 db test %ebx,%ebx
2a0b: 44 89 d0 mov %r10d,%eax
2a0e: 74 0d je 2a1d <udp4_lib_lookup2+0x1dd>
2a10: 39 de cmp %ebx,%esi
2a12: b8 ff ff ff ff mov $0xffffffff,%eax
2a17: 75 89 jne 29a2 <udp4_lib_lookup2+0x162>
2a19: 41 8d 42 02 lea 0x2(%r10),%eax
2a1d: 44 0f b7 97 78 02 00 movzwl 0x278(%rdi),%r10d
2a24: 00
2a25: 66 45 85 d2 test %r10w,%r10w
2a29: 74 0d je 2a38 <udp4_lib_lookup2+0x1f8>
2a2b: 66 45 39 d7 cmp %r10w,%r15w
2a2f: 0f 85 68 ff ff ff jne 299d <udp4_lib_lookup2+0x15d>
2a35: 83 c0 02 add $0x2,%eax
2a38: 44 8b 57 10 mov 0x10(%rdi),%r10d
2a3c: 45 85 d2 test %r10d,%r10d
2a3f: 0f 84 5d ff ff ff je 29a2 <udp4_lib_lookup2+0x162>
2a45: 8d 58 02 lea 0x2(%rax),%ebx
2a48: 45 39 d1 cmp %r10d,%r9d
2a4b: b8 ff ff ff ff mov $0xffffffff,%eax
2a50: 0f 44 c3 cmove %ebx,%eax
2a53: e9 4a ff ff ff jmpq 29a2 <udp4_lib_lookup2+0x162>
2a58: 41 bb ff ff ff ff mov $0xffffffff,%r11d
2a5e: e9 15 fe ff ff jmpq 2878 <udp4_lib_lookup2+0x38>
2a63: 48 8b 9f 70 02 00 00 mov 0x270(%rdi),%rbx
2a6a: b8 ff ff ff ff mov $0xffffffff,%eax
2a6f: f6 43 6e 10 testb $0x10,0x6e(%rbx)
2a73: 0f 85 29 ff ff ff jne 29a2 <udp4_lib_lookup2+0x162>
2a79: e9 66 ff ff ff jmpq 29e4 <udp4_lib_lookup2+0x1a4>
2a7e: 49 d1 ea shr %r10
2a81: 4d 39 d5 cmp %r10,%r13
(5) 2a84: 0f 85 d4 fd ff ff jne 285e <udp4_lib_lookup2+0x1e>
2a8a: e9 d9 fe ff ff jmpq 2968 <udp4_lib_lookup2+0x128>
2a8f: 90 nop

>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2013-05-21 15:16:26

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On Tue, 2013-05-21 at 18:47 +0400, Roman Gushchin wrote:
> On 21.05.2013 17:44, Eric Dumazet wrote:
> > On Tue, 2013-05-21 at 05:09 -0700, Paul E. McKenney wrote:
> >
> >>>
> >>> -#define hlist_nulls_first_rcu(head) \
> >>> - (*((struct hlist_nulls_node __rcu __force **)&(head)->first))
> >>> +#define hlist_nulls_first_rcu(head) \
> >>> + (*((struct hlist_nulls_node __rcu __force **) \
> >>> + &((volatile typeof(*head) *)head)->first))
> >>
> >> Why not use ACCESS_ONCE() or (better) rcu_dereference_raw() here?
> >
> > More exactly we have :
> >
> > #define list_entry_rcu(ptr, type, member) \
> > ({typeof (*ptr) __rcu *__ptr = (typeof (*ptr) __rcu __force *)ptr; \
> > container_of((typeof(ptr))rcu_dereference_raw(__ptr), type, member); \
> > })
> >
> > #define list_for_each_entry_rcu(pos, head, member) \
> > for (pos = list_entry_rcu((head)->next, typeof(*pos), member); \
> > &pos->member != (head); \
> > pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
> > << and >>
> >
> > #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member) \
> > for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \
> > (!is_a_nulls(pos)) && \
> > ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
> > pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))
> >
> > We need to change hlist_nulls_for_each_entry_rcu() to use same construct,
> > so that the rcu_dereference_raw() is performed at the right place.
>
> No.
>
> This code has the same mistake: it is rcu_dereference_raw(head->first),
> so there is nothing that prevents gcc to store the (head->first) value
> in a register.

Please read again what I wrote, you misundertood.

hlist_nulls_for_each_entry_rcu() should use same construct than
list_for_each_entry_rcu(), and not use rcu_dereference_raw()

Is that clear, or do you want me to send the patch ?






2013-05-21 15:38:29

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On Tue, 2013-05-21 at 18:47 +0400, Roman Gushchin wrote:

> This code has the same mistake: it is rcu_dereference_raw(head->first),
> so there is nothing that prevents gcc to store the (head->first) value
> in a register.

If other rcu accessors have the same problem, a more complete patch is
needed.

Thanks


2013-05-21 15:51:20

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On 21.05.2013 19:16, Eric Dumazet wrote:
> On Tue, 2013-05-21 at 18:47 +0400, Roman Gushchin wrote:
>> On 21.05.2013 17:44, Eric Dumazet wrote:
>>> On Tue, 2013-05-21 at 05:09 -0700, Paul E. McKenney wrote:
>>>
>>>>>
>>>>> -#define hlist_nulls_first_rcu(head) \
>>>>> - (*((struct hlist_nulls_node __rcu __force **)&(head)->first))
>>>>> +#define hlist_nulls_first_rcu(head) \
>>>>> + (*((struct hlist_nulls_node __rcu __force **) \
>>>>> + &((volatile typeof(*head) *)head)->first))
>>>>
>>>> Why not use ACCESS_ONCE() or (better) rcu_dereference_raw() here?
>>>
>>> More exactly we have :
>>>
>>> #define list_entry_rcu(ptr, type, member) \
>>> ({typeof (*ptr) __rcu *__ptr = (typeof (*ptr) __rcu __force *)ptr; \
>>> container_of((typeof(ptr))rcu_dereference_raw(__ptr), type, member); \
>>> })
>>>
>>> #define list_for_each_entry_rcu(pos, head, member) \
>>> for (pos = list_entry_rcu((head)->next, typeof(*pos), member); \
>>> &pos->member != (head); \
>>> pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
>>> << and >>
>>>
>>> #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member) \
>>> for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \
>>> (!is_a_nulls(pos)) && \
>>> ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
>>> pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))
>>>
>>> We need to change hlist_nulls_for_each_entry_rcu() to use same construct,
>>> so that the rcu_dereference_raw() is performed at the right place.
>>
>> No.
>>
>> This code has the same mistake: it is rcu_dereference_raw(head->first),
>> so there is nothing that prevents gcc to store the (head->first) value
>> in a register.
>
> Please read again what I wrote, you misundertood.
>
> hlist_nulls_for_each_entry_rcu() should use same construct than
> list_for_each_entry_rcu(), and not use rcu_dereference_raw()
>
> Is that clear, or do you want me to send the patch ?

If you think, that it will solve the problem, please, send a patch. I
think, you are wrong here.
If you think only that it will look better, I agree.

Regards,
Roman

2013-05-21 15:51:47

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On 21.05.2013 19:38, Eric Dumazet wrote:
> On Tue, 2013-05-21 at 18:47 +0400, Roman Gushchin wrote:
>
>> This code has the same mistake: it is rcu_dereference_raw(head->first),
>> so there is nothing that prevents gcc to store the (head->first) value
>> in a register.
>
> If other rcu accessors have the same problem, a more complete patch is
> needed.

Ok, I'll try to provide a complete patch ASAP.

>
> Thanks
>
>
>

2013-05-21 18:12:37

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

> If other rcu accessors have the same problem, a more complete patch is
> needed.

[PATCH] rcu: fix a race in rcu lists traverse macros

Some network functions (udp4_lib_lookup2(), for instance) use the
rcu lists traverse macros (hlist_nulls_for_each_entry_rcu, for instance)
in a way that assumes restarting of a loop. In this case, it is strictly
necessary to reread the head->first value from the memory before each scan.
Without additional hints, gcc caches this value in a register. In this case,
if a cached node is moved to another chain during the scan, we can loop
forever getting wrong nulls values and restarting the loop uninterruptedly.

Signed-off-by: Roman Gushchin <[email protected]>
Reported-by: Boris Zhmurov <[email protected]>
---
include/linux/compiler.h | 6 ++++++
include/linux/rculist.h | 6 ++++--
include/linux/rculist_nulls.h | 3 ++-
3 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 92669cd..0e05d7c 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -351,6 +351,12 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
*/
#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))

+/*
+ * Prevent the compiler from optimizing accesses to structure member.
+ */
+#define GET_STRUCT_FIELD_VOLATILE(struct_ptr, field) \
+ (((volatile typeof(*struct_ptr) *)struct_ptr)->field)
+
/* Ignore/forbid kprobes attach on very low level functions marked by this attribute: */
#ifdef CONFIG_KPROBES
# define __kprobes __attribute__((__section__(".kprobes.text")))
diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index 8089e35..6c3eea7 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -282,7 +282,8 @@ static inline void list_splice_init_rcu(struct list_head *list,
* as long as the traversal is guarded by rcu_read_lock().
*/
#define list_for_each_entry_rcu(pos, head, member) \
- for (pos = list_entry_rcu((head)->next, typeof(*pos), member); \
+ for (pos = list_entry_rcu(GET_STRUCT_FIELD_VOLATILE(head, next), \
+ typeof(*pos), member); \
&pos->member != (head); \
pos = list_entry_rcu(pos->member.next, typeof(*pos), member))

@@ -348,7 +349,8 @@ static inline void hlist_replace_rcu(struct hlist_node *old,
/*
* return the first or the next element in an RCU protected hlist
*/
-#define hlist_first_rcu(head) (*((struct hlist_node __rcu **)(&(head)->first)))
+#define hlist_first_rcu(head) (*((struct hlist_node __rcu **) \
+ (&GET_STRUCT_FIELD_VOLATILE(head, first))))
#define hlist_next_rcu(node) (*((struct hlist_node __rcu **)(&(node)->next)))
#define hlist_pprev_rcu(node) (*((struct hlist_node __rcu **)((node)->pprev)))

diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
index 2ae1371..b04fd0a 100644
--- a/include/linux/rculist_nulls.h
+++ b/include/linux/rculist_nulls.h
@@ -38,7 +38,8 @@ static inline void hlist_nulls_del_init_rcu(struct hlist_nulls_node *n)
}

#define hlist_nulls_first_rcu(head) \
- (*((struct hlist_nulls_node __rcu __force **)&(head)->first))
+ (*((struct hlist_nulls_node __rcu __force **) \
+ &(GET_STRUCT_FIELD_VOLATILE(head, first))))

#define hlist_nulls_next_rcu(node) \
(*((struct hlist_nulls_node __rcu __force **)&(node)->next))
--
1.8.1.2

2013-05-22 02:01:27

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On Tue, 2013-05-21 at 22:12 +0400, Roman Gushchin wrote:
> > If other rcu accessors have the same problem, a more complete patch is
> > needed.
>
> [PATCH] rcu: fix a race in rcu lists traverse macros
>
> Some network functions (udp4_lib_lookup2(), for instance) use the
> rcu lists traverse macros (hlist_nulls_for_each_entry_rcu, for instance)
> in a way that assumes restarting of a loop. In this case, it is strictly
> necessary to reread the head->first value from the memory before each scan.
> Without additional hints, gcc caches this value in a register. In this case,
> if a cached node is moved to another chain during the scan, we can loop
> forever getting wrong nulls values and restarting the loop uninterruptedly.
>
> Signed-off-by: Roman Gushchin <[email protected]>
> Reported-by: Boris Zhmurov <[email protected]>
> ---
> include/linux/compiler.h | 6 ++++++
> include/linux/rculist.h | 6 ++++--
> include/linux/rculist_nulls.h | 3 ++-
> 3 files changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 92669cd..0e05d7c 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -351,6 +351,12 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
> */
> #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
>
> +/*
> + * Prevent the compiler from optimizing accesses to structure member.
> + */
> +#define GET_STRUCT_FIELD_VOLATILE(struct_ptr, field) \
> + (((volatile typeof(*struct_ptr) *)struct_ptr)->field)
> +
> /* Ignore/forbid kprobes attach on very low level functions marked by this attribute: */
> #ifdef CONFIG_KPROBES
> # define __kprobes __attribute__((__section__(".kprobes.text")))
> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> index 8089e35..6c3eea7 100644
> --- a/include/linux/rculist.h
> +++ b/include/linux/rculist.h
> @@ -282,7 +282,8 @@ static inline void list_splice_init_rcu(struct list_head *list,
> * as long as the traversal is guarded by rcu_read_lock().
> */
> #define list_for_each_entry_rcu(pos, head, member) \
> - for (pos = list_entry_rcu((head)->next, typeof(*pos), member); \
> + for (pos = list_entry_rcu(GET_STRUCT_FIELD_VOLATILE(head, next), \
> + typeof(*pos), member); \
> &pos->member != (head); \
> pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
>
> @@ -348,7 +349,8 @@ static inline void hlist_replace_rcu(struct hlist_node *old,
> /*
> * return the first or the next element in an RCU protected hlist
> */
> -#define hlist_first_rcu(head) (*((struct hlist_node __rcu **)(&(head)->first)))
> +#define hlist_first_rcu(head) (*((struct hlist_node __rcu **) \
> + (&GET_STRUCT_FIELD_VOLATILE(head, first))))
> #define hlist_next_rcu(node) (*((struct hlist_node __rcu **)(&(node)->next)))
> #define hlist_pprev_rcu(node) (*((struct hlist_node __rcu **)((node)->pprev)))
>
> diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
> index 2ae1371..b04fd0a 100644
> --- a/include/linux/rculist_nulls.h
> +++ b/include/linux/rculist_nulls.h
> @@ -38,7 +38,8 @@ static inline void hlist_nulls_del_init_rcu(struct hlist_nulls_node *n)
> }
>
> #define hlist_nulls_first_rcu(head) \
> - (*((struct hlist_nulls_node __rcu __force **)&(head)->first))
> + (*((struct hlist_nulls_node __rcu __force **) \
> + &(GET_STRUCT_FIELD_VOLATILE(head, first))))
>
> #define hlist_nulls_next_rcu(node) \
> (*((struct hlist_nulls_node __rcu __force **)&(node)->next))

Please use ACCESS_ONCE(), which is the standard way to deal with this,
and remove the rcu_dereference_raw() in
hlist_nulls_for_each_entry_rcu()

something like : (for the nulls part only)

diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
index 2ae1371..1d485d3 100644
--- a/include/linux/rculist_nulls.h
+++ b/include/linux/rculist_nulls.h
@@ -40,6 +40,9 @@ static inline void hlist_nulls_del_init_rcu(struct hlist_nulls_node *n)
#define hlist_nulls_first_rcu(head) \
(*((struct hlist_nulls_node __rcu __force **)&(head)->first))

+#define hlist_nulls_first_rcu_once(head) \
+ ACCESS_ONCE(*((struct hlist_nulls_node __rcu __force **)&(head)->first))
+
#define hlist_nulls_next_rcu(node) \
(*((struct hlist_nulls_node __rcu __force **)&(node)->next))

@@ -107,7 +110,7 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
*
*/
#define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member) \
- for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \
+ for (pos = hlist_nulls_first_rcu_once(head); \
(!is_a_nulls(pos)) && \
({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))


2013-05-22 05:49:29

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On Tue, 2013-05-21 at 19:01 -0700, Eric Dumazet wrote:

> Please use ACCESS_ONCE(), which is the standard way to deal with this,
> and remove the rcu_dereference_raw() in
> hlist_nulls_for_each_entry_rcu()
>
> something like : (for the nulls part only)

Thinking a bit more about this, I am a bit worried by other uses of
ACCESS_ONCE(ptr->field)

rcu_dereference(ptr->field) intent is to reload ptr->field every time
from memory.

Do we really need a

#define ACCESS_FIELD_ONCE(PTR, FIELD) (((volatile typeof(*PTR) *)PTR)->FIELD)

#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))

and change all rcu_dererence(ptr->field) occurrences ?

I probably miss something obvious.

Anyway, following patch seems to also solve the problem, I need a sleep to understand why.


diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 0bf5d39..10b5c54 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -410,7 +410,7 @@ static inline int compute_score2(struct sock *sk, struct net *net,
static struct sock *udp4_lib_lookup2(struct net *net,
__be32 saddr, __be16 sport,
__be32 daddr, unsigned int hnum, int dif,
- struct udp_hslot *hslot2, unsigned int slot2)
+ struct hlist_nulls_head *head, unsigned int slot2)
{
struct sock *sk, *result;
struct hlist_nulls_node *node;
@@ -420,7 +420,7 @@ static struct sock *udp4_lib_lookup2(struct net *net,
begin:
result = NULL;
badness = 0;
- udp_portaddr_for_each_entry_rcu(sk, node, &hslot2->head) {
+ udp_portaddr_for_each_entry_rcu(sk, node, head) {
score = compute_score2(sk, net, saddr, sport,
daddr, hnum, dif);
if (score > badness) {
@@ -483,7 +483,7 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,

result = udp4_lib_lookup2(net, saddr, sport,
daddr, hnum, dif,
- hslot2, slot2);
+ &hslot2->head, slot2);
if (!result) {
hash2 = udp4_portaddr_hash(net, htonl(INADDR_ANY), hnum);
slot2 = hash2 & udptable->mask;
@@ -493,7 +493,7 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,

result = udp4_lib_lookup2(net, saddr, sport,
htonl(INADDR_ANY), hnum, dif,
- hslot2, slot2);
+ &hslot2->head, slot2);
}
rcu_read_unlock();
return result;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 42923b1..b5bc667 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -200,7 +200,7 @@ static inline int compute_score2(struct sock *sk, struct net *net,
static struct sock *udp6_lib_lookup2(struct net *net,
const struct in6_addr *saddr, __be16 sport,
const struct in6_addr *daddr, unsigned int hnum, int dif,
- struct udp_hslot *hslot2, unsigned int slot2)
+ struct hlist_nulls_head *head, unsigned int slot2)
{
struct sock *sk, *result;
struct hlist_nulls_node *node;
@@ -210,7 +210,7 @@ static struct sock *udp6_lib_lookup2(struct net *net,
begin:
result = NULL;
badness = -1;
- udp_portaddr_for_each_entry_rcu(sk, node, &hslot2->head) {
+ udp_portaddr_for_each_entry_rcu(sk, node, head) {
score = compute_score2(sk, net, saddr, sport,
daddr, hnum, dif);
if (score > badness) {
@@ -274,7 +274,7 @@ struct sock *__udp6_lib_lookup(struct net *net,

result = udp6_lib_lookup2(net, saddr, sport,
daddr, hnum, dif,
- hslot2, slot2);
+ &hslot2->head, slot2);
if (!result) {
hash2 = udp6_portaddr_hash(net, &in6addr_any, hnum);
slot2 = hash2 & udptable->mask;
@@ -284,7 +284,7 @@ struct sock *__udp6_lib_lookup(struct net *net,

result = udp6_lib_lookup2(net, saddr, sport,
&in6addr_any, hnum, dif,
- hslot2, slot2);
+ &hslot2->head, slot2);
}
rcu_read_unlock();
return result;



2013-05-22 10:25:45

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On Tue, May 21, 2013 at 07:01:20PM -0700, Eric Dumazet wrote:
> On Tue, 2013-05-21 at 22:12 +0400, Roman Gushchin wrote:
> > > If other rcu accessors have the same problem, a more complete patch is
> > > needed.
> >
> > [PATCH] rcu: fix a race in rcu lists traverse macros
> >
> > Some network functions (udp4_lib_lookup2(), for instance) use the
> > rcu lists traverse macros (hlist_nulls_for_each_entry_rcu, for instance)
> > in a way that assumes restarting of a loop. In this case, it is strictly
> > necessary to reread the head->first value from the memory before each scan.
> > Without additional hints, gcc caches this value in a register. In this case,
> > if a cached node is moved to another chain during the scan, we can loop
> > forever getting wrong nulls values and restarting the loop uninterruptedly.
> >
> > Signed-off-by: Roman Gushchin <[email protected]>
> > Reported-by: Boris Zhmurov <[email protected]>
> > ---
> > include/linux/compiler.h | 6 ++++++
> > include/linux/rculist.h | 6 ++++--
> > include/linux/rculist_nulls.h | 3 ++-
> > 3 files changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> > index 92669cd..0e05d7c 100644
> > --- a/include/linux/compiler.h
> > +++ b/include/linux/compiler.h
> > @@ -351,6 +351,12 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
> > */
> > #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
> >
> > +/*
> > + * Prevent the compiler from optimizing accesses to structure member.
> > + */
> > +#define GET_STRUCT_FIELD_VOLATILE(struct_ptr, field) \
> > + (((volatile typeof(*struct_ptr) *)struct_ptr)->field)
> > +
> > /* Ignore/forbid kprobes attach on very low level functions marked by this attribute: */
> > #ifdef CONFIG_KPROBES
> > # define __kprobes __attribute__((__section__(".kprobes.text")))
> > diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> > index 8089e35..6c3eea7 100644
> > --- a/include/linux/rculist.h
> > +++ b/include/linux/rculist.h
> > @@ -282,7 +282,8 @@ static inline void list_splice_init_rcu(struct list_head *list,
> > * as long as the traversal is guarded by rcu_read_lock().
> > */
> > #define list_for_each_entry_rcu(pos, head, member) \
> > - for (pos = list_entry_rcu((head)->next, typeof(*pos), member); \
> > + for (pos = list_entry_rcu(GET_STRUCT_FIELD_VOLATILE(head, next), \
> > + typeof(*pos), member); \
> > &pos->member != (head); \
> > pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
> >
> > @@ -348,7 +349,8 @@ static inline void hlist_replace_rcu(struct hlist_node *old,
> > /*
> > * return the first or the next element in an RCU protected hlist
> > */
> > -#define hlist_first_rcu(head) (*((struct hlist_node __rcu **)(&(head)->first)))
> > +#define hlist_first_rcu(head) (*((struct hlist_node __rcu **) \
> > + (&GET_STRUCT_FIELD_VOLATILE(head, first))))
> > #define hlist_next_rcu(node) (*((struct hlist_node __rcu **)(&(node)->next)))
> > #define hlist_pprev_rcu(node) (*((struct hlist_node __rcu **)((node)->pprev)))
> >
> > diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
> > index 2ae1371..b04fd0a 100644
> > --- a/include/linux/rculist_nulls.h
> > +++ b/include/linux/rculist_nulls.h
> > @@ -38,7 +38,8 @@ static inline void hlist_nulls_del_init_rcu(struct hlist_nulls_node *n)
> > }
> >
> > #define hlist_nulls_first_rcu(head) \
> > - (*((struct hlist_nulls_node __rcu __force **)&(head)->first))
> > + (*((struct hlist_nulls_node __rcu __force **) \
> > + &(GET_STRUCT_FIELD_VOLATILE(head, first))))
> >
> > #define hlist_nulls_next_rcu(node) \
> > (*((struct hlist_nulls_node __rcu __force **)&(node)->next))

Now that I am more awake...

The RCU list macros assume that the list header is either statically
allocated (in which case no ACCESS_ONCE() or whatever is needed) or
that the caller did whatever was necessary to protect the list header,
whether that be holding the right lock, using rcu_dereference() when
traversing the pointer to the list header, or whatever.

Maybe I need to document this assumption...

Thanx, Paul

> Please use ACCESS_ONCE(), which is the standard way to deal with this,
> and remove the rcu_dereference_raw() in
> hlist_nulls_for_each_entry_rcu()
>
> something like : (for the nulls part only)
>
> diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
> index 2ae1371..1d485d3 100644
> --- a/include/linux/rculist_nulls.h
> +++ b/include/linux/rculist_nulls.h
> @@ -40,6 +40,9 @@ static inline void hlist_nulls_del_init_rcu(struct hlist_nulls_node *n)
> #define hlist_nulls_first_rcu(head) \
> (*((struct hlist_nulls_node __rcu __force **)&(head)->first))
>
> +#define hlist_nulls_first_rcu_once(head) \
> + ACCESS_ONCE(*((struct hlist_nulls_node __rcu __force **)&(head)->first))
> +
> #define hlist_nulls_next_rcu(node) \
> (*((struct hlist_nulls_node __rcu __force **)&(node)->next))
>
> @@ -107,7 +110,7 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
> *
> */
> #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member) \
> - for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \
> + for (pos = hlist_nulls_first_rcu_once(head); \
> (!is_a_nulls(pos)) && \
> ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
> pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))
>
>
>

2013-05-22 11:58:27

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On 22.05.2013 09:49, Eric Dumazet wrote:
> On Tue, 2013-05-21 at 19:01 -0700, Eric Dumazet wrote:
>
>> Please use ACCESS_ONCE(), which is the standard way to deal with this,
>> and remove the rcu_dereference_raw() in
>> hlist_nulls_for_each_entry_rcu()
>>
>> something like : (for the nulls part only)
>
> Thinking a bit more about this, I am a bit worried by other uses of
> ACCESS_ONCE(ptr->field)
>
> rcu_dereference(ptr->field) intent is to reload ptr->field every time
> from memory.

Exactly.

>
> Do we really need a
>
> #define ACCESS_FIELD_ONCE(PTR, FIELD) (((volatile typeof(*PTR) *)PTR)->FIELD)
>
> #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
>
> and change all rcu_dererence(ptr->field) occurrences ?

Yes.
But not all: only "head->first". The node variable (or any other iterator) is
always a local variable that changes every cycle. So, we can rely on gcc here.

>
> I probably miss something obvious.
>
> Anyway, following patch seems to also solve the problem, I need a sleep to understand why.
>
> struct hlist_nulls_node *node;
> @@ -420,7 +420,7 @@ static struct sock *udp4_lib_lookup2(struct net *net,
> begin:
> result = NULL;
> badness = 0;
> - udp_portaddr_for_each_entry_rcu(sk, node, &hslot2->head) {
> + udp_portaddr_for_each_entry_rcu(sk, node, head) {
> score = compute_score2(sk, net, saddr, sport,
> daddr, hnum, dif);
> if (score > badness) {


IMHO, it doesn't solve.
Ok, ok, it can actually solve, as fast ANY modification of related code.
For instance, adding something like
unsigned int c = 0;
udp_portaddr_for_each_entry_rcu(sk, node, &hslot2->head) {
...
c++;
if (c > 1000000)
printk(...);
}
also "solves" the problem.

It looks like, that it's semantically strange to use the ACCESS_(FIELD)_ONCE
macro for telling gcc to reread something every time.
So, I created corresponding rcu_dereference_field(_raw/_bh) macros.

Please, find my third attempt to create a patch below.

Regards,
Roman

-----------------------------------------------------------------------

rcu: fix a race in rcu lists traverse macros

Some network functions (udp4_lib_lookup2(), for instance) use the
rcu lists traverse macros (hlist_nulls_for_each_entry_rcu, for instance)
in a way that assumes restarting of a loop. In this case, it is strictly
necessary to reread the head->first value from the memory before each scan.
Without additional hints, gcc caches this value in a register. In this case,
if a cached node is moved to another chain during the scan, we can loop
forever getting wrong nulls values and restarting the loop uninterruptedly.

Signed-off-by: Roman Gushchin <[email protected]>
Reported-by: Boris Zhmurov <[email protected]>
---
include/linux/compiler.h | 6 ++++++
include/linux/rculist.h | 9 +++++----
include/linux/rculist_nulls.h | 2 +-
include/linux/rcupdate.h | 9 +++++++++
4 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 92669cd..4109fab 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -351,6 +351,12 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
*/
#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))

+/*
+ * Same as ACCESS_ONCE(), but used for accessing field of a structure.
+ * The main goal is preventing compiler to store &ptr->field in a register.
+ */
+#define ACCESS_FIELD_ONCE(PTR, FIELD) (((volatile typeof(*PTR) *)PTR)->FIELD)
+
/* Ignore/forbid kprobes attach on very low level functions marked by this attribute: */
#ifdef CONFIG_KPROBES
# define __kprobes __attribute__((__section__(".kprobes.text")))
diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index 8089e35..7582cfe 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -282,7 +282,8 @@ static inline void list_splice_init_rcu(struct list_head *list,
* as long as the traversal is guarded by rcu_read_lock().
*/
#define list_for_each_entry_rcu(pos, head, member) \
- for (pos = list_entry_rcu((head)->next, typeof(*pos), member); \
+ for (pos = list_entry_rcu(ACCESS_FIELD_ONCE(head, next), \
+ typeof(*pos), member); \
&pos->member != (head); \
pos = list_entry_rcu(pos->member.next, typeof(*pos), member))

@@ -439,7 +440,7 @@ static inline void hlist_add_after_rcu(struct hlist_node *prev,
}

#define __hlist_for_each_rcu(pos, head) \
- for (pos = rcu_dereference(hlist_first_rcu(head)); \
+ for (pos = rcu_dereference_field(head, first); \
pos; \
pos = rcu_dereference(hlist_next_rcu(pos)))

@@ -454,7 +455,7 @@ static inline void hlist_add_after_rcu(struct hlist_node *prev,
* as long as the traversal is guarded by rcu_read_lock().
*/
#define hlist_for_each_entry_rcu(pos, head, member) \
- for (pos = hlist_entry_safe (rcu_dereference_raw(hlist_first_rcu(head)),\
+ for (pos = hlist_entry_safe(rcu_dereference_field_raw(head, first), \
typeof(*(pos)), member); \
pos; \
pos = hlist_entry_safe(rcu_dereference_raw(hlist_next_rcu(\
@@ -471,7 +472,7 @@ static inline void hlist_add_after_rcu(struct hlist_node *prev,
* as long as the traversal is guarded by rcu_read_lock().
*/
#define hlist_for_each_entry_rcu_bh(pos, head, member) \
- for (pos = hlist_entry_safe(rcu_dereference_bh(hlist_first_rcu(head)),\
+ for (pos = hlist_entry_safe(rcu_dereference_field_bh(head, first), \
typeof(*(pos)), member); \
pos; \
pos = hlist_entry_safe(rcu_dereference_bh(hlist_next_rcu(\
diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
index 2ae1371..ef431b4 100644
--- a/include/linux/rculist_nulls.h
+++ b/include/linux/rculist_nulls.h
@@ -107,7 +107,7 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
*
*/
#define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member) \
- for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \
+ for (pos = rcu_dereference_field_raw(head, first); \
(!is_a_nulls(pos)) && \
({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 4ccd68e..6fef0c2 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -640,6 +640,9 @@ static inline void rcu_preempt_sleep_check(void)

#define rcu_dereference_raw(p) rcu_dereference_check(p, 1) /*@@@ needed? @@@*/

+#define rcu_dereference_field_raw(PTR, FIELD) \
+ rcu_dereference_raw(ACCESS_FIELD_ONCE(PTR, FIELD))
+
/**
* rcu_access_index() - fetch RCU index with no dereferencing
* @p: The index to read
@@ -704,6 +707,9 @@ static inline void rcu_preempt_sleep_check(void)
*/
#define rcu_dereference(p) rcu_dereference_check(p, 0)

+#define rcu_dereference_field(PTR, FIELD) \
+ rcu_dereference(ACCESS_FIELD_ONCE(PTR, FIELD))
+
/**
* rcu_dereference_bh() - fetch an RCU-bh-protected pointer for dereferencing
* @p: The pointer to read, prior to dereferencing
@@ -712,6 +718,9 @@ static inline void rcu_preempt_sleep_check(void)
*/
#define rcu_dereference_bh(p) rcu_dereference_bh_check(p, 0)

+#define rcu_dereference_field_bh(PTR, FIELD) \
+ rcu_dereference_bh(ACCESS_FIELD_ONCE(PTR, FIELD))
+
/**
* rcu_dereference_sched() - fetch RCU-sched-protected pointer for dereferencing
* @p: The pointer to read, prior to dereferencing
--
1.8.1.2

2013-05-22 12:28:51

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On Wed, 2013-05-22 at 02:58 -0700, Paul E. McKenney wrote:

> Now that I am more awake...
>
> The RCU list macros assume that the list header is either statically
> allocated (in which case no ACCESS_ONCE() or whatever is needed) or
> that the caller did whatever was necessary to protect the list header,
> whether that be holding the right lock, using rcu_dereference() when
> traversing the pointer to the list header, or whatever.

Not sure what you mean, we do hold rcu_read_lock() here.

But when we jump back to begin, we do not do
"rcu_read_unlock()/rcu_read_lock()" pair.


2013-05-22 12:30:42

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On Wed, 2013-05-22 at 15:58 +0400, Roman Gushchin wrote:

> +/*
> + * Same as ACCESS_ONCE(), but used for accessing field of a structure.
> + * The main goal is preventing compiler to store &ptr->field in a register.

But &ptr->field is a constant during the whole duration of
udp4_lib_lookup2() and could be in a register, in my case field is at
offset 0, and ptr is a parameter (so could be in a 'register')

The bug you found is that compiler caches the indirection (ptr->field)
into a register, not that compiler stores &ptr->field into a register.

> + */
> +#define ACCESS_FIELD_ONCE(PTR, FIELD) (((volatile typeof(*PTR) *)PTR)->FIELD)
> +

Here we force the compiler to consider ptr as volatile, but semantically
it is not required in rcu_dereference(ptr->field)

We want field to be reloaded, not ptr.

So yes, the patch appears to fix the bug, but it sounds not logical to
me.


2013-05-22 13:00:56

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On Wed, May 22, 2013 at 05:28:47AM -0700, Eric Dumazet wrote:
> On Wed, 2013-05-22 at 02:58 -0700, Paul E. McKenney wrote:
>
> > Now that I am more awake...
> >
> > The RCU list macros assume that the list header is either statically
> > allocated (in which case no ACCESS_ONCE() or whatever is needed) or
> > that the caller did whatever was necessary to protect the list header,
> > whether that be holding the right lock, using rcu_dereference() when
> > traversing the pointer to the list header, or whatever.
>
> Not sure what you mean, we do hold rcu_read_lock() here.
>
> But when we jump back to begin, we do not do
> "rcu_read_unlock()/rcu_read_lock()" pair.

Right, rcu_read_lock() is part of the protection, but rcu_dereference()
is the other part.

All that aside, I can't claim that I understand what problem the various
patches would solve. ;-)

Thanx, Paul

2013-05-22 13:07:25

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On 22.05.2013 16:30, Eric Dumazet wrote:
> On Wed, 2013-05-22 at 15:58 +0400, Roman Gushchin wrote:
>
>> +/*
>> + * Same as ACCESS_ONCE(), but used for accessing field of a structure.
>> + * The main goal is preventing compiler to store &ptr->field in a register.
>
> But &ptr->field is a constant during the whole duration of
> udp4_lib_lookup2() and could be in a register, in my case field is at
> offset 0, and ptr is a parameter (so could be in a 'register')
>
> The bug you found is that compiler caches the indirection (ptr->field)
> into a register, not that compiler stores &ptr->field into a register.
>
>> + */
>> +#define ACCESS_FIELD_ONCE(PTR, FIELD) (((volatile typeof(*PTR) *)PTR)->FIELD)
>> +
>
> Here we force the compiler to consider ptr as volatile, but semantically
> it is not required in rcu_dereference(ptr->field)

Actually, we need to mark an "address of a place" where the field value is
located as volatile before dereferencing. I have no idea how to do it in another way,
except using multiple casts and offsetof's, but, IMHO, it will be even more complex:
ACCESS_ONCE(typeof(&ptr->field)((char*)ptr + offsetof(typeof(*ptr), field)))

>
> We want field to be reloaded, not ptr.
>
> So yes, the patch appears to fix the bug, but it sounds not logical to
> me.
>

May be we can enhance it by providing better/more detailed comments here?
Have you any suggestions?

Thanks,
Roman

2013-05-22 13:27:38

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

> So yes, the patch appears to fix the bug, but it sounds not logical to
> me.

I was confused because the copy of the code I found was different
(it has some checks for reusaddr - which force a function call in the
loop).

The code being compiled is:

begin:
result = NULL;
badness = -1;
udp_portaddr_for_each_entry_rcu(sk, node, &hslot2->head) {
score = compute_score2(sk, net, saddr, sport,
daddr, hnum, dif);
if (score > badness) {
result = sk;
badness = score;
if (score == SCORE2_MAX)
goto exact_match;
}
}
/*
* if the nulls value we got at the end of this lookup is
* not the expected one, we must restart lookup.
* We probably met an item that was moved to another chain.
*/
if (get_nulls_value(node) != slot2)
goto begin;

Which is entirely inlined - so the compiler is allowed to assume
that no other code modifies any of the data.
Hence it is allowed to cache the list head value.
Indeed it could convert the last line to "for (;;);".

A asm volatile ("":::"memory") somewhere would fix it.

David

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-05-22 13:36:45

by Eric Dumazet

[permalink] [raw]
Subject: RE: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On Wed, 2013-05-22 at 14:27 +0100, David Laight wrote:
> > So yes, the patch appears to fix the bug, but it sounds not logical to
> > me.
>
> I was confused because the copy of the code I found was different
> (it has some checks for reusaddr - which force a function call in the
> loop).
>
> The code being compiled is:
>
> begin:
> result = NULL;
> badness = -1;
> udp_portaddr_for_each_entry_rcu(sk, node, &hslot2->head) {

Here this loops begin by

someptr = rcu_dereference(somelocation);

May claim is rcu_dereference() should force the compiler to read again
somelocation. Its done thanks to ACCESS_ONCE(). But apparently in the
specific case of &hslot->head, it doesnt work.

When using head directly (ie the same value), the ACCESS_ONCE() does
correctly its job.


> score = compute_score2(sk, net, saddr, sport,
> daddr, hnum, dif);
> if (score > badness) {
> result = sk;
> badness = score;
> if (score == SCORE2_MAX)
> goto exact_match;
> }
> }
> /*
> * if the nulls value we got at the end of this lookup is
> * not the expected one, we must restart lookup.
> * We probably met an item that was moved to another chain.
> */
> if (get_nulls_value(node) != slot2)
> goto begin;
>
> Which is entirely inlined - so the compiler is allowed to assume
> that no other code modifies any of the data.

You missed the rcu_dereference()/ACCESS_ONCE() we have in this loop.




2013-05-22 13:55:30

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On 22.05.2013 17:27, David Laight wrote:
>> So yes, the patch appears to fix the bug, but it sounds not logical to
>> me.
>
> I was confused because the copy of the code I found was different
> (it has some checks for reusaddr - which force a function call in the
> loop).
>
> The code being compiled is:
>
> begin:
> result = NULL;
> badness = -1;
> udp_portaddr_for_each_entry_rcu(sk, node, &hslot2->head) {
> score = compute_score2(sk, net, saddr, sport,
> daddr, hnum, dif);
> if (score > badness) {
> result = sk;
> badness = score;
> if (score == SCORE2_MAX)
> goto exact_match;
> }
> }
> /*
> * if the nulls value we got at the end of this lookup is
> * not the expected one, we must restart lookup.
> * We probably met an item that was moved to another chain.
> */
> if (get_nulls_value(node) != slot2)
> goto begin;
>
> Which is entirely inlined - so the compiler is allowed to assume
> that no other code modifies any of the data.
> Hence it is allowed to cache the list head value.
> Indeed it could convert the last line to "for (;;);".
>
> A asm volatile ("":::"memory") somewhere would fix it.
>

Yes, it does. It was my first attempt to fix the issue.
But putting such instructions into each such cycle isn't a good idea, IMHO.

Regards,
Roman

2013-05-22 14:16:59

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On Wed, 2013-05-22 at 06:00 -0700, Paul E. McKenney wrote:

> Right, rcu_read_lock() is part of the protection, but rcu_dereference()
> is the other part.
>
> All that aside, I can't claim that I understand what problem the various
> patches would solve. ;-)

Problem is that rcu_dereference(expr) might be optimized by the compiler
to cache the dereferenced data in certain circumstances.

Following patch shows the difference if you look at the generated code:

This patch fixes the problem, and its not really obvious why !

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 0bf5d39..6aa8088 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -416,11 +416,12 @@ static struct sock *udp4_lib_lookup2(struct net *net,
struct hlist_nulls_node *node;
int score, badness, matches = 0, reuseport = 0;
u32 hash = 0;
+ struct hlist_nulls_head *head = &hslot2->head;

begin:
result = NULL;
badness = 0;
- udp_portaddr_for_each_entry_rcu(sk, node, &hslot2->head) {
+ udp_portaddr_for_each_entry_rcu(sk, node, head) {
score = compute_score2(sk, net, saddr, sport,
daddr, hnum, dif);
if (score > badness) {


2013-05-22 14:25:51

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

> Here this loops begin by
>
> someptr = rcu_dereference(somelocation);
>
> May claim is rcu_dereference() should force the compiler to read again
> somelocation. Its done thanks to ACCESS_ONCE(). But apparently in the
> specific case of &hslot->head, it doesnt work.

Hmmm....
#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
That might be doomed to fail for much the same reason as:
void x(struct foo *unaligned_ptr)
{
char *p = (void *)unaligned_ptr;
memcpy(tgt, p, sizeof *p);
}
generates alignment faults.
And that casts to a union type don't get around 'strict aliasing'.

Basically the compiler makes use of the fact that you should
cast addresses back to their original type before dereferencing them.

So I'm not sure you can use a cast to add a type qualifier.
The front-end lets you remove 'const', but I suspect the optimiser
is using the original types.

David

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-05-22 17:46:13

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On Wed, May 22, 2013 at 05:07:07PM +0400, Roman Gushchin wrote:
> On 22.05.2013 16:30, Eric Dumazet wrote:
> >On Wed, 2013-05-22 at 15:58 +0400, Roman Gushchin wrote:
> >
> >>+/*
> >>+ * Same as ACCESS_ONCE(), but used for accessing field of a structure.
> >>+ * The main goal is preventing compiler to store &ptr->field in a register.
> >
> >But &ptr->field is a constant during the whole duration of
> >udp4_lib_lookup2() and could be in a register, in my case field is at
> >offset 0, and ptr is a parameter (so could be in a 'register')
> >
> >The bug you found is that compiler caches the indirection (ptr->field)
> >into a register, not that compiler stores &ptr->field into a register.
> >
> >>+ */
> >>+#define ACCESS_FIELD_ONCE(PTR, FIELD) (((volatile typeof(*PTR) *)PTR)->FIELD)
> >>+
> >
> >Here we force the compiler to consider ptr as volatile, but semantically
> >it is not required in rcu_dereference(ptr->field)
>
> Actually, we need to mark an "address of a place" where the field value is
> located as volatile before dereferencing. I have no idea how to do it in another way,
> except using multiple casts and offsetof's, but, IMHO, it will be even more complex:
> ACCESS_ONCE(typeof(&ptr->field)((char*)ptr + offsetof(typeof(*ptr), field)))

Why not just ACCESS_ONCE(ptr->field)? Or if it is the thing that
ptr->field points to that is subject to change, ACCESS_ONCE(*ptr->field)?

Or rcu_dereference(ptr->field), as appropriate?

Thanx, Paul

> >We want field to be reloaded, not ptr.
> >
> >So yes, the patch appears to fix the bug, but it sounds not logical to
> >me.
> >
>
> May be we can enhance it by providing better/more detailed comments here?
> Have you any suggestions?
>
> Thanks,
> Roman
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2013-05-22 19:17:56

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On 22.05.2013 21:45, Paul E. McKenney wrote:
> On Wed, May 22, 2013 at 05:07:07PM +0400, Roman Gushchin wrote:
>> On 22.05.2013 16:30, Eric Dumazet wrote:
>>> On Wed, 2013-05-22 at 15:58 +0400, Roman Gushchin wrote:
>>>
>>>> +/*
>>>> + * Same as ACCESS_ONCE(), but used for accessing field of a structure.
>>>> + * The main goal is preventing compiler to store &ptr->field in a register.
>>>
>>> But &ptr->field is a constant during the whole duration of
>>> udp4_lib_lookup2() and could be in a register, in my case field is at
>>> offset 0, and ptr is a parameter (so could be in a 'register')
>>>
>>> The bug you found is that compiler caches the indirection (ptr->field)
>>> into a register, not that compiler stores &ptr->field into a register.
>>>
>>>> + */
>>>> +#define ACCESS_FIELD_ONCE(PTR, FIELD) (((volatile typeof(*PTR) *)PTR)->FIELD)
>>>> +
>>>
>>> Here we force the compiler to consider ptr as volatile, but semantically
>>> it is not required in rcu_dereference(ptr->field)
>>
>> Actually, we need to mark an "address of a place" where the field value is
>> located as volatile before dereferencing. I have no idea how to do it in another way,
>> except using multiple casts and offsetof's, but, IMHO, it will be even more complex:
>> ACCESS_ONCE(typeof(&ptr->field)((char*)ptr + offsetof(typeof(*ptr), field)))

Probably I miss one more ACCESS_ONCE() in this expression. Should be something like
ACCESS_ONCE(typeof(&ptr->field)((char*)ACCESS_ONCE(ptr) + offsetof(typeof(*ptr), field))) .
But this is not a working example, just an illustration against using ACCESS_ONCE() here.

> Why not just ACCESS_ONCE(ptr->field)? Or if it is the thing that
> ptr->field points to that is subject to change, ACCESS_ONCE(*ptr->field)?
>
> Or rcu_dereference(ptr->field), as appropriate?

It's not enough. So is the code now, and it doesn't work as expected.
You can't write (ptr->field) without ptr being marked as a volatile pointer.

I try to explain the problem once more from scratch:

1) We have the following pseudo-code (based on udp4_lib_lookup2()):

static void some_func(struct hlist_nulls_head *head) {
struct hlist_nulls_node *node;

begin:
for (node = rcu_dereference(head->first);
!is_nulls(node) & ...;
node = rcu_dereference(node->next)) {
<...>
}

if (restart_condition)
goto begin;

2) A problem occurs when restart_condition is true and we jump to the begin label.
We do not recalculate (head + offsetof(head, first)) address, we just dereference
again the OLD (head->first) pointer. So, we get a node, that WAS the first node in a
previous time instead of current first node. That node can be dead, or, for instance,
can be a head of another chain.

It is correct from gcc's point of view, since it doesn't expect the head pointer
to change, and this address is just (head + constant offset).

3) If we start with a wrong first element, restart_condition can be always true, so,
we get an infinite loop. In any case, we do not scan the whole (socket) chain,
as expected.

This scenario is absolutely real and causes our DNS servers to hang
sometimes under high load.

Note, that there are no problems if we don't restart a loop. Also, it is highly
dependent on gcc options, and the code in the body of the loop. Even small changes
in the code (like adding debugging print) preventing reproducing of the issue.

Thanks!

Regards,
Roman

2013-05-26 08:52:47

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On Wed, May 22, 2013 at 11:17:46PM +0400, Roman Gushchin wrote:
> On 22.05.2013 21:45, Paul E. McKenney wrote:
> >On Wed, May 22, 2013 at 05:07:07PM +0400, Roman Gushchin wrote:
> >>On 22.05.2013 16:30, Eric Dumazet wrote:
> >>>On Wed, 2013-05-22 at 15:58 +0400, Roman Gushchin wrote:
> >>>
> >>>>+/*
> >>>>+ * Same as ACCESS_ONCE(), but used for accessing field of a structure.
> >>>>+ * The main goal is preventing compiler to store &ptr->field in a register.
> >>>
> >>>But &ptr->field is a constant during the whole duration of
> >>>udp4_lib_lookup2() and could be in a register, in my case field is at
> >>>offset 0, and ptr is a parameter (so could be in a 'register')
> >>>
> >>>The bug you found is that compiler caches the indirection (ptr->field)
> >>>into a register, not that compiler stores &ptr->field into a register.
> >>>
> >>>>+ */
> >>>>+#define ACCESS_FIELD_ONCE(PTR, FIELD) (((volatile typeof(*PTR) *)PTR)->FIELD)
> >>>>+
> >>>
> >>>Here we force the compiler to consider ptr as volatile, but semantically
> >>>it is not required in rcu_dereference(ptr->field)
> >>
> >>Actually, we need to mark an "address of a place" where the field value is
> >>located as volatile before dereferencing. I have no idea how to do it in another way,
> >>except using multiple casts and offsetof's, but, IMHO, it will be even more complex:
> >> ACCESS_ONCE(typeof(&ptr->field)((char*)ptr + offsetof(typeof(*ptr), field)))
>
> Probably I miss one more ACCESS_ONCE() in this expression. Should be something like
> ACCESS_ONCE(typeof(&ptr->field)((char*)ACCESS_ONCE(ptr) + offsetof(typeof(*ptr), field))) .
> But this is not a working example, just an illustration against using ACCESS_ONCE() here.
>
> >Why not just ACCESS_ONCE(ptr->field)? Or if it is the thing that
> >ptr->field points to that is subject to change, ACCESS_ONCE(*ptr->field)?
> >
> >Or rcu_dereference(ptr->field), as appropriate?
>
> It's not enough. So is the code now, and it doesn't work as expected.
> You can't write (ptr->field) without ptr being marked as a volatile pointer.
>
> I try to explain the problem once more from scratch:
>
> 1) We have the following pseudo-code (based on udp4_lib_lookup2()):
>
> static void some_func(struct hlist_nulls_head *head) {
> struct hlist_nulls_node *node;
>
> begin:
> for (node = rcu_dereference(head->first);
> !is_nulls(node) & ...;
> node = rcu_dereference(node->next)) {
> <...>
> }
>
> if (restart_condition)
> goto begin;
>
> 2) A problem occurs when restart_condition is true and we jump to the begin label.
> We do not recalculate (head + offsetof(head, first)) address, we just dereference
> again the OLD (head->first) pointer. So, we get a node, that WAS the first node in a
> previous time instead of current first node. That node can be dead, or, for instance,
> can be a head of another chain.

OK, this is what I was referring to when I said that the RCU list macros
assume that the list header is static (or equivalently, appropriately
protected).

With some_func() as written above, you would need to return some sort
of failure indication from some_func(), and have the caller refetch head.
Otherwise, as far as gcc is concerned, the value of the parameter head
is constant throughout some_func().

> It is correct from gcc's point of view, since it doesn't expect the head pointer
> to change, and this address is just (head + constant offset).

Agreed.

How does the caller calculate the value to pass in through the argument "head"?

> 3) If we start with a wrong first element, restart_condition can be always true, so,
> we get an infinite loop. In any case, we do not scan the whole (socket) chain,
> as expected.

Agreed.

> This scenario is absolutely real and causes our DNS servers to hang
> sometimes under high load.

I completely believe that such a hang could happen.

> Note, that there are no problems if we don't restart a loop. Also, it is highly
> dependent on gcc options, and the code in the body of the loop. Even small changes
> in the code (like adding debugging print) preventing reproducing of the issue.

Again, I believe that your retry logic needs to extend back into the
calling function for your some_func() example above.

Thanx, Paul

2013-05-27 11:34:44

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On 25.05.2013 15:37, Paul E. McKenney wrote:
>> 2) A problem occurs when restart_condition is true and we jump to the begin label.
>> We do not recalculate (head + offsetof(head, first)) address, we just dereference
>> again the OLD (head->first) pointer. So, we get a node, that WAS the first node in a
>> previous time instead of current first node. That node can be dead, or, for instance,
>> can be a head of another chain.
>
> OK, this is what I was referring to when I said that the RCU list macros
> assume that the list header is static (or equivalently, appropriately
> protected).
>
> With some_func() as written above, you would need to return some sort
> of failure indication from some_func(), and have the caller refetch head.
> Otherwise, as far as gcc is concerned, the value of the parameter head
> is constant throughout some_func().

Personally I have nothing against this approach, but I'm not sure, if
networking maintainers will adopt corresponding patch. I'll try to find it out.

>
>> It is correct from gcc's point of view, since it doesn't expect the head pointer
>> to change, and this address is just (head + constant offset).
>
> Agreed.
>
> How does the caller calculate the value to pass in through the argument "head"?

struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
__be16 sport, __be32 daddr, __be16 dport,
int dif, struct udp_table *udptable)
{
...
unsigned int hash2, slot2, slot = udp_hashfn(net, hnum, udptable->mask);
struct udp_hslot *hslot2, *hslot = &udptable->hash[slot];
int score, badness;

rcu_read_lock();
if (hslot->count > 10) {
hash2 = udp4_portaddr_hash(net, daddr, hnum);
slot2 = hash2 & udptable->mask;
hslot2 = &udptable->hash2[slot2];
...
result = udp4_lib_lookup2(net, saddr, sport,
daddr, hnum, dif,
hslot2, slot2);
...
}

static struct sock *udp4_lib_lookup2(struct net *net,
__be32 saddr, __be16 sport,
__be32 daddr, unsigned int hnum, int dif,
struct udp_hslot *hslot2, unsigned int slot2)
{
...
udp_portaddr_for_each_entry_rcu(sk, node, &hslot2->head) {
...

Thank you!

Regards,
Roman

2013-05-27 17:55:39

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

Hi, Paul!

> On 25.05.2013 15:37, Paul E. McKenney wrote:
>> Again, I believe that your retry logic needs to extend back into the
>> calling function for your some_func() example above.

And what do you think about the following approach (diff below)?

It seems to me, it's enough clear (especially with good accompanying comments)
and produces a good binary code (without significant overhead).
Also, we will remove a hidden reef in using rcu-protected (h)list traverses with restarts.

diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index 6f95e24..8d8b1eb 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -269,7 +269,8 @@ static inline void list_splice_init_rcu(struct list_head *list,
* as long as the traversal is guarded by rcu_read_lock().
*/
#define list_for_each_entry_rcu(pos, head, member) \
- for (pos = list_entry_rcu((head)->next, typeof(*pos), member); \
+ for (ACCESS_ONCE(*(head)), \
+ pos = list_entry_rcu((head)->next, typeof(*pos), member); \
&pos->member != (head); \
pos = list_entry_rcu(pos->member.next, typeof(*pos), member))

@@ -459,7 +460,8 @@ static inline void hlist_add_after_rcu(struct hlist_node *prev,
* as long as the traversal is guarded by rcu_read_lock().
*/
#define hlist_for_each_entry_rcu(tpos, pos, head, member) \
- for (pos = rcu_dereference_raw(hlist_first_rcu(head)); \
+ for (ACCESS_ONCE(*(head)), \
+ pos = rcu_dereference_raw(hlist_first_rcu(head)); \
pos && \
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; }); \
pos = rcu_dereference_raw(hlist_next_rcu(pos)))
@@ -476,7 +478,7 @@ static inline void hlist_add_after_rcu(struct hlist_node *prev,
* as long as the traversal is guarded by rcu_read_lock().
*/
#define hlist_for_each_entry_rcu_bh(tpos, pos, head, member) \
- for (pos = rcu_dereference_bh((head)->first); \
+ for (ACCESS_ONCE(*(head)), pos = rcu_dereference_bh((head)->first); \
pos && \
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; }); \
pos = rcu_dereference_bh(pos->next))
diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
index 2ae1371..4af5ee5 100644
--- a/include/linux/rculist_nulls.h
+++ b/include/linux/rculist_nulls.h
@@ -107,7 +107,8 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
*
*/
#define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member) \
- for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \
+ for (ACCESS_ONCE(*(head)), \
+ pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \
(!is_a_nulls(pos)) && \
({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))

---

Thanks!

Regards,
Roman

2013-05-28 00:12:14

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On Mon, 2013-05-27 at 21:55 +0400, Roman Gushchin wrote:
> Hi, Paul!
>
> > On 25.05.2013 15:37, Paul E. McKenney wrote:
> >> Again, I believe that your retry logic needs to extend back into the
> >> calling function for your some_func() example above.
>
> And what do you think about the following approach (diff below)?
>
> It seems to me, it's enough clear (especially with good accompanying comments)
> and produces a good binary code (without significant overhead).
> Also, we will remove a hidden reef in using rcu-protected (h)list traverses with restarts.
>

> diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
> index 2ae1371..4af5ee5 100644
> --- a/include/linux/rculist_nulls.h
> +++ b/include/linux/rculist_nulls.h
> @@ -107,7 +107,8 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
> *
> */
> #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member) \
> - for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \
> + for (ACCESS_ONCE(*(head)), \
> + pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \
> (!is_a_nulls(pos)) && \
> ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
> pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))

It looks like this still relies on gcc being friendly here.

I repeat again : @head here is a constant.

Macro already uses ACCESS_ONCE(), we only have to instruct gcc that
caching the value is forbidden if we restart the loop
(aka "goto begin;" see Documentation/RCU/rculist_nulls.txt line 146)

Adding a barrier() is probably what we want.

I cooked followed patch and it fixes the problem.

diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
index 2ae1371..4dc51b2 100644
--- a/include/linux/rculist_nulls.h
+++ b/include/linux/rculist_nulls.h
@@ -105,8 +105,12 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
* @head: the head for your list.
* @member: the name of the hlist_nulls_node within the struct.
*
+ * The barrier() is needed to make sure compiler doesn't cache first element,
+ * as this loop can be restarted.
+ * (cf Documentation/RCU/rculist_nulls.txt around line 146)
*/
#define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member) \
+ barrier(); \
for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \
(!is_a_nulls(pos)) && \
({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \

2013-05-28 09:10:58

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On 28.05.2013 04:12, Eric Dumazet wrote:
> On Mon, 2013-05-27 at 21:55 +0400, Roman Gushchin wrote:
>> Hi, Paul!
>>
>>> On 25.05.2013 15:37, Paul E. McKenney wrote:
>>>> Again, I believe that your retry logic needs to extend back into the
>>>> calling function for your some_func() example above.
>>
>> And what do you think about the following approach (diff below)?
>>
>> It seems to me, it's enough clear (especially with good accompanying comments)
>> and produces a good binary code (without significant overhead).
>> Also, we will remove a hidden reef in using rcu-protected (h)list traverses with restarts.
>>
>
>> diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
>> index 2ae1371..4af5ee5 100644
>> --- a/include/linux/rculist_nulls.h
>> +++ b/include/linux/rculist_nulls.h
>> @@ -107,7 +107,8 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
>> *
>> */
>> #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member) \
>> - for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \
>> + for (ACCESS_ONCE(*(head)), \
>> + pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \
>> (!is_a_nulls(pos)) && \
>> ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
>> pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))
>
> It looks like this still relies on gcc being friendly here.
>
> I repeat again : @head here is a constant.

No.
Actually, there are two volatile objects: pointer to the first element (as a part of the head structure),
and the first element by itself. So, to be strict, head as a structure contains a volatile field.
Head->first should be treated as a volatile pointer to a volatile data. So, the whole head object is volatile.

>
> Macro already uses ACCESS_ONCE(), we only have to instruct gcc that
> caching the value is forbidden if we restart the loop
> (aka "goto begin;" see Documentation/RCU/rculist_nulls.txt line 146)

My patch seems to be correct, because, ACCESS_ONCE(*(head)) will cause gcc to (re)read head data from the memory.
According to gcc documentation:
"A scalar volatile object is read when it is accessed in a void context:
volatile int *src = somevalue;
*src;
Such expressions are rvalues, and GCC implements this as a read of the volatile object being pointed to."
And this is exactly our case.

> Adding a barrier() is probably what we want.

I agree, inserting barrier() is also a correct and working fix.

Thanks!

Regards,
Roman

2013-05-29 00:35:28

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On Tue, 2013-05-28 at 13:10 +0400, Roman Gushchin wrote:
> On 28.05.2013 04:12, Eric Dumazet wrote:

> > Adding a barrier() is probably what we want.
>
> I agree, inserting barrier() is also a correct and working fix.

Yeah, but I can not find a clean way to put it inside the "for (;;)"

for (barrier();;) ->

error: expected expression before ‘__asm__’

No user currently does :

if (condition)
hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)

But who knows...

2013-05-29 01:20:04

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On Tue, May 28, 2013 at 01:10:46PM +0400, Roman Gushchin wrote:
> On 28.05.2013 04:12, Eric Dumazet wrote:
> >On Mon, 2013-05-27 at 21:55 +0400, Roman Gushchin wrote:
> >>Hi, Paul!
> >>
> >>>On 25.05.2013 15:37, Paul E. McKenney wrote:
> >>>>Again, I believe that your retry logic needs to extend back into the
> >>>>calling function for your some_func() example above.
> >>
> >>And what do you think about the following approach (diff below)?
> >>
> >>It seems to me, it's enough clear (especially with good accompanying comments)
> >>and produces a good binary code (without significant overhead).
> >>Also, we will remove a hidden reef in using rcu-protected (h)list traverses with restarts.
> >>
> >
> >>diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
> >>index 2ae1371..4af5ee5 100644
> >>--- a/include/linux/rculist_nulls.h
> >>+++ b/include/linux/rculist_nulls.h
> >>@@ -107,7 +107,8 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
> >> *
> >> */
> >> #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member) \
> >>- for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \
> >>+ for (ACCESS_ONCE(*(head)), \
> >>+ pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \
> >> (!is_a_nulls(pos)) && \
> >> ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
> >> pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))
> >
> >It looks like this still relies on gcc being friendly here.
> >
> >I repeat again : @head here is a constant.
>
> No.
> Actually, there are two volatile objects: pointer to the first element (as a part of the head structure),
> and the first element by itself. So, to be strict, head as a structure contains a volatile field.
> Head->first should be treated as a volatile pointer to a volatile data. So, the whole head object is volatile.
>
> >
> >Macro already uses ACCESS_ONCE(), we only have to instruct gcc that
> >caching the value is forbidden if we restart the loop
> >(aka "goto begin;" see Documentation/RCU/rculist_nulls.txt line 146)
>
> My patch seems to be correct, because, ACCESS_ONCE(*(head)) will cause gcc to (re)read head data from the memory.
> According to gcc documentation:
> "A scalar volatile object is read when it is accessed in a void context:
> volatile int *src = somevalue;
> *src;
> Such expressions are rvalues, and GCC implements this as a read of the volatile object being pointed to."
> And this is exactly our case.
>
> >Adding a barrier() is probably what we want.
>
> I agree, inserting barrier() is also a correct and working fix.

I have to ask this question of both of you...

Are either of Eric's or Roman's fixes really guaranteed to work if gcc
elects not to inline the function containing the RCU list traversal?

Thanx, Paul

2013-05-29 01:31:19

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On Tue, May 28, 2013 at 05:34:53PM -0700, Eric Dumazet wrote:
> On Tue, 2013-05-28 at 13:10 +0400, Roman Gushchin wrote:
> > On 28.05.2013 04:12, Eric Dumazet wrote:
>
> > > Adding a barrier() is probably what we want.
> >
> > I agree, inserting barrier() is also a correct and working fix.
>
> Yeah, but I can not find a clean way to put it inside the "for (;;)"
>
> for (barrier();;) ->
>
> error: expected expression before ‘__asm__’
>
> No user currently does :
>
> if (condition)
> hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)
>
> But who knows...

I still have my earlier question, but I suggest "({ barrier(); XXX })"
to put the barrier into the for loop, either in the second or third
clause, where XXX was the original second or third clause.

Thanx, Paul

2013-05-29 05:08:22

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On Tue, 2013-05-28 at 18:31 -0700, Paul E. McKenney wrote:
> On Tue, May 28, 2013 at 05:34:53PM -0700, Eric Dumazet wrote:
> > On Tue, 2013-05-28 at 13:10 +0400, Roman Gushchin wrote:
> > > On 28.05.2013 04:12, Eric Dumazet wrote:
> >
> > > > Adding a barrier() is probably what we want.
> > >
> > > I agree, inserting barrier() is also a correct and working fix.
> >
> > Yeah, but I can not find a clean way to put it inside the "for (;;)"
> >
> > for (barrier();;) ->
> >
> > error: expected expression before ‘__asm__’
> >
> > No user currently does :
> >
> > if (condition)
> > hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)
> >
> > But who knows...
>
> I still have my earlier question, but I suggest "({ barrier(); XXX })"
> to put the barrier into the for loop, either in the second or third
> clause, where XXX was the original second or third clause.
>
>

Hmm, it doesn't work, I wanted to replace :

barrier();
for (expr1 ; expr2; expr3) {

by :

for ( some_clever_thing ; expr2; expr3) {

So barrier() should not be in second or third clause : it must be done
once and before "expr1".

Not a big deal anyway, we're not adding new
hlist_nulls_for_each_entry_rcu() users :)


About your earlier question, I really don't know why compiler would
cache a memory read if we explicitly use barrier() to prevent this from
happening.

BTW Roman patch generates a double load as in :

2cb1: 49 8b 07 mov (%r15),%rax
2cb4: 49 8b 07 mov (%r15),%rax


...
2ea2: e8 f9 dc ff ff callq ba0 <sock_put>
2ea7: 8b 0c 24 mov (%rsp),%ecx
2eaa: e9 02 fe ff ff jmpq 2cb1 <udp4_lib_lookup2+0x91>

because of ACCESS_ONCE() used twice, once explicitly in
hlist_nulls_for_each_entry_rcu(), and once in rcu_dereference_raw()

While barrier();ptr = rcu_dereference(X); does generate a single load.


2013-05-29 09:18:04

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On 29.05.2013 04:34, Eric Dumazet wrote:
> On Tue, 2013-05-28 at 13:10 +0400, Roman Gushchin wrote:
>> On 28.05.2013 04:12, Eric Dumazet wrote:
>
>>> Adding a barrier() is probably what we want.
>>
>> I agree, inserting barrier() is also a correct and working fix.
>
> Yeah, but I can not find a clean way to put it inside the "for (;;)"

diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
index 2ae1371..74d5065 100644
--- a/include/linux/rculist_nulls.h
+++ b/include/linux/rculist_nulls.h
@@ -107,7 +107,7 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
*
*/
#define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member) \
- for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \
+ for (({barrier();}), pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \
(!is_a_nulls(pos)) && \
({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))


2013-05-29 10:09:38

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On 29.05.2013 09:08, Eric Dumazet wrote:
> On Tue, 2013-05-28 at 18:31 -0700, Paul E. McKenney wrote:
>> On Tue, May 28, 2013 at 05:34:53PM -0700, Eric Dumazet wrote:
>>> On Tue, 2013-05-28 at 13:10 +0400, Roman Gushchin wrote:
>>>> On 28.05.2013 04:12, Eric Dumazet wrote:

> About your earlier question, I really don't know why compiler would
> cache a memory read if we explicitly use barrier() to prevent this from
> happening.

I agree.

> BTW Roman patch generates a double load as in :
>
> 2cb1: 49 8b 07 mov (%r15),%rax
> 2cb4: 49 8b 07 mov (%r15),%rax
>
>
> ...
> 2ea2: e8 f9 dc ff ff callq ba0 <sock_put>
> 2ea7: 8b 0c 24 mov (%rsp),%ecx
> 2eaa: e9 02 fe ff ff jmpq 2cb1 <udp4_lib_lookup2+0x91>
>
> because of ACCESS_ONCE() used twice, once explicitly in
> hlist_nulls_for_each_entry_rcu(), and once in rcu_dereference_raw()
>
> While barrier();ptr = rcu_dereference(X); does generate a single load.

It's true.

Unfortunately, using barrier() can also prevent gcc from making some
(acceptable) code optimizations, because barrier() has a global effect,
and everything we want to reload is the (head->first) pointer.
So, to be absolutely precise, we have to introduce and use
the ACCESS_FIELD_ONCE() macro.

In any case, it doesn't look like a big problem.
In my mind, the best solution is to use the ACCESS_FIELD_ONCE() macro,
but using barrier() is also an acceptable solution.

Regards,
Roman

2013-05-29 19:06:38

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On Wed, 2013-05-29 at 14:09 +0400, Roman Gushchin wrote:

> Unfortunately, using barrier() can also prevent gcc from making some
> (acceptable) code optimizations, because barrier() has a global effect,
> and everything we want to reload is the (head->first) pointer.
> So, to be absolutely precise, we have to introduce and use
> the ACCESS_FIELD_ONCE() macro.

Yep, ACCESS_ONCE() seems to not work for a field, at least the
ACCESS_ONCE() used in __rcu_dereference_check()

>
> In any case, it doesn't look like a big problem.
> In my mind, the best solution is to use the ACCESS_FIELD_ONCE() macro,
> but using barrier() is also an acceptable solution.
>

True, these lookup functions are usually structured the same around the
hlist_nulls_for_each_entry_rcu() loop.

A barrier() right before the loop seems to be a benefit, the size of
assembly code is reduced by 48 bytes.

And its one of the documented way to handle this kind of problems
(Documentation/atomic_ops.txt line 114)

I guess we should amend this documentation, eventually.

Thanks, please add you "Signed-off-by" if you agree with the patch.


[PATCH] net: force a reload of first item in hlist_nulls_for_each_entry_rcu

Roman Gushchin discovered that udp4_lib_lookup2() was not reloading
first item in the rcu protected list, in case the loop was restarted.

This produced soft lockups as in https://lkml.org/lkml/2013/4/16/37

rcu_dereference(X)/ACCESS_ONCE(X) seem to not work as intended if X is
ptr->field :

In some cases, gcc caches the value or ptr->field in a register.

Use a barrier() to disallow such caching, as documented in
Documentation/atomic_ops.txt line 114

Thanks a lot to Roman for providing analysis and numerous patches.

Diagnosed-by: Roman Gushchin <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Boris Zhmurov <[email protected]>
---
include/linux/rculist_nulls.h | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
index 2ae1371..c7557fa 100644
--- a/include/linux/rculist_nulls.h
+++ b/include/linux/rculist_nulls.h
@@ -105,9 +105,14 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
* @head: the head for your list.
* @member: the name of the hlist_nulls_node within the struct.
*
+ * The barrier() is needed to make sure compiler doesn't cache first element [1],
+ * as this loop can be restarted [2]
+ * [1] Documentation/atomic_ops.txt around line 114
+ * [2] Documentation/RCU/rculist_nulls.txt around line 146
*/
#define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member) \
- for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \
+ for (({barrier();}), \
+ pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \
(!is_a_nulls(pos)) && \
({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))

2013-05-30 08:25:45

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On 29.05.2013 23:06, Eric Dumazet wrote:
> On Wed, 2013-05-29 at 14:09 +0400, Roman Gushchin wrote:
>
> True, these lookup functions are usually structured the same around the
> hlist_nulls_for_each_entry_rcu() loop.
>
> A barrier() right before the loop seems to be a benefit, the size of
> assembly code is reduced by 48 bytes.
>
> And its one of the documented way to handle this kind of problems
> (Documentation/atomic_ops.txt line 114)
>
> I guess we should amend this documentation, eventually.
>
> Thanks, please add you "Signed-off-by" if you agree with the patch.
>

Signed-off-by: Roman Gushchin <[email protected]>

Many thanks to you, Paul E. McKenney and David Laight for your
patches, help and participation in this discussion.

>
> [PATCH] net: force a reload of first item in hlist_nulls_for_each_entry_rcu
>
> Roman Gushchin discovered that udp4_lib_lookup2() was not reloading
> first item in the rcu protected list, in case the loop was restarted.
>
> This produced soft lockups as in https://lkml.org/lkml/2013/4/16/37
>
> rcu_dereference(X)/ACCESS_ONCE(X) seem to not work as intended if X is
> ptr->field :
>
> In some cases, gcc caches the value or ptr->field in a register.
>
> Use a barrier() to disallow such caching, as documented in
> Documentation/atomic_ops.txt line 114
>
> Thanks a lot to Roman for providing analysis and numerous patches.
>
> Diagnosed-by: Roman Gushchin <[email protected]>
> Signed-off-by: Eric Dumazet <[email protected]>
> Reported-by: Boris Zhmurov <[email protected]>
> ---
> include/linux/rculist_nulls.h | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
> index 2ae1371..c7557fa 100644
> --- a/include/linux/rculist_nulls.h
> +++ b/include/linux/rculist_nulls.h
> @@ -105,9 +105,14 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
> * @head: the head for your list.
> * @member: the name of the hlist_nulls_node within the struct.
> *
> + * The barrier() is needed to make sure compiler doesn't cache first element [1],
> + * as this loop can be restarted [2]
> + * [1] Documentation/atomic_ops.txt around line 114
> + * [2] Documentation/RCU/rculist_nulls.txt around line 146
> */
> #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member) \
> - for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \
> + for (({barrier();}), \
> + pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \
> (!is_a_nulls(pos)) && \
> ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
> pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))
>
>

2013-06-02 23:31:43

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On Thu, 2013-05-30 at 12:25 +0400, Roman Gushchin wrote:
> On 29.05.2013 23:06, Eric Dumazet wrote:
> > On Wed, 2013-05-29 at 14:09 +0400, Roman Gushchin wrote:
> >
> > True, these lookup functions are usually structured the same around the
> > hlist_nulls_for_each_entry_rcu() loop.
> >
> > A barrier() right before the loop seems to be a benefit, the size of
> > assembly code is reduced by 48 bytes.
> >
> > And its one of the documented way to handle this kind of problems
> > (Documentation/atomic_ops.txt line 114)
> >
> > I guess we should amend this documentation, eventually.
> >
> > Thanks, please add you "Signed-off-by" if you agree with the patch.
> >
>
> Signed-off-by: Roman Gushchin <[email protected]>
>
> Many thanks to you, Paul E. McKenney and David Laight for your
> patches, help and participation in this discussion.

Thanks to you !

David, is there any problem with the patch ?

http://patchwork.ozlabs.org/patch/247360/ says "Not applicable", please
tell me what is needed to get it merged.

Thanks !


2013-06-03 02:58:33

by David Miller

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

From: Eric Dumazet <[email protected]>
Date: Sun, 02 Jun 2013 16:31:35 -0700

> On Thu, 2013-05-30 at 12:25 +0400, Roman Gushchin wrote:
>> On 29.05.2013 23:06, Eric Dumazet wrote:
>> > On Wed, 2013-05-29 at 14:09 +0400, Roman Gushchin wrote:
>> >
>> > True, these lookup functions are usually structured the same around the
>> > hlist_nulls_for_each_entry_rcu() loop.
>> >
>> > A barrier() right before the loop seems to be a benefit, the size of
>> > assembly code is reduced by 48 bytes.
>> >
>> > And its one of the documented way to handle this kind of problems
>> > (Documentation/atomic_ops.txt line 114)
>> >
>> > I guess we should amend this documentation, eventually.
>> >
>> > Thanks, please add you "Signed-off-by" if you agree with the patch.
>> >
>>
>> Signed-off-by: Roman Gushchin <[email protected]>
>>
>> Many thanks to you, Paul E. McKenney and David Laight for your
>> patches, help and participation in this discussion.
>
> Thanks to you !
>
> David, is there any problem with the patch ?
>
> http://patchwork.ozlabs.org/patch/247360/ says "Not applicable", please
> tell me what is needed to get it merged.

It's not a networking patch, it's a patch for generic RCU upstream.
So it either goes through Paul McKenney, or directly via Linus on
linux-kernel.

2013-06-03 03:13:02

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On Sun, 2013-06-02 at 19:58 -0700, David Miller wrote:
> From: Eric Dumazet <[email protected]>
> Date: Sun, 02 Jun 2013 16:31:35 -0700
>
> > On Thu, 2013-05-30 at 12:25 +0400, Roman Gushchin wrote:
> >> On 29.05.2013 23:06, Eric Dumazet wrote:
> >> > On Wed, 2013-05-29 at 14:09 +0400, Roman Gushchin wrote:
> >> >
> >> > True, these lookup functions are usually structured the same around the
> >> > hlist_nulls_for_each_entry_rcu() loop.
> >> >
> >> > A barrier() right before the loop seems to be a benefit, the size of
> >> > assembly code is reduced by 48 bytes.
> >> >
> >> > And its one of the documented way to handle this kind of problems
> >> > (Documentation/atomic_ops.txt line 114)
> >> >
> >> > I guess we should amend this documentation, eventually.
> >> >
> >> > Thanks, please add you "Signed-off-by" if you agree with the patch.
> >> >
> >>
> >> Signed-off-by: Roman Gushchin <[email protected]>
> >>
> >> Many thanks to you, Paul E. McKenney and David Laight for your
> >> patches, help and participation in this discussion.
> >
> > Thanks to you !
> >
> > David, is there any problem with the patch ?
> >
> > http://patchwork.ozlabs.org/patch/247360/ says "Not applicable", please
> > tell me what is needed to get it merged.
>
> It's not a networking patch, it's a patch for generic RCU upstream.
> So it either goes through Paul McKenney, or directly via Linus on
> linux-kernel.

Well, whole discussion went on linux kernel, and you were the original
committer of this patch five years ago.

http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=bbaffaca4810de1a25e32ecaf836eeaacc7a3d11

So please Paul or David make sure the patch goes in.

My only concern is that Paul seems quite busy these days.

Thanks

2013-06-03 03:27:33

by David Miller

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

From: Eric Dumazet <[email protected]>
Date: Sun, 02 Jun 2013 20:12:49 -0700

> On Sun, 2013-06-02 at 19:58 -0700, David Miller wrote:
>> From: Eric Dumazet <[email protected]>
>> Date: Sun, 02 Jun 2013 16:31:35 -0700
>>
>> > On Thu, 2013-05-30 at 12:25 +0400, Roman Gushchin wrote:
>> >> On 29.05.2013 23:06, Eric Dumazet wrote:
>> >> > On Wed, 2013-05-29 at 14:09 +0400, Roman Gushchin wrote:
>> >> >
>> >> > True, these lookup functions are usually structured the same around the
>> >> > hlist_nulls_for_each_entry_rcu() loop.
>> >> >
>> >> > A barrier() right before the loop seems to be a benefit, the size of
>> >> > assembly code is reduced by 48 bytes.
>> >> >
>> >> > And its one of the documented way to handle this kind of problems
>> >> > (Documentation/atomic_ops.txt line 114)
>> >> >
>> >> > I guess we should amend this documentation, eventually.
>> >> >
>> >> > Thanks, please add you "Signed-off-by" if you agree with the patch.
>> >> >
>> >>
>> >> Signed-off-by: Roman Gushchin <[email protected]>
>> >>
>> >> Many thanks to you, Paul E. McKenney and David Laight for your
>> >> patches, help and participation in this discussion.
>> >
>> > Thanks to you !
>> >
>> > David, is there any problem with the patch ?
>> >
>> > http://patchwork.ozlabs.org/patch/247360/ says "Not applicable", please
>> > tell me what is needed to get it merged.
>>
>> It's not a networking patch, it's a patch for generic RCU upstream.
>> So it either goes through Paul McKenney, or directly via Linus on
>> linux-kernel.
>
> Well, whole discussion went on linux kernel, and you were the original
> committer of this patch five years ago.
>
> http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=bbaffaca4810de1a25e32ecaf836eeaacc7a3d11
>
> So please Paul or David make sure the patch goes in.
>
> My only concern is that Paul seems quite busy these days.

Ok I can take it then, no problem.

2013-06-03 03:42:26

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On Sun, Jun 02, 2013 at 07:58:26PM -0700, David Miller wrote:
> From: Eric Dumazet <[email protected]>
> Date: Sun, 02 Jun 2013 16:31:35 -0700
>
> > On Thu, 2013-05-30 at 12:25 +0400, Roman Gushchin wrote:
> >> On 29.05.2013 23:06, Eric Dumazet wrote:
> >> > On Wed, 2013-05-29 at 14:09 +0400, Roman Gushchin wrote:
> >> >
> >> > True, these lookup functions are usually structured the same around the
> >> > hlist_nulls_for_each_entry_rcu() loop.
> >> >
> >> > A barrier() right before the loop seems to be a benefit, the size of
> >> > assembly code is reduced by 48 bytes.
> >> >
> >> > And its one of the documented way to handle this kind of problems
> >> > (Documentation/atomic_ops.txt line 114)
> >> >
> >> > I guess we should amend this documentation, eventually.
> >> >
> >> > Thanks, please add you "Signed-off-by" if you agree with the patch.
> >> >
> >>
> >> Signed-off-by: Roman Gushchin <[email protected]>
> >>
> >> Many thanks to you, Paul E. McKenney and David Laight for your
> >> patches, help and participation in this discussion.
> >
> > Thanks to you !
> >
> > David, is there any problem with the patch ?
> >
> > http://patchwork.ozlabs.org/patch/247360/ says "Not applicable", please
> > tell me what is needed to get it merged.
>
> It's not a networking patch, it's a patch for generic RCU upstream.
> So it either goes through Paul McKenney, or directly via Linus on
> linux-kernel.

I would be happy to take it.

Thanx, Paul

2013-06-03 03:43:18

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On Sun, Jun 02, 2013 at 08:27:03PM -0700, David Miller wrote:
> From: Eric Dumazet <[email protected]>
> Date: Sun, 02 Jun 2013 20:12:49 -0700
>
> > On Sun, 2013-06-02 at 19:58 -0700, David Miller wrote:
> >> From: Eric Dumazet <[email protected]>
> >> Date: Sun, 02 Jun 2013 16:31:35 -0700
> >>
> >> > On Thu, 2013-05-30 at 12:25 +0400, Roman Gushchin wrote:
> >> >> On 29.05.2013 23:06, Eric Dumazet wrote:
> >> >> > On Wed, 2013-05-29 at 14:09 +0400, Roman Gushchin wrote:
> >> >> >
> >> >> > True, these lookup functions are usually structured the same around the
> >> >> > hlist_nulls_for_each_entry_rcu() loop.
> >> >> >
> >> >> > A barrier() right before the loop seems to be a benefit, the size of
> >> >> > assembly code is reduced by 48 bytes.
> >> >> >
> >> >> > And its one of the documented way to handle this kind of problems
> >> >> > (Documentation/atomic_ops.txt line 114)
> >> >> >
> >> >> > I guess we should amend this documentation, eventually.
> >> >> >
> >> >> > Thanks, please add you "Signed-off-by" if you agree with the patch.
> >> >> >
> >> >>
> >> >> Signed-off-by: Roman Gushchin <[email protected]>
> >> >>
> >> >> Many thanks to you, Paul E. McKenney and David Laight for your
> >> >> patches, help and participation in this discussion.
> >> >
> >> > Thanks to you !
> >> >
> >> > David, is there any problem with the patch ?
> >> >
> >> > http://patchwork.ozlabs.org/patch/247360/ says "Not applicable", please
> >> > tell me what is needed to get it merged.
> >>
> >> It's not a networking patch, it's a patch for generic RCU upstream.
> >> So it either goes through Paul McKenney, or directly via Linus on
> >> linux-kernel.
> >
> > Well, whole discussion went on linux kernel, and you were the original
> > committer of this patch five years ago.
> >
> > http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=bbaffaca4810de1a25e32ecaf836eeaacc7a3d11
> >
> > So please Paul or David make sure the patch goes in.
> >
> > My only concern is that Paul seems quite busy these days.
>
> Ok I can take it then, no problem.

Or if you would like to take it:

Acked-by: Paul E. McKenney <[email protected]>

Just let me know. ;-)

Thanx, Paul

2013-06-03 03:47:52

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On Sun, 2013-06-02 at 20:42 -0700, Paul E. McKenney wrote:

> Or if you would like to take it:
>
> Acked-by: Paul E. McKenney <[email protected]>
>

Thanks guys !

2013-06-03 03:48:29

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On Sun, Jun 02, 2013 at 08:12:49PM -0700, Eric Dumazet wrote:
> On Sun, 2013-06-02 at 19:58 -0700, David Miller wrote:
> > From: Eric Dumazet <[email protected]>
> > Date: Sun, 02 Jun 2013 16:31:35 -0700
> >
> > > On Thu, 2013-05-30 at 12:25 +0400, Roman Gushchin wrote:
> > >> On 29.05.2013 23:06, Eric Dumazet wrote:
> > >> > On Wed, 2013-05-29 at 14:09 +0400, Roman Gushchin wrote:
> > >> >
> > >> > True, these lookup functions are usually structured the same around the
> > >> > hlist_nulls_for_each_entry_rcu() loop.
> > >> >
> > >> > A barrier() right before the loop seems to be a benefit, the size of
> > >> > assembly code is reduced by 48 bytes.
> > >> >
> > >> > And its one of the documented way to handle this kind of problems
> > >> > (Documentation/atomic_ops.txt line 114)
> > >> >
> > >> > I guess we should amend this documentation, eventually.
> > >> >
> > >> > Thanks, please add you "Signed-off-by" if you agree with the patch.
> > >> >
> > >>
> > >> Signed-off-by: Roman Gushchin <[email protected]>
> > >>
> > >> Many thanks to you, Paul E. McKenney and David Laight for your
> > >> patches, help and participation in this discussion.
> > >
> > > Thanks to you !
> > >
> > > David, is there any problem with the patch ?
> > >
> > > http://patchwork.ozlabs.org/patch/247360/ says "Not applicable", please
> > > tell me what is needed to get it merged.
> >
> > It's not a networking patch, it's a patch for generic RCU upstream.
> > So it either goes through Paul McKenney, or directly via Linus on
> > linux-kernel.
>
> Well, whole discussion went on linux kernel, and you were the original
> committer of this patch five years ago.
>
> http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=bbaffaca4810de1a25e32ecaf836eeaacc7a3d11
>
> So please Paul or David make sure the patch goes in.
>
> My only concern is that Paul seems quite busy these days.

One reason for the longer-than-usual response delays is my current timezone,
IST. ;-) Back to PDT on the 10th.

Thanx, Paul

2013-06-03 03:49:23

by David Miller

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

From: "Paul E. McKenney" <[email protected]>
Date: Sun, 2 Jun 2013 20:42:54 -0700

> On Sun, Jun 02, 2013 at 08:27:03PM -0700, David Miller wrote:
>> From: Eric Dumazet <[email protected]>
>> Date: Sun, 02 Jun 2013 20:12:49 -0700
>>
>> > On Sun, 2013-06-02 at 19:58 -0700, David Miller wrote:
>> >> From: Eric Dumazet <[email protected]>
>> >> Date: Sun, 02 Jun 2013 16:31:35 -0700
>> >>
>> >> > On Thu, 2013-05-30 at 12:25 +0400, Roman Gushchin wrote:
>> >> >> On 29.05.2013 23:06, Eric Dumazet wrote:
>> >> >> > On Wed, 2013-05-29 at 14:09 +0400, Roman Gushchin wrote:
>> >> >> >
>> >> >> > True, these lookup functions are usually structured the same around the
>> >> >> > hlist_nulls_for_each_entry_rcu() loop.
>> >> >> >
>> >> >> > A barrier() right before the loop seems to be a benefit, the size of
>> >> >> > assembly code is reduced by 48 bytes.
>> >> >> >
>> >> >> > And its one of the documented way to handle this kind of problems
>> >> >> > (Documentation/atomic_ops.txt line 114)
>> >> >> >
>> >> >> > I guess we should amend this documentation, eventually.
>> >> >> >
>> >> >> > Thanks, please add you "Signed-off-by" if you agree with the patch.
>> >> >> >
>> >> >>
>> >> >> Signed-off-by: Roman Gushchin <[email protected]>
>> >> >>
>> >> >> Many thanks to you, Paul E. McKenney and David Laight for your
>> >> >> patches, help and participation in this discussion.
>> >> >
>> >> > Thanks to you !
>> >> >
>> >> > David, is there any problem with the patch ?
>> >> >
>> >> > http://patchwork.ozlabs.org/patch/247360/ says "Not applicable", please
>> >> > tell me what is needed to get it merged.
>> >>
>> >> It's not a networking patch, it's a patch for generic RCU upstream.
>> >> So it either goes through Paul McKenney, or directly via Linus on
>> >> linux-kernel.
>> >
>> > Well, whole discussion went on linux kernel, and you were the original
>> > committer of this patch five years ago.
>> >
>> > http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=bbaffaca4810de1a25e32ecaf836eeaacc7a3d11
>> >
>> > So please Paul or David make sure the patch goes in.
>> >
>> > My only concern is that Paul seems quite busy these days.
>>
>> Ok I can take it then, no problem.
>
> Or if you would like to take it:
>
> Acked-by: Paul E. McKenney <[email protected]>

Let's end this deadlock by me taking it :-)

2013-06-03 06:06:10

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On Sun, Jun 02, 2013 at 08:49:15PM -0700, David Miller wrote:
> From: "Paul E. McKenney" <[email protected]>
> Date: Sun, 2 Jun 2013 20:42:54 -0700
>
> > On Sun, Jun 02, 2013 at 08:27:03PM -0700, David Miller wrote:
> >> From: Eric Dumazet <[email protected]>
> >> Date: Sun, 02 Jun 2013 20:12:49 -0700
> >>
> >> > On Sun, 2013-06-02 at 19:58 -0700, David Miller wrote:
> >> >> From: Eric Dumazet <[email protected]>
> >> >> Date: Sun, 02 Jun 2013 16:31:35 -0700
> >> >>
> >> >> > On Thu, 2013-05-30 at 12:25 +0400, Roman Gushchin wrote:
> >> >> >> On 29.05.2013 23:06, Eric Dumazet wrote:
> >> >> >> > On Wed, 2013-05-29 at 14:09 +0400, Roman Gushchin wrote:
> >> >> >> >
> >> >> >> > True, these lookup functions are usually structured the same around the
> >> >> >> > hlist_nulls_for_each_entry_rcu() loop.
> >> >> >> >
> >> >> >> > A barrier() right before the loop seems to be a benefit, the size of
> >> >> >> > assembly code is reduced by 48 bytes.
> >> >> >> >
> >> >> >> > And its one of the documented way to handle this kind of problems
> >> >> >> > (Documentation/atomic_ops.txt line 114)
> >> >> >> >
> >> >> >> > I guess we should amend this documentation, eventually.
> >> >> >> >
> >> >> >> > Thanks, please add you "Signed-off-by" if you agree with the patch.
> >> >> >> >
> >> >> >>
> >> >> >> Signed-off-by: Roman Gushchin <[email protected]>
> >> >> >>
> >> >> >> Many thanks to you, Paul E. McKenney and David Laight for your
> >> >> >> patches, help and participation in this discussion.
> >> >> >
> >> >> > Thanks to you !
> >> >> >
> >> >> > David, is there any problem with the patch ?
> >> >> >
> >> >> > http://patchwork.ozlabs.org/patch/247360/ says "Not applicable", please
> >> >> > tell me what is needed to get it merged.
> >> >>
> >> >> It's not a networking patch, it's a patch for generic RCU upstream.
> >> >> So it either goes through Paul McKenney, or directly via Linus on
> >> >> linux-kernel.
> >> >
> >> > Well, whole discussion went on linux kernel, and you were the original
> >> > committer of this patch five years ago.
> >> >
> >> > http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=bbaffaca4810de1a25e32ecaf836eeaacc7a3d11
> >> >
> >> > So please Paul or David make sure the patch goes in.
> >> >
> >> > My only concern is that Paul seems quite busy these days.
> >>
> >> Ok I can take it then, no problem.
> >
> > Or if you would like to take it:
> >
> > Acked-by: Paul E. McKenney <[email protected]>
>
> Let's end this deadlock by me taking it :-)

You got it! ;-)

Thanx, Paul

2013-06-10 18:36:29

by Boris B. Zhmurov

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On 06/03/2013 07:49 AM, David Miller wrote:


>>>> Well, whole discussion went on linux kernel, and you were the original
>>>> committer of this patch five years ago.
>>>>
>>>> http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=bbaffaca4810de1a25e32ecaf836eeaacc7a3d11
>>>>
>>>> So please Paul or David make sure the patch goes in.
>>>>
>>>> My only concern is that Paul seems quite busy these days.
>>>
>>> Ok I can take it then, no problem.
>>
>> Or if you would like to take it:
>>
>> Acked-by: Paul E. McKenney <[email protected]>
>
> Let's end this deadlock by me taking it :-)


Hello David,

Do you think, this patch is "stable material" and should be included in
3.0-stable and 3.4-stable trees? Thanks.

--
Boris B. Zhmurov
Yandex.Search SRE Team
*************
т.: +7 (495) 739-70-00 ext.7160
http://yandex.ru
*************

2013-06-10 18:52:07

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

On Mon, 2013-06-10 at 22:29 +0400, Boris B. Zhmurov wrote:

> Hello David,
>
> Do you think, this patch is "stable material" and should be included in
> 3.0-stable and 3.4-stable trees? Thanks.

Yes, its in the stable queue :

http://patchwork.ozlabs.org/bundle/davem/stable/?state=*