From: Yuqi Jin <[email protected]>
atomic_try_cmpxchg is called instead of atomic_cmpxchg that can reduce
the access number of the global variable @p_id in the loop. Let's
optimize it for performance.
Cc: "David S. Miller" <[email protected]>
Cc: Alexey Kuznetsov <[email protected]>
Cc: Hideaki YOSHIFUJI <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Yang Guo <[email protected]>
Signed-off-by: Yuqi Jin <[email protected]>
Signed-off-by: Shaokun Zhang <[email protected]>
---
net/ipv4/route.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 87e979f2b74a..7e28c7121c20 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -496,10 +496,10 @@ u32 ip_idents_reserve(u32 hash, int segs)
delta = prandom_u32_max(now - old);
/* Do not use atomic_add_return() as it makes UBSAN unhappy */
+ old = (u32)atomic_read(p_id);
do {
- old = (u32)atomic_read(p_id);
new = old + delta + segs;
- } while (atomic_cmpxchg(p_id, old, new) != old);
+ } while (!atomic_try_cmpxchg(p_id, &old, new));
return new - segs;
}
--
2.7.4
From: Shaokun Zhang <[email protected]>
Date: Wed, 15 Jan 2020 11:23:40 +0800
> From: Yuqi Jin <[email protected]>
>
> atomic_try_cmpxchg is called instead of atomic_cmpxchg that can reduce
> the access number of the global variable @p_id in the loop. Let's
> optimize it for performance.
>
> Cc: "David S. Miller" <[email protected]>
> Cc: Alexey Kuznetsov <[email protected]>
> Cc: Hideaki YOSHIFUJI <[email protected]>
> Cc: Eric Dumazet <[email protected]>
> Cc: Yang Guo <[email protected]>
> Signed-off-by: Yuqi Jin <[email protected]>
> Signed-off-by: Shaokun Zhang <[email protected]>
I doubt this makes any measurable improvement in performance.
If you can document a specific measurable improvement under
a useful set of circumstances for real usage, then put those
details into the commit message and resubmit.
Otherwise, I'm not applying this, sorry.
Hi David,
On 2020/1/16 20:27, David Miller wrote:
> From: Shaokun Zhang <[email protected]>
> Date: Wed, 15 Jan 2020 11:23:40 +0800
>
>> From: Yuqi Jin <[email protected]>
>>
>> atomic_try_cmpxchg is called instead of atomic_cmpxchg that can reduce
>> the access number of the global variable @p_id in the loop. Let's
>> optimize it for performance.
>>
>> Cc: "David S. Miller" <[email protected]>
>> Cc: Alexey Kuznetsov <[email protected]>
>> Cc: Hideaki YOSHIFUJI <[email protected]>
>> Cc: Eric Dumazet <[email protected]>
>> Cc: Yang Guo <[email protected]>
>> Signed-off-by: Yuqi Jin <[email protected]>
>> Signed-off-by: Shaokun Zhang <[email protected]>
>
> I doubt this makes any measurable improvement in performance.
>
> If you can document a specific measurable improvement under
> a useful set of circumstances for real usage, then put those
> details into the commit message and resubmit.
Ok, I will do it and resubmit.
Thanks your reply,
Shaokun
>
> Otherwise, I'm not applying this, sorry.
>
> .
>
On 1/16/20 4:27 AM, David Miller wrote:
> From: Shaokun Zhang <[email protected]>
> Date: Wed, 15 Jan 2020 11:23:40 +0800
>
>> From: Yuqi Jin <[email protected]>
>>
>> atomic_try_cmpxchg is called instead of atomic_cmpxchg that can reduce
>> the access number of the global variable @p_id in the loop. Let's
>> optimize it for performance.
>>
>> Cc: "David S. Miller" <[email protected]>
>> Cc: Alexey Kuznetsov <[email protected]>
>> Cc: Hideaki YOSHIFUJI <[email protected]>
>> Cc: Eric Dumazet <[email protected]>
>> Cc: Yang Guo <[email protected]>
>> Signed-off-by: Yuqi Jin <[email protected]>
>> Signed-off-by: Shaokun Zhang <[email protected]>
>
> I doubt this makes any measurable improvement in performance.
>
> If you can document a specific measurable improvement under
> a useful set of circumstances for real usage, then put those
> details into the commit message and resubmit.
>
> Otherwise, I'm not applying this, sorry.
>
Real difference that could be made here is to
only use this cmpxchg() dance for CONFIG_UBSAN
When CONFIG_UBSAN is not set, atomic_add_return() is just fine.
(Supposedly UBSAN should not warn about that either, but this depends on compiler version)
On 1/16/20 7:12 AM, Eric Dumazet wrote:
>
>
> On 1/16/20 4:27 AM, David Miller wrote:
>> From: Shaokun Zhang <[email protected]>
>> Date: Wed, 15 Jan 2020 11:23:40 +0800
>>
>>> From: Yuqi Jin <[email protected]>
>>>
>>> atomic_try_cmpxchg is called instead of atomic_cmpxchg that can reduce
>>> the access number of the global variable @p_id in the loop. Let's
>>> optimize it for performance.
>>>
>>> Cc: "David S. Miller" <[email protected]>
>>> Cc: Alexey Kuznetsov <[email protected]>
>>> Cc: Hideaki YOSHIFUJI <[email protected]>
>>> Cc: Eric Dumazet <[email protected]>
>>> Cc: Yang Guo <[email protected]>
>>> Signed-off-by: Yuqi Jin <[email protected]>
>>> Signed-off-by: Shaokun Zhang <[email protected]>
>>
>> I doubt this makes any measurable improvement in performance.
>>
>> If you can document a specific measurable improvement under
>> a useful set of circumstances for real usage, then put those
>> details into the commit message and resubmit.
>>
>> Otherwise, I'm not applying this, sorry.
>>
>
>
> Real difference that could be made here is to
> only use this cmpxchg() dance for CONFIG_UBSAN
>
> When CONFIG_UBSAN is not set, atomic_add_return() is just fine.
>
> (Supposedly UBSAN should not warn about that either, but this depends on compiler version)
I will test something like :
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 2010888e68ca96ae880481973a6d808d6c5612c5..e2fa972f5c78f2aefc801db6a45b2a81141c3028 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -495,11 +495,15 @@ u32 ip_idents_reserve(u32 hash, int segs)
if (old != now && cmpxchg(p_tstamp, old, now) == old)
delta = prandom_u32_max(now - old);
- /* Do not use atomic_add_return() as it makes UBSAN unhappy */
+#ifdef CONFIG_UBSAN
+ /* Do not use atomic_add_return() as it makes old UBSAN versions unhappy */
do {
old = (u32)atomic_read(p_id);
new = old + delta + segs;
} while (atomic_cmpxchg(p_id, old, new) != old);
+#else
+ new = atomic_add_return(segs + delta, p_id);
+#endif
return new - segs;
}
+Cc Will Deacon,
On 2020/1/16 23:19, Eric Dumazet wrote:
>
>
> On 1/16/20 7:12 AM, Eric Dumazet wrote:
>>
>>
>> On 1/16/20 4:27 AM, David Miller wrote:
>>> From: Shaokun Zhang <[email protected]>
>>> Date: Wed, 15 Jan 2020 11:23:40 +0800
>>>
>>>> From: Yuqi Jin <[email protected]>
>>>>
>>>> atomic_try_cmpxchg is called instead of atomic_cmpxchg that can reduce
>>>> the access number of the global variable @p_id in the loop. Let's
>>>> optimize it for performance.
>>>>
>>>> Cc: "David S. Miller" <[email protected]>
>>>> Cc: Alexey Kuznetsov <[email protected]>
>>>> Cc: Hideaki YOSHIFUJI <[email protected]>
>>>> Cc: Eric Dumazet <[email protected]>
>>>> Cc: Yang Guo <[email protected]>
>>>> Signed-off-by: Yuqi Jin <[email protected]>
>>>> Signed-off-by: Shaokun Zhang <[email protected]>
>>>
>>> I doubt this makes any measurable improvement in performance.
>>>
>>> If you can document a specific measurable improvement under
>>> a useful set of circumstances for real usage, then put those
>>> details into the commit message and resubmit.
>>>
>>> Otherwise, I'm not applying this, sorry.
>>>
>>
>>
>> Real difference that could be made here is to
>> only use this cmpxchg() dance for CONFIG_UBSAN
>>
>> When CONFIG_UBSAN is not set, atomic_add_return() is just fine.
>>
>> (Supposedly UBSAN should not warn about that either, but this depends on compiler version)
>
> I will test something like :
>
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 2010888e68ca96ae880481973a6d808d6c5612c5..e2fa972f5c78f2aefc801db6a45b2a81141c3028 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -495,11 +495,15 @@ u32 ip_idents_reserve(u32 hash, int segs)
> if (old != now && cmpxchg(p_tstamp, old, now) == old)
> delta = prandom_u32_max(now - old);
>
> - /* Do not use atomic_add_return() as it makes UBSAN unhappy */
> +#ifdef CONFIG_UBSAN
I appreciate Eric's idea.
> + /* Do not use atomic_add_return() as it makes old UBSAN versions unhappy */
> do {
> old = (u32)atomic_read(p_id);
> new = old + delta + segs;
> } while (atomic_cmpxchg(p_id, old, new) != old);
But about this, I still think that we can try to use atomic_try_cmpxchg instead
of atomic_cmpxchg, like refcount_add_not_zero in include/linux/refcount.h
I abstract the model, as follow:
while (get_cycles() <= (time_temp + t_cnt))
{
do{
old_temp = wk_in->num.counter;
oldt = atomic64_cmpxchg(&wk_in->num, old_temp, old_temp+1);
}while(oldt != old_temp);
myndelay(delay_o);
w_cnt+=1;
}
val = atomic64_read(&wk_in->num);
while (get_cycles() <= (time_temp + t_cnt))
{
do{
new_val = val + 1;
}while(!atomic64_try_cmpxchg(&wk_in->num, &val, new_val));
myndelay(delay_o);
w_cnt += 1;
}
If we take the access global variable out of loop, the performance become better
on both x86 and arm64, as follow:
Kunpeng920: 24 cores call it and the successful operations per second
atomic64_read in loop atomic64_read out of the loop
6291034 8751962
Intel 6248: 20 cores call it and the successful operations per second
atomic64_read in loop atomic64_read out of the loop
8934792 9978998
So how about this? ;-)
delta = prandom_u32_max(now - old);
+#ifdef CONFIG_UBSAN
/* Do not use atomic_add_return() as it makes UBSAN unhappy */
+ old = (u32)atomic_read(p_id);
do {
- old = (u32)atomic_read(p_id);
new = old + delta + segs;
- } while (atomic_cmpxchg(p_id, old, new) != old);
+ } while (!atomic_try_cmpxchg(p_id, &old, new));
+#else
+ new = atomic_add_return(segs + delta, p_id);
+#endif
Thanks,
Shaokun
> +#else
> + new = atomic_add_return(segs + delta, p_id);
> +#endif
>
> return new - segs;
> }
>
>
> .
>
On Fri, Jan 17, 2020 at 02:54:03PM +0800, Shaokun Zhang wrote:
> So how about this? ;-)
>
> delta = prandom_u32_max(now - old);
>
> +#ifdef CONFIG_UBSAN
> /* Do not use atomic_add_return() as it makes UBSAN unhappy */
> + old = (u32)atomic_read(p_id);
> do {
> - old = (u32)atomic_read(p_id);
> new = old + delta + segs;
> - } while (atomic_cmpxchg(p_id, old, new) != old);
> + } while (!atomic_try_cmpxchg(p_id, &old, new));
> +#else
> + new = atomic_add_return(segs + delta, p_id);
> +#endif
That's crazy, just accept that UBSAN is taking bonghits and ignore it.
Use atomic_add_return() unconditionally.
On 1/17/20 4:32 AM, Peter Zijlstra wrote:
>
> That's crazy, just accept that UBSAN is taking bonghits and ignore it.
> Use atomic_add_return() unconditionally.
>
Yes, we might simply add a comment so that people do not bug us if
their compiler is too old.
/* If UBSAN reports an error there, please make sure your compiler
* supports -fno-strict-overflow before reporting it.
*/
return atomic_add_return(segs + delta, p_id) - segs;
On Fri, Jan 17, 2020 at 08:35:07AM -0800, Eric Dumazet wrote:
>
>
> On 1/17/20 4:32 AM, Peter Zijlstra wrote:
>
> >
> > That's crazy, just accept that UBSAN is taking bonghits and ignore it.
> > Use atomic_add_return() unconditionally.
> >
>
> Yes, we might simply add a comment so that people do not bug us if
> their compiler is too old.
>
> /* If UBSAN reports an error there, please make sure your compiler
> * supports -fno-strict-overflow before reporting it.
> */
> return atomic_add_return(segs + delta, p_id) - segs;
>
Do we need that comment any more? The flag was apparently introduced in
gcc-4.2 and we only support 4.6+ anyway?
On 1/17/20 10:03 AM, Arvind Sankar wrote:
> On Fri, Jan 17, 2020 at 08:35:07AM -0800, Eric Dumazet wrote:
>>
>>
>> On 1/17/20 4:32 AM, Peter Zijlstra wrote:
>>
>>>
>>> That's crazy, just accept that UBSAN is taking bonghits and ignore it.
>>> Use atomic_add_return() unconditionally.
>>>
>>
>> Yes, we might simply add a comment so that people do not bug us if
>> their compiler is too old.
>>
>> /* If UBSAN reports an error there, please make sure your compiler
>> * supports -fno-strict-overflow before reporting it.
>> */
>> return atomic_add_return(segs + delta, p_id) - segs;
>>
>
> Do we need that comment any more? The flag was apparently introduced in
> gcc-4.2 and we only support 4.6+ anyway?
Wasńt it the case back in 2016 already for linux-4.8 ?
What will prevent someone to send another report to netdev/lkml ?
-fno-strict-overflow support is not a prereq for CONFIG_UBSAN.
Fact that we kept in lib/ubsan.c and lib/test_ubsan.c code for
test_ubsan_add_overflow() and test_ubsan_sub_overflow() is disturbing.
commit adb03115f4590baa280ddc440a8eff08a6be0cb7
Author: Eric Dumazet <[email protected]>
Date: Tue Sep 20 18:06:17 2016 -0700
net: get rid of an signed integer overflow in ip_idents_reserve()
Jiri Pirko reported an UBSAN warning happening in ip_idents_reserve()
[] UBSAN: Undefined behaviour in ./arch/x86/include/asm/atomic.h:156:11
[] signed integer overflow:
[] -2117905507 + -695755206 cannot be represented in type 'int'
Since we do not have uatomic_add_return() yet, use atomic_cmpxchg()
so that the arithmetics can be done using unsigned int.
Fixes: 04ca6973f7c1 ("ip: make IP identifiers less predictable")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
On Fri, Jan 17, 2020 at 10:16:45AM -0800, Eric Dumazet wrote:
> Wasńt it the case back in 2016 already for linux-4.8 ?
>
> What will prevent someone to send another report to netdev/lkml ?
>
> -fno-strict-overflow support is not a prereq for CONFIG_UBSAN.
>
> Fact that we kept in lib/ubsan.c and lib/test_ubsan.c code for
> test_ubsan_add_overflow() and test_ubsan_sub_overflow() is disturbing.
>
No, it was bumped in 2018 in commit cafa0010cd51 ("Raise the minimum
required gcc version to 4.6"). That raised it from 3.2 -> 4.6.
On 1/17/20 10:38 AM, Arvind Sankar wrote:
> On Fri, Jan 17, 2020 at 10:16:45AM -0800, Eric Dumazet wrote:
>> Wasńt it the case back in 2016 already for linux-4.8 ?
>>
>> What will prevent someone to send another report to netdev/lkml ?
>>
>> -fno-strict-overflow support is not a prereq for CONFIG_UBSAN.
>>
>> Fact that we kept in lib/ubsan.c and lib/test_ubsan.c code for
>> test_ubsan_add_overflow() and test_ubsan_sub_overflow() is disturbing.
>>
>
> No, it was bumped in 2018 in commit cafa0010cd51 ("Raise the minimum
> required gcc version to 4.6"). That raised it from 3.2 -> 4.6.
>
This seems good to me, for gcc at least.
Maybe it is time to enfore -fno-strict-overflow in KBUILD_CFLAGS
instead of making it conditional.
Thanks.
Hi Peter,
On 2020/1/17 20:32, Peter Zijlstra wrote:
> On Fri, Jan 17, 2020 at 02:54:03PM +0800, Shaokun Zhang wrote:
>> So how about this? ;-)
>>
>> delta = prandom_u32_max(now - old);
>>
>> +#ifdef CONFIG_UBSAN
>> /* Do not use atomic_add_return() as it makes UBSAN unhappy */
>> + old = (u32)atomic_read(p_id);
>> do {
>> - old = (u32)atomic_read(p_id);
>> new = old + delta + segs;
>> - } while (atomic_cmpxchg(p_id, old, new) != old);
>> + } while (!atomic_try_cmpxchg(p_id, &old, new));
>> +#else
>> + new = atomic_add_return(segs + delta, p_id);
>> +#endif
>
> That's crazy, just accept that UBSAN is taking bonghits and ignore it.
> Use atomic_add_return() unconditionally.
We have used the atomic_add_return[1], but it makes the UBSAN unhappy followed
by the comment.
It seems that Eric also agreed to do it if some comments are added. I will do
it later.
Thanks,
Shaokun
[1] https://lkml.org/lkml/2019/7/26/217
>
> .
>
On Sat, Jan 18, 2020 at 7:47 PM Shaokun Zhang
<[email protected]> wrote:
>
> We have used the atomic_add_return[1], but it makes the UBSAN unhappy followed
> by the comment.
> It seems that Eric also agreed to do it if some comments are added. I will do
> it later.
>
> Thanks,
> Shaokun
>
> [1] https://lkml.org/lkml/2019/7/26/217
>
In case you have missed it, we needed a proper analysis.
My feedback was quite simple :
<quote>
Have you first checked that current UBSAN versions will not complain anymore ?
</quote>
You never did this work, never replied to my question, and months
later you come back
with a convoluted patch while we simply can proceed with a revert now
we are sure that linux kernels are compiled with the proper option.
As mentioned yesterday, no need for a comment.
Instead the changelog should be explaining why the revert is now safe.
On Fri, Jan 17, 2020 at 10:48:19AM -0800, Eric Dumazet wrote:
>
>
> On 1/17/20 10:38 AM, Arvind Sankar wrote:
> > On Fri, Jan 17, 2020 at 10:16:45AM -0800, Eric Dumazet wrote:
> >> Wasńt it the case back in 2016 already for linux-4.8 ?
> >>
> >> What will prevent someone to send another report to netdev/lkml ?
> >>
> >> -fno-strict-overflow support is not a prereq for CONFIG_UBSAN.
> >>
> >> Fact that we kept in lib/ubsan.c and lib/test_ubsan.c code for
> >> test_ubsan_add_overflow() and test_ubsan_sub_overflow() is disturbing.
> >>
> >
> > No, it was bumped in 2018 in commit cafa0010cd51 ("Raise the minimum
> > required gcc version to 4.6"). That raised it from 3.2 -> 4.6.
> >
>
> This seems good to me, for gcc at least.
>
> Maybe it is time to enfore -fno-strict-overflow in KBUILD_CFLAGS
> instead of making it conditional.
IIRC there was a bug in UBSAN vs -fwrapv/-fno-strict-overflow that was
only fixed in gcc-8 or 9 or so.
So while the -fwrapv/-fno-strict-overflow flag has been correctly
supported since like forever, UBSAN was buggy until quite recent when
used in conjustion with that flag.
Hi Eric,
On 2020/1/19 12:12, Eric Dumazet wrote:
> On Sat, Jan 18, 2020 at 7:47 PM Shaokun Zhang
> <[email protected]> wrote:
>>
>
>> We have used the atomic_add_return[1], but it makes the UBSAN unhappy followed
>> by the comment.
>> It seems that Eric also agreed to do it if some comments are added. I will do
>> it later.
>>
>> Thanks,
>> Shaokun
>>
>> [1] https://lkml.org/lkml/2019/7/26/217
>>
>
> In case you have missed it, we needed a proper analysis.
> My feedback was quite simple :
>
> <quote>
> Have you first checked that current UBSAN versions will not complain anymore ?
> </quote>
>
> You never did this work, never replied to my question, and months
Yeah, I'm not sure how to deal with the UBSAN issue and you said that some of
you would work this.
> later you come back
> with a convoluted patch while we simply can proceed with a revert now
After several months, we thought that we can do it like refcount_add_not_zero,
so we submit this patch.
> we are sure that linux kernels are compiled with the proper option.
>
> As mentioned yesterday, no need for a comment.
> Instead the changelog should be explaining why the revert is now safe.
>
Ok, it is really needed to consider this.
Thanks,
Shaokun
> .
>
On Sat, Jan 18, 2020 at 08:12:48PM -0800, Eric Dumazet wrote:
> Instead the changelog should be explaining why the revert is now safe.
FWIW, it was always safe, UBSAN was just buggy.
Hi Peter/Eric,
Shall we use atomic_add_return() unconditionally and add some comments? Or I missed
something.
Thanks,
Shaokun
On 2020/1/20 16:18, Peter Zijlstra wrote:
> On Fri, Jan 17, 2020 at 10:48:19AM -0800, Eric Dumazet wrote:
>>
>>
>> On 1/17/20 10:38 AM, Arvind Sankar wrote:
>>> On Fri, Jan 17, 2020 at 10:16:45AM -0800, Eric Dumazet wrote:
>>>> Wasńt it the case back in 2016 already for linux-4.8 ?
>>>>
>>>> What will prevent someone to send another report to netdev/lkml ?
>>>>
>>>> -fno-strict-overflow support is not a prereq for CONFIG_UBSAN.
>>>>
>>>> Fact that we kept in lib/ubsan.c and lib/test_ubsan.c code for
>>>> test_ubsan_add_overflow() and test_ubsan_sub_overflow() is disturbing.
>>>>
>>>
>>> No, it was bumped in 2018 in commit cafa0010cd51 ("Raise the minimum
>>> required gcc version to 4.6"). That raised it from 3.2 -> 4.6.
>>>
>>
>> This seems good to me, for gcc at least.
>>
>> Maybe it is time to enfore -fno-strict-overflow in KBUILD_CFLAGS
>> instead of making it conditional.
>
> IIRC there was a bug in UBSAN vs -fwrapv/-fno-strict-overflow that was
> only fixed in gcc-8 or 9 or so.
>
> So while the -fwrapv/-fno-strict-overflow flag has been correctly
> supported since like forever, UBSAN was buggy until quite recent when
> used in conjustion with that flag.
>
> .
>
On Thu, May 7, 2020 at 2:12 AM Shaokun Zhang <[email protected]> wrote:
>
> Hi Peter/Eric,
>
> Shall we use atomic_add_return() unconditionally and add some comments? Or I missed
> something.
>
Yes. A big fat comment, because I do not want yet another bogus
complaint from someone playing with a buggy UBSAN.