2019-11-06 17:50:52

by Eric Dumazet

[permalink] [raw]
Subject: [PATCH] hrtimer: Annotate lockless access to timer->state

syzbot reported various data-race caused by hrtimer_is_queued()
reading timer->state. We need a READ_ONCE() there to silence
the warning.

Also add the corresponding WRITE_ONCE() when timer->state is set.

I chose to not use hrtimer_is_queued() helper in remove_hrtimer()
to avoid loading timer->state twice.

KCSAN reported these cases :

BUG: KCSAN: data-race in __remove_hrtimer / tcp_pacing_check

write to 0xffff8880b2a7d388 of 1 bytes by interrupt on cpu 0:
__remove_hrtimer+0x52/0x130 kernel/time/hrtimer.c:991
__run_hrtimer kernel/time/hrtimer.c:1496 [inline]
__hrtimer_run_queues+0x250/0x600 kernel/time/hrtimer.c:1576
hrtimer_run_softirq+0x10e/0x150 kernel/time/hrtimer.c:1593
__do_softirq+0x115/0x33f kernel/softirq.c:292
run_ksoftirqd+0x46/0x60 kernel/softirq.c:603
smpboot_thread_fn+0x37d/0x4a0 kernel/smpboot.c:165
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

read to 0xffff8880b2a7d388 of 1 bytes by task 24652 on cpu 1:
tcp_pacing_check net/ipv4/tcp_output.c:2235 [inline]
tcp_pacing_check+0xba/0x130 net/ipv4/tcp_output.c:2225
tcp_xmit_retransmit_queue+0x32c/0x5a0 net/ipv4/tcp_output.c:3044
tcp_xmit_recovery+0x7c/0x120 net/ipv4/tcp_input.c:3558
tcp_ack+0x17b6/0x3170 net/ipv4/tcp_input.c:3717
tcp_rcv_established+0x37e/0xf50 net/ipv4/tcp_input.c:5696
tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561
sk_backlog_rcv include/net/sock.h:945 [inline]
__release_sock+0x135/0x1e0 net/core/sock.c:2435
release_sock+0x61/0x160 net/core/sock.c:2951
sk_stream_wait_memory+0x3d7/0x7c0 net/core/stream.c:145
tcp_sendmsg_locked+0xb47/0x1f30 net/ipv4/tcp.c:1393
tcp_sendmsg+0x39/0x60 net/ipv4/tcp.c:1434
inet_sendmsg+0x6d/0x90 net/ipv4/af_inet.c:807
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg+0x9f/0xc0 net/socket.c:657

BUG: KCSAN: data-race in __remove_hrtimer / __tcp_ack_snd_check

write to 0xffff8880a3a65588 of 1 bytes by interrupt on cpu 0:
__remove_hrtimer+0x52/0x130 kernel/time/hrtimer.c:991
__run_hrtimer kernel/time/hrtimer.c:1496 [inline]
__hrtimer_run_queues+0x250/0x600 kernel/time/hrtimer.c:1576
hrtimer_run_softirq+0x10e/0x150 kernel/time/hrtimer.c:1593
__do_softirq+0x115/0x33f kernel/softirq.c:292
invoke_softirq kernel/softirq.c:373 [inline]
irq_exit+0xbb/0xe0 kernel/softirq.c:413
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0xe6/0x280 arch/x86/kernel/apic/apic.c:1137
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830

read to 0xffff8880a3a65588 of 1 bytes by task 22891 on cpu 1:
__tcp_ack_snd_check+0x415/0x4f0 net/ipv4/tcp_input.c:5265
tcp_ack_snd_check net/ipv4/tcp_input.c:5287 [inline]
tcp_rcv_established+0x750/0xf50 net/ipv4/tcp_input.c:5708
tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561
sk_backlog_rcv include/net/sock.h:945 [inline]
__release_sock+0x135/0x1e0 net/core/sock.c:2435
release_sock+0x61/0x160 net/core/sock.c:2951
sk_stream_wait_memory+0x3d7/0x7c0 net/core/stream.c:145
tcp_sendmsg_locked+0xb47/0x1f30 net/ipv4/tcp.c:1393
tcp_sendmsg+0x39/0x60 net/ipv4/tcp.c:1434
inet_sendmsg+0x6d/0x90 net/ipv4/af_inet.c:807
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg+0x9f/0xc0 net/socket.c:657
__sys_sendto+0x21f/0x320 net/socket.c:1952
__do_sys_sendto net/socket.c:1964 [inline]
__se_sys_sendto net/socket.c:1960 [inline]
__x64_sys_sendto+0x89/0xb0 net/socket.c:1960
do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 24652 Comm: syz-executor.3 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Reported-by: syzbot <[email protected]>
---
include/linux/hrtimer.h | 2 +-
kernel/time/hrtimer.c | 9 +++++----
2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 1b9a51a1bccb78d4ef2a4019cfd547d0d1652b79..d55ec621221d2cbf48823445cc9cf2d7f6988f0c 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -461,7 +461,7 @@ extern bool hrtimer_active(const struct hrtimer *timer);
*/
static inline int hrtimer_is_queued(struct hrtimer *timer)
{
- return timer->state & HRTIMER_STATE_ENQUEUED;
+ return READ_ONCE(timer->state) & HRTIMER_STATE_ENQUEUED;
}

/*
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 65605530ee349c9682690c4fccb43aa9284d4287..a1cb0a959be32068025b625240cc1499b6416b63 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -966,7 +966,7 @@ static int enqueue_hrtimer(struct hrtimer *timer,

base->cpu_base->active_bases |= 1 << base->index;

- timer->state = HRTIMER_STATE_ENQUEUED;
+ WRITE_ONCE(timer->state, HRTIMER_STATE_ENQUEUED);

return timerqueue_add(&base->active, &timer->node);
}
@@ -988,7 +988,7 @@ static void __remove_hrtimer(struct hrtimer *timer,
struct hrtimer_cpu_base *cpu_base = base->cpu_base;
u8 state = timer->state;

- timer->state = newstate;
+ WRITE_ONCE(timer->state, newstate);
if (!(state & HRTIMER_STATE_ENQUEUED))
return;

@@ -1013,8 +1013,9 @@ static void __remove_hrtimer(struct hrtimer *timer,
static inline int
remove_hrtimer(struct hrtimer *timer, struct hrtimer_clock_base *base, bool restart)
{
- if (hrtimer_is_queued(timer)) {
- u8 state = timer->state;
+ u8 state = timer->state;
+
+ if (state & HRTIMER_STATE_ENQUEUED) {
int reprogram;

/*
--
2.24.0.rc1.363.gb1bccd3e3d-goog


2019-11-06 18:10:39

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] hrtimer: Annotate lockless access to timer->state

On Wed, 6 Nov 2019, Eric Dumazet wrote:
> @@ -1013,8 +1013,9 @@ static void __remove_hrtimer(struct hrtimer *timer,
> static inline int
> remove_hrtimer(struct hrtimer *timer, struct hrtimer_clock_base *base, bool restart)
> {
> - if (hrtimer_is_queued(timer)) {
> - u8 state = timer->state;
> + u8 state = timer->state;

Shouldn't that be a read once then at least for consistency sake?

> +
> + if (state & HRTIMER_STATE_ENQUEUED) {
> int reprogram;

Thanks,

tglx

2019-11-06 18:22:39

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] hrtimer: Annotate lockless access to timer->state

On Wed, Nov 6, 2019 at 10:09 AM Thomas Gleixner <[email protected]> wrote:
>
> On Wed, 6 Nov 2019, Eric Dumazet wrote:
> > @@ -1013,8 +1013,9 @@ static void __remove_hrtimer(struct hrtimer *timer,
> > static inline int
> > remove_hrtimer(struct hrtimer *timer, struct hrtimer_clock_base *base, bool restart)
> > {
> > - if (hrtimer_is_queued(timer)) {
> > - u8 state = timer->state;
> > + u8 state = timer->state;
>
> Shouldn't that be a read once then at least for consistency sake?

We own the lock here, this is not really needed ?

Note they are other timer->state reads I chose to leave unchanged.

But no big deal if you prefer I can add a READ_ONCE()

Thanks.
>
> > +
> > + if (state & HRTIMER_STATE_ENQUEUED) {
> > int reprogram;
>
> Thanks,
>
> tglx

2019-11-06 19:17:32

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] hrtimer: Annotate lockless access to timer->state

On Wed, 6 Nov 2019, Eric Dumazet wrote:
> On Wed, Nov 6, 2019 at 10:09 AM Thomas Gleixner <[email protected]> wrote:
> >
> > On Wed, 6 Nov 2019, Eric Dumazet wrote:
> > > @@ -1013,8 +1013,9 @@ static void __remove_hrtimer(struct hrtimer *timer,
> > > static inline int
> > > remove_hrtimer(struct hrtimer *timer, struct hrtimer_clock_base *base, bool restart)
> > > {
> > > - if (hrtimer_is_queued(timer)) {
> > > - u8 state = timer->state;
> > > + u8 state = timer->state;
> >
> > Shouldn't that be a read once then at least for consistency sake?
>
> We own the lock here, this is not really needed ?
>
> Note they are other timer->state reads I chose to leave unchanged.
>
> But no big deal if you prefer I can add a READ_ONCE()

Nah. I can add it myself if at all.

I know that the READ_ONCE() is not required there, but I really hate to
twist my brain when staring at code why some places use it and some not.

So at least some commentry helps to avoid that. Something like the
below. If you have no objections I just queue the patch with this folded
in.

Thanks,

tglx

8<-------------
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -456,12 +456,18 @@ extern u64 hrtimer_next_event_without(co

extern bool hrtimer_active(const struct hrtimer *timer);

-/*
- * Helper function to check, whether the timer is on one of the queues
+/**
+ * hrtimer_is_queued = check, whether the timer is on one of the queues
+ * @timer: Timer to check
+ *
+ * Returns: True if the timer is queued, false otherwise
+ *
+ * The function can be used lockless, but it gives only a momentary snapshot.
*/
-static inline int hrtimer_is_queued(struct hrtimer *timer)
+static inline bool hrtimer_is_queued(struct hrtimer *timer)
{
- return READ_ONCE(timer->state) & HRTIMER_STATE_ENQUEUED;
+ /* The READ_ONCE pairs with the update functions of timer->state */
+ return !!READ_ONCE(timer->state) & HRTIMER_STATE_ENQUEUED;
}

/*
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -966,6 +966,7 @@ static int enqueue_hrtimer(struct hrtime

base->cpu_base->active_bases |= 1 << base->index;

+ /* Pairs with the lockless read in hrtimer_is_queued() */
WRITE_ONCE(timer->state, HRTIMER_STATE_ENQUEUED);

return timerqueue_add(&base->active, &timer->node);
@@ -988,6 +989,7 @@ static void __remove_hrtimer(struct hrti
struct hrtimer_cpu_base *cpu_base = base->cpu_base;
u8 state = timer->state;

+ /* Pairs with the lockless read in hrtimer_is_queued() */
WRITE_ONCE(timer->state, newstate);
if (!(state & HRTIMER_STATE_ENQUEUED))
return;

2019-11-06 20:02:26

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] hrtimer: Annotate lockless access to timer->state

On Wed, Nov 6, 2019 at 11:15 AM Thomas Gleixner <[email protected]> wrote:
>
> On Wed, 6 Nov 2019, Eric Dumazet wrote:
> > On Wed, Nov 6, 2019 at 10:09 AM Thomas Gleixner <[email protected]> wrote:
> > >
> > > On Wed, 6 Nov 2019, Eric Dumazet wrote:
> > > > @@ -1013,8 +1013,9 @@ static void __remove_hrtimer(struct hrtimer *timer,
> > > > static inline int
> > > > remove_hrtimer(struct hrtimer *timer, struct hrtimer_clock_base *base, bool restart)
> > > > {
> > > > - if (hrtimer_is_queued(timer)) {
> > > > - u8 state = timer->state;
> > > > + u8 state = timer->state;
> > >
> > > Shouldn't that be a read once then at least for consistency sake?
> >
> > We own the lock here, this is not really needed ?
> >
> > Note they are other timer->state reads I chose to leave unchanged.
> >
> > But no big deal if you prefer I can add a READ_ONCE()
>
> Nah. I can add it myself if at all.
>
> I know that the READ_ONCE() is not required there, but I really hate to
> twist my brain when staring at code why some places use it and some not.
>
> So at least some commentry helps to avoid that. Something like the
> below. If you have no objections I just queue the patch with this folded
> in.


This looks good to me, thanks !

>
> Thanks,
>
> tglx
>
> 8<-------------
> --- a/include/linux/hrtimer.h
> +++ b/include/linux/hrtimer.h
> @@ -456,12 +456,18 @@ extern u64 hrtimer_next_event_without(co
>
> extern bool hrtimer_active(const struct hrtimer *timer);
>
> -/*
> - * Helper function to check, whether the timer is on one of the queues
> +/**
> + * hrtimer_is_queued = check, whether the timer is on one of the queues
> + * @timer: Timer to check
> + *
> + * Returns: True if the timer is queued, false otherwise
> + *
> + * The function can be used lockless, but it gives only a momentary snapshot.
> */
> -static inline int hrtimer_is_queued(struct hrtimer *timer)
> +static inline bool hrtimer_is_queued(struct hrtimer *timer)
> {
> - return READ_ONCE(timer->state) & HRTIMER_STATE_ENQUEUED;
> + /* The READ_ONCE pairs with the update functions of timer->state */
> + return !!READ_ONCE(timer->state) & HRTIMER_STATE_ENQUEUED;
> }
>
> /*
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -966,6 +966,7 @@ static int enqueue_hrtimer(struct hrtime
>
> base->cpu_base->active_bases |= 1 << base->index;
>
> + /* Pairs with the lockless read in hrtimer_is_queued() */
> WRITE_ONCE(timer->state, HRTIMER_STATE_ENQUEUED);
>
> return timerqueue_add(&base->active, &timer->node);
> @@ -988,6 +989,7 @@ static void __remove_hrtimer(struct hrti
> struct hrtimer_cpu_base *cpu_base = base->cpu_base;
> u8 state = timer->state;
>
> + /* Pairs with the lockless read in hrtimer_is_queued() */
> WRITE_ONCE(timer->state, newstate);
> if (!(state & HRTIMER_STATE_ENQUEUED))
> return;

Subject: [tip: timers/core] hrtimer: Annotate lockless access to timer->state

The following commit has been merged into the timers/core branch of tip:

Commit-ID: de4db39b9f0e682e59caa828a277632510901560
Gitweb: https://git.kernel.org/tip/de4db39b9f0e682e59caa828a277632510901560
Author: Eric Dumazet <[email protected]>
AuthorDate: Wed, 06 Nov 2019 09:48:04 -08:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Wed, 06 Nov 2019 21:59:56 +01:00

hrtimer: Annotate lockless access to timer->state

syzbot reported various data-race caused by hrtimer_is_queued() reading
timer->state. A READ_ONCE() is required there to silence the warning.

Also add the corresponding WRITE_ONCE() when timer->state is set.

In remove_hrtimer() the hrtimer_is_queued() helper is open coded to avoid
loading timer->state twice.

KCSAN reported these cases:

BUG: KCSAN: data-race in __remove_hrtimer / tcp_pacing_check

write to 0xffff8880b2a7d388 of 1 bytes by interrupt on cpu 0:
__remove_hrtimer+0x52/0x130 kernel/time/hrtimer.c:991
__run_hrtimer kernel/time/hrtimer.c:1496 [inline]
__hrtimer_run_queues+0x250/0x600 kernel/time/hrtimer.c:1576
hrtimer_run_softirq+0x10e/0x150 kernel/time/hrtimer.c:1593
__do_softirq+0x115/0x33f kernel/softirq.c:292
run_ksoftirqd+0x46/0x60 kernel/softirq.c:603
smpboot_thread_fn+0x37d/0x4a0 kernel/smpboot.c:165
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

read to 0xffff8880b2a7d388 of 1 bytes by task 24652 on cpu 1:
tcp_pacing_check net/ipv4/tcp_output.c:2235 [inline]
tcp_pacing_check+0xba/0x130 net/ipv4/tcp_output.c:2225
tcp_xmit_retransmit_queue+0x32c/0x5a0 net/ipv4/tcp_output.c:3044
tcp_xmit_recovery+0x7c/0x120 net/ipv4/tcp_input.c:3558
tcp_ack+0x17b6/0x3170 net/ipv4/tcp_input.c:3717
tcp_rcv_established+0x37e/0xf50 net/ipv4/tcp_input.c:5696
tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561
sk_backlog_rcv include/net/sock.h:945 [inline]
__release_sock+0x135/0x1e0 net/core/sock.c:2435
release_sock+0x61/0x160 net/core/sock.c:2951
sk_stream_wait_memory+0x3d7/0x7c0 net/core/stream.c:145
tcp_sendmsg_locked+0xb47/0x1f30 net/ipv4/tcp.c:1393
tcp_sendmsg+0x39/0x60 net/ipv4/tcp.c:1434
inet_sendmsg+0x6d/0x90 net/ipv4/af_inet.c:807
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg+0x9f/0xc0 net/socket.c:657

BUG: KCSAN: data-race in __remove_hrtimer / __tcp_ack_snd_check

write to 0xffff8880a3a65588 of 1 bytes by interrupt on cpu 0:
__remove_hrtimer+0x52/0x130 kernel/time/hrtimer.c:991
__run_hrtimer kernel/time/hrtimer.c:1496 [inline]
__hrtimer_run_queues+0x250/0x600 kernel/time/hrtimer.c:1576
hrtimer_run_softirq+0x10e/0x150 kernel/time/hrtimer.c:1593
__do_softirq+0x115/0x33f kernel/softirq.c:292
invoke_softirq kernel/softirq.c:373 [inline]
irq_exit+0xbb/0xe0 kernel/softirq.c:413
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0xe6/0x280 arch/x86/kernel/apic/apic.c:1137
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830

read to 0xffff8880a3a65588 of 1 bytes by task 22891 on cpu 1:
__tcp_ack_snd_check+0x415/0x4f0 net/ipv4/tcp_input.c:5265
tcp_ack_snd_check net/ipv4/tcp_input.c:5287 [inline]
tcp_rcv_established+0x750/0xf50 net/ipv4/tcp_input.c:5708
tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561
sk_backlog_rcv include/net/sock.h:945 [inline]
__release_sock+0x135/0x1e0 net/core/sock.c:2435
release_sock+0x61/0x160 net/core/sock.c:2951
sk_stream_wait_memory+0x3d7/0x7c0 net/core/stream.c:145
tcp_sendmsg_locked+0xb47/0x1f30 net/ipv4/tcp.c:1393
tcp_sendmsg+0x39/0x60 net/ipv4/tcp.c:1434
inet_sendmsg+0x6d/0x90 net/ipv4/af_inet.c:807
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg+0x9f/0xc0 net/socket.c:657
__sys_sendto+0x21f/0x320 net/socket.c:1952
__do_sys_sendto net/socket.c:1964 [inline]
__se_sys_sendto net/socket.c:1960 [inline]
__x64_sys_sendto+0x89/0xb0 net/socket.c:1960
do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 24652 Comm: syz-executor.3 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

[ tglx: Added comments ]

Reported-by: syzbot <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

---
include/linux/hrtimer.h | 14 ++++++++++----
kernel/time/hrtimer.c | 11 +++++++----
2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 1b9a51a..6308952 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -456,12 +456,18 @@ extern u64 hrtimer_next_event_without(const struct hrtimer *exclude);

extern bool hrtimer_active(const struct hrtimer *timer);

-/*
- * Helper function to check, whether the timer is on one of the queues
+/**
+ * hrtimer_is_queued = check, whether the timer is on one of the queues
+ * @timer: Timer to check
+ *
+ * Returns: True if the timer is queued, false otherwise
+ *
+ * The function can be used lockless, but it gives only a current snapshot.
*/
-static inline int hrtimer_is_queued(struct hrtimer *timer)
+static inline bool hrtimer_is_queued(struct hrtimer *timer)
{
- return timer->state & HRTIMER_STATE_ENQUEUED;
+ /* The READ_ONCE pairs with the update functions of timer->state */
+ return !!READ_ONCE(timer->state) & HRTIMER_STATE_ENQUEUED;
}

/*
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 6560553..7f31932 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -966,7 +966,8 @@ static int enqueue_hrtimer(struct hrtimer *timer,

base->cpu_base->active_bases |= 1 << base->index;

- timer->state = HRTIMER_STATE_ENQUEUED;
+ /* Pairs with the lockless read in hrtimer_is_queued() */
+ WRITE_ONCE(timer->state, HRTIMER_STATE_ENQUEUED);

return timerqueue_add(&base->active, &timer->node);
}
@@ -988,7 +989,8 @@ static void __remove_hrtimer(struct hrtimer *timer,
struct hrtimer_cpu_base *cpu_base = base->cpu_base;
u8 state = timer->state;

- timer->state = newstate;
+ /* Pairs with the lockless read in hrtimer_is_queued() */
+ WRITE_ONCE(timer->state, newstate);
if (!(state & HRTIMER_STATE_ENQUEUED))
return;

@@ -1013,8 +1015,9 @@ static void __remove_hrtimer(struct hrtimer *timer,
static inline int
remove_hrtimer(struct hrtimer *timer, struct hrtimer_clock_base *base, bool restart)
{
- if (hrtimer_is_queued(timer)) {
- u8 state = timer->state;
+ u8 state = timer->state;
+
+ if (state & HRTIMER_STATE_ENQUEUED) {
int reprogram;

/*

2019-11-06 22:04:09

by Eric Dumazet

[permalink] [raw]
Subject: Re: [tip: timers/core] hrtimer: Annotate lockless access to timer->state

On Wed, Nov 6, 2019 at 1:06 PM tip-bot2 for Eric Dumazet
<[email protected]> wrote:
>
> The following commit has been merged into the timers/core branch of tip:
>
> Commit-ID: de4db39b9f0e682e59caa828a277632510901560
> Gitweb: https://git.kernel.org/tip/de4db39b9f0e682e59caa828a277632510901560
> Author: Eric Dumazet <[email protected]>
> AuthorDate: Wed, 06 Nov 2019 09:48:04 -08:00
> Committer: Thomas Gleixner <[email protected]>
> CommitterDate: Wed, 06 Nov 2019 21:59:56 +01:00
>
> hrtimer: Annotate lockless access to timer->state

...

> -/*
> - * Helper function to check, whether the timer is on one of the queues
> +/**
> + * hrtimer_is_queued = check, whether the timer is on one of the queues
> + * @timer: Timer to check
> + *
> + * Returns: True if the timer is queued, false otherwise
> + *
> + * The function can be used lockless, but it gives only a current snapshot.
> */
> -static inline int hrtimer_is_queued(struct hrtimer *timer)
> +static inline bool hrtimer_is_queued(struct hrtimer *timer)
> {
> - return timer->state & HRTIMER_STATE_ENQUEUED;
> + /* The READ_ONCE pairs with the update functions of timer->state */
> + return !!READ_ONCE(timer->state) & HRTIMER_STATE_ENQUEUED;

You probably meant :

return !!(READ_ONCE(timer->state) & HRTIMER_STATE_ENQUEUED);

Sorry for not spotting this earlier.

2019-11-06 22:19:00

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [tip: timers/core] hrtimer: Annotate lockless access to timer->state

On Wed, 6 Nov 2019, Eric Dumazet wrote:
> > -static inline int hrtimer_is_queued(struct hrtimer *timer)
> > +static inline bool hrtimer_is_queued(struct hrtimer *timer)
> > {
> > - return timer->state & HRTIMER_STATE_ENQUEUED;
> > + /* The READ_ONCE pairs with the update functions of timer->state */
> > + return !!READ_ONCE(timer->state) & HRTIMER_STATE_ENQUEUED;
>
> You probably meant :
>
> return !!(READ_ONCE(timer->state) & HRTIMER_STATE_ENQUEUED);
>
> Sorry for not spotting this earlier.

Yes, I'm a moron....

Subject: [tip: timers/core] hrtimer: Annotate lockless access to timer->state

The following commit has been merged into the timers/core branch of tip:

Commit-ID: 56144737e67329c9aaed15f942d46a6302e2e3d8
Gitweb: https://git.kernel.org/tip/56144737e67329c9aaed15f942d46a6302e2e3d8
Author: Eric Dumazet <[email protected]>
AuthorDate: Wed, 06 Nov 2019 09:48:04 -08:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Wed, 06 Nov 2019 23:18:31 +01:00

hrtimer: Annotate lockless access to timer->state

syzbot reported various data-race caused by hrtimer_is_queued() reading
timer->state. A READ_ONCE() is required there to silence the warning.

Also add the corresponding WRITE_ONCE() when timer->state is set.

In remove_hrtimer() the hrtimer_is_queued() helper is open coded to avoid
loading timer->state twice.

KCSAN reported these cases:

BUG: KCSAN: data-race in __remove_hrtimer / tcp_pacing_check

write to 0xffff8880b2a7d388 of 1 bytes by interrupt on cpu 0:
__remove_hrtimer+0x52/0x130 kernel/time/hrtimer.c:991
__run_hrtimer kernel/time/hrtimer.c:1496 [inline]
__hrtimer_run_queues+0x250/0x600 kernel/time/hrtimer.c:1576
hrtimer_run_softirq+0x10e/0x150 kernel/time/hrtimer.c:1593
__do_softirq+0x115/0x33f kernel/softirq.c:292
run_ksoftirqd+0x46/0x60 kernel/softirq.c:603
smpboot_thread_fn+0x37d/0x4a0 kernel/smpboot.c:165
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

read to 0xffff8880b2a7d388 of 1 bytes by task 24652 on cpu 1:
tcp_pacing_check net/ipv4/tcp_output.c:2235 [inline]
tcp_pacing_check+0xba/0x130 net/ipv4/tcp_output.c:2225
tcp_xmit_retransmit_queue+0x32c/0x5a0 net/ipv4/tcp_output.c:3044
tcp_xmit_recovery+0x7c/0x120 net/ipv4/tcp_input.c:3558
tcp_ack+0x17b6/0x3170 net/ipv4/tcp_input.c:3717
tcp_rcv_established+0x37e/0xf50 net/ipv4/tcp_input.c:5696
tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561
sk_backlog_rcv include/net/sock.h:945 [inline]
__release_sock+0x135/0x1e0 net/core/sock.c:2435
release_sock+0x61/0x160 net/core/sock.c:2951
sk_stream_wait_memory+0x3d7/0x7c0 net/core/stream.c:145
tcp_sendmsg_locked+0xb47/0x1f30 net/ipv4/tcp.c:1393
tcp_sendmsg+0x39/0x60 net/ipv4/tcp.c:1434
inet_sendmsg+0x6d/0x90 net/ipv4/af_inet.c:807
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg+0x9f/0xc0 net/socket.c:657

BUG: KCSAN: data-race in __remove_hrtimer / __tcp_ack_snd_check

write to 0xffff8880a3a65588 of 1 bytes by interrupt on cpu 0:
__remove_hrtimer+0x52/0x130 kernel/time/hrtimer.c:991
__run_hrtimer kernel/time/hrtimer.c:1496 [inline]
__hrtimer_run_queues+0x250/0x600 kernel/time/hrtimer.c:1576
hrtimer_run_softirq+0x10e/0x150 kernel/time/hrtimer.c:1593
__do_softirq+0x115/0x33f kernel/softirq.c:292
invoke_softirq kernel/softirq.c:373 [inline]
irq_exit+0xbb/0xe0 kernel/softirq.c:413
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0xe6/0x280 arch/x86/kernel/apic/apic.c:1137
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830

read to 0xffff8880a3a65588 of 1 bytes by task 22891 on cpu 1:
__tcp_ack_snd_check+0x415/0x4f0 net/ipv4/tcp_input.c:5265
tcp_ack_snd_check net/ipv4/tcp_input.c:5287 [inline]
tcp_rcv_established+0x750/0xf50 net/ipv4/tcp_input.c:5708
tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561
sk_backlog_rcv include/net/sock.h:945 [inline]
__release_sock+0x135/0x1e0 net/core/sock.c:2435
release_sock+0x61/0x160 net/core/sock.c:2951
sk_stream_wait_memory+0x3d7/0x7c0 net/core/stream.c:145
tcp_sendmsg_locked+0xb47/0x1f30 net/ipv4/tcp.c:1393
tcp_sendmsg+0x39/0x60 net/ipv4/tcp.c:1434
inet_sendmsg+0x6d/0x90 net/ipv4/af_inet.c:807
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg+0x9f/0xc0 net/socket.c:657
__sys_sendto+0x21f/0x320 net/socket.c:1952
__do_sys_sendto net/socket.c:1964 [inline]
__se_sys_sendto net/socket.c:1960 [inline]
__x64_sys_sendto+0x89/0xb0 net/socket.c:1960
do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 24652 Comm: syz-executor.3 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

[ tglx: Added comments ]

Reported-by: syzbot <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
include/linux/hrtimer.h | 14 ++++++++++----
kernel/time/hrtimer.c | 11 +++++++----
2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 1b9a51a..1f98b52 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -456,12 +456,18 @@ extern u64 hrtimer_next_event_without(const struct hrtimer *exclude);

extern bool hrtimer_active(const struct hrtimer *timer);

-/*
- * Helper function to check, whether the timer is on one of the queues
+/**
+ * hrtimer_is_queued = check, whether the timer is on one of the queues
+ * @timer: Timer to check
+ *
+ * Returns: True if the timer is queued, false otherwise
+ *
+ * The function can be used lockless, but it gives only a current snapshot.
*/
-static inline int hrtimer_is_queued(struct hrtimer *timer)
+static inline bool hrtimer_is_queued(struct hrtimer *timer)
{
- return timer->state & HRTIMER_STATE_ENQUEUED;
+ /* The READ_ONCE pairs with the update functions of timer->state */
+ return !!(READ_ONCE(timer->state) & HRTIMER_STATE_ENQUEUED);
}

/*
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 6560553..7f31932 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -966,7 +966,8 @@ static int enqueue_hrtimer(struct hrtimer *timer,

base->cpu_base->active_bases |= 1 << base->index;

- timer->state = HRTIMER_STATE_ENQUEUED;
+ /* Pairs with the lockless read in hrtimer_is_queued() */
+ WRITE_ONCE(timer->state, HRTIMER_STATE_ENQUEUED);

return timerqueue_add(&base->active, &timer->node);
}
@@ -988,7 +989,8 @@ static void __remove_hrtimer(struct hrtimer *timer,
struct hrtimer_cpu_base *cpu_base = base->cpu_base;
u8 state = timer->state;

- timer->state = newstate;
+ /* Pairs with the lockless read in hrtimer_is_queued() */
+ WRITE_ONCE(timer->state, newstate);
if (!(state & HRTIMER_STATE_ENQUEUED))
return;

@@ -1013,8 +1015,9 @@ static void __remove_hrtimer(struct hrtimer *timer,
static inline int
remove_hrtimer(struct hrtimer *timer, struct hrtimer_clock_base *base, bool restart)
{
- if (hrtimer_is_queued(timer)) {
- u8 state = timer->state;
+ u8 state = timer->state;
+
+ if (state & HRTIMER_STATE_ENQUEUED) {
int reprogram;

/*

2019-11-06 22:55:44

by Eric Dumazet

[permalink] [raw]
Subject: Re: [tip: timers/core] hrtimer: Annotate lockless access to timer->state

On Wed, Nov 6, 2019 at 2:24 PM tip-bot2 for Eric Dumazet
<[email protected]> wrote:
>
> The following commit has been merged into the timers/core branch of tip:
>
> Commit-ID: 56144737e67329c9aaed15f942d46a6302e2e3d8
> Gitweb: https://git.kernel.org/tip/56144737e67329c9aaed15f942d46a6302e2e3d8
> Author: Eric Dumazet <[email protected]>
> AuthorDate: Wed, 06 Nov 2019 09:48:04 -08:00
> Committer: Thomas Gleixner <[email protected]>
> CommitterDate: Wed, 06 Nov 2019 23:18:31 +01:00
>
> hrtimer: Annotate lockless access to timer->state
>

I guess we also need to fix timer_pending(), since timer->entry.pprev
could change while we read it.

I will try to find a KCSAN report showing the issue.

Thanks !

2019-11-06 23:01:17

by Eric Dumazet

[permalink] [raw]
Subject: Re: [tip: timers/core] hrtimer: Annotate lockless access to timer->state

On Wed, Nov 6, 2019 at 2:53 PM Eric Dumazet <[email protected]> wrote:
>
> On Wed, Nov 6, 2019 at 2:24 PM tip-bot2 for Eric Dumazet
> <[email protected]> wrote:
> >
> > The following commit has been merged into the timers/core branch of tip:
> >
> > Commit-ID: 56144737e67329c9aaed15f942d46a6302e2e3d8
> > Gitweb: https://git.kernel.org/tip/56144737e67329c9aaed15f942d46a6302e2e3d8
> > Author: Eric Dumazet <[email protected]>
> > AuthorDate: Wed, 06 Nov 2019 09:48:04 -08:00
> > Committer: Thomas Gleixner <[email protected]>
> > CommitterDate: Wed, 06 Nov 2019 23:18:31 +01:00
> >
> > hrtimer: Annotate lockless access to timer->state
> >
>
> I guess we also need to fix timer_pending(), since timer->entry.pprev
> could change while we read it.
>

It is interesting seeing hlist_add_head() has a WRITE_ONCE(h->first, n);,
but no WRITE_ONCE() for the pprev change.

The WRITE_ONCE() was added in commit 1c97be677f72b3c338312aecd36d8fff20322f32
("list: Use WRITE_ONCE() when adding to lists and hlists")

2019-11-07 08:54:35

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [tip: timers/core] hrtimer: Annotate lockless access to timer->state

On Wed, Nov 06, 2019 at 02:59:36PM -0800, Eric Dumazet wrote:
> On Wed, Nov 6, 2019 at 2:53 PM Eric Dumazet <[email protected]> wrote:
> >
> > On Wed, Nov 6, 2019 at 2:24 PM tip-bot2 for Eric Dumazet
> > <[email protected]> wrote:
> > >
> > > The following commit has been merged into the timers/core branch of tip:
> > >
> > > Commit-ID: 56144737e67329c9aaed15f942d46a6302e2e3d8
> > > Gitweb: https://git.kernel.org/tip/56144737e67329c9aaed15f942d46a6302e2e3d8
> > > Author: Eric Dumazet <[email protected]>
> > > AuthorDate: Wed, 06 Nov 2019 09:48:04 -08:00
> > > Committer: Thomas Gleixner <[email protected]>
> > > CommitterDate: Wed, 06 Nov 2019 23:18:31 +01:00
> > >
> > > hrtimer: Annotate lockless access to timer->state
> > >
> >
> > I guess we also need to fix timer_pending(), since timer->entry.pprev
> > could change while we read it.
>
> It is interesting seeing hlist_add_head() has a WRITE_ONCE(h->first, n);,
> but no WRITE_ONCE() for the pprev change.
>
> The WRITE_ONCE() was added in commit 1c97be677f72b3c338312aecd36d8fff20322f32
> ("list: Use WRITE_ONCE() when adding to lists and hlists")

The theory is that while the ->next pointer is concurrently accessed by
RCU readers, the ->pprev pointer is accessed only by updaters, who need
to supply sufficient synchronization.

But what is this theory missing in practice?

Thanx, Paul

2019-11-07 15:51:30

by Eric Dumazet

[permalink] [raw]
Subject: Re: [tip: timers/core] hrtimer: Annotate lockless access to timer->state

On Thu, Nov 7, 2019 at 12:53 AM Paul E. McKenney <[email protected]> wrote:
>
> On Wed, Nov 06, 2019 at 02:59:36PM -0800, Eric Dumazet wrote:
> > On Wed, Nov 6, 2019 at 2:53 PM Eric Dumazet <[email protected]> wrote:
> > >
> > > On Wed, Nov 6, 2019 at 2:24 PM tip-bot2 for Eric Dumazet
> > > <[email protected]> wrote:
> > > >
> > > > The following commit has been merged into the timers/core branch of tip:
> > > >
> > > > Commit-ID: 56144737e67329c9aaed15f942d46a6302e2e3d8
> > > > Gitweb: https://git.kernel.org/tip/56144737e67329c9aaed15f942d46a6302e2e3d8
> > > > Author: Eric Dumazet <[email protected]>
> > > > AuthorDate: Wed, 06 Nov 2019 09:48:04 -08:00
> > > > Committer: Thomas Gleixner <[email protected]>
> > > > CommitterDate: Wed, 06 Nov 2019 23:18:31 +01:00
> > > >
> > > > hrtimer: Annotate lockless access to timer->state
> > > >
> > >
> > > I guess we also need to fix timer_pending(), since timer->entry.pprev
> > > could change while we read it.
> >
> > It is interesting seeing hlist_add_head() has a WRITE_ONCE(h->first, n);,
> > but no WRITE_ONCE() for the pprev change.
> >
> > The WRITE_ONCE() was added in commit 1c97be677f72b3c338312aecd36d8fff20322f32
> > ("list: Use WRITE_ONCE() when adding to lists and hlists")
>
> The theory is that while the ->next pointer is concurrently accessed by
> RCU readers, the ->pprev pointer is accessed only by updaters, who need
> to supply sufficient synchronization.
>
> But what is this theory missing in practice?

Here is some context : I am helping triaging about 400 KCSAN data-race
splats in syzbot moderation queue.

Take a look at the timer related one in [1]

If we want to avoid potential load/store-tearing, minimall patch would be :

diff --git a/include/linux/list.h b/include/linux/list.h
index 85c92555e31f85f019354e54d6efb8e79c2aee17..9139803b851cc37bb759c8d7c12ee7e36c61f009
100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -761,7 +761,7 @@ static inline void __hlist_del(struct hlist_node *n)

WRITE_ONCE(*pprev, next);
if (next)
- next->pprev = pprev;
+ WRITE_ONCE(next->pprev, pprev);
}

static inline void hlist_del(struct hlist_node *n)
diff --git a/include/linux/timer.h b/include/linux/timer.h
index 1e6650ed066d5d28251b0bd385fc37ef94c96532..c7c8dd89f2797389ca96473e60c7297fd38d8259
100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -164,7 +164,7 @@ static inline void destroy_timer_on_stack(struct
timer_list *timer) { }
*/
static inline int timer_pending(const struct timer_list * timer)
{
- return timer->entry.pprev != NULL;
+ return READ_ONCE(timer->entry.pprev) != NULL;
}

extern void add_timer_on(struct timer_list *timer, int cpu);


But really many other WRITE_ONCE() would be needed in include/linux/list.h


[1]

BUG: KCSAN: data-race in del_timer / detach_if_pending

write to 0xffff88808697d870 of 8 bytes by task 10 on cpu 0:
__hlist_del include/linux/list.h:764 [inline]
detach_timer kernel/time/timer.c:815 [inline]
detach_if_pending+0xcd/0x2d0 kernel/time/timer.c:832
try_to_del_timer_sync+0x60/0xb0 kernel/time/timer.c:1226
del_timer_sync+0x6b/0xa0 kernel/time/timer.c:1365
schedule_timeout+0x2d2/0x6e0 kernel/time/timer.c:1896
rcu_gp_fqs_loop+0x37c/0x580 kernel/rcu/tree.c:1639
rcu_gp_kthread+0x143/0x230 kernel/rcu/tree.c:1799
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

read to 0xffff88808697d870 of 8 bytes by task 12060 on cpu 1:
del_timer+0x3b/0xb0 kernel/time/timer.c:1198
sk_stop_timer+0x25/0x60 net/core/sock.c:2845
inet_csk_clear_xmit_timers+0x69/0xa0 net/ipv4/inet_connection_sock.c:523
tcp_clear_xmit_timers include/net/tcp.h:606 [inline]
tcp_v4_destroy_sock+0xa3/0x3f0 net/ipv4/tcp_ipv4.c:2096
inet_csk_destroy_sock+0xf4/0x250 net/ipv4/inet_connection_sock.c:836
tcp_close+0x6f3/0x970 net/ipv4/tcp.c:2497
inet_release+0x86/0x100 net/ipv4/af_inet.c:427
__sock_release+0x85/0x160 net/socket.c:590
sock_close+0x24/0x30 net/socket.c:1268
__fput+0x1e1/0x520 fs/file_table.c:280
____fput+0x1f/0x30 fs/file_table.c:313
task_work_run+0xf6/0x130 kernel/task_work.c:113
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop+0x2b4/0x2c0 arch/x86/entry/common.c:163

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 12060 Comm: syz-executor.5 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
==================================================================

2019-11-07 16:14:04

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [tip: timers/core] hrtimer: Annotate lockless access to timer->state

On Thu, Nov 07, 2019 at 07:48:50AM -0800, Eric Dumazet wrote:
> On Thu, Nov 7, 2019 at 12:53 AM Paul E. McKenney <[email protected]> wrote:
> >
> > On Wed, Nov 06, 2019 at 02:59:36PM -0800, Eric Dumazet wrote:
> > > On Wed, Nov 6, 2019 at 2:53 PM Eric Dumazet <[email protected]> wrote:
> > > >
> > > > On Wed, Nov 6, 2019 at 2:24 PM tip-bot2 for Eric Dumazet
> > > > <[email protected]> wrote:
> > > > >
> > > > > The following commit has been merged into the timers/core branch of tip:
> > > > >
> > > > > Commit-ID: 56144737e67329c9aaed15f942d46a6302e2e3d8
> > > > > Gitweb: https://git.kernel.org/tip/56144737e67329c9aaed15f942d46a6302e2e3d8
> > > > > Author: Eric Dumazet <[email protected]>
> > > > > AuthorDate: Wed, 06 Nov 2019 09:48:04 -08:00
> > > > > Committer: Thomas Gleixner <[email protected]>
> > > > > CommitterDate: Wed, 06 Nov 2019 23:18:31 +01:00
> > > > >
> > > > > hrtimer: Annotate lockless access to timer->state
> > > > >
> > > >
> > > > I guess we also need to fix timer_pending(), since timer->entry.pprev
> > > > could change while we read it.
> > >
> > > It is interesting seeing hlist_add_head() has a WRITE_ONCE(h->first, n);,
> > > but no WRITE_ONCE() for the pprev change.
> > >
> > > The WRITE_ONCE() was added in commit 1c97be677f72b3c338312aecd36d8fff20322f32
> > > ("list: Use WRITE_ONCE() when adding to lists and hlists")
> >
> > The theory is that while the ->next pointer is concurrently accessed by
> > RCU readers, the ->pprev pointer is accessed only by updaters, who need
> > to supply sufficient synchronization.
> >
> > But what is this theory missing in practice?
>
> Here is some context : I am helping triaging about 400 KCSAN data-race
> splats in syzbot moderation queue.
>
> Take a look at the timer related one in [1]
>
> If we want to avoid potential load/store-tearing, minimall patch would be :
>
> diff --git a/include/linux/list.h b/include/linux/list.h
> index 85c92555e31f85f019354e54d6efb8e79c2aee17..9139803b851cc37bb759c8d7c12ee7e36c61f009
> 100644
> --- a/include/linux/list.h
> +++ b/include/linux/list.h
> @@ -761,7 +761,7 @@ static inline void __hlist_del(struct hlist_node *n)
>
> WRITE_ONCE(*pprev, next);
> if (next)
> - next->pprev = pprev;
> + WRITE_ONCE(next->pprev, pprev);
> }
>
> static inline void hlist_del(struct hlist_node *n)
> diff --git a/include/linux/timer.h b/include/linux/timer.h
> index 1e6650ed066d5d28251b0bd385fc37ef94c96532..c7c8dd89f2797389ca96473e60c7297fd38d8259
> 100644
> --- a/include/linux/timer.h
> +++ b/include/linux/timer.h
> @@ -164,7 +164,7 @@ static inline void destroy_timer_on_stack(struct
> timer_list *timer) { }
> */
> static inline int timer_pending(const struct timer_list * timer)
> {
> - return timer->entry.pprev != NULL;
> + return READ_ONCE(timer->entry.pprev) != NULL;
> }
>
> extern void add_timer_on(struct timer_list *timer, int cpu);
>
> But really many other WRITE_ONCE() would be needed in include/linux/list.h
>
> [1]
>
> BUG: KCSAN: data-race in del_timer / detach_if_pending
>
> write to 0xffff88808697d870 of 8 bytes by task 10 on cpu 0:
> __hlist_del include/linux/list.h:764 [inline]
> detach_timer kernel/time/timer.c:815 [inline]
> detach_if_pending+0xcd/0x2d0 kernel/time/timer.c:832
> try_to_del_timer_sync+0x60/0xb0 kernel/time/timer.c:1226
> del_timer_sync+0x6b/0xa0 kernel/time/timer.c:1365
> schedule_timeout+0x2d2/0x6e0 kernel/time/timer.c:1896
> rcu_gp_fqs_loop+0x37c/0x580 kernel/rcu/tree.c:1639
> rcu_gp_kthread+0x143/0x230 kernel/rcu/tree.c:1799
> kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
>
> read to 0xffff88808697d870 of 8 bytes by task 12060 on cpu 1:
> del_timer+0x3b/0xb0 kernel/time/timer.c:1198
> sk_stop_timer+0x25/0x60 net/core/sock.c:2845
> inet_csk_clear_xmit_timers+0x69/0xa0 net/ipv4/inet_connection_sock.c:523
> tcp_clear_xmit_timers include/net/tcp.h:606 [inline]
> tcp_v4_destroy_sock+0xa3/0x3f0 net/ipv4/tcp_ipv4.c:2096
> inet_csk_destroy_sock+0xf4/0x250 net/ipv4/inet_connection_sock.c:836
> tcp_close+0x6f3/0x970 net/ipv4/tcp.c:2497
> inet_release+0x86/0x100 net/ipv4/af_inet.c:427
> __sock_release+0x85/0x160 net/socket.c:590
> sock_close+0x24/0x30 net/socket.c:1268
> __fput+0x1e1/0x520 fs/file_table.c:280
> ____fput+0x1f/0x30 fs/file_table.c:313
> task_work_run+0xf6/0x130 kernel/task_work.c:113
> tracehook_notify_resume include/linux/tracehook.h:188 [inline]
> exit_to_usermode_loop+0x2b4/0x2c0 arch/x86/entry/common.c:163
>
> Reported by Kernel Concurrency Sanitizer on:
> CPU: 1 PID: 12060 Comm: syz-executor.5 Not tainted 5.4.0-rc3+ #0
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
> ==================================================================

OK, so this is due to timer_pending() lockless access to ->entry.pprev
to determine whether or not the timer is on the list. New one on me!

Given that use case, I don't have an objection to your patch to list.h.

Except...

Would it make sense to add a READ_ONCE() to hlist_unhashed()
and to then make timer_pending() invoke hlist_unhashed()? That
would better confine the needed uses of READ_ONCE().

Thanx, Paul

2019-11-07 16:36:39

by Eric Dumazet

[permalink] [raw]
Subject: Re: [tip: timers/core] hrtimer: Annotate lockless access to timer->state

On Thu, Nov 7, 2019 at 8:11 AM Paul E. McKenney <[email protected]> wrote:
>
> OK, so this is due to timer_pending() lockless access to ->entry.pprev
> to determine whether or not the timer is on the list. New one on me!
>
> Given that use case, I don't have an objection to your patch to list.h.
>
> Except...
>
> Would it make sense to add a READ_ONCE() to hlist_unhashed()
> and to then make timer_pending() invoke hlist_unhashed()? That
> would better confine the needed uses of READ_ONCE().

Sounds good to me, I had the same idea but was too lazy to look at the
history of timer_pending()
to check if the pprev pointer check was really the same underlying idea.

2019-11-07 16:41:48

by Eric Dumazet

[permalink] [raw]
Subject: Re: [tip: timers/core] hrtimer: Annotate lockless access to timer->state

On Thu, Nov 7, 2019 at 8:35 AM Eric Dumazet <[email protected]> wrote:
>
> On Thu, Nov 7, 2019 at 8:11 AM Paul E. McKenney <[email protected]> wrote:
> >
> > OK, so this is due to timer_pending() lockless access to ->entry.pprev
> > to determine whether or not the timer is on the list. New one on me!
> >
> > Given that use case, I don't have an objection to your patch to list.h.
> >
> > Except...
> >
> > Would it make sense to add a READ_ONCE() to hlist_unhashed()
> > and to then make timer_pending() invoke hlist_unhashed()? That
> > would better confine the needed uses of READ_ONCE().
>
> Sounds good to me, I had the same idea but was too lazy to look at the
> history of timer_pending()
> to check if the pprev pointer check was really the same underlying idea.

Note that forcing READ_ONCE() in hlist_unhashed() might force the compiler
to read the pprev pointer twice in some cases.

This was one of the reason for me to add skb_queue_empty_lockless()
variant in include/linux/skbuff.h

/**
* skb_queue_empty_lockless - check if a queue is empty
* @list: queue head
*
* Returns true if the queue is empty, false otherwise.
* This variant can be used in lockless contexts.
*/
static inline bool skb_queue_empty_lockless(const struct sk_buff_head *list)
{
return READ_ONCE(list->next) == (const struct sk_buff *) list;
}

So maybe add a hlist_unhashed_lockless() to clearly document why
callers are using the lockless variant ?

2019-11-07 16:57:31

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [tip: timers/core] hrtimer: Annotate lockless access to timer->state

On Thu, Nov 07, 2019 at 08:39:42AM -0800, Eric Dumazet wrote:
> On Thu, Nov 7, 2019 at 8:35 AM Eric Dumazet <[email protected]> wrote:
> >
> > On Thu, Nov 7, 2019 at 8:11 AM Paul E. McKenney <[email protected]> wrote:
> > >
> > > OK, so this is due to timer_pending() lockless access to ->entry.pprev
> > > to determine whether or not the timer is on the list. New one on me!
> > >
> > > Given that use case, I don't have an objection to your patch to list.h.
> > >
> > > Except...
> > >
> > > Would it make sense to add a READ_ONCE() to hlist_unhashed()
> > > and to then make timer_pending() invoke hlist_unhashed()? That
> > > would better confine the needed uses of READ_ONCE().
> >
> > Sounds good to me, I had the same idea but was too lazy to look at the
> > history of timer_pending()
> > to check if the pprev pointer check was really the same underlying idea.
>
> Note that forcing READ_ONCE() in hlist_unhashed() might force the compiler
> to read the pprev pointer twice in some cases.
>
> This was one of the reason for me to add skb_queue_empty_lockless()
> variant in include/linux/skbuff.h

Ouch!

> /**
> * skb_queue_empty_lockless - check if a queue is empty
> * @list: queue head
> *
> * Returns true if the queue is empty, false otherwise.
> * This variant can be used in lockless contexts.
> */
> static inline bool skb_queue_empty_lockless(const struct sk_buff_head *list)
> {
> return READ_ONCE(list->next) == (const struct sk_buff *) list;
> }
>
> So maybe add a hlist_unhashed_lockless() to clearly document why
> callers are using the lockless variant ?

That sounds like a reasonable approach to me. There aren't all that
many uses of hlist_unhashed(), so a name change should not be a problem.

Thanx, Paul

2019-11-07 17:03:44

by Eric Dumazet

[permalink] [raw]
Subject: Re: [tip: timers/core] hrtimer: Annotate lockless access to timer->state

On Thu, Nov 7, 2019 at 8:54 AM Paul E. McKenney <[email protected]> wrote:
>
> On Thu, Nov 07, 2019 at 08:39:42AM -0800, Eric Dumazet wrote:
> > On Thu, Nov 7, 2019 at 8:35 AM Eric Dumazet <[email protected]> wrote:
> > >
> > > On Thu, Nov 7, 2019 at 8:11 AM Paul E. McKenney <[email protected]> wrote:
> > > >
> > > > OK, so this is due to timer_pending() lockless access to ->entry.pprev
> > > > to determine whether or not the timer is on the list. New one on me!
> > > >
> > > > Given that use case, I don't have an objection to your patch to list.h.
> > > >
> > > > Except...
> > > >
> > > > Would it make sense to add a READ_ONCE() to hlist_unhashed()
> > > > and to then make timer_pending() invoke hlist_unhashed()? That
> > > > would better confine the needed uses of READ_ONCE().
> > >
> > > Sounds good to me, I had the same idea but was too lazy to look at the
> > > history of timer_pending()
> > > to check if the pprev pointer check was really the same underlying idea.
> >
> > Note that forcing READ_ONCE() in hlist_unhashed() might force the compiler
> > to read the pprev pointer twice in some cases.
> >
> > This was one of the reason for me to add skb_queue_empty_lockless()
> > variant in include/linux/skbuff.h
>
> Ouch!
>
> > /**
> > * skb_queue_empty_lockless - check if a queue is empty
> > * @list: queue head
> > *
> > * Returns true if the queue is empty, false otherwise.
> > * This variant can be used in lockless contexts.
> > */
> > static inline bool skb_queue_empty_lockless(const struct sk_buff_head *list)
> > {
> > return READ_ONCE(list->next) == (const struct sk_buff *) list;
> > }
> >
> > So maybe add a hlist_unhashed_lockless() to clearly document why
> > callers are using the lockless variant ?
>
> That sounds like a reasonable approach to me. There aren't all that
> many uses of hlist_unhashed(), so a name change should not be a problem.

Maybe I was not clear : I did not rename skb_queue_empty()
I chose to add another helper.

Contexts that can safely use skb_queue_empty() still continue to use
it, since it might help
the compiler to generate better code.

So If I add hlist_unhashed_lockless(), I would only use it from
timer_pending() at first.

Then an audit of the code might reveal other potential users.

2019-11-07 17:08:50

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [tip: timers/core] hrtimer: Annotate lockless access to timer->state

On Thu, Nov 07, 2019 at 08:59:49AM -0800, Eric Dumazet wrote:
> On Thu, Nov 7, 2019 at 8:54 AM Paul E. McKenney <[email protected]> wrote:
> >
> > On Thu, Nov 07, 2019 at 08:39:42AM -0800, Eric Dumazet wrote:
> > > On Thu, Nov 7, 2019 at 8:35 AM Eric Dumazet <[email protected]> wrote:
> > > >
> > > > On Thu, Nov 7, 2019 at 8:11 AM Paul E. McKenney <[email protected]> wrote:
> > > > >
> > > > > OK, so this is due to timer_pending() lockless access to ->entry.pprev
> > > > > to determine whether or not the timer is on the list. New one on me!
> > > > >
> > > > > Given that use case, I don't have an objection to your patch to list.h.
> > > > >
> > > > > Except...
> > > > >
> > > > > Would it make sense to add a READ_ONCE() to hlist_unhashed()
> > > > > and to then make timer_pending() invoke hlist_unhashed()? That
> > > > > would better confine the needed uses of READ_ONCE().
> > > >
> > > > Sounds good to me, I had the same idea but was too lazy to look at the
> > > > history of timer_pending()
> > > > to check if the pprev pointer check was really the same underlying idea.
> > >
> > > Note that forcing READ_ONCE() in hlist_unhashed() might force the compiler
> > > to read the pprev pointer twice in some cases.
> > >
> > > This was one of the reason for me to add skb_queue_empty_lockless()
> > > variant in include/linux/skbuff.h
> >
> > Ouch!
> >
> > > /**
> > > * skb_queue_empty_lockless - check if a queue is empty
> > > * @list: queue head
> > > *
> > > * Returns true if the queue is empty, false otherwise.
> > > * This variant can be used in lockless contexts.
> > > */
> > > static inline bool skb_queue_empty_lockless(const struct sk_buff_head *list)
> > > {
> > > return READ_ONCE(list->next) == (const struct sk_buff *) list;
> > > }
> > >
> > > So maybe add a hlist_unhashed_lockless() to clearly document why
> > > callers are using the lockless variant ?
> >
> > That sounds like a reasonable approach to me. There aren't all that
> > many uses of hlist_unhashed(), so a name change should not be a problem.
>
> Maybe I was not clear : I did not rename skb_queue_empty()
> I chose to add another helper.
>
> Contexts that can safely use skb_queue_empty() still continue to use
> it, since it might help
> the compiler to generate better code.
>
> So If I add hlist_unhashed_lockless(), I would only use it from
> timer_pending() at first.
>
> Then an audit of the code might reveal other potential users.

OK, yes, that approach does make more sense, and thank you for the
clarification.

Thanx, Paul