2020-12-06 21:25:37

by Thomas Gleixner

[permalink] [raw]
Subject: [patch 0/3] tick: Annotate and document the intentionaly racy tick_do_timer_cpu

There have been several reports about KCSAN complaints vs. the racy access
to tick_do_timer_cpu. The syzbot moderation queue has three different
patterns all related to this. There are a few more...

As I know that this is intentional and safe, I did not pay much attention
to it, but Marco actually made me feel bad a few days ago as he explained
that these intentional races generate too much noise to get to the
dangerous ones.

There was an earlier attempt to just silence KCSAN by slapping READ/WRITE
once all over the place without even the faintiest attempt of reasoning,
which is definitely the wrong thing to do.

The bad thing about tick_do_timer_cpu is that its only barely documented
why it is safe and works at all, which makes it extremly hard for someone
not really familiar with the code to come up with reasoning.

So Marco made me fast forward that item in my todo list and I have to admit
that it would have been damned helpful if that Gleixner dude would have
added proper comments in the first place. Would have spared a lot of brain
twisting. :)

Staring at all usage sites unearthed a few silly things which are cleaned
up upfront. The actual annotation uses data_race() with proper comments as
READ/WRITE_ONCE() does not really buy anything under the assumption that
the compiler does not play silly buggers and tears the 32bit stores/loads
into byte wise ones. But even that would cause just potentially shorter
idle sleeps in the worst case and not a complete malfunction.

Thanks,

tglx
----
tick-common.c | 55 +++++++++++++++++++++++++++------
tick-sched.c | 96 ++++++++++++++++++++++++++++++++++++++++++----------------
2 files changed, 117 insertions(+), 34 deletions(-)


2020-12-07 11:08:19

by Marco Elver

[permalink] [raw]
Subject: Re: [patch 0/3] tick: Annotate and document the intentionaly racy tick_do_timer_cpu

On Sun, 6 Dec 2020 at 22:21, Thomas Gleixner <[email protected]> wrote:
> There have been several reports about KCSAN complaints vs. the racy access
> to tick_do_timer_cpu. The syzbot moderation queue has three different
> patterns all related to this. There are a few more...
>
> As I know that this is intentional and safe, I did not pay much attention
> to it, but Marco actually made me feel bad a few days ago as he explained
> that these intentional races generate too much noise to get to the
> dangerous ones.

My strategy so far was to inspect random data races and decide which
ones might be more interesting and send those, but I haven't had time
to chase data races the past few months. Thus, getting rid of the
intentional boring ones will definitely scale better -- relying on a
human to do filtering really is suboptimal. :-)

> There was an earlier attempt to just silence KCSAN by slapping READ/WRITE
> once all over the place without even the faintiest attempt of reasoning,
> which is definitely the wrong thing to do.
>
> The bad thing about tick_do_timer_cpu is that its only barely documented
> why it is safe and works at all, which makes it extremly hard for someone
> not really familiar with the code to come up with reasoning.
>
> So Marco made me fast forward that item in my todo list and I have to admit
> that it would have been damned helpful if that Gleixner dude would have
> added proper comments in the first place. Would have spared a lot of brain
> twisting. :)
>
> Staring at all usage sites unearthed a few silly things which are cleaned
> up upfront. The actual annotation uses data_race() with proper comments as
> READ/WRITE_ONCE() does not really buy anything under the assumption that
> the compiler does not play silly buggers and tears the 32bit stores/loads
> into byte wise ones. But even that would cause just potentially shorter
> idle sleeps in the worst case and not a complete malfunction.

Ack -- thanks for marking the accesses!

Thanks,
-- Marco