tick_do_timer_cpu could be accessed concurrently where both plain writes
and plain reads are not protected by a lock. Thus, it could result in
data races. Fix them by adding pairs of READ|WRITE_ONCE(). The data
races were reported by KCSAN,
write to 0xffffffffb2dc94ac of 4 bytes by interrupt on cpu 98:
tick_sched_do_timer+0x77/0x90
tick_sched_do_timer at kernel/time/tick-sched.c:136
tick_sched_timer+0x35/0xc0
__hrtimer_run_queues+0x217/0x7c0
hrtimer_interrupt+0x1d4/0x3e0
smp_apic_timer_interrupt+0x107/0x460
apic_timer_interrupt+0xf/0x20
cpuidle_enter_state+0x15e/0x980
cpuidle_enter+0x69/0xc0
call_cpuidle+0x23/0x40
do_idle+0x248/0x280
cpu_startup_entry+0x1d/0x1f
start_secondary+0x1b2/0x230
secondary_startup_64+0xb6/0xc0
read to 0xffffffffb2dc94ac of 4 bytes by interrupt on cpu 67:
tick_sched_do_timer+0x31/0x90
tick_sched_do_timer at kernel/time/tick-sched.c:132
tick_sched_timer+0x35/0xc0
__hrtimer_run_queues+0x217/0x7c0
hrtimer_interrupt+0x1d4/0x3e0
smp_apic_timer_interrupt+0x107/0x460
apic_timer_interrupt+0xf/0x20
cpuidle_enter_state+0x15e/0x980
cpuidle_enter+0x69/0xc0
call_cpuidle+0x23/0x40
do_idle+0x248/0x280
cpu_startup_entry+0x1d/0x1f
start_secondary+0x1b2/0x230
secondary_startup_64+0xb6/0xc0
read to 0xffffffffb2dc94ac of 4 bytes by task 0 on cpu 107:
tick_nohz_idle_stop_tick+0x149/0x5f0
tick_nohz_stop_tick at kernel/time/tick-sched.c:774
(inlined by) __tick_nohz_idle_stop_tick at kernel/time/tick-sched.c:967
(inlined by) tick_nohz_idle_stop_tick at kernel/time/tick-sched.c:988
do_idle+0x235/0x280
cpu_startup_entry+0x1d/0x1f
start_secondary+0x1b2/0x230
secondary_startup_64+0xb6/0xc0
read to 0xffffffffb2dc94ac of 4 bytes by task 0 on cpu 21:
tick_nohz_next_event+0x19b/0x2c0
tick_nohz_next_event at kernel/time/tick-sched.c:740
tick_nohz_get_sleep_length+0xae/0xe0
menu_select+0x8b/0xc29
cpuidle_select+0x50/0x70
do_idle+0x214/0x280
cpu_startup_entry+0x1d/0x1f
start_secondary+0x1b2/0x230
secondary_startup_64+0xb6/0xc0
Signed-off-by: Qian Cai <[email protected]>
---
kernel/time/tick-sched.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index a792d21cac64..54bcf4eff238 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -129,16 +129,16 @@ static void tick_sched_do_timer(struct tick_sched *ts, ktime_t now)
* If nohz_full is enabled, this should not happen because the
* tick_do_timer_cpu never relinquishes.
*/
- if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)) {
+ if (unlikely(READ_ONCE(tick_do_timer_cpu) == TICK_DO_TIMER_NONE)) {
#ifdef CONFIG_NO_HZ_FULL
WARN_ON(tick_nohz_full_running);
#endif
- tick_do_timer_cpu = cpu;
+ WRITE_ONCE(tick_do_timer_cpu, cpu);
}
#endif
/* Check, if the jiffies need an update */
- if (tick_do_timer_cpu == cpu)
+ if (READ_ONCE(tick_do_timer_cpu) == cpu)
tick_do_update_jiffies64(now);
if (ts->inidle)
@@ -737,8 +737,9 @@ static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu)
* Otherwise we can sleep as long as we want.
*/
delta = timekeeping_max_deferment();
- if (cpu != tick_do_timer_cpu &&
- (tick_do_timer_cpu != TICK_DO_TIMER_NONE || !ts->do_timer_last))
+ if (cpu != READ_ONCE(tick_do_timer_cpu) &&
+ (READ_ONCE(tick_do_timer_cpu) != TICK_DO_TIMER_NONE ||
+ !ts->do_timer_last))
delta = KTIME_MAX;
/* Calculate the next expiry time */
@@ -771,10 +772,10 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu)
* do_timer() never invoked. Keep track of the fact that it
* was the one which had the do_timer() duty last.
*/
- if (cpu == tick_do_timer_cpu) {
- tick_do_timer_cpu = TICK_DO_TIMER_NONE;
+ if (cpu == READ_ONCE(tick_do_timer_cpu)) {
+ WRITE_ONCE(tick_do_timer_cpu, TICK_DO_TIMER_NONE);
ts->do_timer_last = 1;
- } else if (tick_do_timer_cpu != TICK_DO_TIMER_NONE) {
+ } else if (READ_ONCE(tick_do_timer_cpu) != TICK_DO_TIMER_NONE) {
ts->do_timer_last = 0;
}
--
2.21.0 (Apple Git-122.2)
Qian,
Qian Cai <[email protected]> writes:
> tick_do_timer_cpu could be accessed concurrently where both plain writes
> and plain reads are not protected by a lock. Thus, it could result in
> data races. Fix them by adding pairs of READ|WRITE_ONCE(). The data
> races were reported by KCSAN,
They are reported, but are they actually a real problem?
This completely lacks analysis why these 8 places need the
READ/WRITE_ONCE() treatment at all and if so why the other 14 places
accessing tick_do_timer_cpu are safe without it.
I definitely appreciate the work done with KCSAN, but just making the
tool shut up does not cut it.
Thanks,
tglx
On Wed, Mar 04, 2020 at 10:39:41AM +0100, Thomas Gleixner wrote:
> Qian,
>
> Qian Cai <[email protected]> writes:
> > tick_do_timer_cpu could be accessed concurrently where both plain writes
> > and plain reads are not protected by a lock. Thus, it could result in
> > data races. Fix them by adding pairs of READ|WRITE_ONCE(). The data
> > races were reported by KCSAN,
>
> They are reported, but are they actually a real problem?
>
> This completely lacks analysis why these 8 places need the
> READ/WRITE_ONCE() treatment at all and if so why the other 14 places
> accessing tick_do_timer_cpu are safe without it.
>
> I definitely appreciate the work done with KCSAN, but just making the
> tool shut up does not cut it.
Worse:
+ if (cpu != READ_ONCE(tick_do_timer_cpu) &&
+ (READ_ONCE(tick_do_timer_cpu) != TICK_DO_TIMER_NONE ||
Doing that read twice is just utterly stupid. And the patch is full of
that :/
Please stop this non-thinking 'fix' generation!
> On Mar 4, 2020, at 4:39 AM, Thomas Gleixner <[email protected]> wrote:
>
> They are reported, but are they actually a real problem?
>
> This completely lacks analysis why these 8 places need the
> READ/WRITE_ONCE() treatment at all and if so why the other 14 places
> accessing tick_do_timer_cpu are safe without it.
>
> I definitely appreciate the work done with KCSAN, but just making the
> tool shut up does not cut it.
Looks at tick_sched_do_timer(), for example,
if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)) {
#ifdef CONFIG_NO_HZ_FULL
WARN_ON(tick_nohz_full_running);
#endif
tick_do_timer_cpu = cpu;
}
#endif
/* Check, if the jiffies need an update */
if (tick_do_timer_cpu == cpu)
tick_do_update_jiffies64(now);
Could we rule out all compilers and archs will not optimize it if !CONFIG_NO_HZ_FULL to,
if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE) || tick_do_timer_cpu == cpu)
tick_do_update_jiffies64(now);
So it could save a branch or/and realized that tick_do_timer_cpu is not used later in this cpu, so it could discard the store?
I am not all that familiar with all other 14 places if it is possible to happen concurrently as well, so I took a pragmatic approach that for now only deal with the places that KCSAN confirmed, and then look forward for an incremental approach if there are more places needs treatments later once confirmed.
> On Mar 4, 2020, at 4:56 AM, Peter Zijlstra <[email protected]> wrote:
>
> Worse:
>
> + if (cpu != READ_ONCE(tick_do_timer_cpu) &&
> + (READ_ONCE(tick_do_timer_cpu) != TICK_DO_TIMER_NONE ||
>
> Doing that read twice is just utterly stupid. And the patch is full of
> that :/
>
> Please stop this non-thinking 'fix' generation!
That is an oversight that could easily be fixed. However, blaming about “non-thinking” is rather harsh. Do you know I had spent weeks reading about memory-model.txt and alike just to get better at it?
On Wed, 2020-03-04 at 06:20 -0500, Qian Cai wrote:
> > On Mar 4, 2020, at 4:39 AM, Thomas Gleixner <[email protected]> wrote:
> >
> > They are reported, but are they actually a real problem?
> >
> > This completely lacks analysis why these 8 places need the
> > READ/WRITE_ONCE() treatment at all and if so why the other 14 places
> > accessing tick_do_timer_cpu are safe without it.
> >
> > I definitely appreciate the work done with KCSAN, but just making the
> > tool shut up does not cut it.
>
> Looks at tick_sched_do_timer(), for example,
>
> if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)) {
> #ifdef CONFIG_NO_HZ_FULL
> WARN_ON(tick_nohz_full_running);
> #endif
> tick_do_timer_cpu = cpu;
> }
> #endif
>
> /* Check, if the jiffies need an update */
> if (tick_do_timer_cpu == cpu)
> tick_do_update_jiffies64(now);
>
> Could we rule out all compilers and archs will not optimize it if !CONFIG_NO_HZ_FULL to,
>
> if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE) || tick_do_timer_cpu == cpu)
> tick_do_update_jiffies64(now);
>
> So it could save a branch or/and realized that tick_do_timer_cpu is not used later in this cpu, so it could discard the store?
>
> I am not all that familiar with all other 14 places if it is possible to happen concurrently as well, so I took a pragmatic approach that for now only deal with the places that KCSAN confirmed, and then look forward for an incremental approach if there are more places needs treatments later once confirmed.
If you don't think that will be happening and dislike using READ/WRITE_ONCE()
for documentation purpose, we could annotate those using the data_race() macro.
Is that more acceptable?
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index a792d21cac64..08ce4088da87 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -129,7 +129,7 @@ static void tick_sched_do_timer(struct tick_sched *ts,
ktime_t now)
* If nohz_full is enabled, this should not happen because the
* tick_do_timer_cpu never relinquishes.
*/
- if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)) {
+ if (unlikely(data_race(tick_do_timer_cpu == TICK_DO_TIMER_NONE))) {
#ifdef CONFIG_NO_HZ_FULL
WARN_ON(tick_nohz_full_running);
#endif
@@ -138,7 +138,7 @@ static void tick_sched_do_timer(struct tick_sched *ts,
ktime_t now)
#endif
/* Check, if the jiffies need an update */
- if (tick_do_timer_cpu == cpu)
+ if (data_race(tick_do_timer_cpu == cpu))
tick_do_update_jiffies64(now);
if (ts->inidle)
@@ -737,8 +737,9 @@ static ktime_t tick_nohz_next_event(struct tick_sched *ts,
int cpu)
* Otherwise we can sleep as long as we want.
*/
delta = timekeeping_max_deferment();
- if (cpu != tick_do_timer_cpu &&
- (tick_do_timer_cpu != TICK_DO_TIMER_NONE || !ts->do_timer_last))
+ if (data_race(cpu != tick_do_timer_cpu) &&
+ (data_race(tick_do_timer_cpu != TICK_DO_TIMER_NONE) ||
+ !ts->do_timer_last))
delta = KTIME_MAX;
/* Calculate the next expiry time */
@@ -771,10 +772,10 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int
cpu)
* do_timer() never invoked. Keep track of the fact that it
* was the one which had the do_timer() duty last.
*/
- if (cpu == tick_do_timer_cpu) {
+ if (data_race(cpu == tick_do_timer_cpu)) {
tick_do_timer_cpu = TICK_DO_TIMER_NONE;
ts->do_timer_last = 1;
- } else if (tick_do_timer_cpu != TICK_DO_TIMER_NONE) {
+ } else if (data_race(tick_do_timer_cpu != TICK_DO_TIMER_NONE)) {
ts->do_timer_last = 0;
}
Qian Cai <[email protected]> writes:
> On Wed, 2020-03-04 at 06:20 -0500, Qian Cai wrote:
>> Looks at tick_sched_do_timer(), for example,
>>
>> if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)) {
>> #ifdef CONFIG_NO_HZ_FULL
>> WARN_ON(tick_nohz_full_running);
>> #endif
>> tick_do_timer_cpu = cpu;
>> }
>> #endif
>>
>> /* Check, if the jiffies need an update */
>> if (tick_do_timer_cpu == cpu)
>> tick_do_update_jiffies64(now);
>>
>> Could we rule out all compilers and archs will not optimize it if !CONFIG_NO_HZ_FULL to,
>>
>> if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE) || tick_do_timer_cpu == cpu)
>> tick_do_update_jiffies64(now);
>>
>> So it could save a branch or/and realized that tick_do_timer_cpu is
>> not used later in this cpu, so it could discard the store?
Sorry, I can't make any sense of this.
If a compiler optimizes this in the way you are describing then it is
seriously broken. If that happens then lots of code will just fall apart
in colourful ways.
>> I am not all that familiar with all other 14 places if it is possible
>> to happen concurrently as well, so I took a pragmatic approach that
>> for now only deal with the places that KCSAN confirmed, and then look
>> forward for an incremental approach if there are more places needs
>> treatments later once confirmed.
Stop this please. I just wasted days to understand why code was
sprinkled with random rcu*enter/exit() invocations until I figured out
that the reason was exactly what you are proposing:
Curing the symptoms
That's the worst engineering principle ever, really.
Just shutting up the tool without doing a proper analysis is a
guaranteed recipe for disaster because it is going to paper over real or
latent bugs. Read the git log history carefully to find prime examples
for this.
So no, neither shutting up just the places KCSAN complained about and
ignoring the rest nor your new proposal
> If you don't think that will be happening and dislike using READ/WRITE_ONCE()
> for documentation purpose, we could annotate those using the data_race() macro.
> Is that more acceptable?
is leading anywhere. It's both based on cargo cult programming and lacks
proper root cause analysis.
I'm surely not ignoring the output of KCSAN, but please understand that
any attempt to silence its output purely based on mechanical duct tape
is not going anywhere.
I wasted actually months in the past to decode a subtle wreckage which
was caused by the back then 'silence uninitilized variable warnings'
frenzy. One of these blindly applied annotations papered over a real
bug and made it horribly hard to find.
As I'm busy with other urgent nightmares right now, this has to wait
until I get some spare cycles or you come up with some proper analysis.
Thanks,
tglx