2023-05-21 06:22:40

by Mike Galbraith

[permalink] [raw]
Subject: r8169: disable ASPM during NAPI poll locking changes --> BUG: Invalid wait context --> PREEMPT_RT pain

Greetings,

The locks added to prep for $subject induce invalid context moaning due
to not being raw locks, but if I do that, the hold time does very bad
things to RT. Nouveau apparently didn't appreciate ftrace recording
the 160us latency hit all that much either, because my GUI ceased
responding, and I had to ssh in to find out what happened.

[ 783.226457] nouveau 0000:01:00.0: ce2: LAUNCHERR 0000000a [UNUSED_FIELD]
[ 787.530456] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 787.530501] nouveau 0000:01:00.0: fifo:000000:0002:[X[2677]] rc scheduled
[ 787.530504] nouveau 0000:01:00.0: fifo:000000: rc scheduled
[ 787.530516] nouveau 0000:01:00.0: fifo:000000:0002:0002:[X[2677]] errored - disabling channel
[ 787.530522] nouveau 0000:01:00.0: X[2677]: channel 2 killed!

# tracer: wakeup_RT
#
# wakeup_RT latency trace v1.1.5 on 6.4.0.gd635f6c-master-eevdf-rt
# --------------------------------------------------------------------
# latency: 160 us, #116/116, CPU#0 | (M:preempt_rt VP:0, KP:0, SP:0 HP:0 #P:8)
# -----------------
# | task: cyclictest-29050 (uid:0 nice:0 policy:1 rt_prio:99)
# -----------------
#
# _--------=> CPU#
# / _-------=> irqs-off/BH-disabled
# | / _------=> need-resched
# || / _-----=> need-resched-lazy
# ||| / _----=> hardirq/softirq
# |||| / _---=> preempt-depth
# ||||| / _--=> preempt-lazy-depth
# |||||| / _-=> migrate-disable
# ||||||| / delay
# cmd pid |||||||| time | caller
# \ / |||||||| \ | /
<idle>-0 0dn.h5.. 0us : 0:120:R + [000] 29050: 0:R cyclictest
<idle>-0 0dn.h5.. 1us : 0
<idle>-0 0dn.h4.. 1us : task_woken_rt <-ttwu_do_activate
<idle>-0 0dn.h4.. 1us : preempt_count_sub <-__raw_spin_unlock
<idle>-0 0dn.h3.. 1us : _raw_spin_unlock_irqrestore <-try_to_wake_up
<idle>-0 0dn.h3.. 1us : preempt_count_sub <-_raw_spin_unlock_irqrestore
<idle>-0 0dn.h2.. 2us : preempt_count_sub <-try_to_wake_up
<idle>-0 0dn.h1.. 2us : _raw_spin_lock_irq <-__hrtimer_run_queues
<idle>-0 0dn.h1.. 2us : preempt_count_add <-_raw_spin_lock_irq
<idle>-0 0dn.h2.. 2us : hrtimer_update_next_event <-hrtimer_interrupt
<idle>-0 0dn.h2.. 2us : __hrtimer_get_next_event <-hrtimer_update_next_event
<idle>-0 0dn.h2.. 2us : __hrtimer_next_event_base <-__hrtimer_get_next_event
<idle>-0 0dn.h2.. 3us : __hrtimer_get_next_event <-hrtimer_update_next_event
<idle>-0 0dn.h2.. 3us : __hrtimer_next_event_base <-hrtimer_update_next_event
<idle>-0 0dn.h2.. 3us : _raw_spin_unlock_irqrestore <-hrtimer_interrupt
<idle>-0 0dn.h2.. 3us : preempt_count_sub <-_raw_spin_unlock_irqrestore
<idle>-0 0dn.h1.. 3us : tick_program_event <-hrtimer_interrupt
<idle>-0 0dn.h1.. 3us : clockevents_program_event <-hrtimer_interrupt
<idle>-0 0dn.h1.. 3us : ktime_get <-clockevents_program_event
<idle>-0 0dn.h1.. 3us : lapic_next_deadline <-clockevents_program_event
<idle>-0 0dn.h1.. 4us : irq_exit_rcu <-sysvec_apic_timer_interrupt
<idle>-0 0dn..1.. 4us : idle_cpu <-irq_exit_rcu
<idle>-0 0dn..1.. 4us : irq_enter_rcu <-common_interrupt
<idle>-0 0dn.h1.. 4us : tick_irq_enter <-irq_enter_rcu
<idle>-0 0dn.h1.. 4us : tick_check_oneshot_broadcast_this_cpu <-tick_irq_enter
<idle>-0 0dn.h1.. 5us : __common_interrupt <-common_interrupt
<idle>-0 0dn.h1.. 5us : handle_edge_irq <-__common_interrupt
<idle>-0 0dn.h1.. 5us : _raw_spin_lock <-handle_edge_irq
<idle>-0 0dn.h1.. 5us : preempt_count_add <-_raw_spin_lock
<idle>-0 0dn.h2.. 5us : irq_may_run <-handle_edge_irq
<idle>-0 0dn.h2.. 5us : irq_chip_ack_parent <-handle_edge_irq
<idle>-0 0dn.h2.. 5us : apic_ack_irq <-handle_edge_irq
<idle>-0 0dn.h2.. 6us : handle_irq_event <-handle_edge_irq
<idle>-0 0dn.h2.. 6us : preempt_count_sub <-handle_irq_event
<idle>-0 0dn.h1.. 6us : handle_irq_event_percpu <-handle_irq_event
<idle>-0 0dn.h1.. 6us : __handle_irq_event_percpu <-handle_irq_event_percpu
<idle>-0 0dn.h1.. 6us+: rtl8169_interrupt <-__handle_irq_event_percpu
<idle>-0 0dn.h1.. 81us : napi_schedule_prep <-rtl8169_interrupt
<idle>-0 0dn.h1.. 81us : rtl_unlock_config_regs <-rtl8169_interrupt
<idle>-0 0dn.h1.. 81us : _raw_spin_lock_irqsave <-rtl_unlock_config_regs
<idle>-0 0dn.h1.. 81us : preempt_count_add <-_raw_spin_lock_irqsave
<idle>-0 0dn.h2.. 82us : _raw_spin_unlock_irqrestore <-rtl8169_interrupt
<idle>-0 0dn.h2.. 82us : preempt_count_sub <-_raw_spin_unlock_irqrestore
<idle>-0 0dn.h1.. 82us : rtl_hw_aspm_clkreq_enable <-rtl8169_interrupt
<idle>-0 0dn.h1.. 82us : rtl_mod_config2 <-rtl_hw_aspm_clkreq_enable
<idle>-0 0dn.h1.. 82us : _raw_spin_lock_irqsave <-rtl_mod_config2
<idle>-0 0dn.h1.. 82us+: preempt_count_add <-_raw_spin_lock_irqsave
<idle>-0 0dn.h2.. 110us : _raw_spin_unlock_irqrestore <-rtl_hw_aspm_clkreq_enable
<idle>-0 0dn.h2.. 110us : preempt_count_sub <-_raw_spin_unlock_irqrestore
<idle>-0 0dn.h1.. 110us : rtl_mod_config5 <-rtl8169_interrupt
<idle>-0 0dn.h1.. 110us : _raw_spin_lock_irqsave <-rtl_mod_config5
<idle>-0 0dn.h1.. 110us+: preempt_count_add <-_raw_spin_lock_irqsave
<idle>-0 0dn.h2.. 149us : _raw_spin_unlock_irqrestore <-rtl8169_interrupt
<idle>-0 0dn.h2.. 150us : preempt_count_sub <-_raw_spin_unlock_irqrestore
<idle>-0 0dn.h1.. 150us : rtl_lock_config_regs <-rtl8169_interrupt
<idle>-0 0dn.h1.. 150us : _raw_spin_lock_irqsave <-rtl_lock_config_regs
<idle>-0 0dn.h1.. 150us : preempt_count_add <-_raw_spin_lock_irqsave
<idle>-0 0dn.h2.. 150us : _raw_spin_unlock_irqrestore <-rtl8169_interrupt
<idle>-0 0dn.h2.. 150us : preempt_count_sub <-_raw_spin_unlock_irqrestore
<idle>-0 0dn.h1.. 151us : __napi_schedule <-rtl8169_interrupt
<idle>-0 0dn.h1.. 151us : __raise_softirq_irqoff <-__napi_schedule
<idle>-0 0dn.h1.. 151us : add_interrupt_randomness <-handle_irq_event_percpu
<idle>-0 0dn.h1.. 151us : fast_mix <-add_interrupt_randomness
<idle>-0 0dn.h1.. 151us : note_interrupt <-handle_irq_event_percpu
<idle>-0 0dn.h1.. 151us : _raw_spin_lock <-handle_irq_event
<idle>-0 0dn.h1.. 152us : preempt_count_add <-_raw_spin_lock
<idle>-0 0dn.h2.. 152us : preempt_count_sub <-handle_edge_irq
<idle>-0 0dn.h1.. 152us : irq_exit_rcu <-common_interrupt
<idle>-0 0dn..1.. 152us : wakeup_softirqd <-irq_exit_rcu
<idle>-0 0dn..1.. 152us : wake_up_process <-irq_exit_rcu
<idle>-0 0dn..1.. 152us : try_to_wake_up <-irq_exit_rcu
<idle>-0 0dn..1.. 153us : preempt_count_add <-try_to_wake_up
<idle>-0 0dn..2.. 153us : _raw_spin_lock_irqsave <-try_to_wake_up
<idle>-0 0dn..2.. 153us : preempt_count_add <-_raw_spin_lock_irqsave
<idle>-0 0dn..3.. 153us : kthread_is_per_cpu <-is_cpu_allowed
<idle>-0 0dn..3.. 153us : ttwu_queue_wakelist <-try_to_wake_up
<idle>-0 0dn..3.. 153us : preempt_count_add <-try_to_wake_up
<idle>-0 0dn..4.. 154us : _raw_spin_lock <-try_to_wake_up
<idle>-0 0dn..4.. 154us : preempt_count_add <-_raw_spin_lock
<idle>-0 0dn..5.. 154us : preempt_count_sub <-try_to_wake_up
<idle>-0 0dn..4.. 154us : update_rq_clock <-try_to_wake_up
<idle>-0 0dn..4.. 155us : update_rq_clock.part.110 <-try_to_wake_up
<idle>-0 0dn..4.. 155us : ttwu_do_activate <-try_to_wake_up
<idle>-0 0dn..4.. 155us : activate_task <-ttwu_do_activate
<idle>-0 0dn..4.. 155us : enqueue_task_fair <-activate_task
<idle>-0 0dn..4.. 155us : update_curr <-enqueue_task_fair
<idle>-0 0dn..4.. 155us : __update_load_avg_se <-enqueue_task_fair
<idle>-0 0dn..4.. 156us : __update_load_avg_cfs_rq <-enqueue_task_fair
<idle>-0 0dn..4.. 156us : place_entity <-enqueue_task_fair
<idle>-0 0dn..4.. 156us : avg_vruntime <-place_entity
<idle>-0 0dn..4.. 156us : __enqueue_entity <-enqueue_task_fair
<idle>-0 0dn..4.. 156us : hrtick_update <-activate_task
<idle>-0 0dn..4.. 156us : check_preempt_curr <-ttwu_do_activate
<idle>-0 0dn..4.. 157us : resched_curr <-check_preempt_curr
<idle>-0 0dn..5.. 157us : __traceiter_sched_wakeup <-ttwu_do_activate
<idle>-0 0dn..4.. 157us : preempt_count_sub <-__raw_spin_unlock
<idle>-0 0dn..3.. 157us : _raw_spin_unlock_irqrestore <-try_to_wake_up
<idle>-0 0dn..3.. 157us : preempt_count_sub <-_raw_spin_unlock_irqrestore
<idle>-0 0dn..2.. 157us : preempt_count_sub <-try_to_wake_up
<idle>-0 0dn..1.. 158us : idle_cpu <-irq_exit_rcu
<idle>-0 0dn..1.. 158us : rcu_note_context_switch <-__schedule
<idle>-0 0dn..1.. 158us : rcu_qs <-rcu_note_context_switch
<idle>-0 0dn..1.. 158us : preempt_count_add <-__schedule
<idle>-0 0dn..2.. 158us : _raw_spin_lock <-__schedule
<idle>-0 0dn..2.. 159us : preempt_count_add <-_raw_spin_lock
<idle>-0 0dn..3.. 159us : preempt_count_sub <-__schedule
<idle>-0 0dn..2.. 159us : update_rq_clock <-__schedule
<idle>-0 0dn..2.. 159us : put_prev_task_idle <-__schedule
<idle>-0 0dn..2.. 159us : pick_next_task_stop <-__schedule
<idle>-0 0dn..2.. 159us : pick_next_task_dl <-__schedule
<idle>-0 0dn..2.. 159us : pick_next_task_rt <-__schedule
<idle>-0 0dn..2.. 160us : pick_next_rt_entity <-pick_next_task_rt
<idle>-0 0dn..2.. 160us : update_rt_rq_load_avg <-pick_next_task_rt
<idle>-0 0d...3.. 160us : __traceiter_sched_switch <-__schedule
<idle>-0 0d...3.. 160us : __traceiter_sched_switch
<idle>-0 0d...3.. 160us : 0:120:R ==> [000] 29050: 0:R cyclictest



2023-05-21 14:58:39

by Mike Galbraith

[permalink] [raw]
Subject: Re: r8169: disable ASPM during NAPI poll locking changes --> BUG: Invalid wait context --> PREEMPT_RT pain

On Sun, 2023-05-21 at 08:03 +0200, Mike Galbraith wrote:
> Greetings,
>
> The locks added to prep for $subject induce invalid context moaning due
> to not being raw locks, but if I do that, the hold time does very bad
> things to RT.

The locks aren't really 160us horrible. The GPU was enabled (oops),
nouveau then makes box horrible for RT (and graphics:). With GPU
turned back off, the largest trace I saw was 77us, which jibes pretty
well with new max of low to mid 50s without ftrace running, or up a tad
over 20us for this now pretty old commodity i7 desktop box.

-Mike

Subject: Re: r8169: disable ASPM during NAPI poll locking changes --> BUG: Invalid wait context --> PREEMPT_RT pain

On 2023-05-21 16:27:10 [+0200], Mike Galbraith wrote:
> On Sun, 2023-05-21 at 08:03 +0200, Mike Galbraith wrote:
> > Greetings,
> >
> > The locks added to prep for $subject induce invalid context moaning due
> > to not being raw locks, but if I do that, the hold time does very bad
> > things to RT.
>
> The locks aren't really 160us horrible. The GPU was enabled (oops),
> nouveau then makes box horrible for RT (and graphics:). With GPU
> turned back off, the largest trace I saw was 77us, which jibes pretty
> well with new max of low to mid 50s without ftrace running, or up a tad
> over 20us for this now pretty old commodity i7 desktop box.

So you are saying it is all good?
I've been looking at the r8169 the other day and it seemed all good
since it was only scheduling workqueues and NAPI. But now I see this
mac_ocp_lock + config25_lock which are acquire in hardirq context but
can't. This needs to be taken care.

> -Mike

Sebastian

Subject: Re: r8169: disable ASPM during NAPI poll locking changes --> BUG: Invalid wait context --> PREEMPT_RT pain

On 2023-05-22 12:32:38 [+0200], Heiner Kallweit wrote:
> What do you mean with "but can't"? Taking a spinlock in hardirq context
> is normal.

I'm crafting a patch, testing and will send it with full explanation.

> >
> >> -Mike

Sebastian

2023-05-22 10:50:33

by Heiner Kallweit

[permalink] [raw]
Subject: Re: r8169: disable ASPM during NAPI poll locking changes --> BUG: Invalid wait context --> PREEMPT_RT pain

On 22.05.2023 10:06, Sebastian Andrzej Siewior wrote:
> On 2023-05-21 16:27:10 [+0200], Mike Galbraith wrote:
>> On Sun, 2023-05-21 at 08:03 +0200, Mike Galbraith wrote:
>>> Greetings,
>>>
>>> The locks added to prep for $subject induce invalid context moaning due
>>> to not being raw locks, but if I do that, the hold time does very bad
>>> things to RT.
>>
>> The locks aren't really 160us horrible. The GPU was enabled (oops),
>> nouveau then makes box horrible for RT (and graphics:). With GPU
>> turned back off, the largest trace I saw was 77us, which jibes pretty
>> well with new max of low to mid 50s without ftrace running, or up a tad
>> over 20us for this now pretty old commodity i7 desktop box.
>
> So you are saying it is all good?
> I've been looking at the r8169 the other day and it seemed all good
> since it was only scheduling workqueues and NAPI. But now I see this
> mac_ocp_lock + config25_lock which are acquire in hardirq context but
> can't. This needs to be taken care.

What do you mean with "but can't"? Taking a spinlock in hardirq context
is normal.

>
>> -Mike
>
> Sebastian


2023-05-22 10:52:45

by Mike Galbraith

[permalink] [raw]
Subject: Re: r8169: disable ASPM during NAPI poll locking changes --> BUG: Invalid wait context --> PREEMPT_RT pain

On Mon, 2023-05-22 at 10:06 +0200, Sebastian Andrzej Siewior wrote:
> On 2023-05-21 16:27:10 [+0200], Mike Galbraith wrote:
> > On Sun, 2023-05-21 at 08:03 +0200, Mike Galbraith wrote:
> > > Greetings,
> > >
> > > The locks added to prep for $subject induce invalid context moaning due
> > > to not being raw locks, but if I do that, the hold time does very bad
> > > things to RT.
> >
> > The locks aren't really 160us horrible. The GPU was enabled (oops),
> > nouveau then makes box horrible for RT (and graphics:).  With GPU
> > turned back off, the largest trace I saw was 77us, which jibes pretty
> > well with new max of low to mid 50s without ftrace running, or up a tad
> > over 20us for this now pretty old commodity i7 desktop box.
>
> So you are saying it is all good?

I wouldn't say good, but not the sky is falling 160us bad.

> I've been looking at the r8169 the other day and it seemed all good
> since it was only scheduling workqueues and NAPI. But now I see this
> mac_ocp_lock + config25_lock which are acquire in hardirq context but
> can't. This needs to be taken care.

I did a patch converting the locks, but didn't post it hoping maybe
those locks weren't really really needed. It doesn't matter for my
box, as it doesn't do anything but give RT chances to trip over bugs.

-Mike

Subject: Re: r8169: disable ASPM during NAPI poll locking changes --> BUG: Invalid wait context --> PREEMPT_RT pain

On 2023-05-22 12:41:21 [+0200], Mike Galbraith wrote:
> > I've been looking at the r8169 the other day and it seemed all good
> > since it was only scheduling workqueues and NAPI. But now I see this
> > mac_ocp_lock + config25_lock which are acquire in hardirq context but
> > can't. This needs to be taken care.
>
> I did a patch converting the locks, but didn't post it hoping maybe
> those locks weren't really really needed. It doesn't matter for my
> box, as it doesn't do anything but give RT chances to trip over bugs.

I missed that part. Do you still see those 160us if you keep the locks
as-it as drop the no-threads flag? I'm trying to figure out if this is
due to hwirq context or just blocked PCI-bus due ASPM.

> -Mike

Sebastian

2023-05-22 17:15:09

by Mike Galbraith

[permalink] [raw]
Subject: Re: r8169: disable ASPM during NAPI poll locking changes --> BUG: Invalid wait context --> PREEMPT_RT pain

On Mon, 2023-05-22 at 15:43 +0200, Sebastian Andrzej Siewior wrote:
> On 2023-05-22 12:41:21 [+0200], Mike Galbraith wrote:
> > > I've been looking at the r8169 the other day and it seemed all good
> > > since it was only scheduling workqueues and NAPI. But now I see this
> > > mac_ocp_lock + config25_lock which are acquire in hardirq context but
> > > can't. This needs to be taken care.
> >
> > I did a patch converting the locks, but didn't post it hoping maybe
> > those locks weren't really really needed.  It doesn't matter for my
> > box, as it doesn't do anything but give RT chances to trip over bugs.
>
> I missed that part. Do you still see those 160us if you keep the locks
> as-it as drop the no-threads flag? I'm trying to figure out if this is
> due to hwirq context or just blocked PCI-bus due ASPM.

That 160us was with the nouveau left on by mistake, the worst ftrace
recorded with me beating up box was 77us (and it inflates). Without
ftrace and keeping box quiet except for youtube and netperf with lappy,
it looks like rtmutex v spinlock would eventually end in a near tie,
both being noticeable, but for a box without an RT mission, hohum.

Me preferring to not notice them, <poof> all better.

no locks
policy: fifo: loadavg: 13.22 9.21 4.16 12/970 5737

T: 0 ( 4385) P:99 I:1000 C: 329625 Min: 1 Act: 4 Avg: 3 Max: 31
T: 1 ( 4386) P:99 I:1500 C: 219749 Min: 1 Act: 5 Avg: 3 Max: 24
T: 2 ( 4387) P:99 I:2000 C: 164811 Min: 1 Act: 4 Avg: 3 Max: 24
T: 3 ( 4388) P:99 I:2500 C: 131849 Min: 1 Act: 6 Avg: 4 Max: 25
T: 4 ( 4389) P:99 I:3000 C: 109874 Min: 1 Act: 4 Avg: 3 Max: 24
T: 5 ( 4390) P:99 I:3500 C: 94177 Min: 1 Act: 5 Avg: 4 Max: 25
T: 6 ( 4391) P:99 I:4000 C: 82405 Min: 1 Act: 5 Avg: 4 Max: 23
T: 7 ( 4392) P:99 I:4500 C: 73248 Min: 1 Act: 5 Avg: 4 Max: 27

spinlock_t IRQF_SHARED
policy: fifo: loadavg: 13.99 10.01 4.63 11/925 5459

T: 0 ( 5188) P:99 I:1000 C: 330449 Min: 1 Act: 4 Avg: 3 Max: 41
T: 1 ( 5189) P:99 I:1500 C: 220298 Min: 1 Act: 6 Avg: 4 Max: 25
T: 2 ( 5190) P:99 I:2000 C: 165223 Min: 1 Act: 12 Avg: 4 Max: 58
T: 3 ( 5191) P:99 I:2500 C: 132178 Min: 1 Act: 8 Avg: 4 Max: 44
T: 4 ( 5192) P:99 I:3000 C: 110148 Min: 1 Act: 8 Avg: 4 Max: 43
T: 5 ( 5193) P:99 I:3500 C: 94412 Min: 1 Act: 6 Avg: 4 Max: 23
T: 6 ( 5194) P:99 I:4000 C: 82611 Min: 1 Act: 6 Avg: 5 Max: 24
T: 7 ( 5195) P:99 I:4500 C: 73431 Min: 1 Act: 10 Avg: 5 Max: 38

raw_spinlock_t IRQF_NO_THREAD
policy: fifo: loadavg: 14.84 10.35 4.81 4/1031 5744

T: 0 ( 4680) P:99 I:1000 C: 347816 Min: 1 Act: 6 Avg: 4 Max: 29
T: 1 ( 4681) P:99 I:1500 C: 231877 Min: 1 Act: 6 Avg: 4 Max: 49
T: 2 ( 4682) P:99 I:2000 C: 173907 Min: 1 Act: 7 Avg: 5 Max: 39
T: 3 ( 4683) P:99 I:2500 C: 139125 Min: 1 Act: 6 Avg: 5 Max: 46
T: 4 ( 4684) P:99 I:3000 C: 115937 Min: 1 Act: 4 Avg: 4 Max: 34
T: 5 ( 4685) P:99 I:3500 C: 99375 Min: 1 Act: 7 Avg: 5 Max: 32
T: 6 ( 4686) P:99 I:4000 C: 86953 Min: 1 Act: 9 Avg: 5 Max: 28
T: 7 ( 4687) P:99 I:4500 C: 77291 Min: 1 Act: 6 Avg: 5 Max: 37





2023-05-22 17:29:10

by Mike Galbraith

[permalink] [raw]
Subject: Re: r8169: disable ASPM during NAPI poll locking changes --> BUG: Invalid wait context --> PREEMPT_RT pain

On Mon, 2023-05-22 at 19:02 +0200, Mike Galbraith wrote:
> That 160us was with the nouveau left on by mistake, the worst ftrace
> recorded with me beating up box was 77us (and it inflates).

Grr, ftraced 160us was with the GPU mucking up box's RT performance,
ftraced 77us was without GPU. Box was very busy in both cases.

-Mike