2012-05-30 21:48:22

by Dave Jones

[permalink] [raw]
Subject: NOHZ: local_softirq_pending 08

Thomas,
I've been seeing these messages more and more lately.
When I start a kernel build for eg, it seems to spew regularly..

May 30 17:42:00 [ 1030.651976] NOHZ: local_softirq_pending 08
May 30 17:42:16 [ 1046.579491] NOHZ: local_softirq_pending 08
May 30 17:42:16 [ 1047.230344] NOHZ: local_softirq_pending 08
May 30 17:42:33 [ 1063.751879] NOHZ: local_softirq_pending 08
May 30 17:42:33 [ 1063.796340] NOHZ: local_softirq_pending 08
May 30 17:43:00 [ 1090.776493] NOHZ: local_softirq_pending 08
May 30 17:43:00 [ 1090.836544] NOHZ: local_softirq_pending 08
May 30 17:43:00 [ 1090.880271] NOHZ: local_softirq_pending 08
May 30 17:43:04 [ 1095.291939] NOHZ: local_softirq_pending 08
May 30 17:43:04 [ 1095.335659] NOHZ: local_softirq_pending 08



According to git, you ratelimited this back in 2007 with the message
"until we found the root cause of that problem."

Is this an old problem that has returned ?

Maybe interesting note: Both machines I see this on regularly are Atom cpus,
and it's always '08'.


Dave


2012-05-31 07:52:49

by Thomas Gleixner

[permalink] [raw]
Subject: Re: NOHZ: local_softirq_pending 08

Dave,

On Wed, 30 May 2012, Dave Jones wrote:

> May 30 17:43:04 [ 1095.335659] NOHZ: local_softirq_pending 08
>
> According to git, you ratelimited this back in 2007 with the message
> "until we found the root cause of that problem."
>
> Is this an old problem that has returned ?

Looks like.

> Maybe interesting note: Both machines I see this on regularly are Atom cpus,
> and it's always '08'.

So it goes idle with NET_RX softirq pending. That really should not
happen. I fear you need to fire up the tracer :)

Thanks,

tglx

2012-05-31 07:56:31

by Eric Dumazet

[permalink] [raw]
Subject: Re: NOHZ: local_softirq_pending 08

On Thu, 2012-05-31 at 09:52 +0200, Thomas Gleixner wrote:
> Dave,
>
> On Wed, 30 May 2012, Dave Jones wrote:
>
> > May 30 17:43:04 [ 1095.335659] NOHZ: local_softirq_pending 08
> >
> > According to git, you ratelimited this back in 2007 with the message
> > "until we found the root cause of that problem."
> >
> > Is this an old problem that has returned ?
>
> Looks like.
>
> > Maybe interesting note: Both machines I see this on regularly are Atom cpus,
> > and it's always '08'.
>
> So it goes idle with NET_RX softirq pending. That really should not
> happen. I fear you need to fire up the tracer :)

It might be a bug in the network driver, a race with NAPI completion.

Dave, which net drivers run on these machines ?

2012-05-31 14:04:41

by Dave Jones

[permalink] [raw]
Subject: Re: NOHZ: local_softirq_pending 08

On Thu, May 31, 2012 at 09:56:25AM +0200, Eric Dumazet wrote:
> On Thu, 2012-05-31 at 09:52 +0200, Thomas Gleixner wrote:
> > Dave,
> >
> > On Wed, 30 May 2012, Dave Jones wrote:
> >
> > > May 30 17:43:04 [ 1095.335659] NOHZ: local_softirq_pending 08
> > >
> > > According to git, you ratelimited this back in 2007 with the message
> > > "until we found the root cause of that problem."
> > >
> > > Is this an old problem that has returned ?
> >
> > Looks like.
> >
> > > Maybe interesting note: Both machines I see this on regularly are Atom cpus,
> > > and it's always '08'.
> >
> > So it goes idle with NET_RX softirq pending. That really should not
> > happen. I fear you need to fire up the tracer :)
>
> It might be a bug in the network driver, a race with NAPI completion.
>
> Dave, which net drivers run on these machines ?

r8169.

I got an email from someone else asking if I used Realtek drivers, which I thought
was an uncanny guess.

Dave

2012-06-01 23:06:27

by Francois Romieu

[permalink] [raw]
Subject: Re: NOHZ: local_softirq_pending 08

Thomas Gleixner <[email protected]> :
[
> So it goes idle with NET_RX softirq pending. That really should not
> happen. I fear you need to fire up the tracer :)

Can you elaborate what "should not happen" mean ? AFAIU nothing prevents
softirq to be interrupted after NET_RX processing and thus NET_RX_SOFTIRQ
to be pending again before tick_nohz_irq_exit() runs.

Dave can you:
- specify if there much traffic when the message appears
- send /proc/interrupts + ethtool -S ethX

Thanks.

--
Ueimor

2012-06-02 20:58:26

by Marc Dionne

[permalink] [raw]
Subject: Re: NOHZ: local_softirq_pending 08

If it helps, I've been seeing the same thing here on a system with an
r8169. The message appears consistently enough on network startup to
be able to track it down to this specific commit:
98ddf986fca17840e46e070354b7e2cd2169da15: r8169: bh locking redux
and task scheduling.

The softirq pending messages always come right after "link down"
messages for the interface.

Marc

2012-06-05 23:24:53

by Francois Romieu

[permalink] [raw]
Subject: Re: NOHZ: local_softirq_pending 08

Marc Dionne <[email protected]> :
> If it helps, I've been seeing the same thing here on a system with an
> r8169. The message appears consistently enough on network startup to
> be able to track it down to this specific commit:
> 98ddf986fca17840e46e070354b7e2cd2169da15: r8169: bh locking redux
> and task scheduling.
>
> The softirq pending messages always come right after "link down"
> messages for the interface.

Let's see the traces.

I have added some code to trace the timing and values of the irq in
rtl8169_interrupt(). Patches against current git are attached.

This is the relevant part of kernel messages as seen through the serial
console:
[...]
[ 2441.972077] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[ 2441.977882] r8169 0000:01:00.0: enabling Mem-Wr-Inval
[ 2441.977939] r8169 0000:01:00.0: enabling bus mastering
[ 2441.978046] r8169 0000:01:00.0: irq 42 for MSI/MSI-X
[ 2441.979412] r8169 0000:01:00.0: eth0: RTL8168b/8111b at 0xffffc90000032000, 00:13:8f:ea:b1:5d, XID 18000000 IRQ 42
[ 2441.990081] r8169 0000:01:00.0: eth0: jumbo features [frames: 4080 bytes, tx checksumming: ko]
[...]
[ 2683.588223] r8169 0000:01:00.0: 8168b-lom: link down
[ 2683.593302] r8169 0000:01:00.0: 8168b-lom: link down
[ 2683.598341] NOHZ: local_softirq_pending 08
[ 2683.599356] IPv6: ADDRCONF(NETDEV_UP): 8168b-lom: link is not ready
[ 2685.779376] r8169 0000:01:00.0: 8168b-lom: link up
[ 2685.784824] IPv6: ADDRCONF(NETDEV_CHANGE): 8168b-lom: link becomes ready

Now the content of the IntrStatus register from rtl8169_interrupt() :

[35882] 0020 (LinkChg, see [ 2683.593302] r8169 ... 8168b-lom: link down)
[57792] 0020 (LinkChg, see [ 2685.779376] r8169 ... 8168b-lom: link up)
[57960] 0084
[64880] 0084
[74880] 0084
[06240] 0084
[14960] 0084
[55040] 0084


'ip link set dev 8168b-lom up' + 'cat /sys/kernel/debug/tracing/trace':

# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
[...]
<idle>-0 [001] d..1 2681.581946: tick_nohz_stop_sched_tick <-tick_nohz_idle_enter
ip-1437 [000] ...1 2681.582967: rtl8169_get_stats64 <-dev_get_stats
ip-1437 [000] ...1 2681.582974: rtl8169_get_stats64 <-dev_get_stats
ip-1437 [000] ...1 2681.582990: rtl8169_get_stats64 <-dev_get_stats
ip-1437 [000] ...1 2681.582996: rtl8169_get_stats64 <-dev_get_stats
ip-1437 [000] ...1 2681.583002: rtl8169_get_stats64 <-dev_get_stats
ip-1437 [000] .... 2681.583146: rtl_device_event <-notifier_call_chain
ip-1437 [000] .... 2681.583148: rtl_open <-__dev_open
ip-1437 [000] .... 2681.583153: rtl8169_init_ring <-rtl_open
ip-1437 [000] .... 2681.583155: rtl8169_alloc_rx_data <-rtl8169_init_ring
[...]
ip-1437 [000] .... 2681.583497: rtl8169_alloc_rx_data <-rtl8169_init_ring
ip-1437 [000] .... 2681.583498: rtl_request_uncached_firmware <-rtl_open
ip-1437 [000] .... 2681.583539: rtl8169_init_phy <-rtl_open
ip-1437 [000] .... 2681.583540: rtl_hw_phy_config <-rtl8169_init_phy
ip-1437 [000] .... 2681.583540: rtl_writephy_batch <-rtl_hw_phy_config
ip-1437 [000] .... 2681.583541: rtl_writephy <-rtl_writephy_batch
ip-1437 [000] .... 2681.583542: r8169_mdio_write <-rtl_writephy
ip-1437 [000] .... 2681.583669: rtl_writephy <-rtl_writephy_batch
ip-1437 [000] .... 2681.583669: r8169_mdio_write <-rtl_writephy
ip-1437 [000] .... 2681.583795: rtl_writephy <-rtl_writephy_batch
ip-1437 [000] .... 2681.583796: r8169_mdio_write <-rtl_writephy
ip-1437 [000] .... 2681.583924: rtl8169_xmii_reset_enable <-rtl8169_init_phy
ip-1437 [000] .... 2681.583925: rtl_readphy <-rtl8169_xmii_reset_enable
ip-1437 [000] .... 2681.583925: r8169_mdio_read <-rtl_readphy
ip-1437 [000] .... 2681.584053: rtl_writephy <-rtl8169_xmii_reset_enable
ip-1437 [000] .... 2681.584053: r8169_mdio_write <-rtl_writephy
ip-1437 [000] .... 2681.584180: rtl8169_xmii_reset_pending <-rtl8169_init_phy
ip-1437 [000] .... 2681.584180: rtl_readphy <-rtl8169_xmii_reset_pending
ip-1437 [000] .... 2681.584180: r8169_mdio_read <-rtl_readphy
ip-1437 [000] .... 2681.584308: rtl8169_set_speed <-rtl8169_init_phy
ip-1437 [000] .... 2681.584309: rtl8169_set_speed_xmii <-rtl8169_set_speed
ip-1437 [000] .... 2681.584309: rtl_writephy <-rtl8169_set_speed_xmii
ip-1437 [000] .... 2681.584309: r8169_mdio_write <-rtl_writephy
ip-1437 [000] .... 2681.584436: rtl_readphy <-rtl8169_set_speed_xmii
ip-1437 [000] .... 2681.584436: r8169_mdio_read <-rtl_readphy
ip-1437 [000] .... 2681.584564: rtl_readphy <-rtl8169_set_speed_xmii
ip-1437 [000] .... 2681.584564: r8169_mdio_read <-rtl_readphy
ip-1437 [000] .... 2681.584692: rtl_writephy <-rtl8169_set_speed_xmii
ip-1437 [000] .... 2681.584692: r8169_mdio_write <-rtl_writephy
ip-1437 [000] .... 2681.584819: rtl_writephy <-rtl8169_set_speed_xmii
ip-1437 [000] .... 2681.584819: r8169_mdio_write <-rtl_writephy
ip-1437 [000] .... 2681.584946: rtl_writephy <-rtl8169_set_speed_xmii
ip-1437 [000] .... 2681.584946: r8169_mdio_write <-rtl_writephy
ip-1437 [000] .... 2681.585074: rtl_writephy <-r8168_pll_power_up
ip-1437 [000] .... 2681.585075: r8169_mdio_write <-rtl_writephy
ip-1437 [000] .... 2681.585201: rtl_writephy <-r8168_pll_power_up
ip-1437 [000] .... 2681.585201: r8169_mdio_write <-rtl_writephy
<idle>-0 [001] d..2 2681.585306: tick_nohz_stop_sched_tick <-tick_nohz_irq_exit
ip-1437 [000] .... 2681.585326: rtl_writephy <-r8168_pll_power_up
ip-1437 [000] .... 2681.585327: r8169_mdio_write <-rtl_writephy
ip-1437 [000] .... 2681.585453: rtl_hw_start_8168 <-rtl_open
ip-1437 [000] .... 2681.585456: rtl_set_rx_tx_desc_registers <-rtl_hw_start_8168
ip-1437 [000] .... 2681.585456: rtl_set_rx_mode <-rtl_hw_start_8168
ip-1437 [000] .... 2681.585461: rtl_hw_start_8168bb <-rtl_hw_start_8168
ip-1437 [000] .... 2681.585464: rtl_tx_performance_tweak <-rtl_hw_start_8168bb
ip-1437 [000] .... 2681.585472: rtl8169_xmii_link_ok <-__rtl8169_check_link_status

[See first LinkChg event above]

ip-1437 [000] d.h. 2681.585477: rtl8169_interrupt <-handle_irq_event_percpu
ip-1437 [000] ..s1 2681.585481: rtl8169_poll <-net_rx_action

[r8169 NAPI handler schedules some work for the link event...]

ip-1437 [000] ..s1 2681.585483: rtl_schedule_task <-rtl8169_poll
<idle>-0 [001] d..2 2681.589280: tick_nohz_stop_sched_tick <-tick_nohz_irq_exit
ip-1437 [000] .Ns1 2681.590517: rtl_set_rx_mode <-__dev_set_rx_mode
ip-1437 [000] .Ns1 2681.590523: rtl_set_rx_mode <-__dev_set_rx_mode
ip-1437 [000] .N.. 2681.590533: rtl8169_get_stats64 <-dev_get_stats
ip-1437 [000] .Ns1 2681.590546: rtl_set_rx_mode <-__dev_set_rx_mode

[... and kworker kicks in]

kworker/0:2-44 [000] .... 2681.590565: rtl_task <-process_one_work
kworker/0:2-44 [000] .... 2681.590566: rtl_slow_event_work <-rtl_task

[Here comes the 'link down' message...]

kworker/0:2-44 [000] .... 2681.590568: rtl8169_xmii_link_ok <-__rtl8169_check_link_status
<idle>-0 [001] d..1 2681.590774: tick_nohz_stop_sched_tick <-tick_nohz_idle_enter
<idle>-0 [001] d..2 2681.593285: tick_nohz_stop_sched_tick <-tick_nohz_irq_exit

[... and the 'local_softirq_pending' message takes place ~5ms later]

<idle>-0 [000] d..1 2681.595607: tick_nohz_stop_sched_tick <-tick_nohz_idle_enter
ip-1437 [001] .... 2681.596629: rtl_device_event <-notifier_call_chain

[NAPI was scheduled from 'rtl_slow_event_work'. rtl8169_poll experiences a
10 ms delay but I really don't care: there should not be much slow events
and it avoids shared locking between the slow work thread and the NAPI
handler]

<idle>-0 [000] .Ns2 2681.606110: rtl8169_poll <-net_rx_action
<idle>-0 [000] d..1 2681.606377: tick_nohz_stop_sched_tick <-tick_nohz_idle_enter
<idle>-0 [001] d..1 2681.606506: tick_nohz_stop_sched_tick <-tick_nohz_idle_enter
<idle>-0 [000] d..2 2681.606526: tick_nohz_stop_sched_tick <-tick_nohz_irq_exit
<idle>-0 [000] d..2 2681.607842: tick_nohz_stop_sched_tick <-tick_nohz_irq_exit
<idle>-0 [001] d..2 2681.609292: tick_nohz_stop_sched_tick <-tick_nohz_irq_exit
<idle>-0 [001] d..2 2681.613272: tick_nohz_stop_sched_tick <-tick_nohz_irq_exit
<idle>-0 [001] d..2 2681.617268: tick_nohz_stop_sched_tick <-tick_nohz_irq_exit
<idle>-0 [001] d..2 2681.621265: tick_nohz_stop_sched_tick <-tick_nohz_irq_exit
<idle>-0 [001] d..2 2681.625275: tick_nohz_stop_sched_tick <-tick_nohz_irq_exit
<idle>-0 [001] d..2 2681.629262: tick_nohz_stop_sched_tick <-tick_nohz_irq_exit
[...]
<idle>-0 [001] d..1 2683.551714: tick_nohz_stop_sched_tick <-tick_nohz_idle_enter
<idle>-0 [001] d..1 2683.751544: tick_nohz_stop_sched_tick <-tick_nohz_idle_enter
<idle>-0 [000] d.h1 2683.774846: rtl8169_interrupt <-handle_irq_event_percpu
<idle>-0 [000] ..s2 2683.774850: rtl8169_poll <-net_rx_action
<idle>-0 [000] ..s2 2683.774852: rtl_schedule_task <-rtl8169_poll
kworker/0:2-44 [000] .... 2683.774860: rtl_task <-process_one_work
kworker/0:2-44 [000] .... 2683.774861: rtl_slow_event_work <-rtl_task

[link up message, 2s after the 'link down' one]

kworker/0:2-44 [000] .... 2683.774863: rtl8169_xmii_link_ok <-__rtl8169_check_link_status
kworker/0:2-44 [000] .... 2683.774865: rtl_link_chg_patch <-__rtl8169_check_link_status

98ddf986fca17840e46e070354b7e2cd2169da15 removed bh locking, whence the
implicit do_softirq() called from _local_bh_enable_ip.

With my stupid driver maintainer hat on, I can only tell that
it works as designed.

It could be worth checking if Dave's traces look the same or not.

--
Ueimor


Attachments:
(No filename) (10.67 kB)
0001-r8169-debugfs-support.patch (6.33 kB)
0002-r8169-irq-debug-stuff.patch (3.17 kB)
Download all attachments

2012-06-06 01:47:13

by Dave Jones

[permalink] [raw]
Subject: Re: NOHZ: local_softirq_pending 08

On Wed, Jun 06, 2012 at 01:15:50AM +0200, Francois Romieu wrote:
> Marc Dionne <[email protected]> :
> > If it helps, I've been seeing the same thing here on a system with an
> > r8169. The message appears consistently enough on network startup to
> > be able to track it down to this specific commit:
> > 98ddf986fca17840e46e070354b7e2cd2169da15: r8169: bh locking redux
> > and task scheduling.
> >
> > The softirq pending messages always come right after "link down"
> > messages for the interface.

for me, they're during boot when the nic comes up, and then very occasionally afterwards.

> Let's see the traces.
>
> I have added some code to trace the timing and values of the irq in
> rtl8169_interrupt(). Patches against current git are attached.

with these patches applied, I get a lot of spew in dmesg like ..

[ 98.659314] r8169 0000:01:00.0: eth0: event ring full.
[ 98.763513] r8169 0000:01:00.0: eth0: event ring full.
[ 99.182103] r8169 0000:01:00.0: eth0: event ring full.
[ 99.294426] r8169 0000:01:00.0: eth0: event ring full.
[ 99.420255] r8169 0000:01:00.0: eth0: event ring full.
[ 99.455811] r8169 0000:01:00.0: eth0: event ring full.
[ 99.520541] r8169 0000:01:00.0: eth0: event ring full.
[ 99.730789] r8169 0000:01:00.0: eth0: event ring full.
[ 99.887643] r8169 0000:01:00.0: eth0: event ring full.
[ 103.798861] net_ratelimit: 43 callbacks suppressed

over and over..

> 'ip link set dev 8168b-lom up' + 'cat /sys/kernel/debug/tracing/trace':

What tracer do I need to configure to get make this work ? irqsoff ?

finally, /sys/kernel/debug/r8169/ is empty.

Dave

2012-06-06 05:51:53

by Francois Romieu

[permalink] [raw]
Subject: Re: NOHZ: local_softirq_pending 08

Dave Jones <[email protected]> :
[...]
> with these patches applied, I get a lot of spew in dmesg like ..

Yes, I only set up a single array for the irq instead of a real ring
and pipe in debugfs. You can change RTL_DEBUG_EVENT_RING_SIZE (4 bytes
per timestamp + event record). seq_file will grow its buffer when it
overflows.

[...]
> [ 99.730789] r8169 0000:01:00.0: eth0: event ring full.
> [ 99.887643] r8169 0000:01:00.0: eth0: event ring full.
> [ 103.798861] net_ratelimit: 43 callbacks suppressed
>
> over and over..

You can remove the message in 'rtl_trace_event'

[...]
> > 'ip link set dev 8168b-lom up' + 'cat /sys/kernel/debug/tracing/trace':
>
> What tracer do I need to configure to get make this work ? irqsoff ?

'function'

My sequence was:

echo 0 > tracing_on
echo nop > current_tracer
echo 10000 > buffer_size_kb
echo function > current_tracer
echo rtl* > set_ftrace_filter
echo r8169* >> set_ftrace_filter
echo tick_nohz_stop_sched_tick >> set_ftrace_filter

I only enabled tracing_on right before 'ip link ... up'.
tick_nohz_stop_sched_tick fills the log rather fast.

[...]
> finally, /sys/kernel/debug/r8169/ is empty.

I temporary saw it once too. Don't ask why.

Do things look like:

$ objdump --syms drivers/net/ethernet/realtek/r8169.ko | grep -E 'rtl_debug_(show|fops)_([rt]x|irq)'
000000000000346d l F .text 0000000000000087 rtl_debug_show_irq
0000000000009bdc l F .text 00000000000000a0 rtl_debug_show_tx
000000000000355a l F .text 0000000000000063 rtl_debug_show_rx
00000000000019e0 l O .rodata 00000000000000d0 rtl_debug_fops_rx
0000000000001ab0 l O .rodata 00000000000000d0 rtl_debug_fops_tx
0000000000001b80 l O .rodata 00000000000000d0 rtl_debug_fops_irq

--
Ueimor

(off to work)

2012-06-08 02:34:31

by Dave Jones

[permalink] [raw]
Subject: Re: NOHZ: local_softirq_pending 08

On Wed, Jun 06, 2012 at 07:42:51AM +0200, Francois Romieu wrote:

> My sequence was:
>
> echo 0 > tracing_on
> echo nop > current_tracer
> echo 10000 > buffer_size_kb
> echo function > current_tracer
> echo rtl* > set_ftrace_filter
> echo r8169* >> set_ftrace_filter
> echo tick_nohz_stop_sched_tick >> set_ftrace_filter
>
> I only enabled tracing_on right before 'ip link ... up'.

Ok. This was a bit of a pain to trace, as I can't trigger the NOHZ message
when I bring up/down the interface by hand. It only seems to happen
during boot now for some reason. So I modified the networking ifup scripts
to start/stop the logging. It's still a pain, because there are two NICs
in the affected machine, so everything happens twice.
It also seems to have started too early & finished too late.. oh well.
hopefully you can get something interesting out of it.

> tick_nohz_stop_sched_tick fills the log rather fast.

I don't have that symbol for some reason.

The log is still huge though. (68M uncompressed) at http://www.codemonkey.org.uk/junk/r8169-trace.txt.xz

Here's the order of events on that boot...

[ 53.086040] r8169 0000:01:00.0: eth0: link down
[ 53.086441] r8169 0000:01:00.0: eth0: link down
[ 53.086545] NOHZ: local_softirq_pending 08
[ 53.113025] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 54.773637] r8169 0000:01:00.0: eth0: link up
[ 54.783210] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 64.941664] r8169 0000:05:00.0: eth1: link down
[ 64.941748] r8169 0000:05:00.0: eth1: link down
[ 64.941792] NOHZ: local_softirq_pending 08
[ 64.968326] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
[ 67.948677] r8169 0000:05:00.0: eth1: link up
[ 67.961872] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready


Dave

2012-06-08 07:57:08

by Thomas Gleixner

[permalink] [raw]
Subject: Re: NOHZ: local_softirq_pending 08

On Thu, 7 Jun 2012, Dave Jones wrote:
>
> The log is still huge though. (68M uncompressed) at
> http://www.codemonkey.org.uk/junk/r8169-trace.txt.xz

Following that URL gives me a nice picture.

This is somewhat embarrassing, isn?t it? :)

Thanks,

tglx

2012-06-08 08:59:47

by Johannes Stezenbach

[permalink] [raw]
Subject: Re: NOHZ: local_softirq_pending 08

On Thu, Jun 07, 2012 at 10:34:20PM -0400, Dave Jones wrote:
> Ok. This was a bit of a pain to trace, as I can't trigger the NOHZ message
> when I bring up/down the interface by hand. It only seems to happen
> during boot now for some reason.

The message is only printed 10 times on each boot due to
the "ratelimit" in tick_nohz_stop_sched_tick. I once debugged
an issue with rt2x00 (20ed3166c) and found that fairly confusing at first.

HTH
Johannes

2012-06-08 14:23:52

by Dave Jones

[permalink] [raw]
Subject: Re: NOHZ: local_softirq_pending 08

On Fri, Jun 08, 2012 at 09:57:03AM +0200, Thomas Gleixner wrote:
> On Thu, 7 Jun 2012, Dave Jones wrote:
> >
> > The log is still huge though. (68M uncompressed) at
> > http://www.codemonkey.org.uk/junk/r8169-trace.txt.xz
>
> Following that URL gives me a nice picture.
>
> This is somewhat embarrassing, isn’t it? :)

indeed ;-) Should be fixed.

Dave

2012-06-08 14:52:42

by Thomas Gleixner

[permalink] [raw]
Subject: Re: NOHZ: local_softirq_pending 08

On Fri, 8 Jun 2012, Dave Jones wrote:

> On Fri, Jun 08, 2012 at 09:57:03AM +0200, Thomas Gleixner wrote:
> > On Thu, 7 Jun 2012, Dave Jones wrote:
> > >
> > > The log is still huge though. (68M uncompressed) at
> > > http://www.codemonkey.org.uk/junk/r8169-trace.txt.xz
> >
> > Following that URL gives me a nice picture.
> >
> > This is somewhat embarrassing, isn’t it? :)
>
> indeed ;-) Should be fixed.

I can't find the point where the warning is issued, but I think I
found the cause of the problem.

static void rtl_slow_event_work(struct rtl8169_private *tp)
{
.....
napi_schedule(&tp->napi);
--> __napi_schedule();
--> list_add_tail(&napi->poll_list, &sd->poll_list);
__raise_softirq_irqoff(NET_RX_SOFTIRQ);

This merily sets the softirq bit.


So this code is really wrong. It's called from full preemptible
context of the workqueue. And if the next thing is a context switch to
idle then the pending softirq check will trigger.

I let the network folks sort out the proper solution.

Thanks,

tglx

2012-06-08 16:46:31

by Nikos Chantziaras

[permalink] [raw]
Subject: Re: NOHZ: local_softirq_pending 08

On 31/05/12 17:04, Dave Jones wrote:
> On Thu, May 31, 2012 at 09:56:25AM +0200, Eric Dumazet wrote:
> > On Thu, 2012-05-31 at 09:52 +0200, Thomas Gleixner wrote:
> > > Dave,
> > >
> > > On Wed, 30 May 2012, Dave Jones wrote:
> > >
> > > > May 30 17:43:04 [ 1095.335659] NOHZ: local_softirq_pending 08
> > > >
> > > > According to git, you ratelimited this back in 2007 with the message
> > > > "until we found the root cause of that problem."
> > > >
> > > > Is this an old problem that has returned ?
> > >
> > > Looks like.
> > >
> > > > Maybe interesting note: Both machines I see this on regularly are Atom cpus,
> > > > and it's always '08'.
> > >
> > > So it goes idle with NET_RX softirq pending. That really should not
> > > happen. I fear you need to fire up the tracer :)
> >
> > It might be a bug in the network driver, a race with NAPI completion.
> >
> > Dave, which net drivers run on these machines ?
>
> r8169.
>
> I got an email from someone else asking if I used Realtek drivers, which I thought
> was an uncanny guess.

Same thing happens here (kernel 3.4.1, also happened with 3.4.0):

r8169 0000:06:00.0: eth0: link down
r8169 0000:06:00.0: eth0: link down
NOHZ: local_softirq_pending 08
NOHZ: local_softirq_pending 08
NOHZ: local_softirq_pending 08
r8169 0000:06:00.0: eth0: link up

2012-06-08 20:36:16

by Francois Romieu

[permalink] [raw]
Subject: Re: NOHZ: local_softirq_pending 08

Thomas Gleixner <[email protected]> :
[...]
> I can't find the point where the warning is issued, but I think I
> found the cause of the problem.
>
> static void rtl_slow_event_work(struct rtl8169_private *tp)
> {
> .....
> napi_schedule(&tp->napi);
> --> __napi_schedule();
> --> list_add_tail(&napi->poll_list, &sd->poll_list);
> __raise_softirq_irqoff(NET_RX_SOFTIRQ);
>
> This merily sets the softirq bit.
>
>
> So this code is really wrong. It's called from full preemptible
> context of the workqueue. And if the next thing is a context switch to
> idle then the pending softirq check will trigger.

void __napi_schedule(struct napi_struct *n)
{
unsigned long flags;

local_irq_save(flags);
____napi_schedule(&__get_cpu_var(softnet_data), n);
local_irq_restore(flags);
}

Are you saying that this stuff should be considered "preemptible" ?

--
Ueimor

2012-06-08 20:41:58

by Thomas Gleixner

[permalink] [raw]
Subject: Re: NOHZ: local_softirq_pending 08

On Fri, 8 Jun 2012, Francois Romieu wrote:

> Thomas Gleixner <[email protected]> :
> [...]
> > I can't find the point where the warning is issued, but I think I
> > found the cause of the problem.
> >
> > static void rtl_slow_event_work(struct rtl8169_private *tp)
> > {
> > .....
> > napi_schedule(&tp->napi);
> > --> __napi_schedule();
> > --> list_add_tail(&napi->poll_list, &sd->poll_list);
> > __raise_softirq_irqoff(NET_RX_SOFTIRQ);
> >
> > This merily sets the softirq bit.
> >
> >
> > So this code is really wrong. It's called from full preemptible
> > context of the workqueue. And if the next thing is a context switch to
> > idle then the pending softirq check will trigger.
>
> void __napi_schedule(struct napi_struct *n)
> {
> unsigned long flags;
>
> local_irq_save(flags);
> ____napi_schedule(&__get_cpu_var(softnet_data), n);
> local_irq_restore(flags);
> }
>
> Are you saying that this stuff should be considered "preemptible" ?

Gah, crap. Looked at the wrong ___________underscore level. I _____so
__love _that.

Though the problem is, that it is neither called in interrupt context
nor with bh disabled, so nothing invokes the softirq before it reaches
idle.

In hard interrupt context the pending flag is evaluated in irq_exit()
and the softirqs are invoked from there. If you call that from thread
context, then a bh_disable/enable pair will make sure that the pending
softirq is invoked. Did I miss some more ___underscore magic which
does that ?

Thanks,

tglx

2012-06-08 22:02:41

by Francois Romieu

[permalink] [raw]
Subject: Re: NOHZ: local_softirq_pending 08

Thomas Gleixner <[email protected]> :
[...]
> Though the problem is, that it is neither called in interrupt context
> nor with bh disabled, so nothing invokes the softirq before it reaches
> idle.
>
> In hard interrupt context the pending flag is evaluated in irq_exit()
> and the softirqs are invoked from there. If you call that from thread
> context, then a bh_disable/enable pair will make sure that the pending
> softirq is invoked. Did I miss some more ___underscore magic which
> does that ?

No. You _are_ right.

That's why 98ddf986fca17840e46e070354b7e2cd2169da15 triggered the
message: it removed the last bh lock in the r8169 slow work context.

/me slaps head...

The whole thing is probably moot. Dave, can you apply the patch
below on top of a fresh kernel tree ?

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 4a05b68..d452441 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -5934,11 +5934,7 @@ static void rtl_slow_event_work(struct rtl8169_private *tp)
if (status & LinkChg)
__rtl8169_check_link_status(dev, tp, tp->mmio_addr, true);

- napi_disable(&tp->napi);
- rtl_irq_disable(tp);
-
- napi_enable(&tp->napi);
- napi_schedule(&tp->napi);
+ rtl_irq_enable_all(tp);
}

static void rtl_task(struct work_struct *work)

2012-06-08 22:33:12

by Marc Dionne

[permalink] [raw]
Subject: Re: NOHZ: local_softirq_pending 08

On Fri, Jun 8, 2012 at 5:53 PM, Francois Romieu <[email protected]> wrote:
> Thomas Gleixner <[email protected]> :
> [...]
>> Though the problem is, that it is neither called in interrupt context
>> nor with bh disabled, so nothing invokes the softirq before it reaches
>> idle.
>>
>> In hard interrupt context the pending flag is evaluated in irq_exit()
>> and the softirqs are invoked from there. If you call that from thread
>> context, then a bh_disable/enable pair will make sure that the pending
>> softirq is invoked. Did I miss some more ___underscore magic which
>> does that ?
>
> No. You _are_ right.
>
> That's why 98ddf986fca17840e46e070354b7e2cd2169da15 triggered the
> message: it removed the last bh lock in the r8169 slow work context.
>
> /me slaps head...
>
> The whole thing is probably moot. Dave, can you apply the patch
> below on top of a fresh kernel tree ?
>
> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index 4a05b68..d452441 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -5934,11 +5934,7 @@ static void rtl_slow_event_work(struct rtl8169_private *tp)
> ? ? ? ?if (status & LinkChg)
> ? ? ? ? ? ? ? ?__rtl8169_check_link_status(dev, tp, tp->mmio_addr, true);
>
> - ? ? ? napi_disable(&tp->napi);
> - ? ? ? rtl_irq_disable(tp);
> -
> - ? ? ? napi_enable(&tp->napi);
> - ? ? ? napi_schedule(&tp->napi);
> + ? ? ? rtl_irq_enable_all(tp);
> ?}
>
> ?static void rtl_task(struct work_struct *work)

That works for me - no warnings after several reboots and bringing the
interface up/down many times.

Marc

2012-06-10 20:40:35

by Dave Jones

[permalink] [raw]
Subject: Re: NOHZ: local_softirq_pending 08

On Fri, Jun 08, 2012 at 06:33:09PM -0400, Marc Dionne wrote:
> On Fri, Jun 8, 2012 at 5:53 PM, Francois Romieu <[email protected]> wrote:
> > index 4a05b68..d452441 100644
> > --- a/drivers/net/ethernet/realtek/r8169.c
> > +++ b/drivers/net/ethernet/realtek/r8169.c
> > @@ -5934,11 +5934,7 @@ static void rtl_slow_event_work(struct rtl8169_private *tp)
> > ? ? ? ?if (status & LinkChg)
> > ? ? ? ? ? ? ? ?__rtl8169_check_link_status(dev, tp, tp->mmio_addr, true);
> >
> > - ? ? ? napi_disable(&tp->napi);
> > - ? ? ? rtl_irq_disable(tp);
> > -
> > - ? ? ? napi_enable(&tp->napi);
> > - ? ? ? napi_schedule(&tp->napi);
> > + ? ? ? rtl_irq_enable_all(tp);
> > ?}
> >
> > ?static void rtl_task(struct work_struct *work)
>
> That works for me - no warnings after several reboots and bringing the
> interface up/down many times.

Yep, looks good to me too. Thanks Francois

Dave

2012-06-25 12:38:43

by Johannes Stezenbach

[permalink] [raw]
Subject: Re: NOHZ: local_softirq_pending 08

Hi,

On Fri, Jun 08, 2012 at 11:53:28PM +0200, Francois Romieu wrote:
> Thomas Gleixner <[email protected]> :
> [...]
> > Though the problem is, that it is neither called in interrupt context
> > nor with bh disabled, so nothing invokes the softirq before it reaches
> > idle.
> >
> > In hard interrupt context the pending flag is evaluated in irq_exit()
> > and the softirqs are invoked from there. If you call that from thread
> > context, then a bh_disable/enable pair will make sure that the pending
> > softirq is invoked. Did I miss some more ___underscore magic which
> > does that ?
>
> No. You _are_ right.
>
> That's why 98ddf986fca17840e46e070354b7e2cd2169da15 triggered the
> message: it removed the last bh lock in the r8169 slow work context.
>
> /me slaps head...
>
> The whole thing is probably moot. Dave, can you apply the patch
> below on top of a fresh kernel tree ?
>
> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index 4a05b68..d452441 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -5934,11 +5934,7 @@ static void rtl_slow_event_work(struct rtl8169_private *tp)
> if (status & LinkChg)
> __rtl8169_check_link_status(dev, tp, tp->mmio_addr, true);
>
> - napi_disable(&tp->napi);
> - rtl_irq_disable(tp);
> -
> - napi_enable(&tp->napi);
> - napi_schedule(&tp->napi);
> + rtl_irq_enable_all(tp);
> }
>
> static void rtl_task(struct work_struct *work)

I just upgraded my rarely-used Atom based netbook from 3.3.5 to 3.4.4
and got the "NOHZ: local_softirq_pending 08" message.
Should this patch go into stable?

Johannes

2012-06-25 19:39:35

by Francois Romieu

[permalink] [raw]
Subject: Re: NOHZ: local_softirq_pending 08

Johannes Stezenbach <[email protected]> :
[7dbb491878a2c51d372a8890fa45a8ff80358af1]
> I just upgraded my rarely-used Atom based netbook from 3.3.5 to 3.4.4
> and got the "NOHZ: local_softirq_pending 08" message.
> Should this patch go into stable?

It should not change your life. I'll be fine with whatever policy davej,
tglx or davem see fit (assuming they share one :o) ).

--
Ueimor

2012-06-25 19:53:14

by Dave Jones

[permalink] [raw]
Subject: Re: NOHZ: local_softirq_pending 08

On Mon, Jun 25, 2012 at 09:28:39PM +0200, Francois Romieu wrote:
> Johannes Stezenbach <[email protected]> :
> [7dbb491878a2c51d372a8890fa45a8ff80358af1]
> > I just upgraded my rarely-used Atom based netbook from 3.3.5 to 3.4.4
> > and got the "NOHZ: local_softirq_pending 08" message.
> > Should this patch go into stable?
>
> It should not change your life. I'll be fine with whatever policy davej,
> tglx or davem see fit (assuming they share one :o) ).

it never seemed to cause any noticable problem that I observed (just the dmesg spew),
so I'm ambivalent about the whole thing now that it's fixed in 3.5rc :-)

Dave