LinuxLists.cc - [regression v4.11] 617f01211baf ("8139too: use napi_complete

2017-04-07 18:18:03

Subject: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()")

Hi,

My old P3 laptop started to die on me in the middle of larger compile
jobs (using distcc) after v4.11-rc<something>. I bisected the problem
to 617f01211baf ("8139too: use napi_complete_done()").

Unfortunately I wasn't able to capture a full oops as the machine doesn't
have serial and ramoops failed me. I did get one partial oops on vgacon
which showed rtl8139_poll() being involved (EIP was around
_raw_spin_unlock_irqrestore() supposedly), so seems to agree with my
bisect result.

So maybe some kind of nasty thing going between the hard irq and
softirq? Perhaps UP related? I tried to stare at the locking around
rtl8139_poll() for a while but it looked mostly sane to me.

--
Ville Syrj?l?
Intel OTC

2017-04-07 18:39:01

by Eric Dumazet

[permalink] [raw]

Subject: Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()")

On Fri, 2017-04-07 at 21:17 +0300, Ville Syrjälä wrote:
> Hi,
>
> My old P3 laptop started to die on me in the middle of larger compile
> jobs (using distcc) after v4.11-rc<something>. I bisected the problem
> to 617f01211baf ("8139too: use napi_complete_done()").
>
> Unfortunately I wasn't able to capture a full oops as the machine doesn't
> have serial and ramoops failed me. I did get one partial oops on vgacon
> which showed rtl8139_poll() being involved (EIP was around
> _raw_spin_unlock_irqrestore() supposedly), so seems to agree with my
> bisect result.
>
> So maybe some kind of nasty thing going between the hard irq and
> softirq? Perhaps UP related? I tried to stare at the locking around
> rtl8139_poll() for a while but it looked mostly sane to me.
>

Thanks a lot for the detective work, I am so sorry for this !

Could you try the following patch ?

I do not really see what could be wrong, the code should run just fine
on UP.

Thanks.

diff --git a/drivers/net/ethernet/realtek/8139too.c b/drivers/net/ethernet/realtek/8139too.c
index 89631753e79962d91456d93b71929af768917da1..cd2dbec331dd796f5296cd378561b3443f231673 100644
--- a/drivers/net/ethernet/realtek/8139too.c
+++ b/drivers/net/ethernet/realtek/8139too.c
@@ -2135,11 +2135,12 @@ static int rtl8139_poll(struct napi_struct *napi, int budget)
if (likely(RTL_R16(IntrStatus) & RxAckBits))
work_done += rtl8139_rx(dev, tp, budget);

- if (work_done < budget && napi_complete_done(napi, work_done)) {
+ if (work_done < budget) {
unsigned long flags;

spin_lock_irqsave(&tp->lock, flags);
- RTL_W16_F(IntrMask, rtl8139_intr_mask);
+ if (napi_complete_done(napi, work_done))
+ RTL_W16_F(IntrMask, rtl8139_intr_mask);
spin_unlock_irqrestore(&tp->lock, flags);
}
spin_unlock(&tp->rx_lock);

2017-04-07 18:42:30

by David Miller

[permalink] [raw]

Subject: Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()")

From: Eric Dumazet <[email protected]>
Date: Fri, 07 Apr 2017 11:38:49 -0700

> I do not really see what could be wrong, the code should run just fine
> on UP.

One theory is that the interrupt masking isn't working properly
and interrupts are still arriving and hitting the NAPI state even
when we are actively polling NAPI.

And this problem was masked by the locking done here.

2017-04-08 10:32:56

by Francois Romieu

[permalink] [raw]

Subject: Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()")

David Miller <[email protected]> :
[...]
> One theory is that the interrupt masking isn't working properly
> and interrupts are still arriving and hitting the NAPI state even
> when we are actively polling NAPI.
>
> And this problem was masked by the locking done here.

Yes.

Ville, can you rule out irq sharing between the 8139 and some other
device ? It's a candidate for unexpected interrupt handler invocation
with older pc, even with properly working hardware.

--
Ueimor

2017-04-10 12:11:09

by Ville Syrjälä

[permalink] [raw]

Subject: Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()")

On Fri, Apr 07, 2017 at 11:38:49AM -0700, Eric Dumazet wrote:
> On Fri, 2017-04-07 at 21:17 +0300, Ville Syrj?l? wrote:
> > Hi,
> >
> > My old P3 laptop started to die on me in the middle of larger compile
> > jobs (using distcc) after v4.11-rc<something>. I bisected the problem
> > to 617f01211baf ("8139too: use napi_complete_done()").
> >
> > Unfortunately I wasn't able to capture a full oops as the machine doesn't
> > have serial and ramoops failed me. I did get one partial oops on vgacon
> > which showed rtl8139_poll() being involved (EIP was around
> > _raw_spin_unlock_irqrestore() supposedly), so seems to agree with my
> > bisect result.
> >
> > So maybe some kind of nasty thing going between the hard irq and
> > softirq? Perhaps UP related? I tried to stare at the locking around
> > rtl8139_poll() for a while but it looked mostly sane to me.
> >
>
> Thanks a lot for the detective work, I am so sorry for this !
>
> Could you try the following patch ?
>
> I do not really see what could be wrong, the code should run just fine
> on UP.
>
> Thanks.
>
> diff --git a/drivers/net/ethernet/realtek/8139too.c b/drivers/net/ethernet/realtek/8139too.c
> index 89631753e79962d91456d93b71929af768917da1..cd2dbec331dd796f5296cd378561b3443f231673 100644
> --- a/drivers/net/ethernet/realtek/8139too.c
> +++ b/drivers/net/ethernet/realtek/8139too.c
> @@ -2135,11 +2135,12 @@ static int rtl8139_poll(struct napi_struct *napi, int budget)
> if (likely(RTL_R16(IntrStatus) & RxAckBits))
> work_done += rtl8139_rx(dev, tp, budget);
>
> - if (work_done < budget && napi_complete_done(napi, work_done)) {
> + if (work_done < budget) {
> unsigned long flags;
>
> spin_lock_irqsave(&tp->lock, flags);
> - RTL_W16_F(IntrMask, rtl8139_intr_mask);
> + if (napi_complete_done(napi, work_done))
> + RTL_W16_F(IntrMask, rtl8139_intr_mask);
> spin_unlock_irqrestore(&tp->lock, flags);
> }
> spin_unlock(&tp->rx_lock);
>
>

Yep, that patch does appear to make it stable again.

Tested-by: Ville Syrj?l? <[email protected]>

--
Ville Syrj?l?
Intel OTC

2017-04-10 14:22:35

by tedheadster

[permalink] [raw]

Subject: Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()")

On Sat, Apr 8, 2017 at 6:23 AM, Francois Romieu <[email protected]> wrote:
> David Miller <[email protected]> :
> [...]
>> One theory is that the interrupt masking isn't working properly
>> and interrupts are still arriving and hitting the NAPI state even
>> when we are actively polling NAPI.
>>
>> And this problem was masked by the locking done here.
>
> Yes.
>
> Ville, can you rule out irq sharing between the 8139 and some other
> device ? It's a candidate for unexpected interrupt handler invocation
> with older pc, even with properly working hardware.
>

Eric,
If napi_complete_done() calls could affect drivers on older
hardware, I can test the following:

drivers/net/ethernet/3com/typhoon.c
drivers/net/ethernet/amd/pcnet32.c
drivers/net/ethernet/broadcom/tg3.c
drivers/net/ethernet/dec/tulip/interrupt.c
drivers/net/ethernet/intel/e100.c
drivers/net/ethernet/intel/e1000/e1000_main.c
drivers/net/ethernet/smsc/epic100.c
drivers/net/ethernet/via/via-rhine.c

- Matthew

2017-04-10 14:56:33

by Eric Dumazet

[permalink] [raw]

Subject: Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()")

On Mon, Apr 10, 2017 at 7:22 AM, tedheadster <[email protected]> wrote:
> On Sat, Apr 8, 2017 at 6:23 AM, Francois Romieu <[email protected]> wrote:
>> David Miller <[email protected]> :
>> [...]
>>> One theory is that the interrupt masking isn't working properly
>>> and interrupts are still arriving and hitting the NAPI state even
>>> when we are actively polling NAPI.
>>>
>>> And this problem was masked by the locking done here.
>>
>> Yes.
>>
>> Ville, can you rule out irq sharing between the 8139 and some other
>> device ? It's a candidate for unexpected interrupt handler invocation
>> with older pc, even with properly working hardware.
>>
>
> Eric,
> If napi_complete_done() calls could affect drivers on older
> hardware, I can test the following:
>
> drivers/net/ethernet/3com/typhoon.c
> drivers/net/ethernet/amd/pcnet32.c
> drivers/net/ethernet/broadcom/tg3.c
> drivers/net/ethernet/dec/tulip/interrupt.c
> drivers/net/ethernet/intel/e100.c
> drivers/net/ethernet/intel/e1000/e1000_main.c
> drivers/net/ethernet/smsc/epic100.c
> drivers/net/ethernet/via/via-rhine.c

That would be great, thanks Matthew.

2017-04-21 11:40:09

by Ville Syrjälä

[permalink] [raw]

Subject: Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()")

BTW I've also been getting some lockdep grief from r8169 netpoll stuff
recently. Not sure if it might be related to these changes or not, but
I don't remember seeing this sort of stuff until quite recently.

[ 251.911044] ======================================================
[ 251.911044] [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]
[ 251.911045] 4.11.0-rc6-elk+ #101 Not tainted
[ 251.911045] ------------------------------------------------------
[ 251.911046] kms_plane_blink/2132 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
[ 251.911046] (&syncp->seq#2){+.-...}, at: [<c14398a3>] netpoll_poll_dev+0xb3/0x2a0
[ 251.911048]
[ 251.911048] and this task is already holding:
[ 251.911049] (target_list_lock){-.....}, at: [<f858693f>] write_msg+0x3f/0xd0 [netconsole]
[ 251.911050] which would create a new lock dependency:
[ 251.911050] (target_list_lock){-.....} -> (&syncp->seq#2){+.-...}
[ 251.911053]
[ 251.911053] but this new dependency connects a HARDIRQ-irq-safe lock:
[ 251.911054] (target_list_lock){-.....}
[ 251.911055] ... which became HARDIRQ-irq-safe at:
[ 251.911055] __lock_acquire+0x6ec/0x1440
[ 251.911056] lock_acquire+0x7e/0x1a0
[ 251.911056] _raw_spin_lock_irqsave+0x2b/0x40
[ 251.911056] write_msg+0x3f/0xd0 [netconsole]
[ 251.911057] console_unlock+0x41c/0x570
[ 251.911057] vprintk_emit+0x26d/0x350
[ 251.911058] vprintk_default+0x12/0x20
[ 251.911058] vprintk_func+0x1c/0x50
[ 251.911058] printk+0xe/0x10
[ 251.911059] drm_printk+0x5a/0x60 [drm]
[ 251.911059] intel_get_hpd_pins+0x87/0xa0 [i915]
[ 251.911060] i9xx_hpd_irq_handler+0x96/0xe0 [i915]
[ 251.911060] i965_irq_handler+0x2c8/0x2f0 [i915]
[ 251.911060] __handle_irq_event_percpu+0x38/0x360
[ 251.911061] handle_irq_event_percpu+0x19/0x50
[ 251.911061] handle_irq_event+0x29/0x50
[ 251.911062] handle_edge_irq+0x65/0x110
[ 251.911062] handle_irq+0x9f/0xc0
[ 251.911062] do_IRQ+0x4f/0x110
[ 251.911063] common_interrupt+0x36/0x3c
[ 251.911063] cpuidle_enter_state+0xcc/0x390
[ 251.911064] cpuidle_enter+0xf/0x20
[ 251.911064] call_cpuidle+0x1c/0x40
[ 251.911064] do_idle+0x164/0x1d0
[ 251.911065] cpu_startup_entry+0x1d/0x20
[ 251.911065] start_secondary+0x104/0x170
[ 251.911066] startup_32_smp+0x16b/0x16d
[ 251.911066]
[ 251.911066] to a HARDIRQ-irq-unsafe lock:
[ 251.911067] (&syncp->seq#2){+.-...}
[ 251.911068] ... which became HARDIRQ-irq-unsafe at:
[ 251.911068] ... __lock_acquire+0x5d7/0x1440
[ 251.911069] lock_acquire+0x7e/0x1a0
[ 251.911069] rtl8169_poll+0x474/0x620 [r8169]
[ 251.911070] net_rx_action+0x1d0/0x3f0
[ 251.911070] __do_softirq+0x17d/0x460
[ 251.911070] do_softirq_own_stack+0x1d/0x30
[ 251.911071] irq_exit+0xa5/0xb0
[ 251.911071] do_IRQ+0x58/0x110
[ 251.911072] common_interrupt+0x36/0x3c
[ 251.911072] cpuidle_enter_state+0xcc/0x390
[ 251.911072] cpuidle_enter+0xf/0x20
[ 251.911073] call_cpuidle+0x1c/0x40
[ 251.911073] do_idle+0x164/0x1d0
[ 251.911073] cpu_startup_entry+0x1d/0x20
[ 251.911074] rest_init+0x10c/0x120
[ 251.911074] start_kernel+0x30d/0x312
[ 251.911075] i386_start_kernel+0x85/0x89
[ 251.911075] startup_32_smp+0x16b/0x16d
[ 251.911075]
[ 251.911076] other info that might help us debug this:
[ 251.911076]
[ 251.911077] Possible interrupt unsafe locking scenario:
[ 251.911077]
[ 251.911077] CPU0 CPU1
[ 251.911078] ---- ----
[ 251.911078] lock(&syncp->seq#2);
[ 251.911079] local_irq_disable();
[ 251.911080] lock(target_list_lock);
[ 251.911081] lock(&syncp->seq#2);
[ 251.911082] <Interrupt>
[ 251.911082] lock(target_list_lock);
[ 251.911083]
[ 251.911084] *** DEADLOCK ***
[ 251.911084]
[ 251.911084] 2 locks held by kms_plane_blink/2132:
[ 251.911085] #0: (console_lock){+.+.+.}, at: [<c10ad3b4>] vprintk_emit+0x264/0x350
[ 251.911086] #1: (target_list_lock){-.....}, at: [<f858693f>] write_msg+0x3f/0xd0 [netconsole]
[ 251.911088]
[ 251.911088] the dependencies between HARDIRQ-irq-safe lock and the holding lock:
[ 251.911089] -> (target_list_lock){-.....} ops: 482 {
[ 251.911090] IN-HARDIRQ-W at:
[ 251.911091] __lock_acquire+0x6ec/0x1440
[ 251.911092] lock_acquire+0x7e/0x1a0
[ 251.911092] _raw_spin_lock_irqsave+0x2b/0x40
[ 251.911092] write_msg+0x3f/0xd0 [netconsole]
[ 251.911093] console_unlock+0x41c/0x570
[ 251.911093] vprintk_emit+0x26d/0x350
[ 251.911094] vprintk_default+0x12/0x20
[ 251.911094] vprintk_func+0x1c/0x50
[ 251.911095] printk+0xe/0x10
[ 251.911095] drm_printk+0x5a/0x60 [drm]
[ 251.911096] intel_get_hpd_pins+0x87/0xa0 [i915]
[ 251.911096] i9xx_hpd_irq_handler+0x96/0xe0 [i915]
[ 251.911097] i965_irq_handler+0x2c8/0x2f0 [i915]
[ 251.911097] __handle_irq_event_percpu+0x38/0x360
[ 251.911098] handle_irq_event_percpu+0x19/0x50
[ 251.911098] handle_irq_event+0x29/0x50
[ 251.911098] handle_edge_irq+0x65/0x110
[ 251.911099] handle_irq+0x9f/0xc0
[ 251.911099] do_IRQ+0x4f/0x110
[ 251.911100] common_interrupt+0x36/0x3c
[ 251.911100] cpuidle_enter_state+0xcc/0x390
[ 251.911101] cpuidle_enter+0xf/0x20
[ 251.911101] call_cpuidle+0x1c/0x40
[ 251.911102] do_idle+0x164/0x1d0
[ 251.911102] cpu_startup_entry+0x1d/0x20
[ 251.911102] start_secondary+0x104/0x170
[ 251.911103] startup_32_smp+0x16b/0x16d
[ 251.911103] INITIAL USE at:
[ 251.911104] __lock_acquire+0x221/0x1440
[ 251.911104] lock_acquire+0x7e/0x1a0
[ 251.911105] _raw_spin_lock_irqsave+0x2b/0x40
[ 251.911105] make_netconsole_target+0x80/0xab [netconsole]
[ 251.911106] configfs_mkdir+0x312/0x3b0 [configfs]
[ 251.911106] vfs_mkdir+0xd2/0x1b0
[ 251.911107] SyS_mkdir+0x57/0xc0
[ 251.911107] do_fast_syscall_32+0x80/0x430
[ 251.911108] entry_SYSENTER_32+0x4c/0x7b
[ 251.911108] }
[ 251.911108] ... key at: [<f85882f0>] target_list_lock+0x10/0xffffed20 [netconsole]
[ 251.911109] ... acquired at:
[ 251.911109] check_irq_usage+0x42/0xb0
[ 251.911110] __lock_acquire+0xfbf/0x1440
[ 251.911110] lock_acquire+0x7e/0x1a0
[ 251.911110] rtl8169_poll+0x474/0x620 [r8169]
[ 251.911111] netpoll_poll_dev+0xb3/0x2a0
[ 251.911111] netpoll_send_skb_on_dev+0x180/0x240
[ 251.911112] netpoll_send_udp+0x292/0x440
[ 251.911112] write_msg+0x9c/0xd0 [netconsole]
[ 251.911112] console_unlock+0x41c/0x570
[ 251.911113] vprintk_emit+0x26d/0x350
[ 251.911113] printk_emit+0x1b/0x1d
[ 251.911114] devkmsg_write+0xeb/0x140
[ 251.911114] __vfs_write+0xb0/0x110
[ 251.911114] vfs_write+0xa1/0x1e0
[ 251.911115] SyS_write+0x3d/0x90
[ 251.911115] do_fast_syscall_32+0x80/0x430
[ 251.911116] entry_SYSENTER_32+0x4c/0x7b
[ 251.911116]
[ 251.911116]
[ 251.911117] the dependencies between the lock to be acquired and HARDIRQ-irq-unsafe lock:
[ 251.911117] -> (&syncp->seq#2){+.-...} ops: 7439 {
[ 251.911119] HARDIRQ-ON-W at:
[ 251.911120] __lock_acquire+0x5d7/0x1440
[ 251.911120] lock_acquire+0x7e/0x1a0
[ 251.911121] rtl8169_poll+0x474/0x620 [r8169]
[ 251.911121] net_rx_action+0x1d0/0x3f0
[ 251.911122] __do_softirq+0x17d/0x460
[ 251.911122] do_softirq_own_stack+0x1d/0x30
[ 251.911122] irq_exit+0xa5/0xb0
[ 251.911123] do_IRQ+0x58/0x110
[ 251.911123] common_interrupt+0x36/0x3c
[ 251.911124] cpuidle_enter_state+0xcc/0x390
[ 251.911124] cpuidle_enter+0xf/0x20
[ 251.911125] call_cpuidle+0x1c/0x40
[ 251.911125] do_idle+0x164/0x1d0
[ 251.911126] cpu_startup_entry+0x1d/0x20
[ 251.911126] rest_init+0x10c/0x120
[ 251.911127] start_kernel+0x30d/0x312
[ 251.911127] i386_start_kernel+0x85/0x89
[ 251.911127] startup_32_smp+0x16b/0x16d
[ 251.911128] IN-SOFTIRQ-W at:
[ 251.911129] __lock_acquire+0x5b5/0x1440
[ 251.911129] lock_acquire+0x7e/0x1a0
[ 251.911129] rtl8169_poll+0x474/0x620 [r8169]
[ 251.911130] net_rx_action+0x1d0/0x3f0
[ 251.911130] __do_softirq+0x17d/0x460
[ 251.911131] do_softirq_own_stack+0x1d/0x30
[ 251.911131] irq_exit+0xa5/0xb0
[ 251.911132] do_IRQ+0x58/0x110
[ 251.911132] common_interrupt+0x36/0x3c
[ 251.911133] cpuidle_enter_state+0xcc/0x390
[ 251.911133] cpuidle_enter+0xf/0x20
[ 251.911133] call_cpuidle+0x1c/0x40
[ 251.911134] do_idle+0x164/0x1d0
[ 251.911134] cpu_startup_entry+0x1d/0x20
[ 251.911135] rest_init+0x10c/0x120
[ 251.911135] start_kernel+0x30d/0x312
[ 251.911136] i386_start_kernel+0x85/0x89
[ 251.911136] startup_32_smp+0x16b/0x16d
[ 251.911136] INITIAL USE at:
[ 251.911137] __lock_acquire+0x221/0x1440
[ 251.911138] lock_acquire+0x7e/0x1a0
[ 251.911138] rtl8169_get_stats64+0x81/0x320 [r8169]
[ 251.911139] dev_get_stats+0x30/0xa0
[ 251.911139] rtnl_fill_stats+0x30/0x110
[ 251.911139] rtnl_fill_ifinfo+0x4be/0xd80
[ 251.911140] rtmsg_ifinfo_build_skb+0x45/0xc0
[ 251.911140] rtmsg_ifinfo.part.33+0xd/0x30
[ 251.911141] rtmsg_ifinfo+0x1b/0x20
[ 251.911141] register_netdevice+0x495/0x5e0
[ 251.911142] register_netdev+0x12/0x20
[ 251.911142] rtl_init_one+0x86d/0x1060 [r8169]
[ 251.911143] pci_device_probe+0x8b/0x100
[ 251.911143] driver_probe_device+0x1ad/0x2a0
[ 251.911144] __driver_attach+0x89/0x90
[ 251.911144] bus_for_each_dev+0x4f/0x80
[ 251.911144] driver_attach+0x14/0x20
[ 251.911145]
[ 251.911146] Lost 63 message(s)!

--
Ville Syrj?l?
Intel OTC

2017-04-21 13:29:41

by Eric Dumazet

[permalink] [raw]

Subject: Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()")

On Fri, 2017-04-21 at 14:40 +0300, Ville Syrjälä wrote:
> BTW I've also been getting some lockdep grief from r8169 netpoll stuff
> recently. Not sure if it might be related to these changes or not, but
> I don't remember seeing this sort of stuff until quite recently.
>
> [ 251.911044] ======================================================
> [ 251.911044] [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]
> [ 251.911045] 4.11.0-rc6-elk+ #101 Not tainted
> [ 251.911045] ------------------------------------------------------
> [ 251.911046] kms_plane_blink/2132 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
> [ 251.911046] (&syncp->seq#2){+.-...}, at: [<c14398a3>] netpoll_poll_dev+0xb3/0x2a0
> [ 251.911048]
> [ 251.911048] and this task is already holding:
> [ 251.911049] (target_list_lock){-.....}, at: [<f858693f>] write_msg+0x3f/0xd0 [netconsole]
> [ 251.911050] which would create a new lock dependency:
> [ 251.911050] (target_list_lock){-.....} -> (&syncp->seq#2){+.-...}
> [ 251.911053]
> [ 251.911053] but this new dependency connects a HARDIRQ-irq-safe lock:
> [ 251.911054] (target_list_lock){-.....}
> [ 251.911055] ... which became HARDIRQ-irq-safe at:
> [ 251.911055] __lock_acquire+0x6ec/0x1440
> [ 251.911056] lock_acquire+0x7e/0x1a0
> [ 251.911056] _raw_spin_lock_irqsave+0x2b/0x40
> [ 251.911056] write_msg+0x3f/0xd0 [netconsole]
> [ 251.911057] console_unlock+0x41c/0x570
> [ 251.911057] vprintk_emit+0x26d/0x350
> [ 251.911058] vprintk_default+0x12/0x20
> [ 251.911058] vprintk_func+0x1c/0x50
> [ 251.911058] printk+0xe/0x10
> [ 251.911059] drm_printk+0x5a/0x60 [drm]
> [ 251.911059] intel_get_hpd_pins+0x87/0xa0 [i915]
> [ 251.911060] i9xx_hpd_irq_handler+0x96/0xe0 [i915]
> [ 251.911060] i965_irq_handler+0x2c8/0x2f0 [i915]
> [ 251.911060] __handle_irq_event_percpu+0x38/0x360
> [ 251.911061] handle_irq_event_percpu+0x19/0x50
> [ 251.911061] handle_irq_event+0x29/0x50
> [ 251.911062] handle_edge_irq+0x65/0x110
> [ 251.911062] handle_irq+0x9f/0xc0
> [ 251.911062] do_IRQ+0x4f/0x110
> [ 251.911063] common_interrupt+0x36/0x3c
> [ 251.911063] cpuidle_enter_state+0xcc/0x390
> [ 251.911064] cpuidle_enter+0xf/0x20
> [ 251.911064] call_cpuidle+0x1c/0x40
> [ 251.911064] do_idle+0x164/0x1d0
> [ 251.911065] cpu_startup_entry+0x1d/0x20
> [ 251.911065] start_secondary+0x104/0x170
> [ 251.911066] startup_32_smp+0x16b/0x16d
> [ 251.911066]
> [ 251.911066] to a HARDIRQ-irq-unsafe lock:
> [ 251.911067] (&syncp->seq#2){+.-...}
> [ 251.911068] ... which became HARDIRQ-irq-unsafe at:
> [ 251.911068] ... __lock_acquire+0x5d7/0x1440
> [ 251.911069] lock_acquire+0x7e/0x1a0
> [ 251.911069] rtl8169_poll+0x474/0x620 [r8169]
> [ 251.911070] net_rx_action+0x1d0/0x3f0
> [ 251.911070] __do_softirq+0x17d/0x460
> [ 251.911070] do_softirq_own_stack+0x1d/0x30
> [ 251.911071] irq_exit+0xa5/0xb0
> [ 251.911071] do_IRQ+0x58/0x110
> [ 251.911072] common_interrupt+0x36/0x3c
> [ 251.911072] cpuidle_enter_state+0xcc/0x390
> [ 251.911072] cpuidle_enter+0xf/0x20
> [ 251.911073] call_cpuidle+0x1c/0x40
> [ 251.911073] do_idle+0x164/0x1d0
> [ 251.911073] cpu_startup_entry+0x1d/0x20
> [ 251.911074] rest_init+0x10c/0x120
> [ 251.911074] start_kernel+0x30d/0x312
> [ 251.911075] i386_start_kernel+0x85/0x89
> [ 251.911075] startup_32_smp+0x16b/0x16d
> [ 251.911075]
> [ 251.911076] other info that might help us debug this:
> [ 251.911076]
> [ 251.911077] Possible interrupt unsafe locking scenario:
> [ 251.911077]
> [ 251.911077] CPU0 CPU1
> [ 251.911078] ---- ----
> [ 251.911078] lock(&syncp->seq#2);
> [ 251.911079] local_irq_disable();
> [ 251.911080] lock(target_list_lock);
> [ 251.911081] lock(&syncp->seq#2);
> [ 251.911082] <Interrupt>
> [ 251.911082] lock(target_list_lock);
> [ 251.911083]
> [ 251.911084] *** DEADLOCK ***
> [ 251.911084]
> [ 251.911084] 2 locks held by kms_plane_blink/2132:
> [ 251.911085] #0: (console_lock){+.+.+.}, at: [<c10ad3b4>] vprintk_emit+0x264/0x350
> [ 251.911086] #1: (target_list_lock){-.....}, at: [<f858693f>] write_msg+0x3f/0xd0 [netconsole]
> [ 251.911088]
> [ 251.911088] the dependencies between HARDIRQ-irq-safe lock and the holding lock:
> [ 251.911089] -> (target_list_lock){-.....} ops: 482 {
> [ 251.911090] IN-HARDIRQ-W at:
> [ 251.911091] __lock_acquire+0x6ec/0x1440
> [ 251.911092] lock_acquire+0x7e/0x1a0
> [ 251.911092] _raw_spin_lock_irqsave+0x2b/0x40
> [ 251.911092] write_msg+0x3f/0xd0 [netconsole]
> [ 251.911093] console_unlock+0x41c/0x570
> [ 251.911093] vprintk_emit+0x26d/0x350
> [ 251.911094] vprintk_default+0x12/0x20
> [ 251.911094] vprintk_func+0x1c/0x50
> [ 251.911095] printk+0xe/0x10
> [ 251.911095] drm_printk+0x5a/0x60 [drm]
> [ 251.911096] intel_get_hpd_pins+0x87/0xa0 [i915]
> [ 251.911096] i9xx_hpd_irq_handler+0x96/0xe0 [i915]
> [ 251.911097] i965_irq_handler+0x2c8/0x2f0 [i915]
> [ 251.911097] __handle_irq_event_percpu+0x38/0x360
> [ 251.911098] handle_irq_event_percpu+0x19/0x50
> [ 251.911098] handle_irq_event+0x29/0x50
> [ 251.911098] handle_edge_irq+0x65/0x110
> [ 251.911099] handle_irq+0x9f/0xc0
> [ 251.911099] do_IRQ+0x4f/0x110
> [ 251.911100] common_interrupt+0x36/0x3c
> [ 251.911100] cpuidle_enter_state+0xcc/0x390
> [ 251.911101] cpuidle_enter+0xf/0x20
> [ 251.911101] call_cpuidle+0x1c/0x40
> [ 251.911102] do_idle+0x164/0x1d0
> [ 251.911102] cpu_startup_entry+0x1d/0x20
> [ 251.911102] start_secondary+0x104/0x170
> [ 251.911103] startup_32_smp+0x16b/0x16d
> [ 251.911103] INITIAL USE at:
> [ 251.911104] __lock_acquire+0x221/0x1440
> [ 251.911104] lock_acquire+0x7e/0x1a0
> [ 251.911105] _raw_spin_lock_irqsave+0x2b/0x40
> [ 251.911105] make_netconsole_target+0x80/0xab [netconsole]
> [ 251.911106] configfs_mkdir+0x312/0x3b0 [configfs]
> [ 251.911106] vfs_mkdir+0xd2/0x1b0
> [ 251.911107] SyS_mkdir+0x57/0xc0
> [ 251.911107] do_fast_syscall_32+0x80/0x430
> [ 251.911108] entry_SYSENTER_32+0x4c/0x7b
> [ 251.911108] }
> [ 251.911108] ... key at: [<f85882f0>] target_list_lock+0x10/0xffffed20 [netconsole]
> [ 251.911109] ... acquired at:
> [ 251.911109] check_irq_usage+0x42/0xb0
> [ 251.911110] __lock_acquire+0xfbf/0x1440
> [ 251.911110] lock_acquire+0x7e/0x1a0
> [ 251.911110] rtl8169_poll+0x474/0x620 [r8169]
> [ 251.911111] netpoll_poll_dev+0xb3/0x2a0
> [ 251.911111] netpoll_send_skb_on_dev+0x180/0x240
> [ 251.911112] netpoll_send_udp+0x292/0x440
> [ 251.911112] write_msg+0x9c/0xd0 [netconsole]
> [ 251.911112] console_unlock+0x41c/0x570
> [ 251.911113] vprintk_emit+0x26d/0x350
> [ 251.911113] printk_emit+0x1b/0x1d
> [ 251.911114] devkmsg_write+0xeb/0x140
> [ 251.911114] __vfs_write+0xb0/0x110
> [ 251.911114] vfs_write+0xa1/0x1e0
> [ 251.911115] SyS_write+0x3d/0x90
> [ 251.911115] do_fast_syscall_32+0x80/0x430
> [ 251.911116] entry_SYSENTER_32+0x4c/0x7b
> [ 251.911116]
> [ 251.911116]
> [ 251.911117] the dependencies between the lock to be acquired and HARDIRQ-irq-unsafe lock:
> [ 251.911117] -> (&syncp->seq#2){+.-...} ops: 7439 {
> [ 251.911119] HARDIRQ-ON-W at:
> [ 251.911120] __lock_acquire+0x5d7/0x1440
> [ 251.911120] lock_acquire+0x7e/0x1a0
> [ 251.911121] rtl8169_poll+0x474/0x620 [r8169]
> [ 251.911121] net_rx_action+0x1d0/0x3f0
> [ 251.911122] __do_softirq+0x17d/0x460
> [ 251.911122] do_softirq_own_stack+0x1d/0x30
> [ 251.911122] irq_exit+0xa5/0xb0
> [ 251.911123] do_IRQ+0x58/0x110
> [ 251.911123] common_interrupt+0x36/0x3c
> [ 251.911124] cpuidle_enter_state+0xcc/0x390
> [ 251.911124] cpuidle_enter+0xf/0x20
> [ 251.911125] call_cpuidle+0x1c/0x40
> [ 251.911125] do_idle+0x164/0x1d0
> [ 251.911126] cpu_startup_entry+0x1d/0x20
> [ 251.911126] rest_init+0x10c/0x120
> [ 251.911127] start_kernel+0x30d/0x312
> [ 251.911127] i386_start_kernel+0x85/0x89
> [ 251.911127] startup_32_smp+0x16b/0x16d
> [ 251.911128] IN-SOFTIRQ-W at:
> [ 251.911129] __lock_acquire+0x5b5/0x1440
> [ 251.911129] lock_acquire+0x7e/0x1a0
> [ 251.911129] rtl8169_poll+0x474/0x620 [r8169]
> [ 251.911130] net_rx_action+0x1d0/0x3f0
> [ 251.911130] __do_softirq+0x17d/0x460
> [ 251.911131] do_softirq_own_stack+0x1d/0x30
> [ 251.911131] irq_exit+0xa5/0xb0
> [ 251.911132] do_IRQ+0x58/0x110
> [ 251.911132] common_interrupt+0x36/0x3c
> [ 251.911133] cpuidle_enter_state+0xcc/0x390
> [ 251.911133] cpuidle_enter+0xf/0x20
> [ 251.911133] call_cpuidle+0x1c/0x40
> [ 251.911134] do_idle+0x164/0x1d0
> [ 251.911134] cpu_startup_entry+0x1d/0x20
> [ 251.911135] rest_init+0x10c/0x120
> [ 251.911135] start_kernel+0x30d/0x312
> [ 251.911136] i386_start_kernel+0x85/0x89
> [ 251.911136] startup_32_smp+0x16b/0x16d
> [ 251.911136] INITIAL USE at:
> [ 251.911137] __lock_acquire+0x221/0x1440
> [ 251.911138] lock_acquire+0x7e/0x1a0
> [ 251.911138] rtl8169_get_stats64+0x81/0x320 [r8169]
> [ 251.911139] dev_get_stats+0x30/0xa0
> [ 251.911139] rtnl_fill_stats+0x30/0x110
> [ 251.911139] rtnl_fill_ifinfo+0x4be/0xd80
> [ 251.911140] rtmsg_ifinfo_build_skb+0x45/0xc0
> [ 251.911140] rtmsg_ifinfo.part.33+0xd/0x30
> [ 251.911141] rtmsg_ifinfo+0x1b/0x20
> [ 251.911141] register_netdevice+0x495/0x5e0
> [ 251.911142] register_netdev+0x12/0x20
> [ 251.911142] rtl_init_one+0x86d/0x1060 [r8169]
> [ 251.911143] pci_device_probe+0x8b/0x100
> [ 251.911143] driver_probe_device+0x1ad/0x2a0
> [ 251.911144] __driver_attach+0x89/0x90
> [ 251.911144] bus_for_each_dev+0x4f/0x80
> [ 251.911144] driver_attach+0x14/0x20
> [ 251.911145]
> [ 251.911146] Lost 63 message(s)!
>

Thanks for this report.

Interesting to see how many drivers got the netpoll stuff wrong :/

Can you try :

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 81f18a8335276495a59fa93219c4607c2b8a47aa..74e4c72c331d5a6cc5b653970ef4133c8ddf9999 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -7668,7 +7668,7 @@ static void rtl8169_netpoll(struct net_device *dev)
{
struct rtl8169_private *tp = netdev_priv(dev);

- rtl8169_interrupt(tp->pci_dev->irq, dev);
+ napi_schedule(&tp->napi);
}
#endif

2017-04-21 17:39:11

by Eric Dumazet

[permalink] [raw]

Subject: Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()")

On Fri, 2017-04-21 at 08:09 -0700, Eric Dumazet wrote:
> On Fri, 2017-04-21 at 06:29 -0700, Eric Dumazet wrote:
>
> > Thanks for this report.
> >
> > Interesting to see how many drivers got the netpoll stuff wrong :/
> >
> > Can you try :
> >
> > diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> > index 81f18a8335276495a59fa93219c4607c2b8a47aa..74e4c72c331d5a6cc5b653970ef4133c8ddf9999 100644
> > --- a/drivers/net/ethernet/realtek/r8169.c
> > +++ b/drivers/net/ethernet/realtek/r8169.c
> > @@ -7668,7 +7668,7 @@ static void rtl8169_netpoll(struct net_device *dev)
> > {
> > struct rtl8169_private *tp = netdev_priv(dev);
> >
> > - rtl8169_interrupt(tp->pci_dev->irq, dev);
> > + napi_schedule(&tp->napi);
>
> The problem is more likely that netconsole handling can call rtl_tx()
> from hard irq context, while standard NAPI poll calls it from BH
>
> Meaning that the following sequence triggers a lockdep warning.
>
> u64_stats_update_begin(&tp->tx_stats.syncp);
> tp->tx_stats.packets++;
> tp->tx_stats.bytes += tx_skb->skb->len;
> u64_stats_update_end(&tp->tx_stats.syncp);
>
> Lockdep does not know that poll_napi() ( called from netpoll_poll_dev())
> uses an cmpxchg() to make sure that there is no race.
>
> I am not sure how we can teach lockdep to not splat in this case.

Well, could you try the following patch ?
Thanks !

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 0a8f2817ea60f2172eb28177473a4879f85bd18a..f64f812b86029b772bb245e51cdc2263adc4e6ea 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -7313,10 +7313,10 @@ static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp)
rtl8169_unmap_tx_skb(&tp->pci_dev->dev, tx_skb,
tp->TxDescArray + entry);
if (status & LastFrag) {
- u64_stats_update_begin(&tp->tx_stats.syncp);
+ u64_stats_update_begin_raw(&tp->tx_stats.syncp);
tp->tx_stats.packets++;
tp->tx_stats.bytes += tx_skb->skb->len;
- u64_stats_update_end(&tp->tx_stats.syncp);
+ u64_stats_update_end_raw(&tp->tx_stats.syncp);
dev_kfree_skb_any(tx_skb->skb);
tx_skb->skb = NULL;
}

2017-04-21 17:41:01

by Eric Dumazet

[permalink] [raw]

Subject: Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()")

On Fri, 2017-04-21 at 06:29 -0700, Eric Dumazet wrote:

> Thanks for this report.
>
> Interesting to see how many drivers got the netpoll stuff wrong :/
>
> Can you try :
>
> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index 81f18a8335276495a59fa93219c4607c2b8a47aa..74e4c72c331d5a6cc5b653970ef4133c8ddf9999 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -7668,7 +7668,7 @@ static void rtl8169_netpoll(struct net_device *dev)
> {
> struct rtl8169_private *tp = netdev_priv(dev);
>
> - rtl8169_interrupt(tp->pci_dev->irq, dev);
> + napi_schedule(&tp->napi);

The problem is more likely that netconsole handling can call rtl_tx()
from hard irq context, while standard NAPI poll calls it from BH

Meaning that the following sequence triggers a lockdep warning.

u64_stats_update_begin(&tp->tx_stats.syncp);
tp->tx_stats.packets++;
tp->tx_stats.bytes += tx_skb->skb->len;
u64_stats_update_end(&tp->tx_stats.syncp);

Lockdep does not know that poll_napi() ( called from netpoll_poll_dev())
uses an cmpxchg() to make sure that there is no race.

I am not sure how we can teach lockdep to not splat in this case.

2017-06-19 14:44:49

by Michal Kubecek

[permalink] [raw]

Subject: Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()")

On Fri, Apr 07, 2017 at 11:38:49AM -0700, Eric Dumazet wrote:
> On Fri, 2017-04-07 at 21:17 +0300, Ville Syrj?l? wrote:
> > Hi,
> >
> > My old P3 laptop started to die on me in the middle of larger compile
> > jobs (using distcc) after v4.11-rc<something>. I bisected the problem
> > to 617f01211baf ("8139too: use napi_complete_done()").
> >
> > Unfortunately I wasn't able to capture a full oops as the machine doesn't
> > have serial and ramoops failed me. I did get one partial oops on vgacon
> > which showed rtl8139_poll() being involved (EIP was around
> > _raw_spin_unlock_irqrestore() supposedly), so seems to agree with my
> > bisect result.
> >
> > So maybe some kind of nasty thing going between the hard irq and
> > softirq? Perhaps UP related? I tried to stare at the locking around
> > rtl8139_poll() for a while but it looked mostly sane to me.
> >
>
> Thanks a lot for the detective work, I am so sorry for this !
>
> Could you try the following patch ?
>
> I do not really see what could be wrong, the code should run just fine
> on UP.
>
> Thanks.
>
> diff --git a/drivers/net/ethernet/realtek/8139too.c b/drivers/net/ethernet/realtek/8139too.c
> index 89631753e79962d91456d93b71929af768917da1..cd2dbec331dd796f5296cd378561b3443f231673 100644
> --- a/drivers/net/ethernet/realtek/8139too.c
> +++ b/drivers/net/ethernet/realtek/8139too.c
> @@ -2135,11 +2135,12 @@ static int rtl8139_poll(struct napi_struct *napi, int budget)
> if (likely(RTL_R16(IntrStatus) & RxAckBits))
> work_done += rtl8139_rx(dev, tp, budget);
>
> - if (work_done < budget && napi_complete_done(napi, work_done)) {
> + if (work_done < budget) {
> unsigned long flags;
>
> spin_lock_irqsave(&tp->lock, flags);
> - RTL_W16_F(IntrMask, rtl8139_intr_mask);
> + if (napi_complete_done(napi, work_done))
> + RTL_W16_F(IntrMask, rtl8139_intr_mask);
> spin_unlock_irqrestore(&tp->lock, flags);
> }
> spin_unlock(&tp->rx_lock);

Eric,

we have a bugreport of what seems to be the same problem:

https://bugzilla.suse.com/show_bug.cgi?id=1042208

Do you plan to submit the patch above or is the conclusion that this is
rather a hardware problem?

Michal Kubecek

2017-07-05 19:53:59

by Ville Syrjälä

[permalink] [raw]

Subject: Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()")

On Mon, Jun 19, 2017 at 04:44:45PM +0200, Michal Kubecek wrote:
> On Fri, Apr 07, 2017 at 11:38:49AM -0700, Eric Dumazet wrote:
> > On Fri, 2017-04-07 at 21:17 +0300, Ville Syrj?l? wrote:
> > > Hi,
> > >
> > > My old P3 laptop started to die on me in the middle of larger compile
> > > jobs (using distcc) after v4.11-rc<something>. I bisected the problem
> > > to 617f01211baf ("8139too: use napi_complete_done()").
> > >
> > > Unfortunately I wasn't able to capture a full oops as the machine doesn't
> > > have serial and ramoops failed me. I did get one partial oops on vgacon
> > > which showed rtl8139_poll() being involved (EIP was around
> > > _raw_spin_unlock_irqrestore() supposedly), so seems to agree with my
> > > bisect result.
> > >
> > > So maybe some kind of nasty thing going between the hard irq and
> > > softirq? Perhaps UP related? I tried to stare at the locking around
> > > rtl8139_poll() for a while but it looked mostly sane to me.
> > >
> >
> > Thanks a lot for the detective work, I am so sorry for this !
> >
> > Could you try the following patch ?
> >
> > I do not really see what could be wrong, the code should run just fine
> > on UP.
> >
> > Thanks.
> >
> > diff --git a/drivers/net/ethernet/realtek/8139too.c b/drivers/net/ethernet/realtek/8139too.c
> > index 89631753e79962d91456d93b71929af768917da1..cd2dbec331dd796f5296cd378561b3443f231673 100644
> > --- a/drivers/net/ethernet/realtek/8139too.c
> > +++ b/drivers/net/ethernet/realtek/8139too.c
> > @@ -2135,11 +2135,12 @@ static int rtl8139_poll(struct napi_struct *napi, int budget)
> > if (likely(RTL_R16(IntrStatus) & RxAckBits))
> > work_done += rtl8139_rx(dev, tp, budget);
> >
> > - if (work_done < budget && napi_complete_done(napi, work_done)) {
> > + if (work_done < budget) {
> > unsigned long flags;
> >
> > spin_lock_irqsave(&tp->lock, flags);
> > - RTL_W16_F(IntrMask, rtl8139_intr_mask);
> > + if (napi_complete_done(napi, work_done))
> > + RTL_W16_F(IntrMask, rtl8139_intr_mask);
> > spin_unlock_irqrestore(&tp->lock, flags);
> > }
> > spin_unlock(&tp->rx_lock);
>
> Eric,
>
> we have a bugreport of what seems to be the same problem:
>
> https://bugzilla.suse.com/show_bug.cgi?id=1042208
>
> Do you plan to submit the patch above or is the conclusion that this is
> rather a hardware problem?

Could someone please push this patch forward? My machine just died again
with a fresh kernel because I forgot that I still need this patch.

--
Ville Syrj?l?
Intel OTC

2017-09-18 19:46:43

by Ville Syrjälä

[permalink] [raw]

Subject: Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()")

On Mon, Apr 10, 2017 at 03:11:02PM +0300, Ville Syrj?l? wrote:
> On Fri, Apr 07, 2017 at 11:38:49AM -0700, Eric Dumazet wrote:
> > On Fri, 2017-04-07 at 21:17 +0300, Ville Syrj?l? wrote:
> > > Hi,
> > >
> > > My old P3 laptop started to die on me in the middle of larger compile
> > > jobs (using distcc) after v4.11-rc<something>. I bisected the problem
> > > to 617f01211baf ("8139too: use napi_complete_done()").
> > >
> > > Unfortunately I wasn't able to capture a full oops as the machine doesn't
> > > have serial and ramoops failed me. I did get one partial oops on vgacon
> > > which showed rtl8139_poll() being involved (EIP was around
> > > _raw_spin_unlock_irqrestore() supposedly), so seems to agree with my
> > > bisect result.
> > >
> > > So maybe some kind of nasty thing going between the hard irq and
> > > softirq? Perhaps UP related? I tried to stare at the locking around
> > > rtl8139_poll() for a while but it looked mostly sane to me.
> > >
> >
> > Thanks a lot for the detective work, I am so sorry for this !
> >
> > Could you try the following patch ?
> >
> > I do not really see what could be wrong, the code should run just fine
> > on UP.
> >
> > Thanks.
> >
> > diff --git a/drivers/net/ethernet/realtek/8139too.c b/drivers/net/ethernet/realtek/8139too.c
> > index 89631753e79962d91456d93b71929af768917da1..cd2dbec331dd796f5296cd378561b3443f231673 100644
> > --- a/drivers/net/ethernet/realtek/8139too.c
> > +++ b/drivers/net/ethernet/realtek/8139too.c
> > @@ -2135,11 +2135,12 @@ static int rtl8139_poll(struct napi_struct *napi, int budget)
> > if (likely(RTL_R16(IntrStatus) & RxAckBits))
> > work_done += rtl8139_rx(dev, tp, budget);
> >
> > - if (work_done < budget && napi_complete_done(napi, work_done)) {
> > + if (work_done < budget) {
> > unsigned long flags;
> >
> > spin_lock_irqsave(&tp->lock, flags);
> > - RTL_W16_F(IntrMask, rtl8139_intr_mask);
> > + if (napi_complete_done(napi, work_done))
> > + RTL_W16_F(IntrMask, rtl8139_intr_mask);
> > spin_unlock_irqrestore(&tp->lock, flags);
> > }
> > spin_unlock(&tp->rx_lock);
> >
> >
>
> Yep, that patch does appear to make it stable again.
>
> Tested-by: Ville Syrj?l? <[email protected]>

And five months later I'm still waiting for this patch to land...

--
Ville Syrj?l?
Intel OTC

2017-09-18 19:52:18

by Eric Dumazet

[permalink] [raw]

Subject: Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()")

On Mon, Sep 18, 2017 at 12:46 PM, Ville Syrjälä
<[email protected]> wrote:

> And five months later I'm still waiting for this patch to land...

Sure, while I was on vacation, I received thousands of emails that I
simply have not read.

2017-09-19 12:45:10

by Ville Syrjälä

[permalink] [raw]

Subject: Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()")

On Mon, Sep 18, 2017 at 12:52:15PM -0700, Eric Dumazet wrote:
> On Mon, Sep 18, 2017 at 12:46 PM, Ville Syrj?l?
> <[email protected]> wrote:
>
> > And five months later I'm still waiting for this patch to land...
>
> Sure, while I was on vacation, I received thousands of emails that I
> simply have not read.

Ah that explains where my previous ping went. Pardon the snarky comment,
and thanks for taking care of this.

--
Ville Syrj?l?
Intel OTC

2017-09-19 13:51:24

by Eric Dumazet

[permalink] [raw]

Subject: Re: [regression v4.11] 617f01211baf ("8139too: use napi_complete_done()")

On Tue, 2017-09-19 at 15:45 +0300, Ville Syrjälä wrote:
> On Mon, Sep 18, 2017 at 12:52:15PM -0700, Eric Dumazet wrote:
> > On Mon, Sep 18, 2017 at 12:46 PM, Ville Syrjälä
> > <[email protected]> wrote:
> >
> > > And five months later I'm still waiting for this patch to land...
> >
> > Sure, while I was on vacation, I received thousands of emails that I
> > simply have not read.
>
> Ah that explains where my previous ping went. Pardon the snarky comment,
> and thanks for taking care of this.
>

No problem, this was my fault not following this bug.

Thanks a lot !