abrt_version: 2.0.7
cmdline: BOOT_IMAGE=/vmlinuz-3.2.3 root=/dev/sda2 rootfstype=ext4 rootflags=data=writeback ro quiet rhgb
kernel: 3.2.3
reason: WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x1e7/0x1f0()
time: Sa 04 Feb 2012 15:31:22 CET
backtrace:
:WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x1e7/0x1f0()
:Hardware name: Aspire 1810T
:NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out
:Modules linked in: vfat fat fuse bluetooth nf_conntrack_ipv6 nf_defrag_ipv6 ip6t_REJECT ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep uvcvideo snd_seq arc4 videodev usb_storage snd_seq_device v4l2_compat_ioctl32 snd_pcm snd_timer acer_wmi iwlwifi snd mac80211 sparse_keymap cfg80211 rfkill soundcore snd_page_alloc joydev wmi acerhdf pcspkr virtio_net virtio atl1c virtio_ring kvm_intel kvm uinput ipv6 [last unloaded: scsi_wait_scan]
:Pid: 0, comm: swapper Not tainted 3.2.3 #5
:Call Trace:
: <IRQ> [<ffffffff810322ca>] warn_slowpath_common+0x7a/0xb0
: [<ffffffff81018a18>] ? lapic_next_event+0x18/0x20
: [<ffffffff810323a1>] warn_slowpath_fmt+0x41/0x50
: [<ffffffff8104e9be>] ? hrtimer_interrupt+0xfe/0x1f0
: [<ffffffff8102c350>] ? wake_up_process+0x10/0x20
: [<ffffffff813e9397>] dev_watchdog+0x1e7/0x1f0
: [<ffffffff8103cfcf>] run_timer_softirq+0xef/0x210
: [<ffffffff813e91b0>] ? qdisc_reset+0x50/0x50
: [<ffffffff81037827>] __do_softirq+0x87/0x110
: [<ffffffff814945fa>] call_softirq+0x1a/0x30
: [<ffffffff810038bd>] do_softirq+0x4d/0x80
: [<ffffffff81037a8e>] irq_exit+0x7e/0xa0
: [<ffffffff8100375b>] do_IRQ+0x5b/0xd0
: [<ffffffff8148de29>] common_interrupt+0x69/0x69
: <EOI> [<ffffffff8124e115>] ? acpi_idle_enter_bm+0x1c0/0x1ff
: [<ffffffff8124e110>] ? acpi_idle_enter_bm+0x1bb/0x1ff
: [<ffffffff813a24b4>] ? menu_select+0xd4/0x390
: [<ffffffff813a150b>] cpuidle_idle_call+0x8b/0xe0
: [<ffffffff810011ed>] cpu_idle+0x9d/0xe0
: [<ffffffff8147d08e>] rest_init+0x62/0x64
: [<ffffffff81699b54>] start_kernel+0x358/0x363
: [<ffffffff81699322>] x86_64_start_reservations+0x132/0x136
: [<ffffffff81699416>] x86_64_start_kernel+0xf0/0xf7
: [<ffffffff81017e70>] ? acpi_suspend_lowlevel+0x1b0/0x1b0
END:
On Sun, Feb 12, 2012 at 11:16 AM, Thomas Meyer <[email protected]> wrote:
> abrt_version: ? 2.0.7
> cmdline: ? ? ? ?BOOT_IMAGE=/vmlinuz-3.2.3 root=/dev/sda2 rootfstype=ext4 rootflags=data=writeback ro quiet rhgb
> kernel: ? ? ? ? 3.2.3
> reason: ? ? ? ? WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x1e7/0x1f0()
> time: ? ? ? ? ? Sa 04 Feb 2012 15:31:22 CET
>
> backtrace:
> :WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x1e7/0x1f0()
> :Hardware name: Aspire 1810T
> :NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out
> :Modules linked in: vfat fat fuse bluetooth nf_conntrack_ipv6 nf_defrag_ipv6 ip6t_REJECT ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep uvcvideo snd_seq arc4 videodev usb_storage snd_seq_device v4l2_compat_ioctl32 snd_pcm snd_timer acer_wmi iwlwifi snd mac80211 sparse_keymap cfg80211 rfkill soundcore snd_page_alloc joydev wmi acerhdf pcspkr virtio_net virtio atl1c virtio_ring kvm_intel kvm uinput ipv6 [last unloaded: scsi_wait_scan]
> :Pid: 0, comm: swapper Not tainted 3.2.3 #5
> :Call Trace:
> : <IRQ> ?[<ffffffff810322ca>] warn_slowpath_common+0x7a/0xb0
> : [<ffffffff81018a18>] ? lapic_next_event+0x18/0x20
> : [<ffffffff810323a1>] warn_slowpath_fmt+0x41/0x50
> : [<ffffffff8104e9be>] ? hrtimer_interrupt+0xfe/0x1f0
> : [<ffffffff8102c350>] ? wake_up_process+0x10/0x20
> : [<ffffffff813e9397>] dev_watchdog+0x1e7/0x1f0
> : [<ffffffff8103cfcf>] run_timer_softirq+0xef/0x210
> : [<ffffffff813e91b0>] ? qdisc_reset+0x50/0x50
> : [<ffffffff81037827>] __do_softirq+0x87/0x110
> : [<ffffffff814945fa>] call_softirq+0x1a/0x30
> : [<ffffffff810038bd>] do_softirq+0x4d/0x80
> : [<ffffffff81037a8e>] irq_exit+0x7e/0xa0
> : [<ffffffff8100375b>] do_IRQ+0x5b/0xd0
> : [<ffffffff8148de29>] common_interrupt+0x69/0x69
> : <EOI> ?[<ffffffff8124e115>] ? acpi_idle_enter_bm+0x1c0/0x1ff
> : [<ffffffff8124e110>] ? acpi_idle_enter_bm+0x1bb/0x1ff
> : [<ffffffff813a24b4>] ? menu_select+0xd4/0x390
> : [<ffffffff813a150b>] cpuidle_idle_call+0x8b/0xe0
> : [<ffffffff810011ed>] cpu_idle+0x9d/0xe0
> : [<ffffffff8147d08e>] rest_init+0x62/0x64
> : [<ffffffff81699b54>] start_kernel+0x358/0x363
> : [<ffffffff81699322>] x86_64_start_reservations+0x132/0x136
> : [<ffffffff81699416>] x86_64_start_kernel+0xf0/0xf7
> : [<ffffffff81017e70>] ? acpi_suspend_lowlevel+0x1b0/0x1b0
We've seen this in Fedora for a long time now. As far as I know, nobody
really knows what is going on.
https://bugzilla.redhat.com/show_bug.cgi?id=717211
(Also, you probably should have sent this to netdev.)
josh
Le lundi 13 février 2012 à 16:00 -0500, Josh Boyer a écrit :
> On Sun, Feb 12, 2012 at 11:16 AM, Thomas Meyer <[email protected]> wrote:
> > abrt_version: 2.0.7
> > cmdline: BOOT_IMAGE=/vmlinuz-3.2.3 root=/dev/sda2 rootfstype=ext4 rootflags=data=writeback ro quiet rhgb
> > kernel: 3.2.3
> > reason: WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x1e7/0x1f0()
> > time: Sa 04 Feb 2012 15:31:22 CET
> >
> > backtrace:
> > :WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x1e7/0x1f0()
> > :Hardware name: Aspire 1810T
> > :NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out
> > :Modules linked in: vfat fat fuse bluetooth nf_conntrack_ipv6 nf_defrag_ipv6 ip6t_REJECT ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep uvcvideo snd_seq arc4 videodev usb_storage snd_seq_device v4l2_compat_ioctl32 snd_pcm snd_timer acer_wmi iwlwifi snd mac80211 sparse_keymap cfg80211 rfkill soundcore snd_page_alloc joydev wmi acerhdf pcspkr virtio_net virtio atl1c virtio_ring kvm_intel kvm uinput ipv6 [last unloaded: scsi_wait_scan]
> > :Pid: 0, comm: swapper Not tainted 3.2.3 #5
> > :Call Trace:
> > : <IRQ> [<ffffffff810322ca>] warn_slowpath_common+0x7a/0xb0
> > : [<ffffffff81018a18>] ? lapic_next_event+0x18/0x20
> > : [<ffffffff810323a1>] warn_slowpath_fmt+0x41/0x50
> > : [<ffffffff8104e9be>] ? hrtimer_interrupt+0xfe/0x1f0
> > : [<ffffffff8102c350>] ? wake_up_process+0x10/0x20
> > : [<ffffffff813e9397>] dev_watchdog+0x1e7/0x1f0
> > : [<ffffffff8103cfcf>] run_timer_softirq+0xef/0x210
> > : [<ffffffff813e91b0>] ? qdisc_reset+0x50/0x50
> > : [<ffffffff81037827>] __do_softirq+0x87/0x110
> > : [<ffffffff814945fa>] call_softirq+0x1a/0x30
> > : [<ffffffff810038bd>] do_softirq+0x4d/0x80
> > : [<ffffffff81037a8e>] irq_exit+0x7e/0xa0
> > : [<ffffffff8100375b>] do_IRQ+0x5b/0xd0
> > : [<ffffffff8148de29>] common_interrupt+0x69/0x69
> > : <EOI> [<ffffffff8124e115>] ? acpi_idle_enter_bm+0x1c0/0x1ff
> > : [<ffffffff8124e110>] ? acpi_idle_enter_bm+0x1bb/0x1ff
> > : [<ffffffff813a24b4>] ? menu_select+0xd4/0x390
> > : [<ffffffff813a150b>] cpuidle_idle_call+0x8b/0xe0
> > : [<ffffffff810011ed>] cpu_idle+0x9d/0xe0
> > : [<ffffffff8147d08e>] rest_init+0x62/0x64
> > : [<ffffffff81699b54>] start_kernel+0x358/0x363
> > : [<ffffffff81699322>] x86_64_start_reservations+0x132/0x136
> > : [<ffffffff81699416>] x86_64_start_kernel+0xf0/0xf7
> > : [<ffffffff81017e70>] ? acpi_suspend_lowlevel+0x1b0/0x1b0
>
> We've seen this in Fedora for a long time now. As far as I know, nobody
> really knows what is going on.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=717211
>
> (Also, you probably should have sent this to netdev.)
CC netdev, Jay Cliburn <[email protected]>,
It seems this driver has partial support for two TX rings.
(TX completion only drains the first ring)
Please try following patch.
diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
index b859124..1ff3c6d 100644
--- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
@@ -2244,10 +2244,6 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff *skb,
dev_info(&adapter->pdev->dev, "tx locked\n");
return NETDEV_TX_LOCKED;
}
- if (skb->mark == 0x01)
- type = atl1c_trans_high;
- else
- type = atl1c_trans_normal;
if (atl1c_tpd_avail(adapter, type) < tpd_req) {
/* no enough descriptor, just stop queue */
Le mardi 14 février 2012 à 08:36 +0100, Eric Dumazet a écrit :
> Le lundi 13 février 2012 à 16:00 -0500, Josh Boyer a écrit :
> > On Sun, Feb 12, 2012 at 11:16 AM, Thomas Meyer <[email protected]> wrote:
> > > abrt_version: 2.0.7
> > > cmdline: BOOT_IMAGE=/vmlinuz-3.2.3 root=/dev/sda2 rootfstype=ext4 rootflags=data=writeback ro quiet rhgb
> > > kernel: 3.2.3
> > > reason: WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x1e7/0x1f0()
> > > time: Sa 04 Feb 2012 15:31:22 CET
> > >
> > > backtrace:
> > > :WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x1e7/0x1f0()
> > > :Hardware name: Aspire 1810T
> > > :NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out
> > > :Modules linked in: vfat fat fuse bluetooth nf_conntrack_ipv6 nf_defrag_ipv6 ip6t_REJECT ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep uvcvideo snd_seq arc4 videodev usb_storage snd_seq_device v4l2_compat_ioctl32 snd_pcm snd_timer acer_wmi iwlwifi snd mac80211 sparse_keymap cfg80211 rfkill soundcore snd_page_alloc joydev wmi acerhdf pcspkr virtio_net virtio atl1c virtio_ring kvm_intel kvm uinput ipv6 [last unloaded: scsi_wait_scan]
> > > :Pid: 0, comm: swapper Not tainted 3.2.3 #5
> > > :Call Trace:
> > > : <IRQ> [<ffffffff810322ca>] warn_slowpath_common+0x7a/0xb0
> > > : [<ffffffff81018a18>] ? lapic_next_event+0x18/0x20
> > > : [<ffffffff810323a1>] warn_slowpath_fmt+0x41/0x50
> > > : [<ffffffff8104e9be>] ? hrtimer_interrupt+0xfe/0x1f0
> > > : [<ffffffff8102c350>] ? wake_up_process+0x10/0x20
> > > : [<ffffffff813e9397>] dev_watchdog+0x1e7/0x1f0
> > > : [<ffffffff8103cfcf>] run_timer_softirq+0xef/0x210
> > > : [<ffffffff813e91b0>] ? qdisc_reset+0x50/0x50
> > > : [<ffffffff81037827>] __do_softirq+0x87/0x110
> > > : [<ffffffff814945fa>] call_softirq+0x1a/0x30
> > > : [<ffffffff810038bd>] do_softirq+0x4d/0x80
> > > : [<ffffffff81037a8e>] irq_exit+0x7e/0xa0
> > > : [<ffffffff8100375b>] do_IRQ+0x5b/0xd0
> > > : [<ffffffff8148de29>] common_interrupt+0x69/0x69
> > > : <EOI> [<ffffffff8124e115>] ? acpi_idle_enter_bm+0x1c0/0x1ff
> > > : [<ffffffff8124e110>] ? acpi_idle_enter_bm+0x1bb/0x1ff
> > > : [<ffffffff813a24b4>] ? menu_select+0xd4/0x390
> > > : [<ffffffff813a150b>] cpuidle_idle_call+0x8b/0xe0
> > > : [<ffffffff810011ed>] cpu_idle+0x9d/0xe0
> > > : [<ffffffff8147d08e>] rest_init+0x62/0x64
> > > : [<ffffffff81699b54>] start_kernel+0x358/0x363
> > > : [<ffffffff81699322>] x86_64_start_reservations+0x132/0x136
> > > : [<ffffffff81699416>] x86_64_start_kernel+0xf0/0xf7
> > > : [<ffffffff81017e70>] ? acpi_suspend_lowlevel+0x1b0/0x1b0
> >
> > We've seen this in Fedora for a long time now. As far as I know, nobody
> > really knows what is going on.
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=717211
> >
> > (Also, you probably should have sent this to netdev.)
>
> CC netdev, Jay Cliburn <[email protected]>,
>
> It seems this driver has partial support for two TX rings.
>
> (TX completion only drains the first ring)
>
> Please try following patch.
>
> diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> index b859124..1ff3c6d 100644
> --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> @@ -2244,10 +2244,6 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff *skb,
> dev_info(&adapter->pdev->dev, "tx locked\n");
> return NETDEV_TX_LOCKED;
> }
> - if (skb->mark == 0x01)
> - type = atl1c_trans_high;
> - else
> - type = atl1c_trans_normal;
>
> if (atl1c_tpd_avail(adapter, type) < tpd_req) {
> /* no enough descriptor, just stop queue */
>
Thomas, have you had the chance to test this patch ?
Thanks !
Am 16.02.2012 um 05:34 schrieb Eric Dumazet <[email protected]>:
> Le mardi 14 février 2012 à 08:36 +0100, Eric Dumazet a écrit :
>> Le lundi 13 février 2012 à 16:00 -0500, Josh Boyer a écrit :
>>> On Sun, Feb 12, 2012 at 11:16 AM, Thomas Meyer <[email protected]> wrote:
>>>> abrt_version: 2.0.7
>>>> cmdline: BOOT_IMAGE=/vmlinuz-3.2.3 root=/dev/sda2 rootfstype=ext4 rootflags=data=writeback ro quiet rhgb
>>>> kernel: 3.2.3
>>>> reason: WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x1e7/0x1f0()
>>>> time: Sa 04 Feb 2012 15:31:22 CET
>>>>
>>>> backtrace:
>>>> :WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x1e7/0x1f0()
>>>> :Hardware name: Aspire 1810T
>>>> :NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out
>>>> :Modules linked in: vfat fat fuse bluetooth nf_conntrack_ipv6 nf_defrag_ipv6 ip6t_REJECT ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep uvcvideo snd_seq arc4 videodev usb_storage snd_seq_device v4l2_compat_ioctl32 snd_pcm snd_timer acer_wmi iwlwifi snd mac80211 sparse_keymap cfg80211 rfkill soundcore snd_page_alloc joydev wmi acerhdf pcspkr virtio_net virtio atl1c virtio_ring kvm_intel kvm uinput ipv6 [last unloaded: scsi_wait_scan]
>>>> :Pid: 0, comm: swapper Not tainted 3.2.3 #5
>>>> :Call Trace:
>>>> : <IRQ> [<ffffffff810322ca>] warn_slowpath_common+0x7a/0xb0
>>>> : [<ffffffff81018a18>] ? lapic_next_event+0x18/0x20
>>>> : [<ffffffff810323a1>] warn_slowpath_fmt+0x41/0x50
>>>> : [<ffffffff8104e9be>] ? hrtimer_interrupt+0xfe/0x1f0
>>>> : [<ffffffff8102c350>] ? wake_up_process+0x10/0x20
>>>> : [<ffffffff813e9397>] dev_watchdog+0x1e7/0x1f0
>>>> : [<ffffffff8103cfcf>] run_timer_softirq+0xef/0x210
>>>> : [<ffffffff813e91b0>] ? qdisc_reset+0x50/0x50
>>>> : [<ffffffff81037827>] __do_softirq+0x87/0x110
>>>> : [<ffffffff814945fa>] call_softirq+0x1a/0x30
>>>> : [<ffffffff810038bd>] do_softirq+0x4d/0x80
>>>> : [<ffffffff81037a8e>] irq_exit+0x7e/0xa0
>>>> : [<ffffffff8100375b>] do_IRQ+0x5b/0xd0
>>>> : [<ffffffff8148de29>] common_interrupt+0x69/0x69
>>>> : <EOI> [<ffffffff8124e115>] ? acpi_idle_enter_bm+0x1c0/0x1ff
>>>> : [<ffffffff8124e110>] ? acpi_idle_enter_bm+0x1bb/0x1ff
>>>> : [<ffffffff813a24b4>] ? menu_select+0xd4/0x390
>>>> : [<ffffffff813a150b>] cpuidle_idle_call+0x8b/0xe0
>>>> : [<ffffffff810011ed>] cpu_idle+0x9d/0xe0
>>>> : [<ffffffff8147d08e>] rest_init+0x62/0x64
>>>> : [<ffffffff81699b54>] start_kernel+0x358/0x363
>>>> : [<ffffffff81699322>] x86_64_start_reservations+0x132/0x136
>>>> : [<ffffffff81699416>] x86_64_start_kernel+0xf0/0xf7
>>>> : [<ffffffff81017e70>] ? acpi_suspend_lowlevel+0x1b0/0x1b0
>>>
>>> We've seen this in Fedora for a long time now. As far as I know, nobody
>>> really knows what is going on.
>>>
>>> https://bugzilla.redhat.com/show_bug.cgi?id=717211
>>>
>>> (Also, you probably should have sent this to netdev.)
>>
>> CC netdev, Jay Cliburn <[email protected]>,
>>
>> It seems this driver has partial support for two TX rings.
>>
>> (TX completion only drains the first ring)
>>
>> Please try following patch.
>>
>> diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
>> index b859124..1ff3c6d 100644
>> --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
>> +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
>> @@ -2244,10 +2244,6 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff *skb,
>> dev_info(&adapter->pdev->dev, "tx locked\n");
>> return NETDEV_TX_LOCKED;
>> }
>> - if (skb->mark == 0x01)
>> - type = atl1c_trans_high;
>> - else
>> - type = atl1c_trans_normal;
>>
>> if (atl1c_tpd_avail(adapter, type) < tpd_req) {
>> /* no enough descriptor, just stop queue */
>>
>
Hello Eric,
>
> Thomas, have you had the chance to test this patch ?
Yes, I'm running 3.2.6 with your patch applied for 3 days now. I didn't see above warning yet, but the warning was rather seldom, so I'll keep an eye on it in the next weeks.
Thanks for the patch!
Kind regards
Thomas
>
Le jeudi 16 février 2012 à 07:13 +0100, Thomas Meyer a écrit :
> Hello Eric,
>
> >
> > Thomas, have you had the chance to test this patch ?
>
> Yes, I'm running 3.2.6 with your patch applied for 3 days now. I didn't see above warning yet, but the warning was rather seldom, so I'll keep an eye on it in the next weeks.
>
> Thanks for the patch!
>
Thanks for testing !
Do have any idea of what could set skb mark to 1 on some packets on your
setup ?
Some firewall rules or tc rules ?
Am 16.02.2012 um 07:18 schrieb Eric Dumazet <[email protected]>:
> Le jeudi 16 février 2012 à 07:13 +0100, Thomas Meyer a écrit :
>
>> Hello Eric,
>>
>>>
>>> Thomas, have you had the chance to test this patch ?
>>
>> Yes, I'm running 3.2.6 with your patch applied for 3 days now. I didn't see above warning yet, but the warning was rather seldom, so I'll keep an eye on it in the next weeks.
>>
>> Thanks for the patch!
>>
>
> Thanks for testing !
>
> Do have any idea of what could set skb mark to 1 on some packets on your
> setup ?
No idea.
>
> Some firewall rules or tc rules ?
I'm using a fedora 16 installation. No big changes made on the configuration.
The warning seems to occur only when shutting down the remote computer.
>
>
>
This driver attempts to use two TX rings but lacks proper support :
1) IRQ handler only takes care of TX completion on first TX ring
2) the stop/start logic uses the legacy functions (for non multiqueue
drivers)
This means all packets witk skb mark set to 1 are sent through high
queue but are never cleaned and queue eventualy fills and block the
device, triggering the infamous "NETDEV WATCHDOG" message.
Lets use a single TX ring to fix the problem, this driver is not a real
multiqueue one yet.
Minimal fix for stable kernels.
Reported-by: Thomas Meyer <[email protected]>
Tested-by: Thomas Meyer <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Jay Cliburn <[email protected]>
Cc: Chris Snook <[email protected]>
---
drivers/net/ethernet/atheros/atl1c/atl1c_main.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
index b859124..1ff3c6d 100644
--- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
@@ -2244,10 +2244,6 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff *skb,
dev_info(&adapter->pdev->dev, "tx locked\n");
return NETDEV_TX_LOCKED;
}
- if (skb->mark == 0x01)
- type = atl1c_trans_high;
- else
- type = atl1c_trans_normal;
if (atl1c_tpd_avail(adapter, type) < tpd_req) {
/* no enough descriptor, just stop queue */
Sorry, the piece of code that related to mark might be introduced by our internal test for multiple queues.
I remember mark could be changed by ip_table rule.
Thanks
Xiong
> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> On Behalf Of Eric Dumazet
> Sent: Thursday, February 16, 2012 14:18
> To: Thomas Meyer
> Cc: Linux Kernel Mailing List; [email protected]; [email protected];
> netdev; Josh Boyer
> Subject: Re: NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out
>
> Le jeudi 16 février 2012 à 07:13 +0100, Thomas Meyer a écrit :
>
> > Hello Eric,
> >
> > >
> > > Thomas, have you had the chance to test this patch ?
> >
> > Yes, I'm running 3.2.6 with your patch applied for 3 days now. I didn't see
> above warning yet, but the warning was rather seldom, so I'll keep an eye on it
> in the next weeks.
> >
> > Thanks for the patch!
> >
>
> Thanks for testing !
>
> Do have any idea of what could set skb mark to 1 on some packets on your
> setup ?
>
> Some firewall rules or tc rules ?
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a
> message to [email protected] More majordomo info at
> http://vger.kernel.org/majordomo-info.html
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?
On Thu, Feb 16, 2012 at 1:43 AM, Eric Dumazet <[email protected]> wrote:
> This driver attempts to use two TX rings but lacks proper support :
>
> 1) IRQ handler only takes care of TX completion on first TX ring
> 2) the stop/start logic uses the legacy functions (for non multiqueue
> drivers)
>
> This means all packets witk skb mark set to 1 are sent through high
> queue but are never cleaned and queue eventualy fills and block the
> device, triggering the infamous "NETDEV WATCHDOG" message.
>
> Lets use a single TX ring to fix the problem, this driver is not a real
> multiqueue one yet.
>
> Minimal fix for stable kernels.
>
> Reported-by: Thomas Meyer <[email protected]>
> Tested-by: Thomas Meyer <[email protected]>
> Signed-off-by: Eric Dumazet <[email protected]>
> Cc: Jay Cliburn <[email protected]>
> Cc: Chris Snook <[email protected]>
As I think David handles netdev patches a bit differently for stable releases,
I'd like to suggest this get included in the next batch for the 3.2 kernel.
We've been seeing the bug this patch fixes in Fedora for quite a while now.
josh
Le jeudi 16 février 2012 à 07:36 -0500, Josh Boyer a écrit :
> As I think David handles netdev patches a bit differently for stable releases,
> I'd like to suggest this get included in the next batch for the 3.2 kernel.
> We've been seeing the bug this patch fixes in Fedora for quite a while now.
Hard to believe this bug lasted so long...
From: Eric Dumazet <[email protected]>
Date: Thu, 16 Feb 2012 07:43:11 +0100
> This driver attempts to use two TX rings but lacks proper support :
>
> 1) IRQ handler only takes care of TX completion on first TX ring
> 2) the stop/start logic uses the legacy functions (for non multiqueue
> drivers)
>
> This means all packets witk skb mark set to 1 are sent through high
> queue but are never cleaned and queue eventualy fills and block the
> device, triggering the infamous "NETDEV WATCHDOG" message.
>
> Lets use a single TX ring to fix the problem, this driver is not a real
> multiqueue one yet.
>
> Minimal fix for stable kernels.
>
> Reported-by: Thomas Meyer <[email protected]>
> Tested-by: Thomas Meyer <[email protected]>
> Signed-off-by: Eric Dumazet <[email protected]>
Applied.
Am Donnerstag, den 16.02.2012, 05:34 +0100 schrieb Eric Dumazet:
> Le mardi 14 février 2012 à 08:36 +0100, Eric Dumazet a écrit :
> > Le lundi 13 février 2012 à 16:00 -0500, Josh Boyer a écrit :
> > > On Sun, Feb 12, 2012 at 11:16 AM, Thomas Meyer <[email protected]> wrote:
> > > > abrt_version: 2.0.7
> > > > cmdline: BOOT_IMAGE=/vmlinuz-3.2.3 root=/dev/sda2 rootfstype=ext4 rootflags=data=writeback ro quiet rhgb
> > > > kernel: 3.2.3
> > > > reason: WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x1e7/0x1f0()
> > > > time: Sa 04 Feb 2012 15:31:22 CET
> > > >
> > > > backtrace:
> > > > :WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x1e7/0x1f0()
> > > > :Hardware name: Aspire 1810T
> > > > :NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out
> > > > :Modules linked in: vfat fat fuse bluetooth nf_conntrack_ipv6 nf_defrag_ipv6 ip6t_REJECT ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep uvcvideo snd_seq arc4 videodev usb_storage snd_seq_device v4l2_compat_ioctl32 snd_pcm snd_timer acer_wmi iwlwifi snd mac80211 sparse_keymap cfg80211 rfkill soundcore snd_page_alloc joydev wmi acerhdf pcspkr virtio_net virtio atl1c virtio_ring kvm_intel kvm uinput ipv6 [last unloaded: scsi_wait_scan]
> > > > :Pid: 0, comm: swapper Not tainted 3.2.3 #5
> > > > :Call Trace:
> > > > : <IRQ> [<ffffffff810322ca>] warn_slowpath_common+0x7a/0xb0
> > > > : [<ffffffff81018a18>] ? lapic_next_event+0x18/0x20
> > > > : [<ffffffff810323a1>] warn_slowpath_fmt+0x41/0x50
> > > > : [<ffffffff8104e9be>] ? hrtimer_interrupt+0xfe/0x1f0
> > > > : [<ffffffff8102c350>] ? wake_up_process+0x10/0x20
> > > > : [<ffffffff813e9397>] dev_watchdog+0x1e7/0x1f0
> > > > : [<ffffffff8103cfcf>] run_timer_softirq+0xef/0x210
> > > > : [<ffffffff813e91b0>] ? qdisc_reset+0x50/0x50
> > > > : [<ffffffff81037827>] __do_softirq+0x87/0x110
> > > > : [<ffffffff814945fa>] call_softirq+0x1a/0x30
> > > > : [<ffffffff810038bd>] do_softirq+0x4d/0x80
> > > > : [<ffffffff81037a8e>] irq_exit+0x7e/0xa0
> > > > : [<ffffffff8100375b>] do_IRQ+0x5b/0xd0
> > > > : [<ffffffff8148de29>] common_interrupt+0x69/0x69
> > > > : <EOI> [<ffffffff8124e115>] ? acpi_idle_enter_bm+0x1c0/0x1ff
> > > > : [<ffffffff8124e110>] ? acpi_idle_enter_bm+0x1bb/0x1ff
> > > > : [<ffffffff813a24b4>] ? menu_select+0xd4/0x390
> > > > : [<ffffffff813a150b>] cpuidle_idle_call+0x8b/0xe0
> > > > : [<ffffffff810011ed>] cpu_idle+0x9d/0xe0
> > > > : [<ffffffff8147d08e>] rest_init+0x62/0x64
> > > > : [<ffffffff81699b54>] start_kernel+0x358/0x363
> > > > : [<ffffffff81699322>] x86_64_start_reservations+0x132/0x136
> > > > : [<ffffffff81699416>] x86_64_start_kernel+0xf0/0xf7
> > > > : [<ffffffff81017e70>] ? acpi_suspend_lowlevel+0x1b0/0x1b0
> > >
> > > We've seen this in Fedora for a long time now. As far as I know, nobody
> > > really knows what is going on.
> > >
> > > https://bugzilla.redhat.com/show_bug.cgi?id=717211
> > >
> > > (Also, you probably should have sent this to netdev.)
> >
> > CC netdev, Jay Cliburn <[email protected]>,
> >
> > It seems this driver has partial support for two TX rings.
> >
> > (TX completion only drains the first ring)
> >
> > Please try following patch.
> >
> > diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> > index b859124..1ff3c6d 100644
> > --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> > +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> > @@ -2244,10 +2244,6 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff *skb,
> > dev_info(&adapter->pdev->dev, "tx locked\n");
> > return NETDEV_TX_LOCKED;
> > }
> > - if (skb->mark == 0x01)
> > - type = atl1c_trans_high;
> > - else
> > - type = atl1c_trans_normal;
> >
> > if (atl1c_tpd_avail(adapter, type) < tpd_req) {
> > /* no enough descriptor, just stop queue */
> >
>
>
> Thomas, have you had the chance to test this patch ?
>
Bad news. Just did hit the issue again, with above patch applied.
abrt_version: 2.0.7
cmdline: BOOT_IMAGE=/vmlinuz-3.2.6 root=/dev/sda2 rootfstype=ext4 rootflags=data=writeback ro quiet rhgb
kernel: 3.2.6
reason: [177799.342250] WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x1e7/0x1f0()
time: Di 21 Feb 2012 18:09:16 CET
backtrace:
:[177799.342250] WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x1e7/0x1f0()
:[177799.342254] Hardware name: Aspire 1810T
:[177799.342256] NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out
:[177799.342259] Modules linked in: vfat fat fuse bluetooth nf_conntrack_ipv6 nf_defrag_ipv6 ip6t_REJECT ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq arc4 snd_seq_device snd_pcm iwlwifi uvcvideo mac80211 snd_timer videodev cfg80211 snd usb_storage v4l2_compat_ioctl32 atl1c acer_wmi soundcore sparse_keymap snd_page_alloc rfkill wmi joydev acerhdf pcspkr virtio_net virtio virtio_ring kvm_intel kvm uinput ipv6 [last unloaded: scsi_wait_scan]
:[177799.342303] Pid: 4980, comm: alsa-sink Not tainted 3.2.6 #7
:[177799.342306] Call Trace:
:[177799.342309] <IRQ> [<ffffffff810322ca>] warn_slowpath_common+0x7a/0xb0
:[177799.342320] [<ffffffff810323a1>] warn_slowpath_fmt+0x41/0x50
:[177799.342325] [<ffffffff8100375b>] ? do_IRQ+0x5b/0xd0
:[177799.342330] [<ffffffff813e94b7>] dev_watchdog+0x1e7/0x1f0
:[177799.342336] [<ffffffff8103cfcf>] run_timer_softirq+0xef/0x210
:[177799.342341] [<ffffffff813e92d0>] ? qdisc_reset+0x50/0x50
:[177799.342345] [<ffffffff81037827>] __do_softirq+0x87/0x110
:[177799.342350] [<ffffffff8149483a>] call_softirq+0x1a/0x30
:[177799.342354] [<ffffffff810038bd>] do_softirq+0x4d/0x80
:[177799.342358] [<ffffffff81037a8e>] irq_exit+0x7e/0xa0
:[177799.342363] [<ffffffff8101903a>] smp_apic_timer_interrupt+0x5a/0x90
:[177799.342368] [<ffffffff814942a9>] apic_timer_interrupt+0x69/0x70
:[177799.342371] <EOI> [<ffffffff814948d7>] ? sysenter_dispatch+0x7/0x26
END:
Le mardi 21 février 2012 à 18:56 +0100, Thomas Meyer a écrit :
> Bad news. Just did hit the issue again, with above patch applied.
>
> abrt_version: 2.0.7
> cmdline: BOOT_IMAGE=/vmlinuz-3.2.6 root=/dev/sda2 rootfstype=ext4 rootflags=data=writeback ro quiet rhgb
> kernel: 3.2.6
> reason: [177799.342250] WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x1e7/0x1f0()
> time: Di 21 Feb 2012 18:09:16 CET
>
> backtrace:
> :[177799.342250] WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x1e7/0x1f0()
> :[177799.342254] Hardware name: Aspire 1810T
> :[177799.342256] NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out
> :[177799.342259] Modules linked in: vfat fat fuse bluetooth nf_conntrack_ipv6 nf_defrag_ipv6 ip6t_REJECT ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq arc4 snd_seq_device snd_pcm iwlwifi uvcvideo mac80211 snd_timer videodev cfg80211 snd usb_storage v4l2_compat_ioctl32 atl1c acer_wmi soundcore sparse_keymap snd_page_alloc rfkill wmi joydev acerhdf pcspkr virtio_net virtio virtio_ring kvm_intel kvm uinput ipv6 [last unloaded: scsi_wait_scan]
> :[177799.342303] Pid: 4980, comm: alsa-sink Not tainted 3.2.6 #7
> :[177799.342306] Call Trace:
> :[177799.342309] <IRQ> [<ffffffff810322ca>] warn_slowpath_common+0x7a/0xb0
> :[177799.342320] [<ffffffff810323a1>] warn_slowpath_fmt+0x41/0x50
> :[177799.342325] [<ffffffff8100375b>] ? do_IRQ+0x5b/0xd0
> :[177799.342330] [<ffffffff813e94b7>] dev_watchdog+0x1e7/0x1f0
> :[177799.342336] [<ffffffff8103cfcf>] run_timer_softirq+0xef/0x210
> :[177799.342341] [<ffffffff813e92d0>] ? qdisc_reset+0x50/0x50
> :[177799.342345] [<ffffffff81037827>] __do_softirq+0x87/0x110
> :[177799.342350] [<ffffffff8149483a>] call_softirq+0x1a/0x30
> :[177799.342354] [<ffffffff810038bd>] do_softirq+0x4d/0x80
> :[177799.342358] [<ffffffff81037a8e>] irq_exit+0x7e/0xa0
> :[177799.342363] [<ffffffff8101903a>] smp_apic_timer_interrupt+0x5a/0x90
> :[177799.342368] [<ffffffff814942a9>] apic_timer_interrupt+0x69/0x70
> :[177799.342371] <EOI> [<ffffffff814948d7>] ? sysenter_dispatch+0x7/0x26
>
> END:
>
Thanks
This driver xmit function is racy I suspect, and several patches will be
needed to fix bugs.
For example, it uses a tx_lock, but no other part of the driver uses it.
It's a copy/paste leftover from a LLTX driver.
[PATCH] atl1c: remove useless tx lock
This lock has no purpose, since caller already runs in a serialized
context (its not a LLTX driver)
Signed-off-by: Eric Dumazet <[email protected]>
---
drivers/net/ethernet/atheros/atl1c/atl1c.h | 1 -
drivers/net/ethernet/atheros/atl1c/atl1c_main.c | 10 ----------
2 files changed, 11 deletions(-)
diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c.h b/drivers/net/ethernet/atheros/atl1c/atl1c.h
index ca70e16..3b2851b 100644
--- a/drivers/net/ethernet/atheros/atl1c/atl1c.h
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c.h
@@ -576,7 +576,6 @@ struct atl1c_adapter {
u16 link_duplex;
spinlock_t mdio_lock;
- spinlock_t tx_lock;
atomic_t irq_sem;
struct work_struct common_task;
diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
index 1ff3c6d..3dde956 100644
--- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
@@ -784,7 +784,6 @@ static int __devinit atl1c_sw_init(struct atl1c_adapter *adapter)
atl1c_set_rxbufsize(adapter, adapter->netdev);
atomic_set(&adapter->irq_sem, 1);
spin_lock_init(&adapter->mdio_lock);
- spin_lock_init(&adapter->tx_lock);
set_bit(__AT_DOWN, &adapter->flags);
return 0;
@@ -2228,7 +2227,6 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff *skb,
struct net_device *netdev)
{
struct atl1c_adapter *adapter = netdev_priv(netdev);
- unsigned long flags;
u16 tpd_req = 1;
struct atl1c_tpd_desc *tpd;
enum atl1c_trans_queue type = atl1c_trans_normal;
@@ -2239,16 +2237,10 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff *skb,
}
tpd_req = atl1c_cal_tpd_req(skb);
- if (!spin_trylock_irqsave(&adapter->tx_lock, flags)) {
- if (netif_msg_pktdata(adapter))
- dev_info(&adapter->pdev->dev, "tx locked\n");
- return NETDEV_TX_LOCKED;
- }
if (atl1c_tpd_avail(adapter, type) < tpd_req) {
/* no enough descriptor, just stop queue */
netif_stop_queue(netdev);
- spin_unlock_irqrestore(&adapter->tx_lock, flags);
return NETDEV_TX_BUSY;
}
@@ -2256,7 +2248,6 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff *skb,
/* do TSO and check sum */
if (atl1c_tso_csum(adapter, skb, &tpd, type) != 0) {
- spin_unlock_irqrestore(&adapter->tx_lock, flags);
dev_kfree_skb_any(skb);
return NETDEV_TX_OK;
}
@@ -2277,7 +2268,6 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff *skb,
atl1c_tx_map(adapter, skb, tpd, type);
atl1c_tx_queue(adapter, skb, tpd, type);
- spin_unlock_irqrestore(&adapter->tx_lock, flags);
return NETDEV_TX_OK;
}
Le mardi 21 février 2012 à 19:14 +0100, Eric Dumazet a écrit :
> Le mardi 21 février 2012 à 18:56 +0100, Thomas Meyer a écrit :
>
> > Bad news. Just did hit the issue again, with above patch applied.
> >
> > abrt_version: 2.0.7
> > cmdline: BOOT_IMAGE=/vmlinuz-3.2.6 root=/dev/sda2 rootfstype=ext4 rootflags=data=writeback ro quiet rhgb
> > kernel: 3.2.6
> > reason: [177799.342250] WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x1e7/0x1f0()
> > time: Di 21 Feb 2012 18:09:16 CET
> >
> > backtrace:
> > :[177799.342250] WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x1e7/0x1f0()
> > :[177799.342254] Hardware name: Aspire 1810T
> > :[177799.342256] NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out
> > :[177799.342259] Modules linked in: vfat fat fuse bluetooth nf_conntrack_ipv6 nf_defrag_ipv6 ip6t_REJECT ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq arc4 snd_seq_device snd_pcm iwlwifi uvcvideo mac80211 snd_timer videodev cfg80211 snd usb_storage v4l2_compat_ioctl32 atl1c acer_wmi soundcore sparse_keymap snd_page_alloc rfkill wmi joydev acerhdf pcspkr virtio_net virtio virtio_ring kvm_intel kvm uinput ipv6 [last unloaded: scsi_wait_scan]
> > :[177799.342303] Pid: 4980, comm: alsa-sink Not tainted 3.2.6 #7
> > :[177799.342306] Call Trace:
> > :[177799.342309] <IRQ> [<ffffffff810322ca>] warn_slowpath_common+0x7a/0xb0
> > :[177799.342320] [<ffffffff810323a1>] warn_slowpath_fmt+0x41/0x50
> > :[177799.342325] [<ffffffff8100375b>] ? do_IRQ+0x5b/0xd0
> > :[177799.342330] [<ffffffff813e94b7>] dev_watchdog+0x1e7/0x1f0
> > :[177799.342336] [<ffffffff8103cfcf>] run_timer_softirq+0xef/0x210
> > :[177799.342341] [<ffffffff813e92d0>] ? qdisc_reset+0x50/0x50
> > :[177799.342345] [<ffffffff81037827>] __do_softirq+0x87/0x110
> > :[177799.342350] [<ffffffff8149483a>] call_softirq+0x1a/0x30
> > :[177799.342354] [<ffffffff810038bd>] do_softirq+0x4d/0x80
> > :[177799.342358] [<ffffffff81037a8e>] irq_exit+0x7e/0xa0
> > :[177799.342363] [<ffffffff8101903a>] smp_apic_timer_interrupt+0x5a/0x90
> > :[177799.342368] [<ffffffff814942a9>] apic_timer_interrupt+0x69/0x70
> > :[177799.342371] <EOI> [<ffffffff814948d7>] ? sysenter_dispatch+0x7/0x26
> >
> > END:
> >
>
> Thanks
>
> This driver xmit function is racy I suspect, and several patches will be
> needed to fix bugs.
>
Here is a cumulative patch to hopefuly remove the races in this driver,
could you please test it ?
diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c.h b/drivers/net/ethernet/atheros/atl1c/atl1c.h
index ca70e16..3b2851b 100644
--- a/drivers/net/ethernet/atheros/atl1c/atl1c.h
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c.h
@@ -576,7 +576,6 @@ struct atl1c_adapter {
u16 link_duplex;
spinlock_t mdio_lock;
- spinlock_t tx_lock;
atomic_t irq_sem;
struct work_struct common_task;
diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
index 1ff3c6d..896eb20 100644
--- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
@@ -784,7 +784,6 @@ static int __devinit atl1c_sw_init(struct atl1c_adapter *adapter)
atl1c_set_rxbufsize(adapter, adapter->netdev);
atomic_set(&adapter->irq_sem, 1);
spin_lock_init(&adapter->mdio_lock);
- spin_lock_init(&adapter->tx_lock);
set_bit(__AT_DOWN, &adapter->flags);
return 0;
@@ -1653,9 +1652,11 @@ static bool atl1c_clean_tx_irq(struct atl1c_adapter *adapter,
atomic_set(&tpd_ring->next_to_clean, next_to_clean);
}
- if (netif_queue_stopped(adapter->netdev) &&
- netif_carrier_ok(adapter->netdev)) {
- netif_wake_queue(adapter->netdev);
+ if (netif_carrier_ok(adapter->netdev)) {
+ /* make sure atl1c_xmit_frame() see our changes */
+ smp_mb();
+ if (netif_queue_stopped(adapter->netdev))
+ netif_wake_queue(adapter->netdev);
}
return true;
@@ -2228,7 +2229,6 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff *skb,
struct net_device *netdev)
{
struct atl1c_adapter *adapter = netdev_priv(netdev);
- unsigned long flags;
u16 tpd_req = 1;
struct atl1c_tpd_desc *tpd;
enum atl1c_trans_queue type = atl1c_trans_normal;
@@ -2239,24 +2239,20 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff *skb,
}
tpd_req = atl1c_cal_tpd_req(skb);
- if (!spin_trylock_irqsave(&adapter->tx_lock, flags)) {
- if (netif_msg_pktdata(adapter))
- dev_info(&adapter->pdev->dev, "tx locked\n");
- return NETDEV_TX_LOCKED;
- }
if (atl1c_tpd_avail(adapter, type) < tpd_req) {
/* no enough descriptor, just stop queue */
netif_stop_queue(netdev);
- spin_unlock_irqrestore(&adapter->tx_lock, flags);
- return NETDEV_TX_BUSY;
+ smp_mb();
+ if (atl1c_tpd_avail(adapter, type) < tpd_req)
+ return NETDEV_TX_BUSY;
+ netif_wake_queue(netdev);
}
tpd = atl1c_get_tpd(adapter, type);
/* do TSO and check sum */
if (atl1c_tso_csum(adapter, skb, &tpd, type) != 0) {
- spin_unlock_irqrestore(&adapter->tx_lock, flags);
dev_kfree_skb_any(skb);
return NETDEV_TX_OK;
}
@@ -2277,7 +2273,6 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff *skb,
atl1c_tx_map(adapter, skb, tpd, type);
atl1c_tx_queue(adapter, skb, tpd, type);
- spin_unlock_irqrestore(&adapter->tx_lock, flags);
return NETDEV_TX_OK;
}
Am Freitag, den 24.02.2012, 20:20 +0100 schrieb Eric Dumazet:
> Le mardi 21 février 2012 à 19:14 +0100, Eric Dumazet a écrit :
> > Le mardi 21 février 2012 à 18:56 +0100, Thomas Meyer a écrit :
> >
> > > Bad news. Just did hit the issue again, with above patch applied.
> > >
> > > abrt_version: 2.0.7
> > > cmdline: BOOT_IMAGE=/vmlinuz-3.2.6 root=/dev/sda2 rootfstype=ext4 rootflags=data=writeback ro quiet rhgb
> > > kernel: 3.2.6
> > > reason: [177799.342250] WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x1e7/0x1f0()
> > > time: Di 21 Feb 2012 18:09:16 CET
> > >
> > > backtrace:
> > > :[177799.342250] WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x1e7/0x1f0()
> > > :[177799.342254] Hardware name: Aspire 1810T
> > > :[177799.342256] NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out
> > > :[177799.342259] Modules linked in: vfat fat fuse bluetooth nf_conntrack_ipv6 nf_defrag_ipv6 ip6t_REJECT ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq arc4 snd_seq_device snd_pcm iwlwifi uvcvideo mac80211 snd_timer videodev cfg80211 snd usb_storage v4l2_compat_ioctl32 atl1c acer_wmi soundcore sparse_keymap snd_page_alloc rfkill wmi joydev acerhdf pcspkr virtio_net virtio virtio_ring kvm_intel kvm uinput ipv6 [last unloaded: scsi_wait_scan]
> > > :[177799.342303] Pid: 4980, comm: alsa-sink Not tainted 3.2.6 #7
> > > :[177799.342306] Call Trace:
> > > :[177799.342309] <IRQ> [<ffffffff810322ca>] warn_slowpath_common+0x7a/0xb0
> > > :[177799.342320] [<ffffffff810323a1>] warn_slowpath_fmt+0x41/0x50
> > > :[177799.342325] [<ffffffff8100375b>] ? do_IRQ+0x5b/0xd0
> > > :[177799.342330] [<ffffffff813e94b7>] dev_watchdog+0x1e7/0x1f0
> > > :[177799.342336] [<ffffffff8103cfcf>] run_timer_softirq+0xef/0x210
> > > :[177799.342341] [<ffffffff813e92d0>] ? qdisc_reset+0x50/0x50
> > > :[177799.342345] [<ffffffff81037827>] __do_softirq+0x87/0x110
> > > :[177799.342350] [<ffffffff8149483a>] call_softirq+0x1a/0x30
> > > :[177799.342354] [<ffffffff810038bd>] do_softirq+0x4d/0x80
> > > :[177799.342358] [<ffffffff81037a8e>] irq_exit+0x7e/0xa0
> > > :[177799.342363] [<ffffffff8101903a>] smp_apic_timer_interrupt+0x5a/0x90
> > > :[177799.342368] [<ffffffff814942a9>] apic_timer_interrupt+0x69/0x70
> > > :[177799.342371] <EOI> [<ffffffff814948d7>] ? sysenter_dispatch+0x7/0x26
> > >
> > > END:
> > >
> >
> > Thanks
> >
> > This driver xmit function is racy I suspect, and several patches will be
> > needed to fix bugs.
> >
>
> Here is a cumulative patch to hopefuly remove the races in this driver,
> could you please test it ?
Hi,
just building a 3.2.7 kernel with your patch applied. I will watch out
for the warning in the next days.
many thanks for the patch!
kind regards
thomas
>
> diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c.h b/drivers/net/ethernet/atheros/atl1c/atl1c.h
> index ca70e16..3b2851b 100644
> --- a/drivers/net/ethernet/atheros/atl1c/atl1c.h
> +++ b/drivers/net/ethernet/atheros/atl1c/atl1c.h
> @@ -576,7 +576,6 @@ struct atl1c_adapter {
> u16 link_duplex;
>
> spinlock_t mdio_lock;
> - spinlock_t tx_lock;
> atomic_t irq_sem;
>
> struct work_struct common_task;
> diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> index 1ff3c6d..896eb20 100644
> --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> @@ -784,7 +784,6 @@ static int __devinit atl1c_sw_init(struct atl1c_adapter *adapter)
> atl1c_set_rxbufsize(adapter, adapter->netdev);
> atomic_set(&adapter->irq_sem, 1);
> spin_lock_init(&adapter->mdio_lock);
> - spin_lock_init(&adapter->tx_lock);
> set_bit(__AT_DOWN, &adapter->flags);
>
> return 0;
> @@ -1653,9 +1652,11 @@ static bool atl1c_clean_tx_irq(struct atl1c_adapter *adapter,
> atomic_set(&tpd_ring->next_to_clean, next_to_clean);
> }
>
> - if (netif_queue_stopped(adapter->netdev) &&
> - netif_carrier_ok(adapter->netdev)) {
> - netif_wake_queue(adapter->netdev);
> + if (netif_carrier_ok(adapter->netdev)) {
> + /* make sure atl1c_xmit_frame() see our changes */
> + smp_mb();
> + if (netif_queue_stopped(adapter->netdev))
> + netif_wake_queue(adapter->netdev);
> }
>
> return true;
> @@ -2228,7 +2229,6 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff *skb,
> struct net_device *netdev)
> {
> struct atl1c_adapter *adapter = netdev_priv(netdev);
> - unsigned long flags;
> u16 tpd_req = 1;
> struct atl1c_tpd_desc *tpd;
> enum atl1c_trans_queue type = atl1c_trans_normal;
> @@ -2239,24 +2239,20 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff *skb,
> }
>
> tpd_req = atl1c_cal_tpd_req(skb);
> - if (!spin_trylock_irqsave(&adapter->tx_lock, flags)) {
> - if (netif_msg_pktdata(adapter))
> - dev_info(&adapter->pdev->dev, "tx locked\n");
> - return NETDEV_TX_LOCKED;
> - }
>
> if (atl1c_tpd_avail(adapter, type) < tpd_req) {
> /* no enough descriptor, just stop queue */
> netif_stop_queue(netdev);
> - spin_unlock_irqrestore(&adapter->tx_lock, flags);
> - return NETDEV_TX_BUSY;
> + smp_mb();
> + if (atl1c_tpd_avail(adapter, type) < tpd_req)
> + return NETDEV_TX_BUSY;
> + netif_wake_queue(netdev);
> }
>
> tpd = atl1c_get_tpd(adapter, type);
>
> /* do TSO and check sum */
> if (atl1c_tso_csum(adapter, skb, &tpd, type) != 0) {
> - spin_unlock_irqrestore(&adapter->tx_lock, flags);
> dev_kfree_skb_any(skb);
> return NETDEV_TX_OK;
> }
> @@ -2277,7 +2273,6 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff *skb,
> atl1c_tx_map(adapter, skb, tpd, type);
> atl1c_tx_queue(adapter, skb, tpd, type);
>
> - spin_unlock_irqrestore(&adapter->tx_lock, flags);
> return NETDEV_TX_OK;
> }
>
>
>
In February, 2012, Thomas Meyer wrote:
> Am Freitag, den 24.02.2012, 20:20 +0100 schrieb Eric Dumazet:
>> Here is a cumulative patch to hopefuly remove the races in this driver,
>> could you please test it ?
[...]
> just building a 3.2.7 kernel with your patch applied. I will watch out
> for the warning in the next days.
Well, did it work? :)
In suspense,
Jonathan
On Thu, 2012-06-07 at 14:37 +0200, Thomas Meyer wrote:
> Am Dienstag, den 05.06.2012, 19:38 -0500 schrieb Jonathan Nieder:
> > In February, 2012, Thomas Meyer wrote:
> > > Am Freitag, den 24.02.2012, 20:20 +0100 schrieb Eric Dumazet:
> >
> > >> Here is a cumulative patch to hopefuly remove the races in this driver,
> > >> could you please test it ?
> > [...]
> > > just building a 3.2.7 kernel with your patch applied. I will watch out
> > > for the warning in the next days.
> >
> > Well, did it work? :)
>
> Hi Jonathan,
>
> no it didn't. I still get these warnings.
I sent another patch today, you might try it ;)
https://lkml.org/lkml/2012/6/7/143
Am Dienstag, den 05.06.2012, 19:38 -0500 schrieb Jonathan Nieder:
> In February, 2012, Thomas Meyer wrote:
> > Am Freitag, den 24.02.2012, 20:20 +0100 schrieb Eric Dumazet:
>
> >> Here is a cumulative patch to hopefuly remove the races in this driver,
> >> could you please test it ?
> [...]
> > just building a 3.2.7 kernel with your patch applied. I will watch out
> > for the warning in the next days.
>
> Well, did it work? :)
Hi Jonathan,
no it didn't. I still get these warnings.
wiht kind regards
thomas
>
> In suspense,
> Jonathan
Thomas
Are you using the latest atl1c code in kernel ? recently, I have updated some hw configuration.
Thanks
Xiong
> -----Original Message-----
> From: [email protected] [mailto:netdev-
> [email protected]] On Behalf Of Thomas Meyer
> Sent: Thursday, June 07, 2012 20:38
> To: Jonathan Nieder
> Cc: Eric Dumazet; Linux Kernel Mailing List; [email protected];
> [email protected]; netdev; Josh Boyer
> Subject: Re: NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out
>
> Am Dienstag, den 05.06.2012, 19:38 -0500 schrieb Jonathan Nieder:
> > In February, 2012, Thomas Meyer wrote:
> > > Am Freitag, den 24.02.2012, 20:20 +0100 schrieb Eric Dumazet:
> >
> > >> Here is a cumulative patch to hopefuly remove the races in this
> > >> driver, could you please test it ?
> > [...]
> > > just building a 3.2.7 kernel with your patch applied. I will watch
> > > out for the warning in the next days.
> >
> > Well, did it work? :)
>
> Hi Jonathan,
>
> no it didn't. I still get these warnings.
>
> wiht kind regards
> thomas
>
> >
> > In suspense,
> > Jonathan
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in the body
> of a message to [email protected] More majordomo info at
> http://vger.kernel.org/majordomo-info.html
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?