2009-09-09 07:13:11

by Alex Bennee

[permalink] [raw]
Subject: r8169 ethernet hangs after a pm-suspend (and resume)

Hi,

I've just recently gotten suspend working on my system. Unfortunately
after the resume event I loose access to the network.
As far as the system is concerned the network is configured properly
but every attempt to ping local nodes fails with "Host not reachable".

If also seen an oops or two but I don't know id that is related:

[ 289.816066] ------------[ cut here ]------------
[ 289.816077] WARNING: at net/sched/sch_generic.c:246
dev_watchdog+0x132/0x1da()
[ 289.816080] Hardware name: System Product Name
[ 289.816083] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed
out
[ 289.816085] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat
joydev usb_storage usbhid usb_libusual bridge stp llc bnep rfcomm
l2cap bluetooth ipv6 snd_pcm_oss snd_mixer_oss snd_seq_oss
snd_seq_midi_event snd_seq snd_seq_device kvm_intel kvm acpi_cpufreq
snd_hda_codec_analog uhci_hcd usbcore snd_hda_intel snd_hda_codec
firewire_ohci snd_hwdep snd_pcm snd_timer snd firewire_core crc_itu_t
soundcore snd_page_alloc ide_cd_mod pcspkr evdev cdrom thermal
processor nls_base unix [last unloaded: ehci_hcd]
[ 289.816135] Pid: 0, comm: swapper Not tainted
2.6.31-rc9-ajb-00012-g3ff323f-dirty #84
[ 289.816138] Call Trace:
[ 289.816140] <IRQ> [<ffffffff812aef27>] ? dev_watchdog+0x132/0x1da
[ 289.816152] [<ffffffff8103eb72>] warn_slowpath_common+0x7c/0xa9
[ 289.816157] [<ffffffff8103ec1e>] warn_slowpath_fmt+0x69/0x6b
[ 289.816165] [<ffffffffa0124cbb>] ? uhci_scan_schedule+0x194/0x86a
[uhci_hcd]
[ 289.816169] [<ffffffff81048fbc>] ? lock_timer_base+0x2b/0x4f
[ 289.816174] [<ffffffff81049699>] ? mod_timer+0x111/0x123
[ 289.816180] [<ffffffffa0125d9a>] ?
uhci_hub_status_data+0x16e/0x17d [uhci_hcd]
[ 289.816185] [<ffffffff8129d98d>] ? netdev_drivername+0x48/0x4f
[ 289.816189] [<ffffffff812aef27>] dev_watchdog+0x132/0x1da
[ 289.816211] [<ffffffffa00f0233>] ?
usb_hcd_poll_rh_status+0x144/0x153 [usbcore]
[ 289.816215] [<ffffffff812aedf5>] ? dev_watchdog+0x0/0x1da
[ 289.816220] [<ffffffff81048d76>] run_timer_softirq+0x198/0x20d
[ 289.816226] [<ffffffff8101d0c6>] ? lapic_next_event+0x1d/0x21
[ 289.816231] [<ffffffff8104464f>] __do_softirq+0xd6/0x19a
[ 289.816235] [<ffffffff8100c19c>] call_softirq+0x1c/0x28
[ 289.816239] [<ffffffff8100d51d>] do_softirq+0x39/0x77
[ 289.816243] [<ffffffff8104430c>] irq_exit+0x44/0x7e
[ 289.816248] [<ffffffff8130b164>]
smp_apic_timer_interrupt+0x8d/0x9b
[ 289.816253] [<ffffffff8100bb73>] apic_timer_interrupt+0x13/0x20
[ 289.816256] <EOI> [<ffffffff810117ac>] ? mwait_idle+0xb9/0xf0
[ 289.816264] [<ffffffff81309645>] ?
atomic_notifier_call_chain+0x13/0x15
[ 289.816268] [<ffffffff8100a30a>] ? cpu_idle+0x57/0x98
[ 289.816273] [<ffffffff812f5422>] ? rest_init+0x66/0x68
[ 289.816278] [<ffffffff815319da>] ? start_kernel+0x343/0x34e
[ 289.816283] [<ffffffff8153103a>] ?
x86_64_start_reservations+0xaa/0xae
[ 289.816287] [<ffffffff8153111f>] ? x86_64_start_kernel+0xe1/0xe8
[ 289.816290] ---[ end trace 01c3a2a7a5f34536 ]---
[ 290.635368] r8169: eth0: link up
[ 314.635844] r8169: eth0: link up

I'm currently running 2.6.31-rc9-ajb-00012-g3ff323f-dirty and am
willing to test any patches that might be going.

My card is:
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)
Subsystem: ASUSTeK Computer Inc. Device 81aa
Flags: bus master, fast devsel, latency 0, IRQ 25
I/O ports at e800 [size=256]
Memory at dffff000 (64-bit, non-prefetchable) [size=4K]
Memory at deff0000 (64-bit, prefetchable) [size=64K]
Expansion ROM at dffc0000 [disabled] [size=128K]
Capabilities: [40] Power Management version 2
Capabilities: [50] Message Signalled Interrupts: Mask- 64bit- Count=1/1 Enable+
Capabilities: [70] Express Endpoint, MSI 08
Capabilities: [b0] MSI-X: Enable- Mask- TabSize=2
Capabilities: [d0] Vital Product Data <?>
Kernel driver in use: r8169


--
Alex, homepage: http://www.bennee.com/~alex/
http://www.half-llama.co.uk


2009-09-09 09:23:33

by Francois Romieu

[permalink] [raw]
Subject: Re: r8169 ethernet hangs after a pm-suspend (and resume)

Alex Bennee <[email protected]> :
[...]
> I've just recently gotten suspend working on my system. Unfortunately
> after the resume event I loose access to the network.
> As far as the system is concerned the network is configured properly
> but every attempt to ping local nodes fails with "Host not reachable".

Can the problem be described as "gigabit link setting does not survive
suspend/resume" ?

--
Ueimor

2009-09-09 09:42:27

by Alex Bennee

[permalink] [raw]
Subject: Re: r8169 ethernet hangs after a pm-suspend (and resume)

2009/9/9 Francois Romieu <[email protected]>:
> Alex Bennee <[email protected]> :
> [...]
>> I've just recently gotten suspend working on my system. Unfortunately
>> after the resume event I loose access to the network.
>> As far as the system is concerned the network is configured properly
>> but every attempt to ping local nodes fails with "Host not reachable".
>
> Can the problem be described as "gigabit link setting does not survive
> suspend/resume" ?

How could I check?

AFAIK my network only runs at 100Mbs anyway. I tried poking about with
ethtool to see if could determine any more diagnostics but it didn't know
much about the card.

>
> --
> Ueimor
>



--
Alex, homepage: http://www.bennee.com/~alex/
http://www.half-llama.co.uk

2009-09-09 21:21:00

by Robby Workman

[permalink] [raw]
Subject: Re: r8169 ethernet hangs after a pm-suspend (and resume)

Replying again to fix CC's - sorry for the extra noise.

On Wed, 9 Sep 2009 20:55:56 +0000
[email protected] (Robby Workman) wrote:

> In linux.kernel, you wrote:
> > I've just recently gotten suspend working on my system.
> > Unfortunately after the resume event I loose access to the network.
> > As far as the system is concerned the network is configured properly
> > but every attempt to ping local nodes fails with "Host not
> > reachable".
> >
> > If also seen an oops or two but I don't know id that is related:
> >
> > [ 289.816066] ------------[ cut here ]------------
> > [ 289.816077] WARNING: at net/sched/sch_generic.c:246
> > dev_watchdog+0x132/0x1da()
> > [ 289.816080] Hardware name: System Product Name
> > [ 289.816083] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed
> > out
> > [ 289.816085] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat
> > joydev usb_storage usbhid usb_libusual bridge stp llc bnep rfcomm
> > l2cap bluetooth ipv6 snd_pcm_oss snd_mixer_oss snd_seq_oss
> > snd_seq_midi_event snd_seq snd_seq_device kvm_intel kvm acpi_cpufreq
> > snd_hda_codec_analog uhci_hcd usbcore snd_hda_intel snd_hda_codec
> > firewire_ohci snd_hwdep snd_pcm snd_timer snd firewire_core
> > crc_itu_t soundcore snd_page_alloc ide_cd_mod pcspkr evdev cdrom
> > thermal processor nls_base unix [last unloaded: ehci_hcd]
> > [ 289.816135] Pid: 0, comm: swapper Not tainted
> > 2.6.31-rc9-ajb-00012-g3ff323f-dirty #84
> > [ 289.816138] Call Trace:
> > [ 289.816140] <IRQ> [<ffffffff812aef27>] ?
> > dev_watchdog+0x132/0x1da [ 289.816152] [<ffffffff8103eb72>]
> > warn_slowpath_common+0x7c/0xa9 [ 289.816157] [<ffffffff8103ec1e>]
> > warn_slowpath_fmt+0x69/0x6b [ 289.816165] [<ffffffffa0124cbb>] ?
> > uhci_scan_schedule+0x194/0x86a [uhci_hcd]
> > [ 289.816169] [<ffffffff81048fbc>] ? lock_timer_base+0x2b/0x4f
> > [ 289.816174] [<ffffffff81049699>] ? mod_timer+0x111/0x123
> > [ 289.816180] [<ffffffffa0125d9a>] ?
> > uhci_hub_status_data+0x16e/0x17d [uhci_hcd]
> > [ 289.816185] [<ffffffff8129d98d>] ? netdev_drivername+0x48/0x4f
> > [ 289.816189] [<ffffffff812aef27>] dev_watchdog+0x132/0x1da
> > [ 289.816211] [<ffffffffa00f0233>] ?
> > usb_hcd_poll_rh_status+0x144/0x153 [usbcore]
> > [ 289.816215] [<ffffffff812aedf5>] ? dev_watchdog+0x0/0x1da
> > [ 289.816220] [<ffffffff81048d76>] run_timer_softirq+0x198/0x20d
> > [ 289.816226] [<ffffffff8101d0c6>] ? lapic_next_event+0x1d/0x21
> > [ 289.816231] [<ffffffff8104464f>] __do_softirq+0xd6/0x19a
> > [ 289.816235] [<ffffffff8100c19c>] call_softirq+0x1c/0x28
> > [ 289.816239] [<ffffffff8100d51d>] do_softirq+0x39/0x77
> > [ 289.816243] [<ffffffff8104430c>] irq_exit+0x44/0x7e
> > [ 289.816248] [<ffffffff8130b164>]
> > smp_apic_timer_interrupt+0x8d/0x9b
> > [ 289.816253] [<ffffffff8100bb73>] apic_timer_interrupt+0x13/0x20
> > [ 289.816256] <EOI> [<ffffffff810117ac>] ? mwait_idle+0xb9/0xf0
> > [ 289.816264] [<ffffffff81309645>] ?
> > atomic_notifier_call_chain+0x13/0x15
> > [ 289.816268] [<ffffffff8100a30a>] ? cpu_idle+0x57/0x98
> > [ 289.816273] [<ffffffff812f5422>] ? rest_init+0x66/0x68
> > [ 289.816278] [<ffffffff815319da>] ? start_kernel+0x343/0x34e
> > [ 289.816283] [<ffffffff8153103a>] ?
> > x86_64_start_reservations+0xaa/0xae
> > [ 289.816287] [<ffffffff8153111f>] ? x86_64_start_kernel+0xe1/0xe8
> > [ 289.816290] ---[ end trace 01c3a2a7a5f34536 ]---
> > [ 290.635368] r8169: eth0: link up
> > [ 314.635844] r8169: eth0: link up
> >
> > I'm currently running 2.6.31-rc9-ajb-00012-g3ff323f-dirty and am
> > willing to test any patches that might be going.
>
>
> I recall experiencing something similar when I was on 2.6.27.x,
> but I thought it had gone away in 2.6.29.x; perhaps I forgot to
> actually remove my workaround though...
>
> In case you aren't aware, the workaround (assuming your system
> uses the pm-utils stuff to suspend) is to do this:
>
> Edit (or create) /etc/pm/defaults to add the following line:
> SUSPEND_MODULES="r8169"
>
> That will cause the r8169 module to be unloaded on sleep and
> reloaded on wakeup. That being said, I agree that the *real*
> fix is in the kernel driver.
>
> -RW

2009-09-10 06:49:14

by Alex Bennee

[permalink] [raw]
Subject: Re: r8169 ethernet hangs after a pm-suspend (and resume)

2009/9/9 Francois Romieu <[email protected]>:
> Alex Bennee <[email protected]> :
> [...]
>> I've just recently gotten suspend working on my system. Unfortunately
>> after the resume event I loose access to the network.
>> As far as the system is concerned the network is configured properly
>> but every attempt to ping local nodes fails with "Host not reachable".
>
> Can the problem be described as "gigabit link setting does not survive
> suspend/resume" ?

Further experimentation shows the failure is intermittent. The
following dmesg shows a successful resume with working 'net:

[ 475.800017] ACPI: Waking up from system sleep state S3
[ 475.800726] HDA Intel 0000:00:1b.0: restoring config space at
offset 0x1 (was 0x100006, writing 0x100002)
[ 475.800747] pcieport-driver 0000:00:1c.0: restoring config space at
offset 0xf (was 0x60100, writing 0x6010a)
[ 475.800762] pcieport-driver 0000:00:1c.0: restoring config space at
offset 0x1 (was 0x100107, writing 0x100507)
[ 475.800799] pci 0000:00:1d.0: restoring config space at offset 0x1
(was 0x2800005, writing 0x2800001)
[ 475.800819] pci 0000:00:1d.1: restoring config space at offset 0x1
(was 0x2800005, writing 0x2800001)
[ 475.800840] pci 0000:00:1d.2: restoring config space at offset 0x1
(was 0x2800005, writing 0x2800001)
[ 475.800861] pci 0000:00:1d.3: restoring config space at offset 0x1
(was 0x2800005, writing 0x2800001)
[ 475.800889] pci 0000:00:1d.7: restoring config space at offset 0x1
(was 0x2900006, writing 0x2900002)
[ 475.800967] PIIX_IDE 0000:00:1f.1: restoring config space at offset
0x1 (was 0x2880005, writing 0x2800005)
[ 475.801050] r8169 0000:02:00.0: restoring config space at offset
0x3 (was 0x4, writing 0x8)
[ 475.801056] r8169 0000:02:00.0: restoring config space at offset
0x1 (was 0x100007, writing 0x100407)
[ 475.803466] i915 0000:00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[ 475.803470] i915 0000:00:02.0: setting latency timer to 64
[ 475.864097] [drm] DAC-6: set mode 1440x900 2a
[ 475.936922] [drm] TMDS-8: set mode 1680x1050 2b
[ 476.108887] HDA Intel 0000:00:1b.0: PCI INT A -> GSI 19 (level,
low) -> IRQ 19
[ 476.108892] HDA Intel 0000:00:1b.0: setting latency timer to 64
[ 476.548200] pci 0000:00:1d.7: PME# disabled
[ 476.548207] pci 0000:00:1e.0: setting latency timer to 64
[ 476.548216] PIIX_IDE 0000:00:1f.1: PCI INT A -> GSI 18 (level, low) -> IRQ 18
[ 476.548223] PIIX_IDE 0000:00:1f.1: setting latency timer to 64
[ 476.548235] ata_piix 0000:00:1f.2: PCI INT B -> GSI 17 (level, low) -> IRQ 17
[ 476.548248] ata_piix 0000:00:1f.2: setting latency timer to 64
[ 476.548352] r8169 0000:02:00.0: PME# disabled
[ 476.564404] r8169: eth0: link up

And now compare with a return from suspend that failed:

[12397.816024] ACPI: Waking up from system sleep state S3
[12397.816693] agpgart-intel 0000:00:00.0: restoring config space at
offset 0x1 (was 0x30900006, writing 0x20900006)
[12397.816737] HDA Intel 0000:00:1b.0: restoring config space at
offset 0x1 (was 0x100006, writing 0x100002)
[12397.816757] pcieport-driver 0000:00:1c.0: restoring config space at
offset 0xf (was 0x60100, writing 0x6010a)
[12397.816768] pcieport-driver 0000:00:1c.0: restoring config space at
offset 0x7 (was 0x2000e0e0, writing 0xe0e0)
[12397.816776] pcieport-driver 0000:00:1c.0: restoring config space at
offset 0x1 (was 0x100107, writing 0x100507)
[12397.816813] uhci_hcd 0000:00:1d.0: restoring config space at offset
0x1 (was 0x2800005, writing 0x2800001)
[12397.816835] uhci_hcd 0000:00:1d.1: restoring config space at offset
0x1 (was 0x2800005, writing 0x2800001)
[12397.816856] uhci_hcd 0000:00:1d.2: restoring config space at offset
0x1 (was 0x2800005, writing 0x2800001)
[12397.816877] uhci_hcd 0000:00:1d.3: restoring config space at offset
0x1 (was 0x2800005, writing 0x2800001)
[12397.816906] pci 0000:00:1d.7: restoring config space at offset 0x1
(was 0x2900006, writing 0x2900002)
[12397.816929] pci 0000:00:1e.0: restoring config space at offset 0x7
(was 0x2280d0d0, writing 0xa280d0d0)
[12397.816987] PIIX_IDE 0000:00:1f.1: restoring config space at offset
0x1 (was 0x2880005, writing 0x2800005)
[12397.832040] r8169 0000:02:00.0: restoring config space at offset
0xf (was 0xffffffff, writing 0x10a)
[12397.832045] r8169 0000:02:00.0: restoring config space at offset
0xe (was 0xffffffff, writing 0x0)
[12397.832050] r8169 0000:02:00.0: restoring config space at offset
0xd (was 0xffffffff, writing 0x40)
[12397.832055] r8169 0000:02:00.0: restoring config space at offset
0xc (was 0xffffffff, writing 0xdffc0000)
[12397.832061] r8169 0000:02:00.0: restoring config space at offset
0xb (was 0xffffffff, writing 0x81aa1043)
[12397.832066] r8169 0000:02:00.0: restoring config space at offset
0xa (was 0xffffffff, writing 0x0)
[12397.832071] r8169 0000:02:00.0: restoring config space at offset
0x9 (was 0xffffffff, writing 0x0)
[12397.832076] r8169 0000:02:00.0: restoring config space at offset
0x8 (was 0xffffffff, writing 0xdeff000c)
[12397.832081] r8169 0000:02:00.0: restoring config space at offset
0x7 (was 0xffffffff, writing 0x0)
[12397.832086] r8169 0000:02:00.0: restoring config space at offset
0x6 (was 0xffffffff, writing 0xdffff004)
[12397.832091] r8169 0000:02:00.0: restoring config space at offset
0x5 (was 0xffffffff, writing 0x0)
[12397.832096] r8169 0000:02:00.0: restoring config space at offset
0x4 (was 0xffffffff, writing 0xe801)
[12397.832101] r8169 0000:02:00.0: restoring config space at offset
0x3 (was 0xffffffff, writing 0x8)
[12397.832106] r8169 0000:02:00.0: restoring config space at offset
0x2 (was 0xffffffff, writing 0x2000002)
[12397.832111] r8169 0000:02:00.0: restoring config space at offset
0x1 (was 0xffffffff, writing 0x100407)
[12397.832117] r8169 0000:02:00.0: restoring config space at offset
0x0 (was 0xffffffff, writing 0x816810ec)
[12397.834527] i915 0000:00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[12397.834531] i915 0000:00:02.0: setting latency timer to 64
[12397.895209] [drm] DAC-6: set mode 1440x900 2a
[12397.968038] [drm] TMDS-8: set mode 1680x1050 2b
[12398.140006] HDA Intel 0000:00:1b.0: PCI INT A -> GSI 19 (level,
low) -> IRQ 19
[12398.140011] HDA Intel 0000:00:1b.0: setting latency timer to 64
[12398.580194] uhci_hcd 0000:00:1d.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
[12398.580200] uhci_hcd 0000:00:1d.0: setting latency timer to 64
[12398.580224] usb usb2: root hub lost power or was reset
[12398.580250] uhci_hcd 0000:00:1d.1: PCI INT B -> GSI 17 (level, low) -> IRQ 17
[12398.580255] uhci_hcd 0000:00:1d.1: setting latency timer to 64
[12398.580273] usb usb3: root hub lost power or was reset
[12398.580291] uhci_hcd 0000:00:1d.2: PCI INT C -> GSI 18 (level, low) -> IRQ 18
[12398.580296] uhci_hcd 0000:00:1d.2: setting latency timer to 64
[12398.580314] usb usb4: root hub lost power or was reset
[12398.580332] uhci_hcd 0000:00:1d.3: PCI INT D -> GSI 19 (level, low) -> IRQ 19
[12398.580337] uhci_hcd 0000:00:1d.3: setting latency timer to 64
[12398.580355] usb usb5: root hub lost power or was reset
[12398.580374] pci 0000:00:1d.7: PME# disabled
[12398.580380] pci 0000:00:1e.0: setting latency timer to 64
[12398.580387] PIIX_IDE 0000:00:1f.1: PCI INT A -> GSI 18 (level, low) -> IRQ 18
[12398.580394] PIIX_IDE 0000:00:1f.1: setting latency timer to 64
[12398.580403] ata_piix 0000:00:1f.2: PCI INT B -> GSI 17 (level, low) -> IRQ 17
[12398.580407] ata_piix 0000:00:1f.2: setting latency timer to 64
[12398.580512] r8169 0000:02:00.0: PME# disabled
[12398.660050] firewire_core: skipped bus generations, destroying all nodes
[12398.664833] hda: host max PIO4 wanted PIO255(auto-tune) selected PIO4
[12398.665625] hda: skipping word 93 validity check
[12398.665627] hda: UDMA/66 mode selected
[12398.687404] sd 0:0:0:0: [sda] Starting disk
[12399.419164] r8169: eth0: link up

which has an oops further on:

[12434.816100] ------------[ cut here ]------------
[12434.816111] WARNING: at net/sched/sch_generic.c:246
dev_watchdog+0x132/0x1da()
[12434.816114] Hardware name: System Product Name
[12434.816117] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
[12434.816120] Modules linked in: bridge stp llc bnep rfcomm l2cap
bluetooth ipv6 snd_pcm_oss snd_mixer_oss snd_seq_oss
snd_seq_midi_event snd_seq snd_seq_device kvm_intel kvm acpi_cpufreq
snd_hda_codec_analog snd_hda_intel uhci_hcd snd_hda_codec snd_hwdep
snd_pcm snd_timer ide_cd_mod firewire_ohci firewire_core snd soundcore
usbcore r8169 cdrom processor crc_itu_t nls_base snd_page_alloc mii
evdev thermal pcspkr unix [last unloaded: ehci_hcd]
[12434.816164] Pid: 0, comm: swapper Not tainted
2.6.31-rc9-ajb-00012-g3ff323f-dirty #86
[12434.816167] Call Trace:
[12434.816169] <IRQ> [<ffffffff812aa117>] ? dev_watchdog+0x132/0x1da
[12434.816180] [<ffffffff8103eb72>] warn_slowpath_common+0x7c/0xa9
[12434.816185] [<ffffffff8103ec1e>] warn_slowpath_fmt+0x69/0x6b
[12434.816190] [<ffffffff81039e47>] ? default_wake_function+0x12/0x14
[12434.816195] [<ffffffff8102c24c>] ? __wake_up_common+0x4b/0x7b
[12434.816200] [<ffffffff8102f793>] ? __wake_up+0x48/0x54
[12434.816205] [<ffffffff81298b7d>] ? netdev_drivername+0x48/0x4f
[12434.816209] [<ffffffff812aa117>] dev_watchdog+0x132/0x1da
[12434.816214] [<ffffffff810510f2>] ? __queue_work+0x3a/0x43
[12434.816218] [<ffffffff812a9fe5>] ? dev_watchdog+0x0/0x1da
[12434.816223] [<ffffffff81048d76>] run_timer_softirq+0x198/0x20d
[12434.816229] [<ffffffff8101d0c6>] ? lapic_next_event+0x1d/0x21
[12434.816234] [<ffffffff8104464f>] __do_softirq+0xd6/0x19a
[12434.816239] [<ffffffff8100c19c>] call_softirq+0x1c/0x28
[12434.816242] [<ffffffff8100d51d>] do_softirq+0x39/0x77
[12434.816246] [<ffffffff8104430c>] irq_exit+0x44/0x7e
[12434.816252] [<ffffffff81305914>] smp_apic_timer_interrupt+0x8d/0x9b
[12434.816258] [<ffffffff8100bb73>] apic_timer_interrupt+0x13/0x20
[12434.816260] <EOI> [<ffffffff810117ac>] ? mwait_idle+0xb9/0xf0
[12434.816269] [<ffffffff81303df5>] ? atomic_notifier_call_chain+0x13/0x15
[12434.816273] [<ffffffff8100a30a>] ? cpu_idle+0x57/0x98
[12434.816278] [<ffffffff812f0612>] ? rest_init+0x66/0x68
[12434.816283] [<ffffffff815299da>] ? start_kernel+0x343/0x34e
[12434.816288] [<ffffffff8152903a>] ? x86_64_start_reservations+0xaa/0xae
[12434.816292] [<ffffffff8152911f>] ? x86_64_start_kernel+0xe1/0xe8
[12434.816295] ---[ end trace 1353478188007667 ]---
[12435.635167] r8169: eth0: link up

At this point even unloading and reloading the r8169 module couldn't
bring the network back. I even tried unloading the module, doing a
pm-hibernate and restore reload and still nothing which was odd as I
though the power cycle should have un-wedged any hardware.

A couple of questions:

1. It seems the failure case has a lot more "restoring config space"
going on. Is this a wider range problem that just happens to hit r8169
harder?

2. Is the oops a red herring or could the failure to resume be because
the shutdown occurs before the hardware has flushed all in flight
packets?


--
Alex, homepage: http://www.bennee.com/~alex/
http://www.half-llama.co.uk

2009-10-08 09:28:25

by Alex Bennee

[permalink] [raw]
Subject: Re: r8169 ethernet hangs after a pm-suspend (and resume)

2009/9/9 Francois Romieu <[email protected]>:
> Alex Bennee <[email protected]> :
> [...]
>> I've just recently gotten suspend working on my system. Unfortunately
>> after the resume event I loose access to the network.
>> As far as the system is concerned the network is configured properly
>> but every attempt to ping local nodes fails with "Host not reachable".
>
> Can the problem be described as "gigabit link setting does not survive
> suspend/resume" ?

Even further experimentation shows that ethernet functionality can
survive the resume for a few minutes before reseting. Once it gets
into this state even rmmod/modprobing the r8169 driver won't unwedge
the driver.

The symptoms are either the driver detecting an unknown MAC or setting
the physical address to 0xfffffffff which is obviously broken. I
suspect the hardware has gotten itself wedged somehow.

Is there any way to hard reset the chipset (without power cycling the
system)?

--
Alex, homepage: http://www.bennee.com/~alex/
http://www.half-llama.co.uk