2008-08-27 09:27:09

by Pascal Terjan

[permalink] [raw]
Subject: r8169 regression in 2.6.26.3 vs 2.6.26.2

Since updating to 2.6.26.3, networking no longer works on Acer Aspire
One.

PCI config is now always filled with ones.

Reverting "r8169: avoid thrashing PCI conf space above
RTL_GIGA_MAC_VER_06" makes it work again.

The device is 10ec:8136

02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8101E
PCI Express Fast Ethernet controller (rev 02)
Subsystem: Acer Incorporated [ALI] Device 015b
Flags: bus master, fast devsel, latency 0, IRQ 17
I/O ports at 3000 [size=256]
Memory at 31010000 (64-bit, prefetchable) [size=4K]
Memory at 31000000 (64-bit, prefetchable) [size=64K]
Expansion ROM at 31020000 [disabled] [size=128K]
Capabilities: [40] Power Management version 3
Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0
Enable-
Capabilities: [70] Express Endpoint, MSI 01
Capabilities: [ac] MSI-X: Enable- Mask- TabSize=2
Capabilities: [cc] Vital Product Data <?>
Capabilities: [100] Advanced Error Reporting <?>
Capabilities: [140] Virtual Channel <?>
Capabilities: [160] Device Serial Number 00-00-ff-ff-00-00-00-04
Kernel driver in use: r8169
Kernel modules: r8169


2008-08-27 10:41:15

by Pascal Terjan

[permalink] [raw]
Subject: Re: r8169 regression in 2.6.26.3 vs 2.6.26.2

Le mercredi 27 ao?t 2008 ? 12:38 +0200, Marcus Sundberg a ?crit :
> Pascal Terjan wrote:
> > Since updating to 2.6.26.3, networking no longer works on Acer Aspire
> > One.
> >
> > PCI config is now always filled with ones.
> >
> > Reverting "r8169: avoid thrashing PCI conf space above
> > RTL_GIGA_MAC_VER_06" makes it work again.
> >
> > The device is 10ec:8136
>
> Ok, that's obviously not good. I checked the Realtek driver for
> 8101E (version r8101-1.009.00) and while it doesn't do any
> 8-bit writes to register 0x82 it does perform 32-bit writes to
> register 0x80 (EPHYAR) to communicate with the PHY. I have no
> idea what that 8-bit write does except it breaks the chipset in
> my LG P300...
>
> How does the kernel identify your chipset upon driver load?
> (grep for XID)

eth0: RTL8169 at 0xe04ee000, 00:1e:68:a0:07:b5, XID 24a00000 IRQ 17

2008-08-27 11:03:25

by Marcus Sundberg

[permalink] [raw]
Subject: Re: r8169 regression in 2.6.26.3 vs 2.6.26.2

Pascal Terjan wrote:
> Since updating to 2.6.26.3, networking no longer works on Acer Aspire
> One.
>
> PCI config is now always filled with ones.
>
> Reverting "r8169: avoid thrashing PCI conf space above
> RTL_GIGA_MAC_VER_06" makes it work again.
>
> The device is 10ec:8136

Ok, that's obviously not good. I checked the Realtek driver for
8101E (version r8101-1.009.00) and while it doesn't do any
8-bit writes to register 0x82 it does perform 32-bit writes to
register 0x80 (EPHYAR) to communicate with the PHY. I have no
idea what that 8-bit write does except it breaks the chipset in
my LG P300...

How does the kernel identify your chipset upon driver load?
(grep for XID)

//Marcus
--
---------------------------------------+--------------------------
Marcus Sundberg <[email protected]> | Firewalls with SIP & NAT
Software Developer, Ingate Systems AB | http://www.ingate.com/

2008-08-27 20:11:38

by Francois Romieu

[permalink] [raw]
Subject: Re: r8169 regression in 2.6.26.3 vs 2.6.26.2

Pascal Terjan <[email protected]> :
> Le mercredi 27 ao?t 2008 ? 12:38 +0200, Marcus Sundberg a ?crit :
> > Pascal Terjan wrote:
> > > Since updating to 2.6.26.3, networking no longer works on Acer Aspire
> > > One.
> > >
> > > PCI config is now always filled with ones.
> > >
> > > Reverting "r8169: avoid thrashing PCI conf space above
> > > RTL_GIGA_MAC_VER_06" makes it work again.
> > >
> > > The device is 10ec:8136
[...]
> > How does the kernel identify your chipset upon driver load?
> > (grep for XID)
>
> eth0: RTL8169 at 0xe04ee000, 00:1e:68:a0:07:b5, XID 24a00000 IRQ 17

As far as I can read rtl8169_get_mac_version, this XID should not match
any known device with 2.6.26.3 and thus fallback to RTL_GIGA_MAC_VER_01
(assuming that rtl8169_init_phy runs after rtl8169_get_mac_version, what
it appears to do so far).

Yes / no / -ECOFFEE ?

On a different topic, I would suggest to try patches #0001 ... #0006
at http://userweb.kernel.org/~romieu/r8169/2.6.27-rc3/20080818/ with
your chipset.

--
Ueimor

2008-08-27 20:45:06

by Pascal Terjan

[permalink] [raw]
Subject: Re: r8169 regression in 2.6.26.3 vs 2.6.26.2

Le mercredi 27 ao?t 2008 ? 22:11 +0200, Francois Romieu a ?crit :
> Pascal Terjan <[email protected]> :
> > Le mercredi 27 ao?t 2008 ? 12:38 +0200, Marcus Sundberg a ?crit :
> > > Pascal Terjan wrote:
> > > > Since updating to 2.6.26.3, networking no longer works on Acer Aspire
> > > > One.
> > > >
> > > > PCI config is now always filled with ones.
> > > >
> > > > Reverting "r8169: avoid thrashing PCI conf space above
> > > > RTL_GIGA_MAC_VER_06" makes it work again.
> > > >
> > > > The device is 10ec:8136
> [...]
> > > How does the kernel identify your chipset upon driver load?
> > > (grep for XID)
> >
> > eth0: RTL8169 at 0xe04ee000, 00:1e:68:a0:07:b5, XID 24a00000 IRQ 17
>
> As far as I can read rtl8169_get_mac_version, this XID should not match
> any known device with 2.6.26.3 and thus fallback to RTL_GIGA_MAC_VER_01
> (assuming that rtl8169_init_phy runs after rtl8169_get_mac_version, what
> it appears to do so far).
>
> Yes / no / -ECOFFEE ?
>

Yes I took some time to look at it and now does not understand what's
wrong.

Indeed I get "unknown MAC (27a00600)" so I get RTL_GIGA_MAC_VER_01 which
is <= RTL_GIGA_MAC_VER_06, so it should work fine with or without this
patch. And indeed it seems to work when I rebuild the module, even when
I do not revert the patch.

And by the way, unrelated to this problem, it is written twice for
RTL_GIGA_MAC_VER_02:

=====
if (tp->mac_version <= RTL_GIGA_MAC_VER_06) {
dprintk("Set MAC Reg C+CR Offset 0x82h = 0x01h\n");
RTL_W8(0x82, 0x01);
}
[...]
if (tp->mac_version == RTL_GIGA_MAC_VER_02) {
dprintk("Set MAC Reg C+CR Offset 0x82h = 0x01h\n");
RTL_W8(0x82, 0x01);
=====

To come back to the problem, this patch may not be faulty but I fail to
understand the issue...

- 2.6.26.2 always works
- 2.6.26.3 from my distro always fails
- 2.6.26.3 from my distro with r8169 rebuilt out of the kernel tree but
from the same source and with the same config, gcc, etc, works...

So either it is random I am very unlucky that it always works with one
and always fails with the other or there is some difference that I could
not think about

> On a different topic, I would suggest to try patches #0001 ... #0006
> at http://userweb.kernel.org/~romieu/r8169/2.6.27-rc3/20080818/ with
> your chipset.

OK I will

2008-08-29 17:51:22

by Pascal Terjan

[permalink] [raw]
Subject: Re: r8169 regression in 2.6.26.3 vs 2.6.26.2

Le mercredi 27 ao?t 2008 ? 22:44 +0200, Pascal Terjan a ?crit :
> Le mercredi 27 ao?t 2008 ? 22:11 +0200, Francois Romieu a ?crit :
> > Pascal Terjan <[email protected]> :
> > > Le mercredi 27 ao?t 2008 ? 12:38 +0200, Marcus Sundberg a ?crit :
> > > > Pascal Terjan wrote:
> > > > > Since updating to 2.6.26.3, networking no longer works on Acer Aspire
> > > > > One.
> > > > >
> > > > > PCI config is now always filled with ones.
> > > > >
> > > > > Reverting "r8169: avoid thrashing PCI conf space above
> > > > > RTL_GIGA_MAC_VER_06" makes it work again.
> > > > >
> > > > > The device is 10ec:8136
> > [...]
> > > > How does the kernel identify your chipset upon driver load?
> > > > (grep for XID)
> > >
> > > eth0: RTL8169 at 0xe04ee000, 00:1e:68:a0:07:b5, XID 24a00000 IRQ 17
> >
> > As far as I can read rtl8169_get_mac_version, this XID should not match
> > any known device with 2.6.26.3 and thus fallback to RTL_GIGA_MAC_VER_01
> > (assuming that rtl8169_init_phy runs after rtl8169_get_mac_version, what
> > it appears to do so far).
> >
> > Yes / no / -ECOFFEE ?
> >
>
> Yes I took some time to look at it and now does not understand what's
> wrong.
>
> Indeed I get "unknown MAC (27a00600)" so I get RTL_GIGA_MAC_VER_01 which
> is <= RTL_GIGA_MAC_VER_06, so it should work fine with or without this
> patch. And indeed it seems to work when I rebuild the module, even when
> I do not revert the patch.
>
> And by the way, unrelated to this problem, it is written twice for
> RTL_GIGA_MAC_VER_02:
>
> =====
> if (tp->mac_version <= RTL_GIGA_MAC_VER_06) {
> dprintk("Set MAC Reg C+CR Offset 0x82h = 0x01h\n");
> RTL_W8(0x82, 0x01);
> }
> [...]
> if (tp->mac_version == RTL_GIGA_MAC_VER_02) {
> dprintk("Set MAC Reg C+CR Offset 0x82h = 0x01h\n");
> RTL_W8(0x82, 0x01);
> =====
>
> To come back to the problem, this patch may not be faulty but I fail to
> understand the issue...
>
> - 2.6.26.2 always works
> - 2.6.26.3 from my distro always fails
> - 2.6.26.3 from my distro with r8169 rebuilt out of the kernel tree but
> from the same source and with the same config, gcc, etc, works...
>
> So either it is random I am very unlucky that it always works with one
> and always fails with the other or there is some difference that I could
> not think about

OK I was very unlucky...
For some reason when 2.6.26.2 was OK but 2.6.26.3 built inside the rpm
will always fail (or maybe very often, harder to test).
And when rebuilding the module by hand, it will work most of the time,
but I got the bug after unloading/reloading it about 30 times in a row,
so it's just that this chip is affected by the random bug and something
make it more likely to happen.

> > On a different topic, I would suggest to try patches #0001 ... #0006
> > at http://userweb.kernel.org/~romieu/r8169/2.6.27-rc3/20080818/ with
> > your chipset.
>
> OK I will

I applied these patches meaning that now the write will not occur, and
tried to unload/load the module 200 times but after some time I got a
oops in rtl_tx_performance_tweak (and could not capture it).