2019-07-18 19:01:15

by Thomas Voegtle

[permalink] [raw]
Subject: network problems with r8169


Hello,

I'm having network problems with the commits on r8169 since v5.2. There
are ping packet loss, sometimes 100%, sometimes 50%. In the end network is
unusable.

v5.2 is fine, I bisected it down to:

a2928d28643e3c064ff41397281d20c445525032 is the first bad commit
commit a2928d28643e3c064ff41397281d20c445525032
Author: Heiner Kallweit <[email protected]>
Date: Sun Jun 2 10:53:49 2019 +0200

r8169: use paged versions of phylib MDIO access functions

Use paged versions of phylib MDIO access functions to simplify
the code.

Signed-off-by: Heiner Kallweit <[email protected]>
Signed-off-by: David S. Miller <[email protected]>


Reverting that commit on top of v5.2-11564-g22051d9c4a57 fixes the problem
for me (had to adjust the renaming to r8169_main.c).

I have a:
04:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev
0c)
Subsystem: Biostar Microtech Int'l Corp Device [1565:2400]
Kernel driver in use: r8169

on a BIOSTAR H81MG motherboard.


greetings,

Thomas


2019-07-20 00:35:29

by Thomas Voegtle

[permalink] [raw]
Subject: Re: network problems with r8169

On Fri, 19 Jul 2019, Heiner Kallweit wrote:

> On 18.07.2019 20:50, Thomas Voegtle wrote:
>>
>> Hello,
>>
>> I'm having network problems with the commits on r8169 since v5.2. There are ping packet loss, sometimes 100%, sometimes 50%. In the end network is unusable.
>>
>> v5.2 is fine, I bisected it down to:
>>
>> a2928d28643e3c064ff41397281d20c445525032 is the first bad commit
>> commit a2928d28643e3c064ff41397281d20c445525032
>> Author: Heiner Kallweit <[email protected]>
>> Date:   Sun Jun 2 10:53:49 2019 +0200
>>
>>     r8169: use paged versions of phylib MDIO access functions
>>
>>     Use paged versions of phylib MDIO access functions to simplify
>>     the code.
>>
>>     Signed-off-by: Heiner Kallweit <[email protected]>
>>     Signed-off-by: David S. Miller <[email protected]>
>>
>>
>> Reverting that commit on top of v5.2-11564-g22051d9c4a57 fixes the problem
>> for me (had to adjust the renaming to r8169_main.c).
>>
>> I have a:
>> 04:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev
>> 0c)
>>         Subsystem: Biostar Microtech Int'l Corp Device [1565:2400]
>>         Kernel driver in use: r8169
>>
>> on a BIOSTAR H81MG motherboard.
>>
> Interesting. I have the same chip version (RTL8168g) and can't reproduce
> the issue. Can you provide a full dmesg output and test the patch below
> on top of linux-next? I'd be interested in the WARN_ON stack traces
> (if any) and would like to know whether the experimental change to
> __phy_modify_changed helps.
>
>>
>> greetings,
>>
>>   Thomas
>>
>>
> Heiner
>
>
> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> index 8d7dd4c5f..26be73000 100644
> --- a/drivers/net/ethernet/realtek/r8169_main.c
> +++ b/drivers/net/ethernet/realtek/r8169_main.c
> @@ -1934,6 +1934,8 @@ static int rtl_get_eee_supp(struct rtl8169_private *tp)
> struct phy_device *phydev = tp->phydev;
> int ret;
>
> + WARN_ON(phy_read(phydev, 0x1f));
> +
> switch (tp->mac_version) {
> case RTL_GIGA_MAC_VER_34:
> case RTL_GIGA_MAC_VER_35:
> @@ -1957,6 +1959,8 @@ static int rtl_get_eee_lpadv(struct rtl8169_private *tp)
> struct phy_device *phydev = tp->phydev;
> int ret;
>
> + WARN_ON(phy_read(phydev, 0x1f));
> +
> switch (tp->mac_version) {
> case RTL_GIGA_MAC_VER_34:
> case RTL_GIGA_MAC_VER_35:
> @@ -1980,6 +1984,8 @@ static int rtl_get_eee_adv(struct rtl8169_private *tp)
> struct phy_device *phydev = tp->phydev;
> int ret;
>
> + WARN_ON(phy_read(phydev, 0x1f));
> +
> switch (tp->mac_version) {
> case RTL_GIGA_MAC_VER_34:
> case RTL_GIGA_MAC_VER_35:
> @@ -2003,6 +2009,8 @@ static int rtl_set_eee_adv(struct rtl8169_private *tp, int val)
> struct phy_device *phydev = tp->phydev;
> int ret = 0;
>
> + WARN_ON(phy_read(phydev, 0x1f));
> +
> switch (tp->mac_version) {
> case RTL_GIGA_MAC_VER_34:
> case RTL_GIGA_MAC_VER_35:
> diff --git a/drivers/net/phy/phy-core.c b/drivers/net/phy/phy-core.c
> index 16667fbac..1aa1142b8 100644
> --- a/drivers/net/phy/phy-core.c
> +++ b/drivers/net/phy/phy-core.c
> @@ -463,12 +463,10 @@ int __phy_modify_changed(struct phy_device *phydev, u32 regnum, u16 mask,
> return ret;
>
> new = (ret & ~mask) | set;
> - if (new == ret)
> - return 0;
>
> - ret = __phy_write(phydev, regnum, new);
> + __phy_write(phydev, regnum, new);
>
> - return ret < 0 ? ret : 1;
> + return new != ret;
> }
> EXPORT_SYMBOL_GPL(__phy_modify_changed);
>
>

Took your patch on top of next-20190719.
See attached dmesg.
It didn't work. Same thing, lots of ping drops, no usable network.

like that:
44 packets transmitted, 2 received, 95% packet loss, time 44005ms


Maybe important:
I build a kernel with no modules.

I have to power off when I booted a kernel which doesn't work, a (soft)
reboot into a older kernel (e.g. 4.9.y) doesn't
fix the problem. Powering off and on does.


greetings,

Thomas


Attachments:
dmesg-5.2.0-next-20190719-i5-dirty.txt (52.22 kB)

2019-07-20 03:10:34

by Heiner Kallweit

[permalink] [raw]
Subject: Re: network problems with r8169

On 18.07.2019 20:50, Thomas Voegtle wrote:
>
> Hello,
>
> I'm having network problems with the commits on r8169 since v5.2. There are ping packet loss, sometimes 100%, sometimes 50%. In the end network is unusable.
>
> v5.2 is fine, I bisected it down to:
>
> a2928d28643e3c064ff41397281d20c445525032 is the first bad commit
> commit a2928d28643e3c064ff41397281d20c445525032
> Author: Heiner Kallweit <[email protected]>
> Date:   Sun Jun 2 10:53:49 2019 +0200
>
>     r8169: use paged versions of phylib MDIO access functions
>
>     Use paged versions of phylib MDIO access functions to simplify
>     the code.
>
>     Signed-off-by: Heiner Kallweit <[email protected]>
>     Signed-off-by: David S. Miller <[email protected]>
>
>
> Reverting that commit on top of v5.2-11564-g22051d9c4a57 fixes the problem
> for me (had to adjust the renaming to r8169_main.c).
>
> I have a:
> 04:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev
> 0c)
>         Subsystem: Biostar Microtech Int'l Corp Device [1565:2400]
>         Kernel driver in use: r8169
>
> on a BIOSTAR H81MG motherboard.
>
Interesting. I have the same chip version (RTL8168g) and can't reproduce
the issue. Can you provide a full dmesg output and test the patch below
on top of linux-next? I'd be interested in the WARN_ON stack traces
(if any) and would like to know whether the experimental change to
__phy_modify_changed helps.

>
> greetings,
>
>   Thomas
>
>
Heiner


diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 8d7dd4c5f..26be73000 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -1934,6 +1934,8 @@ static int rtl_get_eee_supp(struct rtl8169_private *tp)
struct phy_device *phydev = tp->phydev;
int ret;

+ WARN_ON(phy_read(phydev, 0x1f));
+
switch (tp->mac_version) {
case RTL_GIGA_MAC_VER_34:
case RTL_GIGA_MAC_VER_35:
@@ -1957,6 +1959,8 @@ static int rtl_get_eee_lpadv(struct rtl8169_private *tp)
struct phy_device *phydev = tp->phydev;
int ret;

+ WARN_ON(phy_read(phydev, 0x1f));
+
switch (tp->mac_version) {
case RTL_GIGA_MAC_VER_34:
case RTL_GIGA_MAC_VER_35:
@@ -1980,6 +1984,8 @@ static int rtl_get_eee_adv(struct rtl8169_private *tp)
struct phy_device *phydev = tp->phydev;
int ret;

+ WARN_ON(phy_read(phydev, 0x1f));
+
switch (tp->mac_version) {
case RTL_GIGA_MAC_VER_34:
case RTL_GIGA_MAC_VER_35:
@@ -2003,6 +2009,8 @@ static int rtl_set_eee_adv(struct rtl8169_private *tp, int val)
struct phy_device *phydev = tp->phydev;
int ret = 0;

+ WARN_ON(phy_read(phydev, 0x1f));
+
switch (tp->mac_version) {
case RTL_GIGA_MAC_VER_34:
case RTL_GIGA_MAC_VER_35:
diff --git a/drivers/net/phy/phy-core.c b/drivers/net/phy/phy-core.c
index 16667fbac..1aa1142b8 100644
--- a/drivers/net/phy/phy-core.c
+++ b/drivers/net/phy/phy-core.c
@@ -463,12 +463,10 @@ int __phy_modify_changed(struct phy_device *phydev, u32 regnum, u16 mask,
return ret;

new = (ret & ~mask) | set;
- if (new == ret)
- return 0;

- ret = __phy_write(phydev, regnum, new);
+ __phy_write(phydev, regnum, new);

- return ret < 0 ? ret : 1;
+ return new != ret;
}
EXPORT_SYMBOL_GPL(__phy_modify_changed);

--
2.22.0

2019-07-20 18:41:01

by Heiner Kallweit

[permalink] [raw]
Subject: Re: network problems with r8169

On 19.07.2019 23:12, Thomas Voegtle wrote:
> On Fri, 19 Jul 2019, Heiner Kallweit wrote:
>
>> On 18.07.2019 20:50, Thomas Voegtle wrote:
>>>
>>> Hello,
>>>
>>> I'm having network problems with the commits on r8169 since v5.2. There are ping packet loss, sometimes 100%, sometimes 50%. In the end network is unusable.
>>>
>>> v5.2 is fine, I bisected it down to:
>>>
>>> a2928d28643e3c064ff41397281d20c445525032 is the first bad commit
>>> commit a2928d28643e3c064ff41397281d20c445525032
>>> Author: Heiner Kallweit <[email protected]>
>>> Date:   Sun Jun 2 10:53:49 2019 +0200
>>>
>>>     r8169: use paged versions of phylib MDIO access functions
>>>
>>>     Use paged versions of phylib MDIO access functions to simplify
>>>     the code.
>>>
>>>     Signed-off-by: Heiner Kallweit <[email protected]>
>>>     Signed-off-by: David S. Miller <[email protected]>
>>>
>>>
>>> Reverting that commit on top of v5.2-11564-g22051d9c4a57 fixes the problem
>>> for me (had to adjust the renaming to r8169_main.c).
>>>
>>> I have a:
>>> 04:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev
>>> 0c)
>>>         Subsystem: Biostar Microtech Int'l Corp Device [1565:2400]
>>>         Kernel driver in use: r8169
>>>
>>> on a BIOSTAR H81MG motherboard.
>>>
>> Interesting. I have the same chip version (RTL8168g) and can't reproduce
>> the issue. Can you provide a full dmesg output and test the patch below
>> on top of linux-next? I'd be interested in the WARN_ON stack traces
>> (if any) and would like to know whether the experimental change to
>> __phy_modify_changed helps.
>>
>>>
>>> greetings,
>>>
>>>   Thomas
>>>
>>>
>> Heiner
>>
>>
>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
>> index 8d7dd4c5f..26be73000 100644
>> --- a/drivers/net/ethernet/realtek/r8169_main.c
>> +++ b/drivers/net/ethernet/realtek/r8169_main.c
>> @@ -1934,6 +1934,8 @@ static int rtl_get_eee_supp(struct rtl8169_private *tp)
>>     struct phy_device *phydev = tp->phydev;
>>     int ret;
>>
>> +    WARN_ON(phy_read(phydev, 0x1f));
>> +
>>     switch (tp->mac_version) {
>>     case RTL_GIGA_MAC_VER_34:
>>     case RTL_GIGA_MAC_VER_35:
>> @@ -1957,6 +1959,8 @@ static int rtl_get_eee_lpadv(struct rtl8169_private *tp)
>>     struct phy_device *phydev = tp->phydev;
>>     int ret;
>>
>> +    WARN_ON(phy_read(phydev, 0x1f));
>> +
>>     switch (tp->mac_version) {
>>     case RTL_GIGA_MAC_VER_34:
>>     case RTL_GIGA_MAC_VER_35:
>> @@ -1980,6 +1984,8 @@ static int rtl_get_eee_adv(struct rtl8169_private *tp)
>>     struct phy_device *phydev = tp->phydev;
>>     int ret;
>>
>> +    WARN_ON(phy_read(phydev, 0x1f));
>> +
>>     switch (tp->mac_version) {
>>     case RTL_GIGA_MAC_VER_34:
>>     case RTL_GIGA_MAC_VER_35:
>> @@ -2003,6 +2009,8 @@ static int rtl_set_eee_adv(struct rtl8169_private *tp, int val)
>>     struct phy_device *phydev = tp->phydev;
>>     int ret = 0;
>>
>> +    WARN_ON(phy_read(phydev, 0x1f));
>> +
>>     switch (tp->mac_version) {
>>     case RTL_GIGA_MAC_VER_34:
>>     case RTL_GIGA_MAC_VER_35:
>> diff --git a/drivers/net/phy/phy-core.c b/drivers/net/phy/phy-core.c
>> index 16667fbac..1aa1142b8 100644
>> --- a/drivers/net/phy/phy-core.c
>> +++ b/drivers/net/phy/phy-core.c
>> @@ -463,12 +463,10 @@ int __phy_modify_changed(struct phy_device *phydev, u32 regnum, u16 mask,
>>         return ret;
>>
>>     new = (ret & ~mask) | set;
>> -    if (new == ret)
>> -        return 0;
>>
>> -    ret = __phy_write(phydev, regnum, new);
>> +    __phy_write(phydev, regnum, new);
>>
>> -    return ret < 0 ? ret : 1;
>> +    return new != ret;
>> }
>> EXPORT_SYMBOL_GPL(__phy_modify_changed);
>>
>>
>
> Took your patch on top of next-20190719.
> See attached dmesg.
> It didn't work. Same thing, lots of ping drops, no usable network.
>
> like that:
> 44 packets transmitted, 2 received, 95% packet loss, time 44005ms
>
>
> Maybe important:
> I build a kernel with no modules.
>
> I have to power off when I booted a kernel which doesn't work, a (soft) reboot into a older kernel (e.g. 4.9.y)  doesn't
> fix the problem. Powering off and on does.
>

Then, what you could do is reversing the hunks of the patch step by step.
Or make them separate patches and bisect.
Relevant are the hunks from point 1 and 2.

1. first 5 hunks (I don't think you have to reverse them individually)
EEE-related

2. rtl8168g_disable_aldps, rtl8168g_phy_adjust_10m_aldps, rtl8168g_1_hw_phy_config
all of these hunks are in the path for RTL8168g

3. rtl8168h_1_hw_phy_config, rtl8168h_2_hw_phy_config, rtl8168ep_1_hw_phy_config,
rtl8168ep_2_hw_phy_config
not in the path for RTL8168g

>
> greetings,
>
>       Thomas
Heiner

2019-07-20 18:44:14

by Thomas Voegtle

[permalink] [raw]
Subject: Re: network problems with r8169

On Sat, 20 Jul 2019, Heiner Kallweit wrote:

> On 19.07.2019 23:12, Thomas Voegtle wrote:
>> On Fri, 19 Jul 2019, Heiner Kallweit wrote:
>>
>>> On 18.07.2019 20:50, Thomas Voegtle wrote:
>>>>
>>>> Hello,
>>>>
>>>> I'm having network problems with the commits on r8169 since v5.2. There are ping packet loss, sometimes 100%, sometimes 50%. In the end network is unusable.
>>>>
>>>> v5.2 is fine, I bisected it down to:
>>>>
>>>> a2928d28643e3c064ff41397281d20c445525032 is the first bad commit
>>>> commit a2928d28643e3c064ff41397281d20c445525032
>>>> Author: Heiner Kallweit <[email protected]>
>>>> Date:   Sun Jun 2 10:53:49 2019 +0200
>>>>
>>>>     r8169: use paged versions of phylib MDIO access functions
>>>>
>>>>     Use paged versions of phylib MDIO access functions to simplify
>>>>     the code.
>>>>
>>>>     Signed-off-by: Heiner Kallweit <[email protected]>
>>>>     Signed-off-by: David S. Miller <[email protected]>
>>>>
>>>>
>>>> Reverting that commit on top of v5.2-11564-g22051d9c4a57 fixes the problem
>>>> for me (had to adjust the renaming to r8169_main.c).
>>>>
>>>> I have a:
>>>> 04:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev
>>>> 0c)
>>>>         Subsystem: Biostar Microtech Int'l Corp Device [1565:2400]
>>>>         Kernel driver in use: r8169
>>>>
>>>> on a BIOSTAR H81MG motherboard.
>>>>
>>> Interesting. I have the same chip version (RTL8168g) and can't reproduce
>>> the issue. Can you provide a full dmesg output and test the patch below
>>> on top of linux-next? I'd be interested in the WARN_ON stack traces
>>> (if any) and would like to know whether the experimental change to
>>> __phy_modify_changed helps.
>>>
>>>>
>>>> greetings,
>>>>
>>>>   Thomas
>>>>
>>>>
>>> Heiner
>>>
>>>
>>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
>>> index 8d7dd4c5f..26be73000 100644
>>> --- a/drivers/net/ethernet/realtek/r8169_main.c
>>> +++ b/drivers/net/ethernet/realtek/r8169_main.c
>>> @@ -1934,6 +1934,8 @@ static int rtl_get_eee_supp(struct rtl8169_private *tp)
>>>     struct phy_device *phydev = tp->phydev;
>>>     int ret;
>>>
>>> +    WARN_ON(phy_read(phydev, 0x1f));
>>> +
>>>     switch (tp->mac_version) {
>>>     case RTL_GIGA_MAC_VER_34:
>>>     case RTL_GIGA_MAC_VER_35:
>>> @@ -1957,6 +1959,8 @@ static int rtl_get_eee_lpadv(struct rtl8169_private *tp)
>>>     struct phy_device *phydev = tp->phydev;
>>>     int ret;
>>>
>>> +    WARN_ON(phy_read(phydev, 0x1f));
>>> +
>>>     switch (tp->mac_version) {
>>>     case RTL_GIGA_MAC_VER_34:
>>>     case RTL_GIGA_MAC_VER_35:
>>> @@ -1980,6 +1984,8 @@ static int rtl_get_eee_adv(struct rtl8169_private *tp)
>>>     struct phy_device *phydev = tp->phydev;
>>>     int ret;
>>>
>>> +    WARN_ON(phy_read(phydev, 0x1f));
>>> +
>>>     switch (tp->mac_version) {
>>>     case RTL_GIGA_MAC_VER_34:
>>>     case RTL_GIGA_MAC_VER_35:
>>> @@ -2003,6 +2009,8 @@ static int rtl_set_eee_adv(struct rtl8169_private *tp, int val)
>>>     struct phy_device *phydev = tp->phydev;
>>>     int ret = 0;
>>>
>>> +    WARN_ON(phy_read(phydev, 0x1f));
>>> +
>>>     switch (tp->mac_version) {
>>>     case RTL_GIGA_MAC_VER_34:
>>>     case RTL_GIGA_MAC_VER_35:
>>> diff --git a/drivers/net/phy/phy-core.c b/drivers/net/phy/phy-core.c
>>> index 16667fbac..1aa1142b8 100644
>>> --- a/drivers/net/phy/phy-core.c
>>> +++ b/drivers/net/phy/phy-core.c
>>> @@ -463,12 +463,10 @@ int __phy_modify_changed(struct phy_device *phydev, u32 regnum, u16 mask,
>>>         return ret;
>>>
>>>     new = (ret & ~mask) | set;
>>> -    if (new == ret)
>>> -        return 0;
>>>
>>> -    ret = __phy_write(phydev, regnum, new);
>>> +    __phy_write(phydev, regnum, new);
>>>
>>> -    return ret < 0 ? ret : 1;
>>> +    return new != ret;
>>> }
>>> EXPORT_SYMBOL_GPL(__phy_modify_changed);
>>>
>>>
>>
>> Took your patch on top of next-20190719.
>> See attached dmesg.
>> It didn't work. Same thing, lots of ping drops, no usable network.
>>
>> like that:
>> 44 packets transmitted, 2 received, 95% packet loss, time 44005ms
>>
>>
>> Maybe important:
>> I build a kernel with no modules.
>>
>> I have to power off when I booted a kernel which doesn't work, a (soft) reboot into a older kernel (e.g. 4.9.y)  doesn't
>> fix the problem. Powering off and on does.
>>
>
> Then, what you could do is reversing the hunks of the patch step by step.
> Or make them separate patches and bisect.
> Relevant are the hunks from point 1 and 2.
>
> 1. first 5 hunks (I don't think you have to reverse them individually)
> EEE-related
>
> 2. rtl8168g_disable_aldps, rtl8168g_phy_adjust_10m_aldps, rtl8168g_1_hw_phy_config
> all of these hunks are in the path for RTL8168g
>
> 3. rtl8168h_1_hw_phy_config, rtl8168h_2_hw_phy_config, rtl8168ep_1_hw_phy_config,
> rtl8168ep_2_hw_phy_config
> not in the path for RTL8168g
>

this is the minimal revert:

diff --git a/drivers/net/ethernet/realtek/r8169_main.c
b/drivers/net/ethernet/realtek/r8169_main.c
index efef5453b94f..267995a614b5 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -3249,12 +3249,14 @@ static void rtl8168g_1_hw_phy_config(struct
rtl8169_private *tp)
else
phy_modify_paged(tp->phydev, 0x0bcc, 0x12, 0, BIT(15));

- ret = phy_read_paged(tp->phydev, 0x0a46, 0x13);
- if (ret & BIT(8))
- phy_modify_paged(tp->phydev, 0x0c41, 0x12, 0, BIT(1));
- else
- phy_modify_paged(tp->phydev, 0x0c41, 0x12, BIT(1), 0);
-
+ rtl_writephy(tp, 0x1f, 0x0a46);
+ if (rtl_readphy(tp, 0x13) & 0x0100) {
+ rtl_writephy(tp, 0x1f, 0x0c41);
+ rtl_w0w1_phy(tp, 0x15, 0x0002, 0x0000);
+ } else {
+ rtl_writephy(tp, 0x1f, 0x0c41);
+ rtl_w0w1_phy(tp, 0x15, 0x0000, 0x0002);
+ }
/* Enable PHY auto speed down */
phy_modify_paged(tp->phydev, 0x0a44, 0x11, 0, BIT(3) | BIT(2));



Could it be, that there is just a typo?

if (ret & BIT(8))
- phy_modify_paged(tp->phydev, 0x0c41, 0x12, 0, BIT(1));
+ phy_modify_paged(tp->phydev, 0x0c41, 0x15, 0, BIT(1));
else
- phy_modify_paged(tp->phydev, 0x0c41, 0x12, BIT(1), 0);
+ phy_modify_paged(tp->phydev, 0x0c41, 0x15, BIT(1), 0);




greetings,

Thomas

2019-07-20 18:47:06

by Heiner Kallweit

[permalink] [raw]
Subject: Re: network problems with r8169

On 20.07.2019 16:22, Thomas Voegtle wrote:
> On Sat, 20 Jul 2019, Heiner Kallweit wrote:
>
>> On 19.07.2019 23:12, Thomas Voegtle wrote:
>>> On Fri, 19 Jul 2019, Heiner Kallweit wrote:
>>>
>>>> On 18.07.2019 20:50, Thomas Voegtle wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> I'm having network problems with the commits on r8169 since v5.2. There are ping packet loss, sometimes 100%, sometimes 50%. In the end network is unusable.
>>>>>
>>>>> v5.2 is fine, I bisected it down to:
>>>>>
>>>>> a2928d28643e3c064ff41397281d20c445525032 is the first bad commit
>>>>> commit a2928d28643e3c064ff41397281d20c445525032
>>>>> Author: Heiner Kallweit <[email protected]>
>>>>> Date:   Sun Jun 2 10:53:49 2019 +0200
>>>>>
>>>>>     r8169: use paged versions of phylib MDIO access functions
>>>>>
>>>>>     Use paged versions of phylib MDIO access functions to simplify
>>>>>     the code.
>>>>>
>>>>>     Signed-off-by: Heiner Kallweit <[email protected]>
>>>>>     Signed-off-by: David S. Miller <[email protected]>
>>>>>
>>>>>
>>>>> Reverting that commit on top of v5.2-11564-g22051d9c4a57 fixes the problem
>>>>> for me (had to adjust the renaming to r8169_main.c).
>>>>>
>>>>> I have a:
>>>>> 04:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev
>>>>> 0c)
>>>>>         Subsystem: Biostar Microtech Int'l Corp Device [1565:2400]
>>>>>         Kernel driver in use: r8169
>>>>>
>>>>> on a BIOSTAR H81MG motherboard.
>>>>>
>>>> Interesting. I have the same chip version (RTL8168g) and can't reproduce
>>>> the issue. Can you provide a full dmesg output and test the patch below
>>>> on top of linux-next? I'd be interested in the WARN_ON stack traces
>>>> (if any) and would like to know whether the experimental change to
>>>> __phy_modify_changed helps.
>>>>
>>>>>
>>>>> greetings,
>>>>>
>>>>>   Thomas
>>>>>
>>>>>
>>>> Heiner
>>>>
>>>>
>>>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
>>>> index 8d7dd4c5f..26be73000 100644
>>>> --- a/drivers/net/ethernet/realtek/r8169_main.c
>>>> +++ b/drivers/net/ethernet/realtek/r8169_main.c
>>>> @@ -1934,6 +1934,8 @@ static int rtl_get_eee_supp(struct rtl8169_private *tp)
>>>>     struct phy_device *phydev = tp->phydev;
>>>>     int ret;
>>>>
>>>> +    WARN_ON(phy_read(phydev, 0x1f));
>>>> +
>>>>     switch (tp->mac_version) {
>>>>     case RTL_GIGA_MAC_VER_34:
>>>>     case RTL_GIGA_MAC_VER_35:
>>>> @@ -1957,6 +1959,8 @@ static int rtl_get_eee_lpadv(struct rtl8169_private *tp)
>>>>     struct phy_device *phydev = tp->phydev;
>>>>     int ret;
>>>>
>>>> +    WARN_ON(phy_read(phydev, 0x1f));
>>>> +
>>>>     switch (tp->mac_version) {
>>>>     case RTL_GIGA_MAC_VER_34:
>>>>     case RTL_GIGA_MAC_VER_35:
>>>> @@ -1980,6 +1984,8 @@ static int rtl_get_eee_adv(struct rtl8169_private *tp)
>>>>     struct phy_device *phydev = tp->phydev;
>>>>     int ret;
>>>>
>>>> +    WARN_ON(phy_read(phydev, 0x1f));
>>>> +
>>>>     switch (tp->mac_version) {
>>>>     case RTL_GIGA_MAC_VER_34:
>>>>     case RTL_GIGA_MAC_VER_35:
>>>> @@ -2003,6 +2009,8 @@ static int rtl_set_eee_adv(struct rtl8169_private *tp, int val)
>>>>     struct phy_device *phydev = tp->phydev;
>>>>     int ret = 0;
>>>>
>>>> +    WARN_ON(phy_read(phydev, 0x1f));
>>>> +
>>>>     switch (tp->mac_version) {
>>>>     case RTL_GIGA_MAC_VER_34:
>>>>     case RTL_GIGA_MAC_VER_35:
>>>> diff --git a/drivers/net/phy/phy-core.c b/drivers/net/phy/phy-core.c
>>>> index 16667fbac..1aa1142b8 100644
>>>> --- a/drivers/net/phy/phy-core.c
>>>> +++ b/drivers/net/phy/phy-core.c
>>>> @@ -463,12 +463,10 @@ int __phy_modify_changed(struct phy_device *phydev, u32 regnum, u16 mask,
>>>>         return ret;
>>>>
>>>>     new = (ret & ~mask) | set;
>>>> -    if (new == ret)
>>>> -        return 0;
>>>>
>>>> -    ret = __phy_write(phydev, regnum, new);
>>>> +    __phy_write(phydev, regnum, new);
>>>>
>>>> -    return ret < 0 ? ret : 1;
>>>> +    return new != ret;
>>>> }
>>>> EXPORT_SYMBOL_GPL(__phy_modify_changed);
>>>>
>>>>
>>>
>>> Took your patch on top of next-20190719.
>>> See attached dmesg.
>>> It didn't work. Same thing, lots of ping drops, no usable network.
>>>
>>> like that:
>>> 44 packets transmitted, 2 received, 95% packet loss, time 44005ms
>>>
>>>
>>> Maybe important:
>>> I build a kernel with no modules.
>>>
>>> I have to power off when I booted a kernel which doesn't work, a (soft) reboot into a older kernel (e.g. 4.9.y)  doesn't
>>> fix the problem. Powering off and on does.
>>>
>>
>> Then, what you could do is reversing the hunks of the patch step by step.
>> Or make them separate patches and bisect.
>> Relevant are the hunks from point 1 and 2.
>>
>> 1. first 5 hunks (I don't think you have to reverse them individually)
>>   EEE-related
>>
>> 2. rtl8168g_disable_aldps, rtl8168g_phy_adjust_10m_aldps, rtl8168g_1_hw_phy_config
>>   all of these hunks are in the path for RTL8168g
>>
>> 3. rtl8168h_1_hw_phy_config, rtl8168h_2_hw_phy_config, rtl8168ep_1_hw_phy_config,
>>   rtl8168ep_2_hw_phy_config
>>   not in the path for RTL8168g
>>
>
> this is the minimal revert:
>
> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> index efef5453b94f..267995a614b5 100644
> --- a/drivers/net/ethernet/realtek/r8169_main.c
> +++ b/drivers/net/ethernet/realtek/r8169_main.c
> @@ -3249,12 +3249,14 @@ static void rtl8168g_1_hw_phy_config(struct rtl8169_private *tp)
>         else
>                 phy_modify_paged(tp->phydev, 0x0bcc, 0x12, 0, BIT(15));
>
> -       ret = phy_read_paged(tp->phydev, 0x0a46, 0x13);
> -       if (ret & BIT(8))
> -               phy_modify_paged(tp->phydev, 0x0c41, 0x12, 0, BIT(1));
> -       else
> -               phy_modify_paged(tp->phydev, 0x0c41, 0x12, BIT(1), 0);
> -
> +       rtl_writephy(tp, 0x1f, 0x0a46);
> +       if (rtl_readphy(tp, 0x13) & 0x0100) {
> +               rtl_writephy(tp, 0x1f, 0x0c41);
> +               rtl_w0w1_phy(tp, 0x15, 0x0002, 0x0000);
> +       } else {
> +               rtl_writephy(tp, 0x1f, 0x0c41);
> +               rtl_w0w1_phy(tp, 0x15, 0x0000, 0x0002);
> +       }
>         /* Enable PHY auto speed down */
>         phy_modify_paged(tp->phydev, 0x0a44, 0x11, 0, BIT(3) | BIT(2));
>
>
>
> Could it be, that there is just a typo?
>
I looked a hundred times over this piece of code ..
Yes, it's simply a typo. I'll submit a patch for it.
Thanks a lot for your testing efforts!

>         if (ret & BIT(8))
> -               phy_modify_paged(tp->phydev, 0x0c41, 0x12, 0, BIT(1));
> +               phy_modify_paged(tp->phydev, 0x0c41, 0x15, 0, BIT(1));
>         else
> -               phy_modify_paged(tp->phydev, 0x0c41, 0x12, BIT(1), 0);
> +               phy_modify_paged(tp->phydev, 0x0c41, 0x15, BIT(1), 0);
>
>
>
>
> greetings,
>
>       Thomas
Heiner

2019-07-20 19:21:01

by Heiner Kallweit

[permalink] [raw]
Subject: Re: network problems with r8169

On 19.07.2019 23:12, Thomas Voegtle wrote:
> On Fri, 19 Jul 2019, Heiner Kallweit wrote:
>
>> On 18.07.2019 20:50, Thomas Voegtle wrote:
>>>
>>> Hello,
>>>
>>> I'm having network problems with the commits on r8169 since v5.2. There are ping packet loss, sometimes 100%, sometimes 50%. In the end network is unusable.
>>>
>>> v5.2 is fine, I bisected it down to:
>>>
>>> a2928d28643e3c064ff41397281d20c445525032 is the first bad commit
>>> commit a2928d28643e3c064ff41397281d20c445525032
>>> Author: Heiner Kallweit <[email protected]>
>>> Date:   Sun Jun 2 10:53:49 2019 +0200
>>>
>>>     r8169: use paged versions of phylib MDIO access functions
>>>
>>>     Use paged versions of phylib MDIO access functions to simplify
>>>     the code.
>>>
>>>     Signed-off-by: Heiner Kallweit <[email protected]>
>>>     Signed-off-by: David S. Miller <[email protected]>
>>>
>>>
>>> Reverting that commit on top of v5.2-11564-g22051d9c4a57 fixes the problem
>>> for me (had to adjust the renaming to r8169_main.c).
>>>
>>> I have a:
>>> 04:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev
>>> 0c)
>>>         Subsystem: Biostar Microtech Int'l Corp Device [1565:2400]
>>>         Kernel driver in use: r8169
>>>
>>> on a BIOSTAR H81MG motherboard.
>>>
>> Interesting. I have the same chip version (RTL8168g) and can't reproduce
>> the issue. Can you provide a full dmesg output and test the patch below
>> on top of linux-next? I'd be interested in the WARN_ON stack traces
>> (if any) and would like to know whether the experimental change to
>> __phy_modify_changed helps.
>>
>>>
>>> greetings,
>>>
>>>   Thomas
>>>
>>>
>> Heiner
>>
>>
>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
>> index 8d7dd4c5f..26be73000 100644
>> --- a/drivers/net/ethernet/realtek/r8169_main.c
>> +++ b/drivers/net/ethernet/realtek/r8169_main.c
>> @@ -1934,6 +1934,8 @@ static int rtl_get_eee_supp(struct rtl8169_private *tp)
>>     struct phy_device *phydev = tp->phydev;
>>     int ret;
>>
>> +    WARN_ON(phy_read(phydev, 0x1f));
>> +
>>     switch (tp->mac_version) {
>>     case RTL_GIGA_MAC_VER_34:
>>     case RTL_GIGA_MAC_VER_35:
>> @@ -1957,6 +1959,8 @@ static int rtl_get_eee_lpadv(struct rtl8169_private *tp)
>>     struct phy_device *phydev = tp->phydev;
>>     int ret;
>>
>> +    WARN_ON(phy_read(phydev, 0x1f));
>> +
>>     switch (tp->mac_version) {
>>     case RTL_GIGA_MAC_VER_34:
>>     case RTL_GIGA_MAC_VER_35:
>> @@ -1980,6 +1984,8 @@ static int rtl_get_eee_adv(struct rtl8169_private *tp)
>>     struct phy_device *phydev = tp->phydev;
>>     int ret;
>>
>> +    WARN_ON(phy_read(phydev, 0x1f));
>> +
>>     switch (tp->mac_version) {
>>     case RTL_GIGA_MAC_VER_34:
>>     case RTL_GIGA_MAC_VER_35:
>> @@ -2003,6 +2009,8 @@ static int rtl_set_eee_adv(struct rtl8169_private *tp, int val)
>>     struct phy_device *phydev = tp->phydev;
>>     int ret = 0;
>>
>> +    WARN_ON(phy_read(phydev, 0x1f));
>> +
>>     switch (tp->mac_version) {
>>     case RTL_GIGA_MAC_VER_34:
>>     case RTL_GIGA_MAC_VER_35:
>> diff --git a/drivers/net/phy/phy-core.c b/drivers/net/phy/phy-core.c
>> index 16667fbac..1aa1142b8 100644
>> --- a/drivers/net/phy/phy-core.c
>> +++ b/drivers/net/phy/phy-core.c
>> @@ -463,12 +463,10 @@ int __phy_modify_changed(struct phy_device *phydev, u32 regnum, u16 mask,
>>         return ret;
>>
>>     new = (ret & ~mask) | set;
>> -    if (new == ret)
>> -        return 0;
>>
>> -    ret = __phy_write(phydev, regnum, new);
>> +    __phy_write(phydev, regnum, new);
>>
>> -    return ret < 0 ? ret : 1;
>> +    return new != ret;
>> }
>> EXPORT_SYMBOL_GPL(__phy_modify_changed);
>>
>>
>
> Took your patch on top of next-20190719.
> See attached dmesg.
> It didn't work. Same thing, lots of ping drops, no usable network.
>
> like that:
> 44 packets transmitted, 2 received, 95% packet loss, time 44005ms
>
I remember that I once had problems with this chip version and 100Mbps.
Could you check whether you face the same issues with 1Gbps?

>
> Maybe important:
> I build a kernel with no modules.
>
> I have to power off when I booted a kernel which doesn't work, a (soft) reboot into a older kernel (e.g. 4.9.y)  doesn't
> fix the problem. Powering off and on does.
>
>
> greetings,
>
>       Thomas