2023-10-20 21:08:50

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v5 0/8] r8152: Avoid writing garbage to the adapter's registers

This series is the result of a cooperative debug effort between
Realtek and the ChromeOS team. On ChromeOS, we've noticed that Realtek
Ethernet adapters can sometimes get so wedged that even a reboot of
the host can't get them to enumerate again, assuming that the adapter
was on a powered hub and din't lose power when the host rebooted. This
is sometimes seen in the ChromeOS automated testing lab. The only way
to recover adapters in this state is to manually power cycle them.

I managed to reproduce one instance of this wedging (unknown if this
is truly related to what the test lab sees) by doing this:
1. Start a flood ping from a host to the device.
2. Drop the device into kdb.
3. Wait 90 seconds.
4. Resume from kdb (the "g" command).
5. Wait another 45 seconds.

Upon analysis, Realtek realized this was happening:

1. The Linux driver was getting a "Tx timeout" after resuming from kdb
and then trying to reset itself.
2. As part of the reset, the Linux driver was attempting to do a
read-modify-write of the adapter's registers.
3. The read would fail (due to a timeout) and the driver pretended
that the register contained all 0xFFs. See commit f53a7ad18959
("r8152: Set memory to all 0xFFs on failed reg reads")
4. The driver would take this value of all 0xFFs, modify it, and
attempt to write it back to the adapter.
5. By this time the USB channel seemed to recover and thus we'd
successfully write a value that was mostly 0xFFs to the adpater.
6. The adapter didn't like this and would wedge itself.

Another Engineer also managed to reproduce wedging of the Realtek
Ethernet adpater during a reboot test on an AMD Chromebook. In that
case he was sometimes seeing -EPIPE returned from the control
transfers.

This patch series fixes both issues.

Changes in v5:
- ("Run the unload routine if we have errors during probe") new for v5.
- ("Cancel hw_phy_work if we have an error in probe") new for v5.
- ("Release firmware if we have an error in probe") new for v5.
- Removed extra mutex_unlock() left over in v4.
- Fixed minor typos.
- Don't do queue an unbind/bind reset if probe fails; just retry probe.

Changes in v4:
- Took out some unnecessary locks/unlocks of the control mutex.
- Added comment about reading version causing probe fail if 3 fails.
- Added text to commit msg about the potential unbind/bind loop.

Changes in v3:
- Fixed v2 changelog ending up in the commit message.
- farmework -> framework in comments.

Changes in v2:
- ("Check for unplug in rtl_phy_patch_request()") new for v2.
- ("Check for unplug in r8153b_ups_en() / r8153c_ups_en()") new for v2.
- ("Rename RTL8152_UNPLUG to RTL8152_INACCESSIBLE") new for v2.
- Reset patch no longer based on retry patch, since that was dropped.
- Reset patch should be robust even if failures happen in probe.
- Switched booleans to bits in the "flags" variable.
- Check for -ENODEV instead of "udev->state == USB_STATE_NOTATTACHED"

Douglas Anderson (8):
r8152: Increase USB control msg timeout to 5000ms as per spec
r8152: Run the unload routine if we have errors during probe
r8152: Cancel hw_phy_work if we have an error in probe
r8152: Release firmware if we have an error in probe
r8152: Check for unplug in rtl_phy_patch_request()
r8152: Check for unplug in r8153b_ups_en() / r8153c_ups_en()
r8152: Rename RTL8152_UNPLUG to RTL8152_INACCESSIBLE
r8152: Block future register access if register access fails

drivers/net/usb/r8152.c | 303 ++++++++++++++++++++++++++++++----------
1 file changed, 230 insertions(+), 73 deletions(-)

--
2.42.0.758.gaed0368e0e-goog


2023-10-20 21:08:52

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v5 3/8] r8152: Cancel hw_phy_work if we have an error in probe

The error handling in rtl8152_probe() is missing a call to cancel the
hw_phy_work. Add it in to match what's in the cleanup code in
rtl8152_disconnect().

Fixes: a028a9e003f2 ("r8152: move the settings of PHY to a work queue")
Signed-off-by: Douglas Anderson <[email protected]>
---

Changes in v5:
- ("Cancel hw_phy_work if we have an error in probe") new for v5.

drivers/net/usb/r8152.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 201c688e3e3f..d10b0886b652 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -9783,6 +9783,7 @@ static int rtl8152_probe(struct usb_interface *intf,

out1:
tasklet_kill(&tp->tx_tl);
+ cancel_delayed_work_sync(&tp->hw_phy_work);
if (tp->rtl_ops.unload)
tp->rtl_ops.unload(tp);
usb_set_intfdata(intf, NULL);
--
2.42.0.758.gaed0368e0e-goog

2023-10-20 21:09:04

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v5 4/8] r8152: Release firmware if we have an error in probe

The error handling in rtl8152_probe() is missing a call to release
firmware. Add it in to match what's in the cleanup code in
rtl8152_disconnect().

Fixes: 9370f2d05a2a ("r8152: support request_firmware for RTL8153")
Signed-off-by: Douglas Anderson <[email protected]>
---

Changes in v5:
- ("Release firmware if we have an error in probe") new for v5.

drivers/net/usb/r8152.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index d10b0886b652..656fe90734fc 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -9786,6 +9786,7 @@ static int rtl8152_probe(struct usb_interface *intf,
cancel_delayed_work_sync(&tp->hw_phy_work);
if (tp->rtl_ops.unload)
tp->rtl_ops.unload(tp);
+ rtl8152_release_firmware(tp);
usb_set_intfdata(intf, NULL);
out:
free_netdev(netdev);
--
2.42.0.758.gaed0368e0e-goog

2023-10-20 21:09:04

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v5 6/8] r8152: Check for unplug in r8153b_ups_en() / r8153c_ups_en()

If the adapter is unplugged while we're looping in r8153b_ups_en() /
r8153c_ups_en() we could end up looping for 10 seconds (20 ms * 500
loops). Add code similar to what's done in other places in the driver
to check for unplug and bail.

Signed-off-by: Douglas Anderson <[email protected]>
---

(no changes since v2)

Changes in v2:
- ("Check for unplug in r8153b_ups_en() / r8153c_ups_en()") new for v2.

drivers/net/usb/r8152.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 9888bc43e903..982f9ca03e7a 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -3663,6 +3663,8 @@ static void r8153b_ups_en(struct r8152 *tp, bool enable)
int i;

for (i = 0; i < 500; i++) {
+ if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ return;
if (ocp_read_word(tp, MCU_TYPE_PLA, PLA_BOOT_CTRL) &
AUTOLOAD_DONE)
break;
@@ -3703,6 +3705,8 @@ static void r8153c_ups_en(struct r8152 *tp, bool enable)
int i;

for (i = 0; i < 500; i++) {
+ if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ return;
if (ocp_read_word(tp, MCU_TYPE_PLA, PLA_BOOT_CTRL) &
AUTOLOAD_DONE)
break;
--
2.42.0.758.gaed0368e0e-goog

2023-10-20 21:09:10

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v5 5/8] r8152: Check for unplug in rtl_phy_patch_request()

If the adapter is unplugged while we're looping in
rtl_phy_patch_request() we could end up looping for 10 seconds (2 ms *
5000 loops). Add code similar to what's done in other places in the
driver to check for unplug and bail.

Signed-off-by: Douglas Anderson <[email protected]>
---

(no changes since v2)

Changes in v2:
- ("Check for unplug in rtl_phy_patch_request()") new for v2.

drivers/net/usb/r8152.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 656fe90734fc..9888bc43e903 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -4046,6 +4046,9 @@ static int rtl_phy_patch_request(struct r8152 *tp, bool request, bool wait)
for (i = 0; wait && i < 5000; i++) {
u32 ocp_data;

+ if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ break;
+
usleep_range(1000, 2000);
ocp_data = ocp_reg_read(tp, OCP_PHY_PATCH_STAT);
if ((ocp_data & PATCH_READY) ^ check)
--
2.42.0.758.gaed0368e0e-goog

2023-10-20 21:09:18

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v5 2/8] r8152: Run the unload routine if we have errors during probe

The rtl8152_probe() function lacks a call to the chip-specific
unload() routine when it sees an error in probe. Add it in to match
the cleanup code in rtl8152_disconnect().

Fixes: ac718b69301c ("net/usb: new driver for RTL8152")
Signed-off-by: Douglas Anderson <[email protected]>
---

Changes in v5:
- ("Run the unload routine if we have errors during probe") new for v5.

drivers/net/usb/r8152.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 482957beae66..201c688e3e3f 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -9783,6 +9783,8 @@ static int rtl8152_probe(struct usb_interface *intf,

out1:
tasklet_kill(&tp->tx_tl);
+ if (tp->rtl_ops.unload)
+ tp->rtl_ops.unload(tp);
usb_set_intfdata(intf, NULL);
out:
free_netdev(netdev);
--
2.42.0.758.gaed0368e0e-goog

2023-10-20 21:09:19

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v5 7/8] r8152: Rename RTL8152_UNPLUG to RTL8152_INACCESSIBLE

Whenever the RTL8152_UNPLUG is set that just tells the driver that all
accesses will fail and we should just immediately bail. A future patch
will use this same concept at a time when the driver hasn't actually
been unplugged but is about to be reset. Rename the flag in
preparation for the future patch.

This is a no-op change and just a search and replace.

Signed-off-by: Douglas Anderson <[email protected]>
---

(no changes since v2)

Changes in v2:
- ("Rename RTL8152_UNPLUG to RTL8152_INACCESSIBLE") new for v2.

drivers/net/usb/r8152.c | 96 ++++++++++++++++++++---------------------
1 file changed, 48 insertions(+), 48 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 982f9ca03e7a..65232848b31d 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -764,7 +764,7 @@ enum rtl_register_content {

/* rtl8152 flags */
enum rtl8152_flags {
- RTL8152_UNPLUG = 0,
+ RTL8152_INACCESSIBLE = 0,
RTL8152_SET_RX_MODE,
WORK_ENABLE,
RTL8152_LINK_CHG,
@@ -1245,7 +1245,7 @@ int set_registers(struct r8152 *tp, u16 value, u16 index, u16 size, void *data)
static void rtl_set_unplug(struct r8152 *tp)
{
if (tp->udev->state == USB_STATE_NOTATTACHED) {
- set_bit(RTL8152_UNPLUG, &tp->flags);
+ set_bit(RTL8152_INACCESSIBLE, &tp->flags);
smp_mb__after_atomic();
}
}
@@ -1256,7 +1256,7 @@ static int generic_ocp_read(struct r8152 *tp, u16 index, u16 size,
u16 limit = 64;
int ret = 0;

- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return -ENODEV;

/* both size and indix must be 4 bytes align */
@@ -1300,7 +1300,7 @@ static int generic_ocp_write(struct r8152 *tp, u16 index, u16 byteen,
u16 byteen_start, byteen_end, byen;
u16 limit = 512;

- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return -ENODEV;

/* both size and indix must be 4 bytes align */
@@ -1537,7 +1537,7 @@ static int read_mii_word(struct net_device *netdev, int phy_id, int reg)
struct r8152 *tp = netdev_priv(netdev);
int ret;

- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return -ENODEV;

if (phy_id != R8152_PHY_ID)
@@ -1553,7 +1553,7 @@ void write_mii_word(struct net_device *netdev, int phy_id, int reg, int val)
{
struct r8152 *tp = netdev_priv(netdev);

- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return;

if (phy_id != R8152_PHY_ID)
@@ -1758,7 +1758,7 @@ static void read_bulk_callback(struct urb *urb)
if (!tp)
return;

- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return;

if (!test_bit(WORK_ENABLE, &tp->flags))
@@ -1850,7 +1850,7 @@ static void write_bulk_callback(struct urb *urb)
if (!test_bit(WORK_ENABLE, &tp->flags))
return;

- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return;

if (!skb_queue_empty(&tp->tx_queue))
@@ -1871,7 +1871,7 @@ static void intr_callback(struct urb *urb)
if (!test_bit(WORK_ENABLE, &tp->flags))
return;

- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return;

switch (status) {
@@ -2615,7 +2615,7 @@ static void bottom_half(struct tasklet_struct *t)
{
struct r8152 *tp = from_tasklet(tp, t, tx_tl);

- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return;

if (!test_bit(WORK_ENABLE, &tp->flags))
@@ -2658,7 +2658,7 @@ int r8152_submit_rx(struct r8152 *tp, struct rx_agg *agg, gfp_t mem_flags)
int ret;

/* The rx would be stopped, so skip submitting */
- if (test_bit(RTL8152_UNPLUG, &tp->flags) ||
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags) ||
!test_bit(WORK_ENABLE, &tp->flags) || !netif_carrier_ok(tp->netdev))
return 0;

@@ -3058,7 +3058,7 @@ static int rtl_enable(struct r8152 *tp)

static int rtl8152_enable(struct r8152 *tp)
{
- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return -ENODEV;

set_tx_qlen(tp);
@@ -3145,7 +3145,7 @@ static int rtl8153_enable(struct r8152 *tp)
{
u32 ocp_data;

- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return -ENODEV;

set_tx_qlen(tp);
@@ -3177,7 +3177,7 @@ static void rtl_disable(struct r8152 *tp)
u32 ocp_data;
int i;

- if (test_bit(RTL8152_UNPLUG, &tp->flags)) {
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags)) {
rtl_drop_queued_tx(tp);
return;
}
@@ -3631,7 +3631,7 @@ static u16 r8153_phy_status(struct r8152 *tp, u16 desired)
}

msleep(20);
- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
break;
}

@@ -3663,7 +3663,7 @@ static void r8153b_ups_en(struct r8152 *tp, bool enable)
int i;

for (i = 0; i < 500; i++) {
- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return;
if (ocp_read_word(tp, MCU_TYPE_PLA, PLA_BOOT_CTRL) &
AUTOLOAD_DONE)
@@ -3705,7 +3705,7 @@ static void r8153c_ups_en(struct r8152 *tp, bool enable)
int i;

for (i = 0; i < 500; i++) {
- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return;
if (ocp_read_word(tp, MCU_TYPE_PLA, PLA_BOOT_CTRL) &
AUTOLOAD_DONE)
@@ -4050,8 +4050,8 @@ static int rtl_phy_patch_request(struct r8152 *tp, bool request, bool wait)
for (i = 0; wait && i < 5000; i++) {
u32 ocp_data;

- if (test_bit(RTL8152_UNPLUG, &tp->flags))
- break;
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
+ return -ENODEV;

usleep_range(1000, 2000);
ocp_data = ocp_reg_read(tp, OCP_PHY_PATCH_STAT);
@@ -6009,7 +6009,7 @@ static int rtl8156_enable(struct r8152 *tp)
u32 ocp_data;
u16 speed;

- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return -ENODEV;

r8156_fc_parameter(tp);
@@ -6067,7 +6067,7 @@ static int rtl8156b_enable(struct r8152 *tp)
u32 ocp_data;
u16 speed;

- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return -ENODEV;

set_tx_qlen(tp);
@@ -6253,7 +6253,7 @@ static int rtl8152_set_speed(struct r8152 *tp, u8 autoneg, u32 speed, u8 duplex,

static void rtl8152_up(struct r8152 *tp)
{
- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return;

r8152_aldps_en(tp, false);
@@ -6263,7 +6263,7 @@ static void rtl8152_up(struct r8152 *tp)

static void rtl8152_down(struct r8152 *tp)
{
- if (test_bit(RTL8152_UNPLUG, &tp->flags)) {
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags)) {
rtl_drop_queued_tx(tp);
return;
}
@@ -6278,7 +6278,7 @@ static void rtl8153_up(struct r8152 *tp)
{
u32 ocp_data;

- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return;

r8153_u1u2en(tp, false);
@@ -6318,7 +6318,7 @@ static void rtl8153_down(struct r8152 *tp)
{
u32 ocp_data;

- if (test_bit(RTL8152_UNPLUG, &tp->flags)) {
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags)) {
rtl_drop_queued_tx(tp);
return;
}
@@ -6339,7 +6339,7 @@ static void rtl8153b_up(struct r8152 *tp)
{
u32 ocp_data;

- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return;

r8153b_u1u2en(tp, false);
@@ -6363,7 +6363,7 @@ static void rtl8153b_down(struct r8152 *tp)
{
u32 ocp_data;

- if (test_bit(RTL8152_UNPLUG, &tp->flags)) {
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags)) {
rtl_drop_queued_tx(tp);
return;
}
@@ -6400,7 +6400,7 @@ static void rtl8153c_up(struct r8152 *tp)
{
u32 ocp_data;

- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return;

r8153b_u1u2en(tp, false);
@@ -6481,7 +6481,7 @@ static void rtl8156_up(struct r8152 *tp)
{
u32 ocp_data;

- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return;

r8153b_u1u2en(tp, false);
@@ -6554,7 +6554,7 @@ static void rtl8156_down(struct r8152 *tp)
{
u32 ocp_data;

- if (test_bit(RTL8152_UNPLUG, &tp->flags)) {
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags)) {
rtl_drop_queued_tx(tp);
return;
}
@@ -6692,7 +6692,7 @@ static void rtl_work_func_t(struct work_struct *work)
/* If the device is unplugged or !netif_running(), the workqueue
* doesn't need to wake the device, and could return directly.
*/
- if (test_bit(RTL8152_UNPLUG, &tp->flags) || !netif_running(tp->netdev))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags) || !netif_running(tp->netdev))
return;

if (usb_autopm_get_interface(tp->intf) < 0)
@@ -6731,7 +6731,7 @@ static void rtl_hw_phy_work_func_t(struct work_struct *work)
{
struct r8152 *tp = container_of(work, struct r8152, hw_phy_work.work);

- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return;

if (usb_autopm_get_interface(tp->intf) < 0)
@@ -6858,7 +6858,7 @@ static int rtl8152_close(struct net_device *netdev)
netif_stop_queue(netdev);

res = usb_autopm_get_interface(tp->intf);
- if (res < 0 || test_bit(RTL8152_UNPLUG, &tp->flags)) {
+ if (res < 0 || test_bit(RTL8152_INACCESSIBLE, &tp->flags)) {
rtl_drop_queued_tx(tp);
rtl_stop_rx(tp);
} else {
@@ -6891,7 +6891,7 @@ static void r8152b_init(struct r8152 *tp)
u32 ocp_data;
u16 data;

- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return;

data = r8152_mdio_read(tp, MII_BMCR);
@@ -6935,7 +6935,7 @@ static void r8153_init(struct r8152 *tp)
u16 data;
int i;

- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return;

r8153_u1u2en(tp, false);
@@ -6946,7 +6946,7 @@ static void r8153_init(struct r8152 *tp)
break;

msleep(20);
- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
break;
}

@@ -7075,7 +7075,7 @@ static void r8153b_init(struct r8152 *tp)
u16 data;
int i;

- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return;

r8153b_u1u2en(tp, false);
@@ -7086,7 +7086,7 @@ static void r8153b_init(struct r8152 *tp)
break;

msleep(20);
- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
break;
}

@@ -7157,7 +7157,7 @@ static void r8153c_init(struct r8152 *tp)
u16 data;
int i;

- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return;

r8153b_u1u2en(tp, false);
@@ -7177,7 +7177,7 @@ static void r8153c_init(struct r8152 *tp)
break;

msleep(20);
- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return;
}

@@ -8006,7 +8006,7 @@ static void r8156_init(struct r8152 *tp)
u16 data;
int i;

- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return;

ocp_data = ocp_read_byte(tp, MCU_TYPE_USB, USB_ECM_OP);
@@ -8027,7 +8027,7 @@ static void r8156_init(struct r8152 *tp)
break;

msleep(20);
- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return;
}

@@ -8102,7 +8102,7 @@ static void r8156b_init(struct r8152 *tp)
u16 data;
int i;

- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return;

ocp_data = ocp_read_byte(tp, MCU_TYPE_USB, USB_ECM_OP);
@@ -8136,7 +8136,7 @@ static void r8156b_init(struct r8152 *tp)
break;

msleep(20);
- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return;
}

@@ -9165,7 +9165,7 @@ static int rtl8152_ioctl(struct net_device *netdev, struct ifreq *rq, int cmd)
struct mii_ioctl_data *data = if_mii(rq);
int res;

- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return -ENODEV;

res = usb_autopm_get_interface(tp->intf);
@@ -9267,7 +9267,7 @@ static const struct net_device_ops rtl8152_netdev_ops = {

static void rtl8152_unload(struct r8152 *tp)
{
- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return;

if (tp->version != RTL_VER_01)
@@ -9276,7 +9276,7 @@ static void rtl8152_unload(struct r8152 *tp)

static void rtl8153_unload(struct r8152 *tp)
{
- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return;

r8153_power_cut_en(tp, false);
@@ -9284,7 +9284,7 @@ static void rtl8153_unload(struct r8152 *tp)

static void rtl8153b_unload(struct r8152 *tp)
{
- if (test_bit(RTL8152_UNPLUG, &tp->flags))
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
return;

r8153b_power_cut_en(tp, false);
--
2.42.0.758.gaed0368e0e-goog

2023-10-20 21:09:39

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v5 8/8] r8152: Block future register access if register access fails

Even though the functions to read/write registers can fail, most of
the places in the r8152 driver that read/write register values don't
check error codes. The lack of error code checking is problematic in
at least two ways.

The first problem is that the r8152 driver often uses code patterns
similar to this:
x = read_register()
x = x | SOME_BIT;
write_register(x);

...with the above pattern, if the read_register() fails and returns
garbage then we'll end up trying to write modified garbage back to the
Realtek adapter. If the write_register() succeeds that's bad. Note
that as of commit f53a7ad18959 ("r8152: Set memory to all 0xFFs on
failed reg reads") the "garbage" returned by read_register() will at
least be consistent garbage, but it is still garbage.

It turns out that this problem is very serious. Writing garbage to
some of the hardware registers on the Ethernet adapter can put the
adapter in such a bad state that it needs to be power cycled (fully
unplugged and plugged in again) before it can enumerate again.

The second problem is that the r8152 driver generally has functions
that are long sequences of register writes. Assuming everything will
be OK if a random register write fails in the middle isn't a great
assumption.

One might wonder if the above two problems are real. You could ask if
we would really have a successful write after a failed read. It turns
out that the answer appears to be "yes, this can happen". In fact,
we've seen at least two distinct failure modes where this happens.

On a sc7180-trogdor Chromebook if you drop into kdb for a while and
then resume, you can see:
1. We get a "Tx timeout"
2. The "Tx timeout" queues up a USB reset.
3. In rtl8152_pre_reset() we try to reinit the hardware.
4. The first several (2-9) register accesses fail with a timeout, then
things recover.

The above test case was actually fixed by the patch ("r8152: Increase
USB control msg timeout to 5000ms as per spec") but at least shows
that we really can see successful calls after failed ones.

On a different (AMD) based Chromebook with a particular adapter, we
found that during reboot tests we'd also sometimes get a transitory
failure. In this case we saw -EPIPE being returned sometimes. Retrying
worked, but retrying is not always safe for all register accesses
since reading/writing some registers might have side effects (like
registers that clear on read).

Let's fully lock out all register access if a register access fails.
When we do this, we'll try to queue up a USB reset and try to unlock
register access after the reset. This is slightly tricker than it
sounds since the r8152 driver has an optimized reset sequence that
only works reliably after probe happens. In order to handle this, we
avoid the optimized reset if probe didn't finish. Instead, we simply
retry the probe routine in this case.

When locking out access, we'll use the existing infrastructure that
the driver was using when it detected we were unplugged. This keeps us
from getting stuck in delay loops in some parts of the driver.

Signed-off-by: Douglas Anderson <[email protected]>
---
Originally when looking at this problem I thought that the obvious
solution was to "just" add better error handling to the driver. This
_sounds_ appealing, but it's a massive change and touches a
significant portion of the lines in this driver. It's also not always
obvious what the driver should be doing to handle errors.

If you feel like you need to be convinced and to see what it looked
like to add better error handling, I put up my "work in progress"
patch when I was investigating this at: https://crrev.com/c/4937290

There is still some active debate between the two approaches, though,
so it would be interesting to hear if anyone had any opinions.

NOTE: Grant's review tag was removed in v5 since v5 changed somewhat
significantly.

Changes in v5:
- Removed extra mutex_unlock() left over in v4.
- Fixed minor typos.
- Don't do queue an unbind/bind reset if probe fails; just retry probe.

Changes in v4:
- Took out some unnecessary locks/unlocks of the control mutex.
- Added comment about reading version causing probe fail if 3 fails.
- Added text to commit msg about the potential unbind/bind loop.

Changes in v3:
- Fixed v2 changelog ending up in the commit message.
- farmework -> framework in comments.

Changes in v2:
- Reset patch no longer based on retry patch, since that was dropped.
- Reset patch should be robust even if failures happen in probe.
- Switched booleans to bits in the "flags" variable.
- Check for -ENODEV instead of "udev->state == USB_STATE_NOTATTACHED"

drivers/net/usb/r8152.c | 207 ++++++++++++++++++++++++++++++++++------
1 file changed, 176 insertions(+), 31 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 65232848b31d..afb20c0ed688 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -773,6 +773,9 @@ enum rtl8152_flags {
SCHEDULE_TASKLET,
GREEN_ETHERNET,
RX_EPROTO,
+ IN_PRE_RESET,
+ PROBED_WITH_NO_ERRORS,
+ PROBE_SHOULD_RETRY,
};

#define DEVICE_ID_LENOVO_USB_C_TRAVEL_HUB 0x721e
@@ -953,6 +956,8 @@ struct r8152 {
u8 version;
u8 duplex;
u8 autoneg;
+
+ unsigned int reg_access_reset_count;
};

/**
@@ -1200,6 +1205,96 @@ static unsigned int agg_buf_sz = 16384;

#define RTL_LIMITED_TSO_SIZE (size_to_mtu(agg_buf_sz) - sizeof(struct tx_desc))

+/* If register access fails then we block access and issue a reset. If this
+ * happens too many times in a row without a successful access then we stop
+ * trying to reset and just leave access blocked.
+ */
+#define REGISTER_ACCESS_MAX_RESETS 3
+
+static void rtl_set_inaccessible(struct r8152 *tp)
+{
+ set_bit(RTL8152_INACCESSIBLE, &tp->flags);
+ smp_mb__after_atomic();
+}
+
+static void rtl_set_accessible(struct r8152 *tp)
+{
+ clear_bit(RTL8152_INACCESSIBLE, &tp->flags);
+ smp_mb__after_atomic();
+}
+
+static
+int r8152_control_msg(struct r8152 *tp, unsigned int pipe, __u8 request,
+ __u8 requesttype, __u16 value, __u16 index, void *data,
+ __u16 size, const char *msg_tag)
+{
+ struct usb_device *udev = tp->udev;
+ int ret;
+
+ if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
+ return -ENODEV;
+
+ ret = usb_control_msg(udev, pipe, request, requesttype,
+ value, index, data, size,
+ USB_CTRL_GET_TIMEOUT);
+
+ /* No need to issue a reset to report an error if the USB device got
+ * unplugged; just return immediately.
+ */
+ if (ret == -ENODEV)
+ return ret;
+
+ /* If the write was successful then we're done */
+ if (ret >= 0) {
+ tp->reg_access_reset_count = 0;
+ return ret;
+ }
+
+ dev_err(&udev->dev,
+ "Failed to %s %d bytes at %#06x/%#06x (%d)\n",
+ msg_tag, size, value, index, ret);
+
+ /* Block all future register access until we reset. Much of the code
+ * in the driver doesn't check for errors. Notably, many parts of the
+ * driver do a read/modify/write of a register value without
+ * confirming that the read succeeded. Writing back modified garbage
+ * like this can fully wedge the adapter, requiring a power cycle.
+ */
+ rtl_set_inaccessible(tp);
+
+ /* If probe hasn't yet finished, then we'll request a retry of the
+ * whole probe routine if we get any control transfer errors. We
+ * never have to clear this bit since we free/reallocate the whole "tp"
+ * structure if we retry probe.
+ */
+ if (!test_bit(PROBED_WITH_NO_ERRORS, &tp->flags)) {
+ set_bit(PROBE_SHOULD_RETRY, &tp->flags);
+ return ret;
+ }
+
+ /* Failing to access registers in pre-reset is not surprising since we
+ * wouldn't be resetting if things were behaving normally. The register
+ * access we do in pre-reset isn't truly mandatory--we're just reusing
+ * the disable() function and trying to be nice by powering the
+ * adapter down before resetting it. Thus, if we're in pre-reset,
+ * we'll return right away and not try to queue up yet another reset.
+ * We know the post-reset is already coming.
+ */
+ if (test_bit(IN_PRE_RESET, &tp->flags))
+ return ret;
+
+ if (tp->reg_access_reset_count < REGISTER_ACCESS_MAX_RESETS) {
+ usb_queue_reset_device(tp->intf);
+ tp->reg_access_reset_count++;
+ } else if (tp->reg_access_reset_count == REGISTER_ACCESS_MAX_RESETS) {
+ dev_err(&udev->dev,
+ "Tried to reset %d times; giving up.\n",
+ REGISTER_ACCESS_MAX_RESETS);
+ }
+
+ return ret;
+}
+
static
int get_registers(struct r8152 *tp, u16 value, u16 index, u16 size, void *data)
{
@@ -1210,9 +1305,10 @@ int get_registers(struct r8152 *tp, u16 value, u16 index, u16 size, void *data)
if (!tmp)
return -ENOMEM;

- ret = usb_control_msg(tp->udev, tp->pipe_ctrl_in,
- RTL8152_REQ_GET_REGS, RTL8152_REQT_READ,
- value, index, tmp, size, USB_CTRL_GET_TIMEOUT);
+ ret = r8152_control_msg(tp, tp->pipe_ctrl_in,
+ RTL8152_REQ_GET_REGS, RTL8152_REQT_READ,
+ value, index, tmp, size, "read");
+
if (ret < 0)
memset(data, 0xff, size);
else
@@ -1233,9 +1329,9 @@ int set_registers(struct r8152 *tp, u16 value, u16 index, u16 size, void *data)
if (!tmp)
return -ENOMEM;

- ret = usb_control_msg(tp->udev, tp->pipe_ctrl_out,
- RTL8152_REQ_SET_REGS, RTL8152_REQT_WRITE,
- value, index, tmp, size, USB_CTRL_SET_TIMEOUT);
+ ret = r8152_control_msg(tp, tp->pipe_ctrl_out,
+ RTL8152_REQ_SET_REGS, RTL8152_REQT_WRITE,
+ value, index, tmp, size, "write");

kfree(tmp);

@@ -1244,10 +1340,8 @@ int set_registers(struct r8152 *tp, u16 value, u16 index, u16 size, void *data)

static void rtl_set_unplug(struct r8152 *tp)
{
- if (tp->udev->state == USB_STATE_NOTATTACHED) {
- set_bit(RTL8152_INACCESSIBLE, &tp->flags);
- smp_mb__after_atomic();
- }
+ if (tp->udev->state == USB_STATE_NOTATTACHED)
+ rtl_set_inaccessible(tp);
}

static int generic_ocp_read(struct r8152 *tp, u16 index, u16 size,
@@ -8262,7 +8356,7 @@ static int rtl8152_pre_reset(struct usb_interface *intf)
struct r8152 *tp = usb_get_intfdata(intf);
struct net_device *netdev;

- if (!tp)
+ if (!tp || !test_bit(PROBED_WITH_NO_ERRORS, &tp->flags))
return 0;

netdev = tp->netdev;
@@ -8277,7 +8371,9 @@ static int rtl8152_pre_reset(struct usb_interface *intf)
napi_disable(&tp->napi);
if (netif_carrier_ok(netdev)) {
mutex_lock(&tp->control);
+ set_bit(IN_PRE_RESET, &tp->flags);
tp->rtl_ops.disable(tp);
+ clear_bit(IN_PRE_RESET, &tp->flags);
mutex_unlock(&tp->control);
}

@@ -8290,9 +8386,11 @@ static int rtl8152_post_reset(struct usb_interface *intf)
struct net_device *netdev;
struct sockaddr sa;

- if (!tp)
+ if (!tp || !test_bit(PROBED_WITH_NO_ERRORS, &tp->flags))
return 0;

+ rtl_set_accessible(tp);
+
/* reset the MAC address in case of policy change */
if (determine_ethernet_addr(tp, &sa) >= 0) {
rtnl_lock();
@@ -9494,17 +9592,29 @@ static u8 __rtl_get_hw_ver(struct usb_device *udev)
__le32 *tmp;
u8 version;
int ret;
+ int i;

tmp = kmalloc(sizeof(*tmp), GFP_KERNEL);
if (!tmp)
return 0;

- ret = usb_control_msg(udev, usb_rcvctrlpipe(udev, 0),
- RTL8152_REQ_GET_REGS, RTL8152_REQT_READ,
- PLA_TCR0, MCU_TYPE_PLA, tmp, sizeof(*tmp),
- USB_CTRL_GET_TIMEOUT);
- if (ret > 0)
- ocp_data = (__le32_to_cpu(*tmp) >> 16) & VERSION_MASK;
+ /* Retry up to 3 times in case there is a transitory error. We do this
+ * since retrying a read of the version is always safe and this
+ * function doesn't take advantage of r8152_control_msg().
+ */
+ for (i = 0; i < 3; i++) {
+ ret = usb_control_msg(udev, usb_rcvctrlpipe(udev, 0),
+ RTL8152_REQ_GET_REGS, RTL8152_REQT_READ,
+ PLA_TCR0, MCU_TYPE_PLA, tmp, sizeof(*tmp),
+ USB_CTRL_GET_TIMEOUT);
+ if (ret > 0) {
+ ocp_data = (__le32_to_cpu(*tmp) >> 16) & VERSION_MASK;
+ break;
+ }
+ }
+
+ if (i != 0 && ret > 0)
+ dev_warn(&udev->dev, "Needed %d retries to read version\n", i);

kfree(tmp);

@@ -9603,25 +9713,14 @@ static bool rtl8152_supports_lenovo_macpassthru(struct usb_device *udev)
return 0;
}

-static int rtl8152_probe(struct usb_interface *intf,
- const struct usb_device_id *id)
+static int rtl8152_probe_once(struct usb_interface *intf,
+ const struct usb_device_id *id, u8 version)
{
struct usb_device *udev = interface_to_usbdev(intf);
struct r8152 *tp;
struct net_device *netdev;
- u8 version;
int ret;

- if (intf->cur_altsetting->desc.bInterfaceClass != USB_CLASS_VENDOR_SPEC)
- return -ENODEV;
-
- if (!rtl_check_vendor_ok(intf))
- return -ENODEV;
-
- version = rtl8152_get_version(intf);
- if (version == RTL_VER_UNKNOWN)
- return -ENODEV;
-
usb_reset_device(udev);
netdev = alloc_etherdev(sizeof(struct r8152));
if (!netdev) {
@@ -9784,10 +9883,20 @@ static int rtl8152_probe(struct usb_interface *intf,
else
device_set_wakeup_enable(&udev->dev, false);

+ /* If we saw a control transfer error while probing then we may
+ * want to try probe() again. Consider this an error.
+ */
+ if (test_bit(PROBE_SHOULD_RETRY, &tp->flags))
+ goto out2;
+
+ set_bit(PROBED_WITH_NO_ERRORS, &tp->flags);
netif_info(tp, probe, netdev, "%s\n", DRIVER_VERSION);

return 0;

+out2:
+ unregister_netdev(netdev);
+
out1:
tasklet_kill(&tp->tx_tl);
cancel_delayed_work_sync(&tp->hw_phy_work);
@@ -9796,10 +9905,46 @@ static int rtl8152_probe(struct usb_interface *intf,
rtl8152_release_firmware(tp);
usb_set_intfdata(intf, NULL);
out:
+ if (test_bit(PROBE_SHOULD_RETRY, &tp->flags))
+ ret = -EAGAIN;
+
free_netdev(netdev);
return ret;
}

+#define RTL8152_PROBE_TRIES 3
+
+static int rtl8152_probe(struct usb_interface *intf,
+ const struct usb_device_id *id)
+{
+ u8 version;
+ int ret;
+ int i;
+
+ if (intf->cur_altsetting->desc.bInterfaceClass != USB_CLASS_VENDOR_SPEC)
+ return -ENODEV;
+
+ if (!rtl_check_vendor_ok(intf))
+ return -ENODEV;
+
+ version = rtl8152_get_version(intf);
+ if (version == RTL_VER_UNKNOWN)
+ return -ENODEV;
+
+ for (i = 0; i < RTL8152_PROBE_TRIES; i++) {
+ ret = rtl8152_probe_once(intf, id, version);
+ if (ret != -EAGAIN)
+ break;
+ }
+ if (ret == -EAGAIN) {
+ dev_err(&intf->dev,
+ "r8152 failed probe after %d tries; giving up\n", i);
+ return -ENODEV;
+ }
+
+ return ret;
+}
+
static void rtl8152_disconnect(struct usb_interface *intf)
{
struct r8152 *tp = usb_get_intfdata(intf);
--
2.42.0.758.gaed0368e0e-goog

2023-10-21 14:51:11

by Grant Grundler

[permalink] [raw]
Subject: Re: [PATCH v5 2/8] r8152: Run the unload routine if we have errors during probe

On Fri, Oct 20, 2023 at 2:08 PM Douglas Anderson <[email protected]> wrote:
>
> The rtl8152_probe() function lacks a call to the chip-specific
> unload() routine when it sees an error in probe. Add it in to match
> the cleanup code in rtl8152_disconnect().
>
> Fixes: ac718b69301c ("net/usb: new driver for RTL8152")
> Signed-off-by: Douglas Anderson <[email protected]>

Reviewed-by: Grant Grundler <[email protected]>

> ---
>
> Changes in v5:
> - ("Run the unload routine if we have errors during probe") new for v5.
>
> drivers/net/usb/r8152.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
> index 482957beae66..201c688e3e3f 100644
> --- a/drivers/net/usb/r8152.c
> +++ b/drivers/net/usb/r8152.c
> @@ -9783,6 +9783,8 @@ static int rtl8152_probe(struct usb_interface *intf,
>
> out1:
> tasklet_kill(&tp->tx_tl);
> + if (tp->rtl_ops.unload)
> + tp->rtl_ops.unload(tp);
> usb_set_intfdata(intf, NULL);
> out:
> free_netdev(netdev);
> --
> 2.42.0.758.gaed0368e0e-goog
>

2023-10-21 14:54:05

by Grant Grundler

[permalink] [raw]
Subject: Re: [PATCH v5 3/8] r8152: Cancel hw_phy_work if we have an error in probe

On Fri, Oct 20, 2023 at 2:08 PM Douglas Anderson <[email protected]> wrote:
>
> The error handling in rtl8152_probe() is missing a call to cancel the
> hw_phy_work. Add it in to match what's in the cleanup code in
> rtl8152_disconnect().

Sounds like there is a future opportunity for someone (not Doug) to
refactor code.

> Fixes: a028a9e003f2 ("r8152: move the settings of PHY to a work queue")
> Signed-off-by: Douglas Anderson <[email protected]>

Reviewed-by: Grant Grundler <[email protected]>

> ---
>
> Changes in v5:
> - ("Cancel hw_phy_work if we have an error in probe") new for v5.
>
> drivers/net/usb/r8152.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
> index 201c688e3e3f..d10b0886b652 100644
> --- a/drivers/net/usb/r8152.c
> +++ b/drivers/net/usb/r8152.c
> @@ -9783,6 +9783,7 @@ static int rtl8152_probe(struct usb_interface *intf,
>
> out1:
> tasklet_kill(&tp->tx_tl);
> + cancel_delayed_work_sync(&tp->hw_phy_work);
> if (tp->rtl_ops.unload)
> tp->rtl_ops.unload(tp);
> usb_set_intfdata(intf, NULL);
> --
> 2.42.0.758.gaed0368e0e-goog
>

2023-10-21 15:02:39

by Grant Grundler

[permalink] [raw]
Subject: Re: [PATCH v5 4/8] r8152: Release firmware if we have an error in probe

On Fri, Oct 20, 2023 at 2:08 PM Douglas Anderson <[email protected]> wrote:
>
> The error handling in rtl8152_probe() is missing a call to release
> firmware. Add it in to match what's in the cleanup code in
> rtl8152_disconnect().
>
> Fixes: 9370f2d05a2a ("r8152: support request_firmware for RTL8153")
> Signed-off-by: Douglas Anderson <[email protected]>

Reviewed-by: Grant Grundler <[email protected]>

> ---
>
> Changes in v5:
> - ("Release firmware if we have an error in probe") new for v5.
>
> drivers/net/usb/r8152.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
> index d10b0886b652..656fe90734fc 100644
> --- a/drivers/net/usb/r8152.c
> +++ b/drivers/net/usb/r8152.c
> @@ -9786,6 +9786,7 @@ static int rtl8152_probe(struct usb_interface *intf,
> cancel_delayed_work_sync(&tp->hw_phy_work);
> if (tp->rtl_ops.unload)
> tp->rtl_ops.unload(tp);
> + rtl8152_release_firmware(tp);
> usb_set_intfdata(intf, NULL);
> out:
> free_netdev(netdev);
> --
> 2.42.0.758.gaed0368e0e-goog
>

2023-10-21 15:03:56

by Grant Grundler

[permalink] [raw]
Subject: Re: [PATCH v5 5/8] r8152: Check for unplug in rtl_phy_patch_request()

On Fri, Oct 20, 2023 at 2:08 PM Douglas Anderson <[email protected]> wrote:
>
> If the adapter is unplugged while we're looping in
> rtl_phy_patch_request() we could end up looping for 10 seconds (2 ms *
> 5000 loops). Add code similar to what's done in other places in the
> driver to check for unplug and bail.
>
> Signed-off-by: Douglas Anderson <[email protected]>

Reviewed-by: Grant Grundler <[email protected]>

> ---
>
> (no changes since v2)
>
> Changes in v2:
> - ("Check for unplug in rtl_phy_patch_request()") new for v2.
>
> drivers/net/usb/r8152.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
> index 656fe90734fc..9888bc43e903 100644
> --- a/drivers/net/usb/r8152.c
> +++ b/drivers/net/usb/r8152.c
> @@ -4046,6 +4046,9 @@ static int rtl_phy_patch_request(struct r8152 *tp, bool request, bool wait)
> for (i = 0; wait && i < 5000; i++) {
> u32 ocp_data;
>
> + if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + break;
> +
> usleep_range(1000, 2000);
> ocp_data = ocp_reg_read(tp, OCP_PHY_PATCH_STAT);
> if ((ocp_data & PATCH_READY) ^ check)
> --
> 2.42.0.758.gaed0368e0e-goog
>

2023-10-21 15:06:22

by Grant Grundler

[permalink] [raw]
Subject: Re: [PATCH v5 6/8] r8152: Check for unplug in r8153b_ups_en() / r8153c_ups_en()

On Fri, Oct 20, 2023 at 2:08 PM Douglas Anderson <[email protected]> wrote:
>
> If the adapter is unplugged while we're looping in r8153b_ups_en() /
> r8153c_ups_en() we could end up looping for 10 seconds (20 ms * 500
> loops). Add code similar to what's done in other places in the driver
> to check for unplug and bail.
>
> Signed-off-by: Douglas Anderson <[email protected]>

Reviewed-by: Grant Grundler <[email protected]>

> ---
>
> (no changes since v2)
>
> Changes in v2:
> - ("Check for unplug in r8153b_ups_en() / r8153c_ups_en()") new for v2.
>
> drivers/net/usb/r8152.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
> index 9888bc43e903..982f9ca03e7a 100644
> --- a/drivers/net/usb/r8152.c
> +++ b/drivers/net/usb/r8152.c
> @@ -3663,6 +3663,8 @@ static void r8153b_ups_en(struct r8152 *tp, bool enable)
> int i;
>
> for (i = 0; i < 500; i++) {
> + if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + return;
> if (ocp_read_word(tp, MCU_TYPE_PLA, PLA_BOOT_CTRL) &
> AUTOLOAD_DONE)
> break;
> @@ -3703,6 +3705,8 @@ static void r8153c_ups_en(struct r8152 *tp, bool enable)
> int i;
>
> for (i = 0; i < 500; i++) {
> + if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + return;
> if (ocp_read_word(tp, MCU_TYPE_PLA, PLA_BOOT_CTRL) &
> AUTOLOAD_DONE)
> break;
> --
> 2.42.0.758.gaed0368e0e-goog
>

2023-10-21 15:06:52

by Grant Grundler

[permalink] [raw]
Subject: Re: [PATCH v5 7/8] r8152: Rename RTL8152_UNPLUG to RTL8152_INACCESSIBLE

On Fri, Oct 20, 2023 at 2:08 PM Douglas Anderson <[email protected]> wrote:
>
> Whenever the RTL8152_UNPLUG is set that just tells the driver that all
> accesses will fail and we should just immediately bail. A future patch
> will use this same concept at a time when the driver hasn't actually
> been unplugged but is about to be reset. Rename the flag in
> preparation for the future patch.
>
> This is a no-op change and just a search and replace.
>
> Signed-off-by: Douglas Anderson <[email protected]>

Reviewed-by: Grant Grundler <[email protected]>

> ---
>
> (no changes since v2)
>
> Changes in v2:
> - ("Rename RTL8152_UNPLUG to RTL8152_INACCESSIBLE") new for v2.
>
> drivers/net/usb/r8152.c | 96 ++++++++++++++++++++---------------------
> 1 file changed, 48 insertions(+), 48 deletions(-)
>
> diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
> index 982f9ca03e7a..65232848b31d 100644
> --- a/drivers/net/usb/r8152.c
> +++ b/drivers/net/usb/r8152.c
> @@ -764,7 +764,7 @@ enum rtl_register_content {
>
> /* rtl8152 flags */
> enum rtl8152_flags {
> - RTL8152_UNPLUG = 0,
> + RTL8152_INACCESSIBLE = 0,
> RTL8152_SET_RX_MODE,
> WORK_ENABLE,
> RTL8152_LINK_CHG,
> @@ -1245,7 +1245,7 @@ int set_registers(struct r8152 *tp, u16 value, u16 index, u16 size, void *data)
> static void rtl_set_unplug(struct r8152 *tp)
> {
> if (tp->udev->state == USB_STATE_NOTATTACHED) {
> - set_bit(RTL8152_UNPLUG, &tp->flags);
> + set_bit(RTL8152_INACCESSIBLE, &tp->flags);
> smp_mb__after_atomic();
> }
> }
> @@ -1256,7 +1256,7 @@ static int generic_ocp_read(struct r8152 *tp, u16 index, u16 size,
> u16 limit = 64;
> int ret = 0;
>
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return -ENODEV;
>
> /* both size and indix must be 4 bytes align */
> @@ -1300,7 +1300,7 @@ static int generic_ocp_write(struct r8152 *tp, u16 index, u16 byteen,
> u16 byteen_start, byteen_end, byen;
> u16 limit = 512;
>
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return -ENODEV;
>
> /* both size and indix must be 4 bytes align */
> @@ -1537,7 +1537,7 @@ static int read_mii_word(struct net_device *netdev, int phy_id, int reg)
> struct r8152 *tp = netdev_priv(netdev);
> int ret;
>
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return -ENODEV;
>
> if (phy_id != R8152_PHY_ID)
> @@ -1553,7 +1553,7 @@ void write_mii_word(struct net_device *netdev, int phy_id, int reg, int val)
> {
> struct r8152 *tp = netdev_priv(netdev);
>
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return;
>
> if (phy_id != R8152_PHY_ID)
> @@ -1758,7 +1758,7 @@ static void read_bulk_callback(struct urb *urb)
> if (!tp)
> return;
>
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return;
>
> if (!test_bit(WORK_ENABLE, &tp->flags))
> @@ -1850,7 +1850,7 @@ static void write_bulk_callback(struct urb *urb)
> if (!test_bit(WORK_ENABLE, &tp->flags))
> return;
>
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return;
>
> if (!skb_queue_empty(&tp->tx_queue))
> @@ -1871,7 +1871,7 @@ static void intr_callback(struct urb *urb)
> if (!test_bit(WORK_ENABLE, &tp->flags))
> return;
>
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return;
>
> switch (status) {
> @@ -2615,7 +2615,7 @@ static void bottom_half(struct tasklet_struct *t)
> {
> struct r8152 *tp = from_tasklet(tp, t, tx_tl);
>
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return;
>
> if (!test_bit(WORK_ENABLE, &tp->flags))
> @@ -2658,7 +2658,7 @@ int r8152_submit_rx(struct r8152 *tp, struct rx_agg *agg, gfp_t mem_flags)
> int ret;
>
> /* The rx would be stopped, so skip submitting */
> - if (test_bit(RTL8152_UNPLUG, &tp->flags) ||
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags) ||
> !test_bit(WORK_ENABLE, &tp->flags) || !netif_carrier_ok(tp->netdev))
> return 0;
>
> @@ -3058,7 +3058,7 @@ static int rtl_enable(struct r8152 *tp)
>
> static int rtl8152_enable(struct r8152 *tp)
> {
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return -ENODEV;
>
> set_tx_qlen(tp);
> @@ -3145,7 +3145,7 @@ static int rtl8153_enable(struct r8152 *tp)
> {
> u32 ocp_data;
>
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return -ENODEV;
>
> set_tx_qlen(tp);
> @@ -3177,7 +3177,7 @@ static void rtl_disable(struct r8152 *tp)
> u32 ocp_data;
> int i;
>
> - if (test_bit(RTL8152_UNPLUG, &tp->flags)) {
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags)) {
> rtl_drop_queued_tx(tp);
> return;
> }
> @@ -3631,7 +3631,7 @@ static u16 r8153_phy_status(struct r8152 *tp, u16 desired)
> }
>
> msleep(20);
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> break;
> }
>
> @@ -3663,7 +3663,7 @@ static void r8153b_ups_en(struct r8152 *tp, bool enable)
> int i;
>
> for (i = 0; i < 500; i++) {
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return;
> if (ocp_read_word(tp, MCU_TYPE_PLA, PLA_BOOT_CTRL) &
> AUTOLOAD_DONE)
> @@ -3705,7 +3705,7 @@ static void r8153c_ups_en(struct r8152 *tp, bool enable)
> int i;
>
> for (i = 0; i < 500; i++) {
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return;
> if (ocp_read_word(tp, MCU_TYPE_PLA, PLA_BOOT_CTRL) &
> AUTOLOAD_DONE)
> @@ -4050,8 +4050,8 @@ static int rtl_phy_patch_request(struct r8152 *tp, bool request, bool wait)
> for (i = 0; wait && i < 5000; i++) {
> u32 ocp_data;
>
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> - break;
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> + return -ENODEV;
>
> usleep_range(1000, 2000);
> ocp_data = ocp_reg_read(tp, OCP_PHY_PATCH_STAT);
> @@ -6009,7 +6009,7 @@ static int rtl8156_enable(struct r8152 *tp)
> u32 ocp_data;
> u16 speed;
>
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return -ENODEV;
>
> r8156_fc_parameter(tp);
> @@ -6067,7 +6067,7 @@ static int rtl8156b_enable(struct r8152 *tp)
> u32 ocp_data;
> u16 speed;
>
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return -ENODEV;
>
> set_tx_qlen(tp);
> @@ -6253,7 +6253,7 @@ static int rtl8152_set_speed(struct r8152 *tp, u8 autoneg, u32 speed, u8 duplex,
>
> static void rtl8152_up(struct r8152 *tp)
> {
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return;
>
> r8152_aldps_en(tp, false);
> @@ -6263,7 +6263,7 @@ static void rtl8152_up(struct r8152 *tp)
>
> static void rtl8152_down(struct r8152 *tp)
> {
> - if (test_bit(RTL8152_UNPLUG, &tp->flags)) {
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags)) {
> rtl_drop_queued_tx(tp);
> return;
> }
> @@ -6278,7 +6278,7 @@ static void rtl8153_up(struct r8152 *tp)
> {
> u32 ocp_data;
>
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return;
>
> r8153_u1u2en(tp, false);
> @@ -6318,7 +6318,7 @@ static void rtl8153_down(struct r8152 *tp)
> {
> u32 ocp_data;
>
> - if (test_bit(RTL8152_UNPLUG, &tp->flags)) {
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags)) {
> rtl_drop_queued_tx(tp);
> return;
> }
> @@ -6339,7 +6339,7 @@ static void rtl8153b_up(struct r8152 *tp)
> {
> u32 ocp_data;
>
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return;
>
> r8153b_u1u2en(tp, false);
> @@ -6363,7 +6363,7 @@ static void rtl8153b_down(struct r8152 *tp)
> {
> u32 ocp_data;
>
> - if (test_bit(RTL8152_UNPLUG, &tp->flags)) {
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags)) {
> rtl_drop_queued_tx(tp);
> return;
> }
> @@ -6400,7 +6400,7 @@ static void rtl8153c_up(struct r8152 *tp)
> {
> u32 ocp_data;
>
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return;
>
> r8153b_u1u2en(tp, false);
> @@ -6481,7 +6481,7 @@ static void rtl8156_up(struct r8152 *tp)
> {
> u32 ocp_data;
>
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return;
>
> r8153b_u1u2en(tp, false);
> @@ -6554,7 +6554,7 @@ static void rtl8156_down(struct r8152 *tp)
> {
> u32 ocp_data;
>
> - if (test_bit(RTL8152_UNPLUG, &tp->flags)) {
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags)) {
> rtl_drop_queued_tx(tp);
> return;
> }
> @@ -6692,7 +6692,7 @@ static void rtl_work_func_t(struct work_struct *work)
> /* If the device is unplugged or !netif_running(), the workqueue
> * doesn't need to wake the device, and could return directly.
> */
> - if (test_bit(RTL8152_UNPLUG, &tp->flags) || !netif_running(tp->netdev))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags) || !netif_running(tp->netdev))
> return;
>
> if (usb_autopm_get_interface(tp->intf) < 0)
> @@ -6731,7 +6731,7 @@ static void rtl_hw_phy_work_func_t(struct work_struct *work)
> {
> struct r8152 *tp = container_of(work, struct r8152, hw_phy_work.work);
>
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return;
>
> if (usb_autopm_get_interface(tp->intf) < 0)
> @@ -6858,7 +6858,7 @@ static int rtl8152_close(struct net_device *netdev)
> netif_stop_queue(netdev);
>
> res = usb_autopm_get_interface(tp->intf);
> - if (res < 0 || test_bit(RTL8152_UNPLUG, &tp->flags)) {
> + if (res < 0 || test_bit(RTL8152_INACCESSIBLE, &tp->flags)) {
> rtl_drop_queued_tx(tp);
> rtl_stop_rx(tp);
> } else {
> @@ -6891,7 +6891,7 @@ static void r8152b_init(struct r8152 *tp)
> u32 ocp_data;
> u16 data;
>
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return;
>
> data = r8152_mdio_read(tp, MII_BMCR);
> @@ -6935,7 +6935,7 @@ static void r8153_init(struct r8152 *tp)
> u16 data;
> int i;
>
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return;
>
> r8153_u1u2en(tp, false);
> @@ -6946,7 +6946,7 @@ static void r8153_init(struct r8152 *tp)
> break;
>
> msleep(20);
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> break;
> }
>
> @@ -7075,7 +7075,7 @@ static void r8153b_init(struct r8152 *tp)
> u16 data;
> int i;
>
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return;
>
> r8153b_u1u2en(tp, false);
> @@ -7086,7 +7086,7 @@ static void r8153b_init(struct r8152 *tp)
> break;
>
> msleep(20);
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> break;
> }
>
> @@ -7157,7 +7157,7 @@ static void r8153c_init(struct r8152 *tp)
> u16 data;
> int i;
>
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return;
>
> r8153b_u1u2en(tp, false);
> @@ -7177,7 +7177,7 @@ static void r8153c_init(struct r8152 *tp)
> break;
>
> msleep(20);
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return;
> }
>
> @@ -8006,7 +8006,7 @@ static void r8156_init(struct r8152 *tp)
> u16 data;
> int i;
>
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return;
>
> ocp_data = ocp_read_byte(tp, MCU_TYPE_USB, USB_ECM_OP);
> @@ -8027,7 +8027,7 @@ static void r8156_init(struct r8152 *tp)
> break;
>
> msleep(20);
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return;
> }
>
> @@ -8102,7 +8102,7 @@ static void r8156b_init(struct r8152 *tp)
> u16 data;
> int i;
>
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return;
>
> ocp_data = ocp_read_byte(tp, MCU_TYPE_USB, USB_ECM_OP);
> @@ -8136,7 +8136,7 @@ static void r8156b_init(struct r8152 *tp)
> break;
>
> msleep(20);
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return;
> }
>
> @@ -9165,7 +9165,7 @@ static int rtl8152_ioctl(struct net_device *netdev, struct ifreq *rq, int cmd)
> struct mii_ioctl_data *data = if_mii(rq);
> int res;
>
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return -ENODEV;
>
> res = usb_autopm_get_interface(tp->intf);
> @@ -9267,7 +9267,7 @@ static const struct net_device_ops rtl8152_netdev_ops = {
>
> static void rtl8152_unload(struct r8152 *tp)
> {
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return;
>
> if (tp->version != RTL_VER_01)
> @@ -9276,7 +9276,7 @@ static void rtl8152_unload(struct r8152 *tp)
>
> static void rtl8153_unload(struct r8152 *tp)
> {
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return;
>
> r8153_power_cut_en(tp, false);
> @@ -9284,7 +9284,7 @@ static void rtl8153_unload(struct r8152 *tp)
>
> static void rtl8153b_unload(struct r8152 *tp)
> {
> - if (test_bit(RTL8152_UNPLUG, &tp->flags))
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> return;
>
> r8153b_power_cut_en(tp, false);
> --
> 2.42.0.758.gaed0368e0e-goog
>

2023-10-21 15:36:27

by Grant Grundler

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] r8152: Block future register access if register access fails

On Fri, Oct 20, 2023 at 2:08 PM Douglas Anderson <[email protected]> wrote:
>
> Even though the functions to read/write registers can fail, most of
> the places in the r8152 driver that read/write register values don't
> check error codes. The lack of error code checking is problematic in
> at least two ways.
>
> The first problem is that the r8152 driver often uses code patterns
> similar to this:
> x = read_register()
> x = x | SOME_BIT;
> write_register(x);
>
> ...with the above pattern, if the read_register() fails and returns
> garbage then we'll end up trying to write modified garbage back to the
> Realtek adapter. If the write_register() succeeds that's bad. Note
> that as of commit f53a7ad18959 ("r8152: Set memory to all 0xFFs on
> failed reg reads") the "garbage" returned by read_register() will at
> least be consistent garbage, but it is still garbage.
>
> It turns out that this problem is very serious. Writing garbage to
> some of the hardware registers on the Ethernet adapter can put the
> adapter in such a bad state that it needs to be power cycled (fully
> unplugged and plugged in again) before it can enumerate again.
>
> The second problem is that the r8152 driver generally has functions
> that are long sequences of register writes. Assuming everything will
> be OK if a random register write fails in the middle isn't a great
> assumption.
>
> One might wonder if the above two problems are real. You could ask if
> we would really have a successful write after a failed read. It turns
> out that the answer appears to be "yes, this can happen". In fact,
> we've seen at least two distinct failure modes where this happens.
>
> On a sc7180-trogdor Chromebook if you drop into kdb for a while and
> then resume, you can see:
> 1. We get a "Tx timeout"
> 2. The "Tx timeout" queues up a USB reset.
> 3. In rtl8152_pre_reset() we try to reinit the hardware.
> 4. The first several (2-9) register accesses fail with a timeout, then
> things recover.
>
> The above test case was actually fixed by the patch ("r8152: Increase
> USB control msg timeout to 5000ms as per spec") but at least shows
> that we really can see successful calls after failed ones.
>
> On a different (AMD) based Chromebook with a particular adapter, we
> found that during reboot tests we'd also sometimes get a transitory
> failure. In this case we saw -EPIPE being returned sometimes. Retrying
> worked, but retrying is not always safe for all register accesses
> since reading/writing some registers might have side effects (like
> registers that clear on read).
>
> Let's fully lock out all register access if a register access fails.
> When we do this, we'll try to queue up a USB reset and try to unlock
> register access after the reset. This is slightly tricker than it
> sounds since the r8152 driver has an optimized reset sequence that
> only works reliably after probe happens. In order to handle this, we
> avoid the optimized reset if probe didn't finish. Instead, we simply
> retry the probe routine in this case.
>
> When locking out access, we'll use the existing infrastructure that
> the driver was using when it detected we were unplugged. This keeps us
> from getting stuck in delay loops in some parts of the driver.
>
> Signed-off-by: Douglas Anderson <[email protected]>

Reviewed-by: Grant Grundler <[email protected]>

> ---
> Originally when looking at this problem I thought that the obvious
> solution was to "just" add better error handling to the driver. This
> _sounds_ appealing, but it's a massive change and touches a
> significant portion of the lines in this driver. It's also not always
> obvious what the driver should be doing to handle errors.

This needs to be done one code path at a time, not in one massive
change. This is the driver equivalent to removing the BKL
(https://kernelnewbies.org/BigKernelLock).

For two years, I worked on an HPUX SCSI driver (1990s) that supported
the equivalent of hotplug (but it was called "Power Fail" on that HPUX
server). We continually found places in the driver where the error
handling wasn't exactly right or the device FW wasn't responding to
the error handling the way we expected. Even after two years, the rate
of issue discovery was constant despite refactoring most of the driver
(reducing the driver from 15k lines to 11k lines).

> If you feel like you need to be convinced and to see what it looked
> like to add better error handling, I put up my "work in progress"
> patch when I was investigating this at: https://crrev.com/c/4937290

And this still isn't anywhere near "complete".

> There is still some active debate between the two approaches, though,
> so it would be interesting to hear if anyone had any opinions.

I have a strong opinion that the "fix all error handling" needs to be
done by someone who is willing and able to invest several years into
this driver. In other words, based on my experience, I don't think
it's worth doing for this driver.

The approach proposed in Doug's patches can easily be removed if/when
someone is able to fix up all the error handling (and even then, maybe
keep it around "just in case").

> NOTE: Grant's review tag was removed in v5 since v5 changed somewhat
> significantly.

No worries - I'm quite happy to review this again. It's a non-trivial
change but the most important one in the series.

cheers,
grant

> Changes in v5:
> - Removed extra mutex_unlock() left over in v4.
> - Fixed minor typos.
> - Don't do queue an unbind/bind reset if probe fails; just retry probe.
>
> Changes in v4:
> - Took out some unnecessary locks/unlocks of the control mutex.
> - Added comment about reading version causing probe fail if 3 fails.
> - Added text to commit msg about the potential unbind/bind loop.
>
> Changes in v3:
> - Fixed v2 changelog ending up in the commit message.
> - farmework -> framework in comments.
>
> Changes in v2:
> - Reset patch no longer based on retry patch, since that was dropped.
> - Reset patch should be robust even if failures happen in probe.
> - Switched booleans to bits in the "flags" variable.
> - Check for -ENODEV instead of "udev->state == USB_STATE_NOTATTACHED"
>
> drivers/net/usb/r8152.c | 207 ++++++++++++++++++++++++++++++++++------
> 1 file changed, 176 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
> index 65232848b31d..afb20c0ed688 100644
> --- a/drivers/net/usb/r8152.c
> +++ b/drivers/net/usb/r8152.c
> @@ -773,6 +773,9 @@ enum rtl8152_flags {
> SCHEDULE_TASKLET,
> GREEN_ETHERNET,
> RX_EPROTO,
> + IN_PRE_RESET,
> + PROBED_WITH_NO_ERRORS,
> + PROBE_SHOULD_RETRY,
> };
>
> #define DEVICE_ID_LENOVO_USB_C_TRAVEL_HUB 0x721e
> @@ -953,6 +956,8 @@ struct r8152 {
> u8 version;
> u8 duplex;
> u8 autoneg;
> +
> + unsigned int reg_access_reset_count;
> };
>
> /**
> @@ -1200,6 +1205,96 @@ static unsigned int agg_buf_sz = 16384;
>
> #define RTL_LIMITED_TSO_SIZE (size_to_mtu(agg_buf_sz) - sizeof(struct tx_desc))
>
> +/* If register access fails then we block access and issue a reset. If this
> + * happens too many times in a row without a successful access then we stop
> + * trying to reset and just leave access blocked.
> + */
> +#define REGISTER_ACCESS_MAX_RESETS 3
> +
> +static void rtl_set_inaccessible(struct r8152 *tp)
> +{
> + set_bit(RTL8152_INACCESSIBLE, &tp->flags);
> + smp_mb__after_atomic();
> +}
> +
> +static void rtl_set_accessible(struct r8152 *tp)
> +{
> + clear_bit(RTL8152_INACCESSIBLE, &tp->flags);
> + smp_mb__after_atomic();
> +}
> +
> +static
> +int r8152_control_msg(struct r8152 *tp, unsigned int pipe, __u8 request,
> + __u8 requesttype, __u16 value, __u16 index, void *data,
> + __u16 size, const char *msg_tag)
> +{
> + struct usb_device *udev = tp->udev;
> + int ret;
> +
> + if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> + return -ENODEV;
> +
> + ret = usb_control_msg(udev, pipe, request, requesttype,
> + value, index, data, size,
> + USB_CTRL_GET_TIMEOUT);
> +
> + /* No need to issue a reset to report an error if the USB device got
> + * unplugged; just return immediately.
> + */
> + if (ret == -ENODEV)
> + return ret;
> +
> + /* If the write was successful then we're done */
> + if (ret >= 0) {
> + tp->reg_access_reset_count = 0;
> + return ret;
> + }
> +
> + dev_err(&udev->dev,
> + "Failed to %s %d bytes at %#06x/%#06x (%d)\n",
> + msg_tag, size, value, index, ret);
> +
> + /* Block all future register access until we reset. Much of the code
> + * in the driver doesn't check for errors. Notably, many parts of the
> + * driver do a read/modify/write of a register value without
> + * confirming that the read succeeded. Writing back modified garbage
> + * like this can fully wedge the adapter, requiring a power cycle.
> + */
> + rtl_set_inaccessible(tp);
> +
> + /* If probe hasn't yet finished, then we'll request a retry of the
> + * whole probe routine if we get any control transfer errors. We
> + * never have to clear this bit since we free/reallocate the whole "tp"
> + * structure if we retry probe.
> + */
> + if (!test_bit(PROBED_WITH_NO_ERRORS, &tp->flags)) {
> + set_bit(PROBE_SHOULD_RETRY, &tp->flags);
> + return ret;
> + }
> +
> + /* Failing to access registers in pre-reset is not surprising since we
> + * wouldn't be resetting if things were behaving normally. The register
> + * access we do in pre-reset isn't truly mandatory--we're just reusing
> + * the disable() function and trying to be nice by powering the
> + * adapter down before resetting it. Thus, if we're in pre-reset,
> + * we'll return right away and not try to queue up yet another reset.
> + * We know the post-reset is already coming.
> + */
> + if (test_bit(IN_PRE_RESET, &tp->flags))
> + return ret;
> +
> + if (tp->reg_access_reset_count < REGISTER_ACCESS_MAX_RESETS) {
> + usb_queue_reset_device(tp->intf);
> + tp->reg_access_reset_count++;
> + } else if (tp->reg_access_reset_count == REGISTER_ACCESS_MAX_RESETS) {
> + dev_err(&udev->dev,
> + "Tried to reset %d times; giving up.\n",
> + REGISTER_ACCESS_MAX_RESETS);
> + }
> +
> + return ret;
> +}
> +
> static
> int get_registers(struct r8152 *tp, u16 value, u16 index, u16 size, void *data)
> {
> @@ -1210,9 +1305,10 @@ int get_registers(struct r8152 *tp, u16 value, u16 index, u16 size, void *data)
> if (!tmp)
> return -ENOMEM;
>
> - ret = usb_control_msg(tp->udev, tp->pipe_ctrl_in,
> - RTL8152_REQ_GET_REGS, RTL8152_REQT_READ,
> - value, index, tmp, size, USB_CTRL_GET_TIMEOUT);
> + ret = r8152_control_msg(tp, tp->pipe_ctrl_in,
> + RTL8152_REQ_GET_REGS, RTL8152_REQT_READ,
> + value, index, tmp, size, "read");
> +
> if (ret < 0)
> memset(data, 0xff, size);
> else
> @@ -1233,9 +1329,9 @@ int set_registers(struct r8152 *tp, u16 value, u16 index, u16 size, void *data)
> if (!tmp)
> return -ENOMEM;
>
> - ret = usb_control_msg(tp->udev, tp->pipe_ctrl_out,
> - RTL8152_REQ_SET_REGS, RTL8152_REQT_WRITE,
> - value, index, tmp, size, USB_CTRL_SET_TIMEOUT);
> + ret = r8152_control_msg(tp, tp->pipe_ctrl_out,
> + RTL8152_REQ_SET_REGS, RTL8152_REQT_WRITE,
> + value, index, tmp, size, "write");
>
> kfree(tmp);
>
> @@ -1244,10 +1340,8 @@ int set_registers(struct r8152 *tp, u16 value, u16 index, u16 size, void *data)
>
> static void rtl_set_unplug(struct r8152 *tp)
> {
> - if (tp->udev->state == USB_STATE_NOTATTACHED) {
> - set_bit(RTL8152_INACCESSIBLE, &tp->flags);
> - smp_mb__after_atomic();
> - }
> + if (tp->udev->state == USB_STATE_NOTATTACHED)
> + rtl_set_inaccessible(tp);
> }
>
> static int generic_ocp_read(struct r8152 *tp, u16 index, u16 size,
> @@ -8262,7 +8356,7 @@ static int rtl8152_pre_reset(struct usb_interface *intf)
> struct r8152 *tp = usb_get_intfdata(intf);
> struct net_device *netdev;
>
> - if (!tp)
> + if (!tp || !test_bit(PROBED_WITH_NO_ERRORS, &tp->flags))
> return 0;
>
> netdev = tp->netdev;
> @@ -8277,7 +8371,9 @@ static int rtl8152_pre_reset(struct usb_interface *intf)
> napi_disable(&tp->napi);
> if (netif_carrier_ok(netdev)) {
> mutex_lock(&tp->control);
> + set_bit(IN_PRE_RESET, &tp->flags);
> tp->rtl_ops.disable(tp);
> + clear_bit(IN_PRE_RESET, &tp->flags);
> mutex_unlock(&tp->control);
> }
>
> @@ -8290,9 +8386,11 @@ static int rtl8152_post_reset(struct usb_interface *intf)
> struct net_device *netdev;
> struct sockaddr sa;
>
> - if (!tp)
> + if (!tp || !test_bit(PROBED_WITH_NO_ERRORS, &tp->flags))
> return 0;
>
> + rtl_set_accessible(tp);
> +
> /* reset the MAC address in case of policy change */
> if (determine_ethernet_addr(tp, &sa) >= 0) {
> rtnl_lock();
> @@ -9494,17 +9592,29 @@ static u8 __rtl_get_hw_ver(struct usb_device *udev)
> __le32 *tmp;
> u8 version;
> int ret;
> + int i;
>
> tmp = kmalloc(sizeof(*tmp), GFP_KERNEL);
> if (!tmp)
> return 0;
>
> - ret = usb_control_msg(udev, usb_rcvctrlpipe(udev, 0),
> - RTL8152_REQ_GET_REGS, RTL8152_REQT_READ,
> - PLA_TCR0, MCU_TYPE_PLA, tmp, sizeof(*tmp),
> - USB_CTRL_GET_TIMEOUT);
> - if (ret > 0)
> - ocp_data = (__le32_to_cpu(*tmp) >> 16) & VERSION_MASK;
> + /* Retry up to 3 times in case there is a transitory error. We do this
> + * since retrying a read of the version is always safe and this
> + * function doesn't take advantage of r8152_control_msg().
> + */
> + for (i = 0; i < 3; i++) {
> + ret = usb_control_msg(udev, usb_rcvctrlpipe(udev, 0),
> + RTL8152_REQ_GET_REGS, RTL8152_REQT_READ,
> + PLA_TCR0, MCU_TYPE_PLA, tmp, sizeof(*tmp),
> + USB_CTRL_GET_TIMEOUT);
> + if (ret > 0) {
> + ocp_data = (__le32_to_cpu(*tmp) >> 16) & VERSION_MASK;
> + break;
> + }
> + }
> +
> + if (i != 0 && ret > 0)
> + dev_warn(&udev->dev, "Needed %d retries to read version\n", i);
>
> kfree(tmp);
>
> @@ -9603,25 +9713,14 @@ static bool rtl8152_supports_lenovo_macpassthru(struct usb_device *udev)
> return 0;
> }
>
> -static int rtl8152_probe(struct usb_interface *intf,
> - const struct usb_device_id *id)
> +static int rtl8152_probe_once(struct usb_interface *intf,
> + const struct usb_device_id *id, u8 version)
> {
> struct usb_device *udev = interface_to_usbdev(intf);
> struct r8152 *tp;
> struct net_device *netdev;
> - u8 version;
> int ret;
>
> - if (intf->cur_altsetting->desc.bInterfaceClass != USB_CLASS_VENDOR_SPEC)
> - return -ENODEV;
> -
> - if (!rtl_check_vendor_ok(intf))
> - return -ENODEV;
> -
> - version = rtl8152_get_version(intf);
> - if (version == RTL_VER_UNKNOWN)
> - return -ENODEV;
> -
> usb_reset_device(udev);
> netdev = alloc_etherdev(sizeof(struct r8152));
> if (!netdev) {
> @@ -9784,10 +9883,20 @@ static int rtl8152_probe(struct usb_interface *intf,
> else
> device_set_wakeup_enable(&udev->dev, false);
>
> + /* If we saw a control transfer error while probing then we may
> + * want to try probe() again. Consider this an error.
> + */
> + if (test_bit(PROBE_SHOULD_RETRY, &tp->flags))
> + goto out2;
> +
> + set_bit(PROBED_WITH_NO_ERRORS, &tp->flags);
> netif_info(tp, probe, netdev, "%s\n", DRIVER_VERSION);
>
> return 0;
>
> +out2:
> + unregister_netdev(netdev);
> +
> out1:
> tasklet_kill(&tp->tx_tl);
> cancel_delayed_work_sync(&tp->hw_phy_work);
> @@ -9796,10 +9905,46 @@ static int rtl8152_probe(struct usb_interface *intf,
> rtl8152_release_firmware(tp);
> usb_set_intfdata(intf, NULL);
> out:
> + if (test_bit(PROBE_SHOULD_RETRY, &tp->flags))
> + ret = -EAGAIN;
> +
> free_netdev(netdev);
> return ret;
> }
>
> +#define RTL8152_PROBE_TRIES 3
> +
> +static int rtl8152_probe(struct usb_interface *intf,
> + const struct usb_device_id *id)
> +{
> + u8 version;
> + int ret;
> + int i;
> +
> + if (intf->cur_altsetting->desc.bInterfaceClass != USB_CLASS_VENDOR_SPEC)
> + return -ENODEV;
> +
> + if (!rtl_check_vendor_ok(intf))
> + return -ENODEV;
> +
> + version = rtl8152_get_version(intf);
> + if (version == RTL_VER_UNKNOWN)
> + return -ENODEV;
> +
> + for (i = 0; i < RTL8152_PROBE_TRIES; i++) {
> + ret = rtl8152_probe_once(intf, id, version);
> + if (ret != -EAGAIN)
> + break;
> + }
> + if (ret == -EAGAIN) {
> + dev_err(&intf->dev,
> + "r8152 failed probe after %d tries; giving up\n", i);
> + return -ENODEV;
> + }
> +
> + return ret;
> +}
> +
> static void rtl8152_disconnect(struct usb_interface *intf)
> {
> struct r8152 *tp = usb_get_intfdata(intf);
> --
> 2.42.0.758.gaed0368e0e-goog
>

2023-10-22 10:51:08

by patchwork-bot+netdevbpf

[permalink] [raw]
Subject: Re: [PATCH v5 0/8] r8152: Avoid writing garbage to the adapter's registers

Hello:

This series was applied to netdev/net.git (main)
by David S. Miller <[email protected]>:

On Fri, 20 Oct 2023 14:06:51 -0700 you wrote:
> This series is the result of a cooperative debug effort between
> Realtek and the ChromeOS team. On ChromeOS, we've noticed that Realtek
> Ethernet adapters can sometimes get so wedged that even a reboot of
> the host can't get them to enumerate again, assuming that the adapter
> was on a powered hub and din't lose power when the host rebooted. This
> is sometimes seen in the ChromeOS automated testing lab. The only way
> to recover adapters in this state is to manually power cycle them.
>
> [...]

Here is the summary with links:
- [v5,1/8] r8152: Increase USB control msg timeout to 5000ms as per spec
https://git.kernel.org/netdev/net/c/a5feba71ec9c
- [v5,2/8] r8152: Run the unload routine if we have errors during probe
https://git.kernel.org/netdev/net/c/5dd176895269
- [v5,3/8] r8152: Cancel hw_phy_work if we have an error in probe
https://git.kernel.org/netdev/net/c/bb8adff9123e
- [v5,4/8] r8152: Release firmware if we have an error in probe
https://git.kernel.org/netdev/net/c/b8d35024d405
- [v5,5/8] r8152: Check for unplug in rtl_phy_patch_request()
https://git.kernel.org/netdev/net/c/dc90ba37a8c3
- [v5,6/8] r8152: Check for unplug in r8153b_ups_en() / r8153c_ups_en()
https://git.kernel.org/netdev/net/c/bc65cc42af73
- [v5,7/8] r8152: Rename RTL8152_UNPLUG to RTL8152_INACCESSIBLE
https://git.kernel.org/netdev/net/c/715f67f33af4
- [v5,8/8] r8152: Block future register access if register access fails
https://git.kernel.org/netdev/net/c/d9962b0d4202

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html


2023-10-24 01:24:52

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH v5 2/8] r8152: Run the unload routine if we have errors during probe



On 10/20/2023 2:06 PM, Douglas Anderson wrote:
> The rtl8152_probe() function lacks a call to the chip-specific
> unload() routine when it sees an error in probe. Add it in to match
> the cleanup code in rtl8152_disconnect().
>
> Fixes: ac718b69301c ("net/usb: new driver for RTL8152")
> Signed-off-by: Douglas Anderson <[email protected]>

Reviewed-by: Florian Fainelli <[email protected]>
--
Florian

2023-10-24 01:25:09

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH v5 3/8] r8152: Cancel hw_phy_work if we have an error in probe



On 10/20/2023 2:06 PM, Douglas Anderson wrote:
> The error handling in rtl8152_probe() is missing a call to cancel the
> hw_phy_work. Add it in to match what's in the cleanup code in
> rtl8152_disconnect().
>
> Fixes: a028a9e003f2 ("r8152: move the settings of PHY to a work queue")
> Signed-off-by: Douglas Anderson <[email protected]>

Reviewed-by: Florian Fainelli <[email protected]>
--
Florian

2023-10-24 01:25:49

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH v5 5/8] r8152: Check for unplug in rtl_phy_patch_request()



On 10/20/2023 2:06 PM, Douglas Anderson wrote:
> If the adapter is unplugged while we're looping in
> rtl_phy_patch_request() we could end up looping for 10 seconds (2 ms *
> 5000 loops). Add code similar to what's done in other places in the
> driver to check for unplug and bail.
>
> Signed-off-by: Douglas Anderson <[email protected]>

Reviewed-by: Florian Fainelli <[email protected]>
--
Florian

2023-10-24 01:25:55

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH v5 4/8] r8152: Release firmware if we have an error in probe



On 10/20/2023 2:06 PM, Douglas Anderson wrote:
> The error handling in rtl8152_probe() is missing a call to release
> firmware. Add it in to match what's in the cleanup code in
> rtl8152_disconnect().
>
> Fixes: 9370f2d05a2a ("r8152: support request_firmware for RTL8153")
> Signed-off-by: Douglas Anderson <[email protected]>

Reviewed-by: Florian Fainelli <[email protected]>
--
Florian

2023-10-24 01:26:02

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH v5 6/8] r8152: Check for unplug in r8153b_ups_en() / r8153c_ups_en()



On 10/20/2023 2:06 PM, Douglas Anderson wrote:
> If the adapter is unplugged while we're looping in r8153b_ups_en() /
> r8153c_ups_en() we could end up looping for 10 seconds (20 ms * 500
> loops). Add code similar to what's done in other places in the driver
> to check for unplug and bail.
>
> Signed-off-by: Douglas Anderson <[email protected]>

Reviewed-by: Florian Fainelli <[email protected]>
--
Florian

2023-10-24 01:26:35

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH v5 7/8] r8152: Rename RTL8152_UNPLUG to RTL8152_INACCESSIBLE



On 10/20/2023 2:06 PM, Douglas Anderson wrote:
> Whenever the RTL8152_UNPLUG is set that just tells the driver that all
> accesses will fail and we should just immediately bail. A future patch
> will use this same concept at a time when the driver hasn't actually
> been unplugged but is about to be reset. Rename the flag in
> preparation for the future patch.
>
> This is a no-op change and just a search and replace.
>
> Signed-off-by: Douglas Anderson <[email protected]>

Reviewed-by: Florian Fainelli <[email protected]>
--
Florian

2023-10-24 01:27:28

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH v5 0/8] r8152: Avoid writing garbage to the adapter's registers



On 10/22/2023 3:50 AM, [email protected] wrote:
> Hello:
>
> This series was applied to netdev/net.git (main)
> by David S. Miller <[email protected]>:
>
> On Fri, 20 Oct 2023 14:06:51 -0700 you wrote:
>> This series is the result of a cooperative debug effort between
>> Realtek and the ChromeOS team. On ChromeOS, we've noticed that Realtek
>> Ethernet adapters can sometimes get so wedged that even a reboot of
>> the host can't get them to enumerate again, assuming that the adapter
>> was on a powered hub and din't lose power when the host rebooted. This
>> is sometimes seen in the ChromeOS automated testing lab. The only way
>> to recover adapters in this state is to manually power cycle them.
>>
>> [...]

Oh well, late to the party, but this looks great, thanks!
--
Florian

2023-10-25 16:28:58

by Simon Horman

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] r8152: Block future register access if register access fails

On Fri, Oct 20, 2023 at 02:06:59PM -0700, Douglas Anderson wrote:

...

> @@ -9603,25 +9713,14 @@ static bool rtl8152_supports_lenovo_macpassthru(struct usb_device *udev)
> return 0;
> }
>
> -static int rtl8152_probe(struct usb_interface *intf,
> - const struct usb_device_id *id)
> +static int rtl8152_probe_once(struct usb_interface *intf,
> + const struct usb_device_id *id, u8 version)
> {
> struct usb_device *udev = interface_to_usbdev(intf);
> struct r8152 *tp;
> struct net_device *netdev;
> - u8 version;
> int ret;
>
> - if (intf->cur_altsetting->desc.bInterfaceClass != USB_CLASS_VENDOR_SPEC)
> - return -ENODEV;
> -
> - if (!rtl_check_vendor_ok(intf))
> - return -ENODEV;
> -
> - version = rtl8152_get_version(intf);
> - if (version == RTL_VER_UNKNOWN)
> - return -ENODEV;
> -
> usb_reset_device(udev);
> netdev = alloc_etherdev(sizeof(struct r8152));
> if (!netdev) {
> @@ -9784,10 +9883,20 @@ static int rtl8152_probe(struct usb_interface *intf,
> else
> device_set_wakeup_enable(&udev->dev, false);
>
> + /* If we saw a control transfer error while probing then we may
> + * want to try probe() again. Consider this an error.
> + */
> + if (test_bit(PROBE_SHOULD_RETRY, &tp->flags))
> + goto out2;

Sorry for being a bit slow here, but if this is an error condition,
sould ret be set to an error value?

As flagged by Smatch.

> +
> + set_bit(PROBED_WITH_NO_ERRORS, &tp->flags);
> netif_info(tp, probe, netdev, "%s\n", DRIVER_VERSION);
>
> return 0;
>
> +out2:
> + unregister_netdev(netdev);
> +
> out1:
> tasklet_kill(&tp->tx_tl);
> cancel_delayed_work_sync(&tp->hw_phy_work);

...

2023-10-25 20:26:38

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] r8152: Block future register access if register access fails

Hi,

On Wed, Oct 25, 2023 at 9:28 AM Simon Horman <[email protected]> wrote:
>
> On Fri, Oct 20, 2023 at 02:06:59PM -0700, Douglas Anderson wrote:
>
> ...
>
> > @@ -9603,25 +9713,14 @@ static bool rtl8152_supports_lenovo_macpassthru(struct usb_device *udev)
> > return 0;
> > }
> >
> > -static int rtl8152_probe(struct usb_interface *intf,
> > - const struct usb_device_id *id)
> > +static int rtl8152_probe_once(struct usb_interface *intf,
> > + const struct usb_device_id *id, u8 version)
> > {
> > struct usb_device *udev = interface_to_usbdev(intf);
> > struct r8152 *tp;
> > struct net_device *netdev;
> > - u8 version;
> > int ret;
> >
> > - if (intf->cur_altsetting->desc.bInterfaceClass != USB_CLASS_VENDOR_SPEC)
> > - return -ENODEV;
> > -
> > - if (!rtl_check_vendor_ok(intf))
> > - return -ENODEV;
> > -
> > - version = rtl8152_get_version(intf);
> > - if (version == RTL_VER_UNKNOWN)
> > - return -ENODEV;
> > -
> > usb_reset_device(udev);
> > netdev = alloc_etherdev(sizeof(struct r8152));
> > if (!netdev) {
> > @@ -9784,10 +9883,20 @@ static int rtl8152_probe(struct usb_interface *intf,
> > else
> > device_set_wakeup_enable(&udev->dev, false);
> >
> > + /* If we saw a control transfer error while probing then we may
> > + * want to try probe() again. Consider this an error.
> > + */
> > + if (test_bit(PROBE_SHOULD_RETRY, &tp->flags))
> > + goto out2;
>
> Sorry for being a bit slow here, but if this is an error condition,
> sould ret be set to an error value?
>
> As flagged by Smatch.

Thanks for the note. I think we're OK, though. If you look at the
"out:" label, which is right after "out1" it tests for the same bit.
That will set "ret = -EAGAIN" for us.

I'll admit it probably violates the principle of least astonishment,
but there's a method to my madness. Specifically:

a) We need a test here to make sure we don't return "success" if the
bit is set. The driver doesn't error check for success when it
modifies HW registers so it might _thnk_ it was successful but still
have this bit set. ...so we need this check right before we return
"success".

b) We also need to test for this bit if we're in the error handling
code. Even though the driver doesn't check for success in lots of
places, there still could be some places that notice an error. It may
return any kind of error here, so we need to override it to -EAGAIN.

...so I just set "ret = -EAGAIN" in one place.

Does that make sense? If you want to submit a patch adjusting the
comment to make this more obvious, I'm happy to review it.

-Doug

2023-11-03 16:53:06

by Simon Horman

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] r8152: Block future register access if register access fails

On Wed, Oct 25, 2023 at 01:24:55PM -0700, Doug Anderson wrote:
> Hi,
>
> On Wed, Oct 25, 2023 at 9:28 AM Simon Horman <[email protected]> wrote:
> >
> > On Fri, Oct 20, 2023 at 02:06:59PM -0700, Douglas Anderson wrote:
> >
> > ...
> >
> > > @@ -9603,25 +9713,14 @@ static bool rtl8152_supports_lenovo_macpassthru(struct usb_device *udev)
> > > return 0;
> > > }
> > >
> > > -static int rtl8152_probe(struct usb_interface *intf,
> > > - const struct usb_device_id *id)
> > > +static int rtl8152_probe_once(struct usb_interface *intf,
> > > + const struct usb_device_id *id, u8 version)
> > > {
> > > struct usb_device *udev = interface_to_usbdev(intf);
> > > struct r8152 *tp;
> > > struct net_device *netdev;
> > > - u8 version;
> > > int ret;
> > >
> > > - if (intf->cur_altsetting->desc.bInterfaceClass != USB_CLASS_VENDOR_SPEC)
> > > - return -ENODEV;
> > > -
> > > - if (!rtl_check_vendor_ok(intf))
> > > - return -ENODEV;
> > > -
> > > - version = rtl8152_get_version(intf);
> > > - if (version == RTL_VER_UNKNOWN)
> > > - return -ENODEV;
> > > -
> > > usb_reset_device(udev);
> > > netdev = alloc_etherdev(sizeof(struct r8152));
> > > if (!netdev) {
> > > @@ -9784,10 +9883,20 @@ static int rtl8152_probe(struct usb_interface *intf,
> > > else
> > > device_set_wakeup_enable(&udev->dev, false);
> > >
> > > + /* If we saw a control transfer error while probing then we may
> > > + * want to try probe() again. Consider this an error.
> > > + */
> > > + if (test_bit(PROBE_SHOULD_RETRY, &tp->flags))
> > > + goto out2;
> >
> > Sorry for being a bit slow here, but if this is an error condition,
> > sould ret be set to an error value?
> >
> > As flagged by Smatch.
>
> Thanks for the note. I think we're OK, though. If you look at the
> "out:" label, which is right after "out1" it tests for the same bit.
> That will set "ret = -EAGAIN" for us.

Thanks, and sorry for being even slower than the previous time.
I see your point regarding "out:" and agree that the code is correct.

> I'll admit it probably violates the principle of least astonishment,
> but there's a method to my madness. Specifically:
>
> a) We need a test here to make sure we don't return "success" if the
> bit is set. The driver doesn't error check for success when it
> modifies HW registers so it might _thnk_ it was successful but still
> have this bit set. ...so we need this check right before we return
> "success".
>
> b) We also need to test for this bit if we're in the error handling
> code. Even though the driver doesn't check for success in lots of
> places, there still could be some places that notice an error. It may
> return any kind of error here, so we need to override it to -EAGAIN.
>
> ...so I just set "ret = -EAGAIN" in one place.
>
> Does that make sense? If you want to submit a patch adjusting the
> comment to make this more obvious, I'm happy to review it.

Thanks it does make sense.
And I don't think any further action is required.