LinuxLists.cc - [PATCH v3 net-next 1/2] net: dsa: mv88e6xxx: Improve performance of busy bit polling

2022-01-31 11:17:54

Subject: [PATCH v3 net-next 1/2] net: dsa: mv88e6xxx: Improve performance of busy bit polling

Avoid a long delay when a busy bit is still set and has to be polled
again.

Measurements on a system with 2 Opals (6097F) and one Agate (6352)
show that even with this much tighter loop, we have about a 50% chance
of the bit being cleared on the first poll, all other accesses see the
bit being cleared on the second poll.

On a standard MDIO bus running MDC at 2.5MHz, a single access with 32
bits of preamble plus 32 bits of data takes 64*(1/2.5MHz) = 25.6us.

This means that mv88e6xxx_smi_direct_wait took 26us + CPU overhead in
the fast scenario, but 26us + 1500us + 26us + CPU overhead in the slow
case - bringing the average close to 1ms.

With this change in place, the slow case is closer to 2*26us + CPU
overhead, with the average well below 100us - a 10x improvement.

This translates to real-world winnings. On a 3-chip 20-port system,
the modprobe time drops by 88%:

Before:

root@coronet:~# time modprobe mv88e6xxx
real 0m 15.99s
user 0m 0.00s
sys 0m 1.52s

After:

root@coronet:~# time modprobe mv88e6xxx
real 0m 2.21s
user 0m 0.00s
sys 0m 1.54s

Signed-off-by: Tobias Waldekranz <[email protected]>
---
drivers/net/dsa/mv88e6xxx/chip.c | 13 ++++++++++---
drivers/net/dsa/mv88e6xxx/smi.c | 11 +++++++++--
2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 58ca684d73f7..1023e4549359 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -86,12 +86,16 @@ int mv88e6xxx_write(struct mv88e6xxx_chip *chip, int addr, int reg, u16 val)
int mv88e6xxx_wait_mask(struct mv88e6xxx_chip *chip, int addr, int reg,
u16 mask, u16 val)
{
+ const unsigned long timeout = jiffies + msecs_to_jiffies(50);
u16 data;
int err;
int i;

- /* There's no bus specific operation to wait for a mask */
- for (i = 0; i < 16; i++) {
+ /* There's no bus specific operation to wait for a mask. Even
+ * if the initial poll takes longer than 50ms, always do at
+ * least one more attempt.
+ */
+ for (i = 0; time_before(jiffies, timeout) || (i < 2); i++) {
err = mv88e6xxx_read(chip, addr, reg, &data);
if (err)
return err;
@@ -99,7 +103,10 @@ int mv88e6xxx_wait_mask(struct mv88e6xxx_chip *chip, int addr, int reg,
if ((data & mask) == val)
return 0;

- usleep_range(1000, 2000);
+ if (i < 2)
+ cpu_relax();
+ else
+ usleep_range(1000, 2000);
}

dev_err(chip->dev, "Timeout while waiting for switch\n");
diff --git a/drivers/net/dsa/mv88e6xxx/smi.c b/drivers/net/dsa/mv88e6xxx/smi.c
index 282fe08db050..728ef3f54ec5 100644
--- a/drivers/net/dsa/mv88e6xxx/smi.c
+++ b/drivers/net/dsa/mv88e6xxx/smi.c
@@ -55,11 +55,15 @@ static int mv88e6xxx_smi_direct_write(struct mv88e6xxx_chip *chip,
static int mv88e6xxx_smi_direct_wait(struct mv88e6xxx_chip *chip,
int dev, int reg, int bit, int val)
{
+ const unsigned long timeout = jiffies + msecs_to_jiffies(50);
u16 data;
int err;
int i;

- for (i = 0; i < 16; i++) {
+ /* Even if the initial poll takes longer than 50ms, always do
+ * at least one more attempt.
+ */
+ for (i = 0; time_before(jiffies, timeout) || (i < 2); i++) {
err = mv88e6xxx_smi_direct_read(chip, dev, reg, &data);
if (err)
return err;
@@ -67,7 +71,10 @@ static int mv88e6xxx_smi_direct_wait(struct mv88e6xxx_chip *chip,
if (!!(data & BIT(bit)) == !!val)
return 0;

- usleep_range(1000, 2000);
+ if (i < 2)
+ cpu_relax();
+ else
+ usleep_range(1000, 2000);
}

return -ETIMEDOUT;
--
2.25.1

2022-01-31 11:22:47

by Andrew Lunn

[permalink] [raw]

Subject: Re: [PATCH v3 net-next 1/2] net: dsa: mv88e6xxx: Improve performance of busy bit polling

On Fri, Jan 28, 2022 at 05:26:49PM +0100, Tobias Waldekranz wrote:
> Avoid a long delay when a busy bit is still set and has to be polled
> again.
>
> Measurements on a system with 2 Opals (6097F) and one Agate (6352)
> show that even with this much tighter loop, we have about a 50% chance
> of the bit being cleared on the first poll, all other accesses see the
> bit being cleared on the second poll.
>
> On a standard MDIO bus running MDC at 2.5MHz, a single access with 32
> bits of preamble plus 32 bits of data takes 64*(1/2.5MHz) = 25.6us.
>
> This means that mv88e6xxx_smi_direct_wait took 26us + CPU overhead in
> the fast scenario, but 26us + 1500us + 26us + CPU overhead in the slow
> case - bringing the average close to 1ms.
>
> With this change in place, the slow case is closer to 2*26us + CPU
> overhead, with the average well below 100us - a 10x improvement.
>
> This translates to real-world winnings. On a 3-chip 20-port system,
> the modprobe time drops by 88%:
>
> Before:
>
> root@coronet:~# time modprobe mv88e6xxx
> real 0m 15.99s
> user 0m 0.00s
> sys 0m 1.52s
>
> After:
>
> root@coronet:~# time modprobe mv88e6xxx
> real 0m 2.21s
> user 0m 0.00s
> sys 0m 1.54s
>
> Signed-off-by: Tobias Waldekranz <[email protected]>

Reviewed-by: Andrew Lunn <[email protected]>

Andrew