With this patch the phc2sys synchronisation precision improved to +/-500ns.
Signed-off-by: Hubert Feurstein <[email protected]>
---
This patch should only be the base for a discussion about improving precision of
phc2sys (from the linuxptp package) in combination with a mv88e6xxx switch and
imx6-fec.
When I started my work on PTP at the beginning of this week, I was positively
supprised about the sync precision of ptp4l. After adding support for the
MV88E6220 I was able to synchronize two of our boards within +/- 10 nanoseconds.
Remebering that the PTP system in the MV88E6220 is clocked with 100MHz, this is
I think the best what can be expected. Big thanks to Richard and the other
developers who made this possible.
But then I tried to synchornize the PTP clock with the system clock by using
phc2sys (phc2sys -rr -amq -l6) and I quickly was very disapointed about the
precision.
Min: -17829 ns
Max: 21694 ns
StdDev: 8520 ns
Count: 127
So I tried to find the reason for this. And as you probably already know, the
reason for that is the MDIO latency, which is terrible (up to 800 usecs).
As a next step, I tried to to implement the gettimex64 callback (see: "[PATCH]
net: dsa: mv88e6xxx: extend PTP gettime function to read system clock"). With
this in place (and a patched linuxptp-master version which really uses the
PTP_SYS_OFFSET_EXTENDED-ioctl), I got the following results:
Min: -12144 ns
Max: 10891 ns
StdDev: 4046,71 ns
Count: 112
So, things improved, but this is still unacceptable. It was still not possible
to compensate the MDIO latency issue.
According to my understanding, the timestamps (by using
ptp_read_system_{pre|post}ts) have to be captured at a place where we have an
constant offset related to the PHC in the switch. The only point where these
timestamps can be captured is the mdio_write callback in the imx_fec. Because,
reading the PHC timestamp will result in the follwing MDIO accesses:
(several) reads of the AVB_CMD register (to poll for the busy-flag)
write AVB_CMD (AVBOp=110b Read with post-incerement of PHC timestamp)
read AVB_DATA (PTP Global Time [15:0])
read AVB_DATA (PTP Global Time [31:16])
With this sequence in mind, the Marvell switch has to snapshot the PHC
timestamp at the write-AVB_CMD in order to be able to get sane values later by
reading AVB_DATA. So the best place to capture the system timestamps is this
one and only write operation directly in the imx_fec. By using the patch below
(without the changes to the system clock resolution) I got the following
results:
Min: -464 ns
Max: 525 ns
StdDev: 210,31 ns
Count: 401
I would say that is a huge improvement.
I realized, that the system clock (at least on the imx6) has a resolution of
333ns. So I tried to speed up this clock by using the PER-clock instead of
OSC_PER. This gave me 15ns resolution. The results were:
Min: -476 ns
Max: 439 ns
StdDev: 176,52 ns
Count: 630
So, things got improved again a little bit (at least the StdDev).
According to my understanding, this is almost the best which is possible,
because there is one more clock which influences the results. This is the MDIO
bus clock, which is just 2.5MHz (or 400ns). So, I would say that +/- 400ns
jitter is caused only by the MDIO bus clock, since the changes in imx_fec should
not introduce any unpredictable latencies.
My question to the experienced kernel developers is, how can this be implemented
in a more generic way? Because this hack only works under these circumstances,
and of course can never be part of the mainline kernel.
I am not 100% sure that all my statements or assumptions are correct, so feel
free to correct me.
Hubert
drivers/clocksource/timer-imx-gpt.c | 9 ++++++++-
drivers/net/dsa/mv88e6xxx/chip.h | 2 ++
drivers/net/dsa/mv88e6xxx/ptp.c | 11 +++++++----
drivers/net/dsa/mv88e6xxx/smi.c | 2 +-
drivers/net/ethernet/freescale/fec_main.c | 24 ++++++++++++++++++-----
drivers/net/phy/mdio_bus.c | 16 +++++++++++++++
include/linux/mdio.h | 2 ++
include/linux/phy.h | 1 +
8 files changed, 56 insertions(+), 11 deletions(-)
diff --git a/drivers/clocksource/timer-imx-gpt.c b/drivers/clocksource/timer-imx-gpt.c
index 706c0d0ff56c..84695a2d8ff7 100644
--- a/drivers/clocksource/timer-imx-gpt.c
+++ b/drivers/clocksource/timer-imx-gpt.c
@@ -471,8 +471,15 @@ static int __init mxc_timer_init_dt(struct device_node *np, enum imx_gpt_type t
/* Try osc_per first, and fall back to per otherwise */
imxtm->clk_per = of_clk_get_by_name(np, "osc_per");
- if (IS_ERR(imxtm->clk_per))
+
+ /* Force PER clock to be used, so we get 15ns instead of 333ns */
+ //if (IS_ERR(imxtm->clk_per)) {
+ if (1) {
imxtm->clk_per = of_clk_get_by_name(np, "per");
+ pr_info("==> Using PER clock\n");
+ } else {
+ pr_info("==> Using OSC_PER clock\n");
+ }
imxtm->type = type;
diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
index 01963ee94c50..9e14dc406415 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.h
+++ b/drivers/net/dsa/mv88e6xxx/chip.h
@@ -277,6 +277,8 @@ struct mv88e6xxx_chip {
struct ptp_clock_info ptp_clock_info;
struct delayed_work tai_event_work;
struct ptp_pin_desc pin_config[MV88E6XXX_MAX_GPIO];
+ struct ptp_system_timestamp *ptp_sts;
+
u16 trig_config;
u16 evcap_config;
u16 enable_count;
diff --git a/drivers/net/dsa/mv88e6xxx/ptp.c b/drivers/net/dsa/mv88e6xxx/ptp.c
index 073cbd0bb91b..cf6e52ee9e0a 100644
--- a/drivers/net/dsa/mv88e6xxx/ptp.c
+++ b/drivers/net/dsa/mv88e6xxx/ptp.c
@@ -235,14 +235,17 @@ static int mv88e6xxx_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
return 0;
}
-static int mv88e6xxx_ptp_gettime(struct ptp_clock_info *ptp,
- struct timespec64 *ts)
+static int mv88e6xxx_ptp_gettimex(struct ptp_clock_info *ptp,
+ struct timespec64 *ts,
+ struct ptp_system_timestamp *sts)
{
struct mv88e6xxx_chip *chip = ptp_to_chip(ptp);
u64 ns;
mv88e6xxx_reg_lock(chip);
+ chip->ptp_sts = sts;
ns = timecounter_read(&chip->tstamp_tc);
+ chip->ptp_sts = NULL;
mv88e6xxx_reg_unlock(chip);
*ts = ns_to_timespec64(ns);
@@ -426,7 +429,7 @@ static void mv88e6xxx_ptp_overflow_check(struct work_struct *work)
struct mv88e6xxx_chip *chip = dw_overflow_to_chip(dw);
struct timespec64 ts;
- mv88e6xxx_ptp_gettime(&chip->ptp_clock_info, &ts);
+ mv88e6xxx_ptp_gettimex(&chip->ptp_clock_info, &ts, NULL);
schedule_delayed_work(&chip->overflow_work,
MV88E6XXX_TAI_OVERFLOW_PERIOD);
@@ -472,7 +475,7 @@ int mv88e6xxx_ptp_setup(struct mv88e6xxx_chip *chip)
chip->ptp_clock_info.max_adj = MV88E6XXX_MAX_ADJ_PPB;
chip->ptp_clock_info.adjfine = mv88e6xxx_ptp_adjfine;
chip->ptp_clock_info.adjtime = mv88e6xxx_ptp_adjtime;
- chip->ptp_clock_info.gettime64 = mv88e6xxx_ptp_gettime;
+ chip->ptp_clock_info.gettimex64 = mv88e6xxx_ptp_gettimex;
chip->ptp_clock_info.settime64 = mv88e6xxx_ptp_settime;
chip->ptp_clock_info.enable = ptp_ops->ptp_enable;
chip->ptp_clock_info.verify = ptp_ops->ptp_verify;
diff --git a/drivers/net/dsa/mv88e6xxx/smi.c b/drivers/net/dsa/mv88e6xxx/smi.c
index 5fc78a063843..801fd4abba5a 100644
--- a/drivers/net/dsa/mv88e6xxx/smi.c
+++ b/drivers/net/dsa/mv88e6xxx/smi.c
@@ -45,7 +45,7 @@ static int mv88e6xxx_smi_direct_write(struct mv88e6xxx_chip *chip,
{
int ret;
- ret = mdiobus_write_nested(chip->bus, dev, reg, data);
+ ret = mdiobus_write_nested_ptp(chip->bus, dev, reg, data, chip->ptp_sts);
if (ret < 0)
return ret;
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 2f6057e7335d..20f589dc5b8b 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -1814,11 +1814,25 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
reinit_completion(&fep->mdio_done);
- /* start a write op */
- writel(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
- FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
- FEC_MMFR_TA | FEC_MMFR_DATA(value),
- fep->hwp + FEC_MII_DATA);
+ if (bus->ptp_sts) {
+ unsigned long flags = 0;
+ local_irq_save(flags);
+ __iowmb();
+ /* >Take the timestamp *after* the memory barrier */
+ ptp_read_system_prets(bus->ptp_sts);
+ writel_relaxed(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
+ FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
+ FEC_MMFR_TA | FEC_MMFR_DATA(value),
+ fep->hwp + FEC_MII_DATA);
+ ptp_read_system_postts(bus->ptp_sts);
+ local_irq_restore(flags);
+ } else {
+ /* start a write op */
+ writel(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
+ FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
+ FEC_MMFR_TA | FEC_MMFR_DATA(value),
+ fep->hwp + FEC_MII_DATA);
+ }
/* wait for end of transfer */
time_left = wait_for_completion_timeout(&fep->mdio_done,
diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index bd04fe762056..f076487db11f 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -672,6 +672,22 @@ int mdiobus_write_nested(struct mii_bus *bus, int addr, u32 regnum, u16 val)
}
EXPORT_SYMBOL(mdiobus_write_nested);
+int mdiobus_write_nested_ptp(struct mii_bus *bus, int addr, u32 regnum, u16 val, struct ptp_system_timestamp *ptp_sts)
+{
+ int err;
+
+ BUG_ON(in_interrupt());
+
+ mutex_lock_nested(&bus->mdio_lock, MDIO_MUTEX_NESTED);
+ bus->ptp_sts = ptp_sts;
+ err = __mdiobus_write(bus, addr, regnum, val);
+ bus->ptp_sts = NULL;
+ mutex_unlock(&bus->mdio_lock);
+
+ return err;
+}
+EXPORT_SYMBOL(mdiobus_write_nested_ptp);
+
/**
* mdiobus_write - Convenience function for writing a given MII mgmt register
* @bus: the mii_bus struct
diff --git a/include/linux/mdio.h b/include/linux/mdio.h
index e8242ad88c81..7f9767babdc3 100644
--- a/include/linux/mdio.h
+++ b/include/linux/mdio.h
@@ -9,6 +9,7 @@
#include <uapi/linux/mdio.h>
#include <linux/mod_devicetable.h>
+struct ptp_system_timestamp;
struct gpio_desc;
struct mii_bus;
@@ -310,6 +311,7 @@ int mdiobus_read(struct mii_bus *bus, int addr, u32 regnum);
int mdiobus_read_nested(struct mii_bus *bus, int addr, u32 regnum);
int mdiobus_write(struct mii_bus *bus, int addr, u32 regnum, u16 val);
int mdiobus_write_nested(struct mii_bus *bus, int addr, u32 regnum, u16 val);
+int mdiobus_write_nested_ptp(struct mii_bus *bus, int addr, u32 regnum, u16 val, struct ptp_system_timestamp *ptp_sts);
int mdiobus_register_device(struct mdio_device *mdiodev);
int mdiobus_unregister_device(struct mdio_device *mdiodev);
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 462b90b73f93..fd4e33654863 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -252,6 +252,7 @@ struct mii_bus {
int reset_delay_us;
/* RESET GPIO descriptor pointer */
struct gpio_desc *reset_gpiod;
+ struct ptp_system_timestamp *ptp_sts;
};
#define to_mii_bus(d) container_of(d, struct mii_bus, dev)
--
2.22.0
Hi Hubert,
On Fri, 2 Aug 2019 at 19:33, Hubert Feurstein <[email protected]> wrote:
>
> With this patch the phc2sys synchronisation precision improved to +/-500ns.
>
> Signed-off-by: Hubert Feurstein <[email protected]>
> ---
>
> This patch should only be the base for a discussion about improving precision of
> phc2sys (from the linuxptp package) in combination with a mv88e6xxx switch and
> imx6-fec.
>
> When I started my work on PTP at the beginning of this week, I was positively
> supprised about the sync precision of ptp4l. After adding support for the
> MV88E6220 I was able to synchronize two of our boards within +/- 10 nanoseconds.
> Remebering that the PTP system in the MV88E6220 is clocked with 100MHz, this is
> I think the best what can be expected. Big thanks to Richard and the other
> developers who made this possible.
>
> But then I tried to synchornize the PTP clock with the system clock by using
> phc2sys (phc2sys -rr -amq -l6) and I quickly was very disapointed about the
> precision.
>
> Min: -17829 ns
> Max: 21694 ns
> StdDev: 8520 ns
> Count: 127
>
> So I tried to find the reason for this. And as you probably already know, the
> reason for that is the MDIO latency, which is terrible (up to 800 usecs).
>
> As a next step, I tried to to implement the gettimex64 callback (see: "[PATCH]
> net: dsa: mv88e6xxx: extend PTP gettime function to read system clock"). With
> this in place (and a patched linuxptp-master version which really uses the
> PTP_SYS_OFFSET_EXTENDED-ioctl), I got the following results:
>
> Min: -12144 ns
> Max: 10891 ns
> StdDev: 4046,71 ns
> Count: 112
>
> So, things improved, but this is still unacceptable. It was still not possible
> to compensate the MDIO latency issue.
>
> According to my understanding, the timestamps (by using
> ptp_read_system_{pre|post}ts) have to be captured at a place where we have an
> constant offset related to the PHC in the switch. The only point where these
> timestamps can be captured is the mdio_write callback in the imx_fec. Because,
> reading the PHC timestamp will result in the follwing MDIO accesses:
>
> (several) reads of the AVB_CMD register (to poll for the busy-flag)
> write AVB_CMD (AVBOp=110b Read with post-incerement of PHC timestamp)
> read AVB_DATA (PTP Global Time [15:0])
> read AVB_DATA (PTP Global Time [31:16])
>
> With this sequence in mind, the Marvell switch has to snapshot the PHC
> timestamp at the write-AVB_CMD in order to be able to get sane values later by
> reading AVB_DATA. So the best place to capture the system timestamps is this
> one and only write operation directly in the imx_fec. By using the patch below
> (without the changes to the system clock resolution) I got the following
> results:
>
> Min: -464 ns
> Max: 525 ns
> StdDev: 210,31 ns
> Count: 401
>
> I would say that is a huge improvement.
>
> I realized, that the system clock (at least on the imx6) has a resolution of
> 333ns. So I tried to speed up this clock by using the PER-clock instead of
> OSC_PER. This gave me 15ns resolution. The results were:
>
> Min: -476 ns
> Max: 439 ns
> StdDev: 176,52 ns
> Count: 630
>
> So, things got improved again a little bit (at least the StdDev).
>
> According to my understanding, this is almost the best which is possible,
> because there is one more clock which influences the results. This is the MDIO
> bus clock, which is just 2.5MHz (or 400ns). So, I would say that +/- 400ns
> jitter is caused only by the MDIO bus clock, since the changes in imx_fec should
> not introduce any unpredictable latencies.
>
> My question to the experienced kernel developers is, how can this be implemented
> in a more generic way? Because this hack only works under these circumstances,
> and of course can never be part of the mainline kernel.
>
> I am not 100% sure that all my statements or assumptions are correct, so feel
> free to correct me.
>
> Hubert
>
You guessed correctly (since you copied me) that I'm battling much of
the same issues with the sja1105 and its spi-fsl-dspi controller
driver.
In fact I will refrain from saying anything about your
PTP_SYS_OFFSET_EXTENDED solution/hack combo, but will ask some
questions instead:
- You said you patched linuxptp master. Could you share the patch? Is
there anything else that's needed except compiling against the board's
real kernel headers (aka point KBUILD_OUTPUT to the extracted location
of /sys/kernel/kheaders.tar.xz)? I've done that and I do see phc2sys
probing and using the new ioctl, but I don't see a big improvement in
my case (that's probably because my SPI interface has real jitter
caused by peripheral clock instability, but I really need to put a
scope on it to be sure, so that's a discussion for another time).
- I really wonder what your jitter is caused by. Maybe it is just
contention on the mii_bus->mdio_lock? If that's the case, maybe you
don't need to go all the way down to the driver level, and taking the
ptp_sts at the subsystem (MDIO, SPI, I2C, etc) level is "good enough".
- Along the lines of the above, I wonder how badly would the
maintainers shout at the proposal of adding a struct
ptp_system_timestamp pointer and its associated lock in struct device.
That way at least you don't have to change the API, like you did with
mdiobus_write_nested_ptp. Relatively speaking, this is the least
amount of intrusion (although, again, far from beautiful).
- The software timestamps help you in the particular case of phc2sys,
but are they enough to cover all your needs? I haven't spent even 1
second looking at Marvell switch datasheets, but is its free-running
timer only used for PTP timestamping? At least the sja1105 does also
support time-based egress scheduling and ingress policing, so for that
scenario, the timecounter/cyclecounter implementation will no longer
cut it (you do need to synchronize the hardware clock). If your
hardware supports these PTP-based features as well, I can only assume
the reason why mv88e6xxx went for a timecounter is the same as the
reason why I did: the MDIO/SPI jitter while accessing the frequency
and offset correction registers is bad enough that the servo loop goes
haywire. And implementing gettimex64 will not solve that.
I also added Miroslav Lichvar, who originally created the
PTP_SYS_OFFSET_EXTENDED ioctl, maybe he can share some ideas on how it
is best served.
> drivers/clocksource/timer-imx-gpt.c | 9 ++++++++-
> drivers/net/dsa/mv88e6xxx/chip.h | 2 ++
> drivers/net/dsa/mv88e6xxx/ptp.c | 11 +++++++----
> drivers/net/dsa/mv88e6xxx/smi.c | 2 +-
> drivers/net/ethernet/freescale/fec_main.c | 24 ++++++++++++++++++-----
> drivers/net/phy/mdio_bus.c | 16 +++++++++++++++
> include/linux/mdio.h | 2 ++
> include/linux/phy.h | 1 +
> 8 files changed, 56 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/clocksource/timer-imx-gpt.c b/drivers/clocksource/timer-imx-gpt.c
> index 706c0d0ff56c..84695a2d8ff7 100644
> --- a/drivers/clocksource/timer-imx-gpt.c
> +++ b/drivers/clocksource/timer-imx-gpt.c
> @@ -471,8 +471,15 @@ static int __init mxc_timer_init_dt(struct device_node *np, enum imx_gpt_type t
>
> /* Try osc_per first, and fall back to per otherwise */
> imxtm->clk_per = of_clk_get_by_name(np, "osc_per");
> - if (IS_ERR(imxtm->clk_per))
> +
> + /* Force PER clock to be used, so we get 15ns instead of 333ns */
> + //if (IS_ERR(imxtm->clk_per)) {
> + if (1) {
> imxtm->clk_per = of_clk_get_by_name(np, "per");
> + pr_info("==> Using PER clock\n");
> + } else {
> + pr_info("==> Using OSC_PER clock\n");
> + }
>
> imxtm->type = type;
>
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
> index 01963ee94c50..9e14dc406415 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.h
> +++ b/drivers/net/dsa/mv88e6xxx/chip.h
> @@ -277,6 +277,8 @@ struct mv88e6xxx_chip {
> struct ptp_clock_info ptp_clock_info;
> struct delayed_work tai_event_work;
> struct ptp_pin_desc pin_config[MV88E6XXX_MAX_GPIO];
> + struct ptp_system_timestamp *ptp_sts;
> +
> u16 trig_config;
> u16 evcap_config;
> u16 enable_count;
> diff --git a/drivers/net/dsa/mv88e6xxx/ptp.c b/drivers/net/dsa/mv88e6xxx/ptp.c
> index 073cbd0bb91b..cf6e52ee9e0a 100644
> --- a/drivers/net/dsa/mv88e6xxx/ptp.c
> +++ b/drivers/net/dsa/mv88e6xxx/ptp.c
> @@ -235,14 +235,17 @@ static int mv88e6xxx_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
> return 0;
> }
>
> -static int mv88e6xxx_ptp_gettime(struct ptp_clock_info *ptp,
> - struct timespec64 *ts)
> +static int mv88e6xxx_ptp_gettimex(struct ptp_clock_info *ptp,
> + struct timespec64 *ts,
> + struct ptp_system_timestamp *sts)
> {
> struct mv88e6xxx_chip *chip = ptp_to_chip(ptp);
> u64 ns;
>
> mv88e6xxx_reg_lock(chip);
> + chip->ptp_sts = sts;
> ns = timecounter_read(&chip->tstamp_tc);
> + chip->ptp_sts = NULL;
> mv88e6xxx_reg_unlock(chip);
>
> *ts = ns_to_timespec64(ns);
> @@ -426,7 +429,7 @@ static void mv88e6xxx_ptp_overflow_check(struct work_struct *work)
> struct mv88e6xxx_chip *chip = dw_overflow_to_chip(dw);
> struct timespec64 ts;
>
> - mv88e6xxx_ptp_gettime(&chip->ptp_clock_info, &ts);
> + mv88e6xxx_ptp_gettimex(&chip->ptp_clock_info, &ts, NULL);
>
> schedule_delayed_work(&chip->overflow_work,
> MV88E6XXX_TAI_OVERFLOW_PERIOD);
> @@ -472,7 +475,7 @@ int mv88e6xxx_ptp_setup(struct mv88e6xxx_chip *chip)
> chip->ptp_clock_info.max_adj = MV88E6XXX_MAX_ADJ_PPB;
> chip->ptp_clock_info.adjfine = mv88e6xxx_ptp_adjfine;
> chip->ptp_clock_info.adjtime = mv88e6xxx_ptp_adjtime;
> - chip->ptp_clock_info.gettime64 = mv88e6xxx_ptp_gettime;
> + chip->ptp_clock_info.gettimex64 = mv88e6xxx_ptp_gettimex;
> chip->ptp_clock_info.settime64 = mv88e6xxx_ptp_settime;
> chip->ptp_clock_info.enable = ptp_ops->ptp_enable;
> chip->ptp_clock_info.verify = ptp_ops->ptp_verify;
> diff --git a/drivers/net/dsa/mv88e6xxx/smi.c b/drivers/net/dsa/mv88e6xxx/smi.c
> index 5fc78a063843..801fd4abba5a 100644
> --- a/drivers/net/dsa/mv88e6xxx/smi.c
> +++ b/drivers/net/dsa/mv88e6xxx/smi.c
> @@ -45,7 +45,7 @@ static int mv88e6xxx_smi_direct_write(struct mv88e6xxx_chip *chip,
> {
> int ret;
>
> - ret = mdiobus_write_nested(chip->bus, dev, reg, data);
> + ret = mdiobus_write_nested_ptp(chip->bus, dev, reg, data, chip->ptp_sts);
> if (ret < 0)
> return ret;
>
> diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
> index 2f6057e7335d..20f589dc5b8b 100644
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -1814,11 +1814,25 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
>
> reinit_completion(&fep->mdio_done);
>
> - /* start a write op */
> - writel(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
> - FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
> - FEC_MMFR_TA | FEC_MMFR_DATA(value),
> - fep->hwp + FEC_MII_DATA);
> + if (bus->ptp_sts) {
> + unsigned long flags = 0;
> + local_irq_save(flags);
> + __iowmb();
> + /* >Take the timestamp *after* the memory barrier */
> + ptp_read_system_prets(bus->ptp_sts);
> + writel_relaxed(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
> + FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
> + FEC_MMFR_TA | FEC_MMFR_DATA(value),
> + fep->hwp + FEC_MII_DATA);
> + ptp_read_system_postts(bus->ptp_sts);
> + local_irq_restore(flags);
> + } else {
> + /* start a write op */
> + writel(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
> + FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
> + FEC_MMFR_TA | FEC_MMFR_DATA(value),
> + fep->hwp + FEC_MII_DATA);
> + }
>
> /* wait for end of transfer */
> time_left = wait_for_completion_timeout(&fep->mdio_done,
> diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
> index bd04fe762056..f076487db11f 100644
> --- a/drivers/net/phy/mdio_bus.c
> +++ b/drivers/net/phy/mdio_bus.c
> @@ -672,6 +672,22 @@ int mdiobus_write_nested(struct mii_bus *bus, int addr, u32 regnum, u16 val)
> }
> EXPORT_SYMBOL(mdiobus_write_nested);
>
> +int mdiobus_write_nested_ptp(struct mii_bus *bus, int addr, u32 regnum, u16 val, struct ptp_system_timestamp *ptp_sts)
> +{
> + int err;
> +
> + BUG_ON(in_interrupt());
> +
> + mutex_lock_nested(&bus->mdio_lock, MDIO_MUTEX_NESTED);
> + bus->ptp_sts = ptp_sts;
> + err = __mdiobus_write(bus, addr, regnum, val);
> + bus->ptp_sts = NULL;
> + mutex_unlock(&bus->mdio_lock);
> +
> + return err;
> +}
> +EXPORT_SYMBOL(mdiobus_write_nested_ptp);
> +
> /**
> * mdiobus_write - Convenience function for writing a given MII mgmt register
> * @bus: the mii_bus struct
> diff --git a/include/linux/mdio.h b/include/linux/mdio.h
> index e8242ad88c81..7f9767babdc3 100644
> --- a/include/linux/mdio.h
> +++ b/include/linux/mdio.h
> @@ -9,6 +9,7 @@
> #include <uapi/linux/mdio.h>
> #include <linux/mod_devicetable.h>
>
> +struct ptp_system_timestamp;
> struct gpio_desc;
> struct mii_bus;
>
> @@ -310,6 +311,7 @@ int mdiobus_read(struct mii_bus *bus, int addr, u32 regnum);
> int mdiobus_read_nested(struct mii_bus *bus, int addr, u32 regnum);
> int mdiobus_write(struct mii_bus *bus, int addr, u32 regnum, u16 val);
> int mdiobus_write_nested(struct mii_bus *bus, int addr, u32 regnum, u16 val);
> +int mdiobus_write_nested_ptp(struct mii_bus *bus, int addr, u32 regnum, u16 val, struct ptp_system_timestamp *ptp_sts);
>
> int mdiobus_register_device(struct mdio_device *mdiodev);
> int mdiobus_unregister_device(struct mdio_device *mdiodev);
> diff --git a/include/linux/phy.h b/include/linux/phy.h
> index 462b90b73f93..fd4e33654863 100644
> --- a/include/linux/phy.h
> +++ b/include/linux/phy.h
> @@ -252,6 +252,7 @@ struct mii_bus {
> int reset_delay_us;
> /* RESET GPIO descriptor pointer */
> struct gpio_desc *reset_gpiod;
> + struct ptp_system_timestamp *ptp_sts;
> };
> #define to_mii_bus(d) container_of(d, struct mii_bus, dev)
>
> --
> 2.22.0
>
Regards,
-Vladimir
On Mon, Aug 05, 2019 at 12:54:49PM +0300, Vladimir Oltean wrote:
> - Along the lines of the above, I wonder how badly would the
> maintainers shout at the proposal of adding a struct
> ptp_system_timestamp pointer and its associated lock in struct device.
> That way at least you don't have to change the API, like you did with
> mdiobus_write_nested_ptp. Relatively speaking, this is the least
> amount of intrusion (although, again, far from beautiful).
That would make sense to me, if there are other drivers that could use
it.
> I also added Miroslav Lichvar, who originally created the
> PTP_SYS_OFFSET_EXTENDED ioctl, maybe he can share some ideas on how it
> is best served.
The idea behind that ioctl was to allow drivers to take the system
timestamps as close to the reading of the HW clock as possible,
excluding delays (and jitter) due to other operations that are
necessary to get that timestamp. In the ethernet drivers that was a
single PCI read. If in this case it's a "write" operation that
triggers the reading of the HW clock, then I think the patch is
using is the ptp_read_system_*ts() calls correctly.
> > --- a/drivers/net/ethernet/freescale/fec_main.c
> > +++ b/drivers/net/ethernet/freescale/fec_main.c
> > @@ -1814,11 +1814,25 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
> >
> > reinit_completion(&fep->mdio_done);
> >
> > - /* start a write op */
> > - writel(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
> > - FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
> > - FEC_MMFR_TA | FEC_MMFR_DATA(value),
> > - fep->hwp + FEC_MII_DATA);
> > + if (bus->ptp_sts) {
> > + unsigned long flags = 0;
> > + local_irq_save(flags);
> > + __iowmb();
> > + /* >Take the timestamp *after* the memory barrier */
> > + ptp_read_system_prets(bus->ptp_sts);
> > + writel_relaxed(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
> > + FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
> > + FEC_MMFR_TA | FEC_MMFR_DATA(value),
> > + fep->hwp + FEC_MII_DATA);
> > + ptp_read_system_postts(bus->ptp_sts);
> > + local_irq_restore(flags);
> > + } else {
> > + /* start a write op */
> > + writel(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
> > + FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
> > + FEC_MMFR_TA | FEC_MMFR_DATA(value),
> > + fep->hwp + FEC_MII_DATA);
> > + }
--
Miroslav Lichvar
Hi Vladimir,
Am Mo., 5. Aug. 2019 um 11:55 Uhr schrieb Vladimir Oltean <[email protected]>:
[...]
> You guessed correctly (since you copied me) that I'm battling much of
> the same issues with the sja1105 and its spi-fsl-dspi controller
> driver.
I've copied you, because of this discussion on github:
https://github.com/openil/linuxptp/issues/5
There you said: "In fact any MDIO access will make the latency
unpredictable and hence
throw off the time."
I thought you might be interested in how to make MDIO latency predictable.
[...]
> - You said you patched linuxptp master. Could you share the patch? Is
> there anything else that's needed except compiling against the board's
> real kernel headers (aka point KBUILD_OUTPUT to the extracted location
> of /sys/kernel/kheaders.tar.xz)? I've done that and I do see phc2sys
> probing and using the new ioctl, but I don't see a big improvement in
> my case (that's probably because my SPI interface has real jitter
> caused by peripheral clock instability, but I really need to put a
> scope on it to be sure, so that's a discussion for another time).
My compiler used kernel headers where the ioctl was not yet defined.
So I simply defined it in the linuxptp source directly:
diff --git a/sysoff.c b/sysoff.c
index b993ee9..b23ad2f 100644
--- a/sysoff.c
+++ b/sysoff.c
@@ -27,6 +27,20 @@
#define NS_PER_SEC 1000000000LL
+#ifndef PTP_SYS_OFFSET_EXTENDED
+struct ptp_sys_offset_extended {
+ unsigned int n_samples; /* Desired number of measurements. */
+ unsigned int rsv[3]; /* Reserved for future use. */
+ /*
+ * Array of [system, phc, system] time stamps. The kernel will provide
+ * 3*n_samples time stamps.
+ */
+ struct ptp_clock_time ts[PTP_MAX_SAMPLES][3];
+};
+#define PTP_SYS_OFFSET_EXTENDED \
+ _IOWR(PTP_CLK_MAGIC, 9, struct ptp_sys_offset_extended)
+#endif
+
#ifdef PTP_SYS_OFFSET
static int64_t pctns(struct ptp_clock_time *t)
> - I really wonder what your jitter is caused by. Maybe it is just
> contention on the mii_bus->mdio_lock? If that's the case, maybe you
> don't need to go all the way down to the driver level, and taking the
> ptp_sts at the subsystem (MDIO, SPI, I2C, etc) level is "good enough".
I would say there are many places, where we introduce unpredictable jitter:
- The busy-flag polling
- The locking of the chip- and mdio-bus-mutex
- The mdio_done completion in the imx_fec
- Scheduling latencies
> - Along the lines of the above, I wonder how badly would the
> maintainers shout at the proposal of adding a struct
> ptp_system_timestamp pointer and its associated lock in struct device.
> That way at least you don't have to change the API, like you did with
> mdiobus_write_nested_ptp. Relatively speaking, this is the least
> amount of intrusion (although, again, far from beautiful).
It is important to make sure to hook up the right mdio_write access, that is
why the ptp_sts pointer is passed under the mdio_lock. Of course It would
be nicer, to pass through the pointer as an argument, instead of bypassing it to
the mii_bus struct. In the case of SPI it could be added to the spi_transfer
struct.
> - The software timestamps help you in the particular case of phc2sys,
> but are they enough to cover all your needs?
For now it's all I need.
> I haven't spent even 1
> second looking at Marvell switch datasheets, but is its free-running
> timer only used for PTP timestamping? At least the sja1105 does also
> support time-based egress scheduling and ingress policing, so for that
> scenario, the timecounter/cyclecounter implementation will no longer
> cut it (you do need to synchronize the hardware clock). If your
> hardware supports these PTP-based features as well, I can only assume
> the reason why mv88e6xxx went for a timecounter is the same as the
> reason why I did: the MDIO/SPI jitter while accessing the frequency
> and offset correction registers is bad enough that the servo loop goes
> haywire. And implementing gettimex64 will not solve that.
>
[...]
Hubert
On Fri, Aug 02, 2019 at 06:32:48PM +0200, Hubert Feurstein wrote:
> diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
> index 2f6057e7335d..20f589dc5b8b 100644
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -1814,11 +1814,25 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
>
> reinit_completion(&fep->mdio_done);
>
> - /* start a write op */
> - writel(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
> - FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
> - FEC_MMFR_TA | FEC_MMFR_DATA(value),
> - fep->hwp + FEC_MII_DATA);
> + if (bus->ptp_sts) {
> + unsigned long flags = 0;
> + local_irq_save(flags);
> + __iowmb();
> + /* >Take the timestamp *after* the memory barrier */
> + ptp_read_system_prets(bus->ptp_sts);
> + writel_relaxed(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
> + FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
> + FEC_MMFR_TA | FEC_MMFR_DATA(value),
> + fep->hwp + FEC_MII_DATA);
> + ptp_read_system_postts(bus->ptp_sts);
Regarding generic support of this, see if you can't place the
ptp_read_system_prets/postts() calls at the mii_bus layer. This could
mean, for example, offering a two-part write API, to split this write
operation from...
> + local_irq_restore(flags);
> + } else {
> + /* start a write op */
> + writel(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
> + FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
> + FEC_MMFR_TA | FEC_MMFR_DATA(value),
> + fep->hwp + FEC_MII_DATA);
> + }
>
> /* wait for end of transfer */
> time_left = wait_for_completion_timeout(&fep->mdio_done,
...this kind of thing ^^^
Thanks,
Richard