From: Hubert Feurstein <[email protected]>
Changelog:
v3: mv88e6xxx_smi_indirect_write: forward ptp_sts only on the last write
Copied Miroslav Lichvar because of PTP offset compensation patch
v2: Added patch for PTP offset compensation
Removed mdiobus_write_sts as there was no user
Removed ptp_sts_supported-boolean and introduced flags instead
With this patchset the phc2sys synchronisation precision improved to +/-555ns on
an IMX6DL with an MV88E6220 switch attached.
This patchset takes into account the comments from the following discussions:
- https://lkml.org/lkml/2019/8/2/1364
- https://lkml.org/lkml/2019/8/5/169
Patch 01 adds the required infrastructure in the MDIO layer.
Patch 02 adds additional PTP offset compensation.
Patch 03 adds support for the PTP_SYS_OFFSET_EXTENDED ioctl in the mv88e6xxx driver.
Patch 04 adds support for the PTP system timestamping in the imx-fec driver.
The following tests show the improvement caused by each patch. The system clock
precision was set to 15ns instead of 333ns (as described in https://lkml.org/lkml/2019/8/2/1364).
Without this patchset applied, the phc2sys synchronisation performance was very
poor:
offset: min -27120 max 28840 mean 2.44 stddev 8040.78 count 1236
delay: min 282103 max 386385 mean 352568.03 stddev 27814.27 count 1236
(test runtime 20 minutes)
Results after appling patch 01-03:
offset: min -12316 max 13314 mean -9.38 stddev 4274.82 count 1022
delay: min 69977 max 96266 mean 87939.04 stddev 6466.17 count 1022
(test runtime 16 minutes)
Results after appling patch 04:
offset: min -788 max 528 mean -0.06 stddev 185.02 count 7171
delay: min 1773 max 2031 mean 1909.43 stddev 33.74 count 7171
(test runtime 119 minutes)
Hubert Feurstein (4):
net: mdio: add support for passing a PTP system timestamp to the
mii_bus driver
net: mdio: add PTP offset compensation to mdiobus_write_sts
net: dsa: mv88e6xxx: extend PTP gettime function to read system clock
net: fec: add support for PTP system timestamping for MDIO devices
drivers/net/dsa/mv88e6xxx/chip.h | 2 +
drivers/net/dsa/mv88e6xxx/ptp.c | 11 +--
drivers/net/dsa/mv88e6xxx/smi.c | 7 +-
drivers/net/ethernet/freescale/fec_main.c | 7 +-
drivers/net/phy/mdio_bus.c | 88 +++++++++++++++++++++++
include/linux/mdio.h | 5 ++
include/linux/phy.h | 42 +++++++++++
7 files changed, 156 insertions(+), 6 deletions(-)
--
2.22.1
From: Hubert Feurstein <[email protected]>
In order to improve the synchronisation precision of phc2sys (from
the linuxptp project) for devices like switches which are attached
to the MDIO bus, it is necessary the get the system timestamps as
close as possible to the access which causes the PTP timestamp
register to be snapshotted in the switch hardware. Usually this is
triggered by an MDIO write access, the snapshotted timestamp is then
transferred by several MDIO reads.
This patch adds the required infrastructure to solve the problem described
above.
Signed-off-by: Hubert Feurstein <[email protected]>
---
Changes in v2:
- Removed mdiobus_write_sts as there was no user
- Removed ptp_sts_supported-boolean and introduced flags instead
drivers/net/phy/mdio_bus.c | 76 ++++++++++++++++++++++++++++++++++++++
include/linux/mdio.h | 5 +++
include/linux/phy.h | 34 +++++++++++++++++
3 files changed, 115 insertions(+)
diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index bd04fe762056..4dba2714495e 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -34,6 +34,7 @@
#include <linux/phy.h>
#include <linux/io.h>
#include <linux/uaccess.h>
+#include <linux/ptp_clock_kernel.h>
#define CREATE_TRACE_POINTS
#include <trace/events/mdio.h>
@@ -697,6 +698,81 @@ int mdiobus_write(struct mii_bus *bus, int addr, u32 regnum, u16 val)
}
EXPORT_SYMBOL(mdiobus_write);
+/**
+ * __mdiobus_write_sts - Unlocked version of the mdiobus_write_sts function
+ * @bus: the mii_bus struct
+ * @addr: the phy address
+ * @regnum: register number to write
+ * @val: value to write to @regnum
+ * @sts: the ptp system timestamp
+ *
+ * Write a MDIO bus register and request the MDIO bus driver to take the
+ * system timestamps when sts-pointer is valid. When the bus driver doesn't
+ * support this, the timestamps are taken in this function instead.
+ *
+ * In order to improve the synchronisation precision of phc2sys (from
+ * the linuxptp project) for devices like switches which are attached
+ * to the MDIO bus, it is necessary the get the system timestamps as
+ * close as possible to the access which causes the PTP timestamp
+ * register to be snapshotted in the switch hardware. Usually this is
+ * triggered by an MDIO write access, the snapshotted timestamp is then
+ * transferred by several MDIO reads.
+ *
+ * Caller must hold the mdio bus lock.
+ *
+ * NOTE: MUST NOT be called from interrupt context.
+ */
+int __mdiobus_write_sts(struct mii_bus *bus, int addr, u32 regnum, u16 val,
+ struct ptp_system_timestamp *sts)
+{
+ int retval;
+
+ WARN_ON_ONCE(!mutex_is_locked(&bus->mdio_lock));
+
+ if (!(bus->flags & MII_BUS_F_PTP_STS_SUPPORTED))
+ ptp_read_system_prets(sts);
+
+ bus->ptp_sts = sts;
+ retval = __mdiobus_write(bus, addr, regnum, val);
+ bus->ptp_sts = NULL;
+
+ if (!(bus->flags & MII_BUS_F_PTP_STS_SUPPORTED))
+ ptp_read_system_postts(sts);
+
+ return retval;
+}
+EXPORT_SYMBOL(__mdiobus_write_sts);
+
+/**
+ * mdiobus_write_sts_nested - Nested version of the mdiobus_write_sts function
+ * @bus: the mii_bus struct
+ * @addr: the phy address
+ * @regnum: register number to write
+ * @val: value to write to @regnum
+ * @sts: the ptp system timestamp
+ *
+ * In case of nested MDIO bus access avoid lockdep false positives by
+ * using mutex_lock_nested().
+ *
+ * NOTE: MUST NOT be called from interrupt context,
+ * because the bus read/write functions may wait for an interrupt
+ * to conclude the operation.
+ */
+int mdiobus_write_sts_nested(struct mii_bus *bus, int addr, u32 regnum, u16 val,
+ struct ptp_system_timestamp *sts)
+{
+ int retval;
+
+ BUG_ON(in_interrupt());
+
+ mutex_lock_nested(&bus->mdio_lock, MDIO_MUTEX_NESTED);
+ retval = __mdiobus_write_sts(bus, addr, regnum, val, sts);
+ mutex_unlock(&bus->mdio_lock);
+
+ return retval;
+}
+EXPORT_SYMBOL(mdiobus_write_sts_nested);
+
/**
* mdio_bus_match - determine if given MDIO driver supports the given
* MDIO device
diff --git a/include/linux/mdio.h b/include/linux/mdio.h
index e8242ad88c81..482388341c7b 100644
--- a/include/linux/mdio.h
+++ b/include/linux/mdio.h
@@ -9,6 +9,7 @@
#include <uapi/linux/mdio.h>
#include <linux/mod_devicetable.h>
+struct ptp_system_timestamp;
struct gpio_desc;
struct mii_bus;
@@ -305,11 +306,15 @@ static inline void mii_10gbt_stat_mod_linkmode_lpa_t(unsigned long *advertising,
int __mdiobus_read(struct mii_bus *bus, int addr, u32 regnum);
int __mdiobus_write(struct mii_bus *bus, int addr, u32 regnum, u16 val);
+int __mdiobus_write_sts(struct mii_bus *bus, int addr, u32 regnum, u16 val,
+ struct ptp_system_timestamp *sts);
int mdiobus_read(struct mii_bus *bus, int addr, u32 regnum);
int mdiobus_read_nested(struct mii_bus *bus, int addr, u32 regnum);
int mdiobus_write(struct mii_bus *bus, int addr, u32 regnum, u16 val);
int mdiobus_write_nested(struct mii_bus *bus, int addr, u32 regnum, u16 val);
+int mdiobus_write_sts_nested(struct mii_bus *bus, int addr, u32 regnum, u16 val,
+ struct ptp_system_timestamp *sts);
int mdiobus_register_device(struct mdio_device *mdiodev);
int mdiobus_unregister_device(struct mdio_device *mdiodev);
diff --git a/include/linux/phy.h b/include/linux/phy.h
index d26779f1fb6b..0b33662e0320 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -205,6 +205,13 @@ struct device;
struct phylink;
struct sk_buff;
+/* MII-bus flags:
+ * @MII_BUS_F_PTP_STS_SUPPORTED: The driver supports PTP system timestamping
+ */
+enum mii_bus_flags {
+ MII_BUS_F_PTP_STS_SUPPORTED = BIT(0)
+};
+
/*
* The Bus class for PHYs. Devices which provide access to
* PHYs should register using this structure
@@ -252,7 +259,34 @@ struct mii_bus {
int reset_delay_us;
/* RESET GPIO descriptor pointer */
struct gpio_desc *reset_gpiod;
+
+ /* Feature flags */
+ u32 flags;
+
+ /* PTP system timestamping support
+ *
+ * In order to improve the synchronisation precision of phc2sys (from
+ * the linuxptp project) for devices like switches which are attached
+ * to the MDIO bus, it is necessary the get the system timestamps as
+ * close as possible to the access which causes the PTP timestamp
+ * register to be snapshotted in the switch hardware. Usually this is
+ * triggered by an MDIO write access, the snapshotted timestamp is then
+ * transferred by several MDIO reads.
+ *
+ * The switch driver can use mdio_write_sts*() to pass through the
+ * system timestamp pointer @ptp_sts to the MDIO bus driver. The bus
+ * driver simply has to do the following calls in its write handler:
+ * ptp_read_system_prets(bus->ptp_sts);
+ * writel(value, mdio-register)
+ * ptp_read_system_postts(bus->ptp_sts);
+ *
+ * The ptp_read_system_*ts functions already check the ptp_sts pointer.
+ * The MII_BUS_F_PTP_STS_SUPPORTED-bit must be set in flags, when the
+ * MDIO bus driver takes the timestamps as described above.
+ */
+ struct ptp_system_timestamp *ptp_sts;
};
+
#define to_mii_bus(d) container_of(d, struct mii_bus, dev)
struct mii_bus *mdiobus_alloc_size(size_t);
--
2.22.1
From: Hubert Feurstein <[email protected]>
This adds support for the PTP_SYS_OFFSET_EXTENDED ioctl.
Signed-off-by: Hubert Feurstein <[email protected]>
---
Changes in v3:
- mv88e6xxx_smi_indirect_write: forward ptp_sts only on the last write
drivers/net/dsa/mv88e6xxx/chip.h | 2 ++
drivers/net/dsa/mv88e6xxx/ptp.c | 11 +++++++----
drivers/net/dsa/mv88e6xxx/smi.c | 7 ++++++-
3 files changed, 15 insertions(+), 5 deletions(-)
diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
index a406be2f5652..1bfde0d8a5a3 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.h
+++ b/drivers/net/dsa/mv88e6xxx/chip.h
@@ -275,6 +275,8 @@ struct mv88e6xxx_chip {
struct ptp_clock_info ptp_clock_info;
struct delayed_work tai_event_work;
struct ptp_pin_desc pin_config[MV88E6XXX_MAX_GPIO];
+ struct ptp_system_timestamp *ptp_sts;
+
u16 trig_config;
u16 evcap_config;
u16 enable_count;
diff --git a/drivers/net/dsa/mv88e6xxx/ptp.c b/drivers/net/dsa/mv88e6xxx/ptp.c
index 073cbd0bb91b..cf6e52ee9e0a 100644
--- a/drivers/net/dsa/mv88e6xxx/ptp.c
+++ b/drivers/net/dsa/mv88e6xxx/ptp.c
@@ -235,14 +235,17 @@ static int mv88e6xxx_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
return 0;
}
-static int mv88e6xxx_ptp_gettime(struct ptp_clock_info *ptp,
- struct timespec64 *ts)
+static int mv88e6xxx_ptp_gettimex(struct ptp_clock_info *ptp,
+ struct timespec64 *ts,
+ struct ptp_system_timestamp *sts)
{
struct mv88e6xxx_chip *chip = ptp_to_chip(ptp);
u64 ns;
mv88e6xxx_reg_lock(chip);
+ chip->ptp_sts = sts;
ns = timecounter_read(&chip->tstamp_tc);
+ chip->ptp_sts = NULL;
mv88e6xxx_reg_unlock(chip);
*ts = ns_to_timespec64(ns);
@@ -426,7 +429,7 @@ static void mv88e6xxx_ptp_overflow_check(struct work_struct *work)
struct mv88e6xxx_chip *chip = dw_overflow_to_chip(dw);
struct timespec64 ts;
- mv88e6xxx_ptp_gettime(&chip->ptp_clock_info, &ts);
+ mv88e6xxx_ptp_gettimex(&chip->ptp_clock_info, &ts, NULL);
schedule_delayed_work(&chip->overflow_work,
MV88E6XXX_TAI_OVERFLOW_PERIOD);
@@ -472,7 +475,7 @@ int mv88e6xxx_ptp_setup(struct mv88e6xxx_chip *chip)
chip->ptp_clock_info.max_adj = MV88E6XXX_MAX_ADJ_PPB;
chip->ptp_clock_info.adjfine = mv88e6xxx_ptp_adjfine;
chip->ptp_clock_info.adjtime = mv88e6xxx_ptp_adjtime;
- chip->ptp_clock_info.gettime64 = mv88e6xxx_ptp_gettime;
+ chip->ptp_clock_info.gettimex64 = mv88e6xxx_ptp_gettimex;
chip->ptp_clock_info.settime64 = mv88e6xxx_ptp_settime;
chip->ptp_clock_info.enable = ptp_ops->ptp_enable;
chip->ptp_clock_info.verify = ptp_ops->ptp_verify;
diff --git a/drivers/net/dsa/mv88e6xxx/smi.c b/drivers/net/dsa/mv88e6xxx/smi.c
index 282fe08db050..1b5f78662158 100644
--- a/drivers/net/dsa/mv88e6xxx/smi.c
+++ b/drivers/net/dsa/mv88e6xxx/smi.c
@@ -45,7 +45,8 @@ static int mv88e6xxx_smi_direct_write(struct mv88e6xxx_chip *chip,
{
int ret;
- ret = mdiobus_write_nested(chip->bus, dev, reg, data);
+ ret = mdiobus_write_sts_nested(chip->bus, dev, reg, data,
+ chip->ptp_sts);
if (ret < 0)
return ret;
@@ -130,6 +131,7 @@ static int mv88e6xxx_smi_indirect_read(struct mv88e6xxx_chip *chip,
static int mv88e6xxx_smi_indirect_write(struct mv88e6xxx_chip *chip,
int dev, int reg, u16 data)
{
+ struct ptp_system_timestamp *ptp_sts = chip->ptp_sts;
int err;
err = mv88e6xxx_smi_direct_wait(chip, chip->sw_addr,
@@ -137,11 +139,14 @@ static int mv88e6xxx_smi_indirect_write(struct mv88e6xxx_chip *chip,
if (err)
return err;
+ chip->ptp_sts = NULL;
err = mv88e6xxx_smi_direct_write(chip, chip->sw_addr,
MV88E6XXX_SMI_DATA, data);
if (err)
return err;
+ /* Capture the PTP system timestamps only on *this* write operation */
+ chip->ptp_sts = ptp_sts;
err = mv88e6xxx_smi_direct_write(chip, chip->sw_addr,
MV88E6XXX_SMI_CMD,
MV88E6XXX_SMI_CMD_BUSY |
--
2.22.1
From: Hubert Feurstein <[email protected]>
In order to improve the synchronisation precision of phc2sys (from
the linuxptp project) for devices like switches which are attached
to the MDIO bus, it is necessary the get the system timestamps as
close as possible to the access which causes the PTP timestamp
register to be snapshotted in the switch hardware. Usually this is
triggered by an MDIO write access, the snapshotted timestamp is then
transferred by several MDIO reads.
The ptp_read_system_*ts functions already check the ptp_sts pointer.
Signed-off-by: Hubert Feurstein <[email protected]>
---
drivers/net/ethernet/freescale/fec_main.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index c01d3ec3e9af..dd1253683ac0 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -1815,10 +1815,12 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
reinit_completion(&fep->mdio_done);
/* start a write op */
+ ptp_read_system_prets(bus->ptp_sts);
writel(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
FEC_MMFR_TA | FEC_MMFR_DATA(value),
fep->hwp + FEC_MII_DATA);
+ ptp_read_system_postts(bus->ptp_sts);
/* wait for end of transfer */
time_left = wait_for_completion_timeout(&fep->mdio_done,
@@ -1956,7 +1958,7 @@ static int fec_enet_mii_init(struct platform_device *pdev)
struct fec_enet_private *fep = netdev_priv(ndev);
struct device_node *node;
int err = -ENXIO;
- u32 mii_speed, holdtime;
+ u32 mii_speed, mii_period, holdtime;
/*
* The i.MX28 dual fec interfaces are not equal.
@@ -1993,6 +1995,7 @@ static int fec_enet_mii_init(struct platform_device *pdev)
* document.
*/
mii_speed = DIV_ROUND_UP(clk_get_rate(fep->clk_ipg), 5000000);
+ mii_period = div_u64((u64)mii_speed * 2 * NSEC_PER_SEC, clk_get_rate(fep->clk_ipg));
if (fep->quirks & FEC_QUIRK_ENET_MAC)
mii_speed--;
if (mii_speed > 63) {
@@ -2034,6 +2037,8 @@ static int fec_enet_mii_init(struct platform_device *pdev)
pdev->name, fep->dev_id + 1);
fep->mii_bus->priv = fep;
fep->mii_bus->parent = &pdev->dev;
+ fep->mii_bus->flags = MII_BUS_F_PTP_STS_SUPPORTED;
+ fep->mii_bus->ptp_sts_offset = 32 * mii_period;
node = of_get_child_by_name(pdev->dev.of_node, "mdio");
err = of_mdiobus_register(fep->mii_bus, node);
--
2.22.1
From: Hubert Feurstein <[email protected]>
The slow MDIO access introduces quite a big offset (~13us) to the PTP
system time synchronisation. With this patch the driver has the possibility
to set the correct offset which can then be compensated.
Signed-off-by: Hubert Feurstein <[email protected]>
---
drivers/net/phy/mdio_bus.c | 12 ++++++++++++
include/linux/phy.h | 8 ++++++++
2 files changed, 20 insertions(+)
diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index 4dba2714495e..50a37cf46f96 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -739,6 +739,18 @@ int __mdiobus_write_sts(struct mii_bus *bus, int addr, u32 regnum, u16 val,
if (!(bus->flags & MII_BUS_F_PTP_STS_SUPPORTED))
ptp_read_system_postts(sts);
+ /* PTP offset compensation:
+ * After the MDIO access is completed (from the chip perspective), the
+ * switch chip will snapshot the PHC timestamp. To make sure our system
+ * timestamp corresponds to the PHC timestamp, we have to add the
+ * duration of this MDIO access to sts->post_ts. Linuxptp's phc2sys
+ * takes the average of pre_ts and post_ts to calculate the final
+ * system timestamp. With this in mind, we have to add ptp_sts_offset
+ * twice to post_ts, in order to not introduce an constant time offset.
+ */
+ if (sts)
+ timespec64_add_ns(&sts->post_ts, 2 * bus->ptp_sts_offset);
+
return retval;
}
EXPORT_SYMBOL(__mdiobus_write_sts);
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 0b33662e0320..615df9c7f2c3 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -283,8 +283,16 @@ struct mii_bus {
* The ptp_read_system_*ts functions already check the ptp_sts pointer.
* The MII_BUS_F_PTP_STS_SUPPORTED-bit must be set in flags, when the
* MDIO bus driver takes the timestamps as described above.
+ *
+ * @ptp_sts_offset: This is the compensation offset for the system
+ * timestamp which is introduced by the slow MDIO access duration. An
+ * MDIO access consists of 32 clock cycles. Usually the MDIO bus runs
+ * at ~2.5MHz, so we have to compensate ~12800ns offset.
+ * Set the ptp_sts_offset to the exact duration of one MDIO frame
+ * (= 32 * clock-period) in nano-seconds.
*/
struct ptp_system_timestamp *ptp_sts;
+ u32 ptp_sts_offset;
};
#define to_mii_bus(d) container_of(d, struct mii_bus, dev)
--
2.22.1
On Tue, Aug 20, 2019 at 10:48:31AM +0200, Hubert Feurstein wrote:
> + /* PTP offset compensation:
> + * After the MDIO access is completed (from the chip perspective), the
> + * switch chip will snapshot the PHC timestamp. To make sure our system
> + * timestamp corresponds to the PHC timestamp, we have to add the
> + * duration of this MDIO access to sts->post_ts. Linuxptp's phc2sys
> + * takes the average of pre_ts and post_ts to calculate the final
> + * system timestamp. With this in mind, we have to add ptp_sts_offset
> + * twice to post_ts, in order to not introduce an constant time offset.
> + */
> + if (sts)
> + timespec64_add_ns(&sts->post_ts, 2 * bus->ptp_sts_offset);
This correction looks good to me.
Is the MDIO write delay constant in reality, or does it at least have
an upper bound? That is, is it always true that the post_ts timestamp
does not point to a time before the PHC timestamp was actually taken?
This is important to not break the estimation of maximum error in the
measured offset. Applications using the ioctl may assume that the
maximum error is (post_ts-pre_ts)/2 (i.e. half of the delay printed by
phc2sys). That would not work if the delay could be occasionally 50
microseconds for instance, i.e. the post_ts timestamp would be earlier
than the PHC timestamp.
--
Miroslav Lichvar
Hi Miroslav,
Am Di., 20. Aug. 2019 um 11:49 Uhr schrieb Miroslav Lichvar
<[email protected]>:
>
> On Tue, Aug 20, 2019 at 10:48:31AM +0200, Hubert Feurstein wrote:
>
> > + /* PTP offset compensation:
> > + * After the MDIO access is completed (from the chip perspective), the
> > + * switch chip will snapshot the PHC timestamp. To make sure our system
> > + * timestamp corresponds to the PHC timestamp, we have to add the
> > + * duration of this MDIO access to sts->post_ts. Linuxptp's phc2sys
> > + * takes the average of pre_ts and post_ts to calculate the final
> > + * system timestamp. With this in mind, we have to add ptp_sts_offset
> > + * twice to post_ts, in order to not introduce an constant time offset.
> > + */
> > + if (sts)
> > + timespec64_add_ns(&sts->post_ts, 2 * bus->ptp_sts_offset);
>
> This correction looks good to me.
>
> Is the MDIO write delay constant in reality, or does it at least have
> an upper bound? That is, is it always true that the post_ts timestamp
> does not point to a time before the PHC timestamp was actually taken?
>
> This is important to not break the estimation of maximum error in the
> measured offset. Applications using the ioctl may assume that the
> maximum error is (post_ts-pre_ts)/2 (i.e. half of the delay printed by
> phc2sys). That would not work if the delay could be occasionally 50
> microseconds for instance, i.e. the post_ts timestamp would be earlier
> than the PHC timestamp.
>
If the timestamps are taken in the MDIO driver (imx-fec in my case), then
everything is quite deterministic (see results in the cover letter). Of course,
it still can be improved slightly, by splitting up the writel into iowmb and
write_relaxed and disable the interrupts while capturing the timestamps
(as I did in my original RFC patch). But phc2sys takes by default 5 measurements
and uses the one with the smallest delay, so this shouldn't be necessary.
Although, by adding 2 * ptp_sts_offset the system timestamp to post_ts
the timestamp is aligned with the PHC timestamp, but this also increases
the delay which is reported by phc2sys (~26us). But the maximum error
which must be expected is definitely much less (< 1us). So maybe it is better
to shift both timestamps pre_ts and post_ts by ptp_sts_offset.
Hubert
On Tue, Aug 20, 2019 at 11:49:03AM +0200, Miroslav Lichvar wrote:
> On Tue, Aug 20, 2019 at 10:48:31AM +0200, Hubert Feurstein wrote:
>
> > + /* PTP offset compensation:
> > + * After the MDIO access is completed (from the chip perspective), the
> > + * switch chip will snapshot the PHC timestamp. To make sure our system
> > + * timestamp corresponds to the PHC timestamp, we have to add the
> > + * duration of this MDIO access to sts->post_ts. Linuxptp's phc2sys
> > + * takes the average of pre_ts and post_ts to calculate the final
> > + * system timestamp. With this in mind, we have to add ptp_sts_offset
> > + * twice to post_ts, in order to not introduce an constant time offset.
> > + */
> > + if (sts)
> > + timespec64_add_ns(&sts->post_ts, 2 * bus->ptp_sts_offset);
>
> This correction looks good to me.
>
> Is the MDIO write delay constant in reality, or does it at least have
> an upper bound? That is, is it always true that the post_ts timestamp
> does not point to a time before the PHC timestamp was actually taken?
The post_ts could be before the target hardware does anything. The
write triggers an MDIO bus transaction, sending about 64 bits of data
down a wire at around 2.5Mbps. So there is a minimum delay of 25uS
just sending the bits down the wire. It is unclear to me exactly when
the post_ts is taken, has the hardware actually sent the bits, or has
it just initiated sending the bits? In this case, there is an
interrupt sometime later indicating the transaction has completed, so
my guess would be, post_ts indicates the transaction has been
initiated.
Also, how long does the device on the end of the bus actually take to
decode the bits on the wire and do what it needs to do?
> This is important to not break the estimation of maximum error in the
> measured offset. Applications using the ioctl may assume that the
> maximum error is (post_ts-pre_ts)/2 (i.e. half of the delay printed by
> phc2sys). That would not work if the delay could be occasionally 50
> microseconds for instance, i.e. the post_ts timestamp would be earlier
> than the PHC timestamp.
Given my understanding of the hardware, post_ts-pre_ts should be
constant. But what it exactly measures is not clearly defined :-(
And different hardware will have different definitions.
But the real point is, by doing these timestamps here, we are as close
as possible to where the action really happens, and we are minimising
the number of undefined things we are measuring, so in general, the
error is minimised.
Andrew
On Tue, Aug 20, 2019 at 02:29:27PM +0200, Hubert Feurstein wrote:
> Am Di., 20. Aug. 2019 um 11:49 Uhr schrieb Miroslav Lichvar
> > This is important to not break the estimation of maximum error in the
> > measured offset. Applications using the ioctl may assume that the
> > maximum error is (post_ts-pre_ts)/2 (i.e. half of the delay printed by
> > phc2sys). That would not work if the delay could be occasionally 50
> > microseconds for instance, i.e. the post_ts timestamp would be earlier
> > than the PHC timestamp.
> >
> If the timestamps are taken in the MDIO driver (imx-fec in my case), then
> everything is quite deterministic (see results in the cover letter). Of course,
> it still can be improved slightly, by splitting up the writel into iowmb and
> write_relaxed and disable the interrupts while capturing the timestamps
> (as I did in my original RFC patch). But phc2sys takes by default 5 measurements
> and uses the one with the smallest delay, so this shouldn't be necessary.
The delay that phc2sys sees is the difference between post_ts and
pre_ts, which doesn't contain the actual delay, right? So, I'm not
sure how well the phc2sys filtering actually works. Even if the
measured delay was related to the write delay, would be 5 measurements
always enough to get a "correct" sample?
> Although, by adding 2 * ptp_sts_offset the system timestamp to post_ts
> the timestamp is aligned with the PHC timestamp, but this also increases
> the delay which is reported by phc2sys (~26us). But the maximum error
> which must be expected is definitely much less (< 1us). So maybe it is better
> to shift both timestamps pre_ts and post_ts by ptp_sts_offset.
If you could guarantee that [pre_ts + ptp_sts_offset, post_ts +
ptp_sts_offset] always contains the PHC timestamps, then that would be
great. From what Andrew is writing, this doesn't seem to be the case.
I'd suggest a different approach:
- specify a minimum delay for the write and a minimum delay for the
completion (if they are not equal)
- take a second "post" system timestamp after the completion
- adjust pre_ts and post_ts so that the middle point is equal to what
you have now, but the interval contains both
pre_ts + min_write_delay and post2_ts - min_completion_delay
This way you get the best stability and also a delay that correctly
estimates the maximum error.
--
Miroslav Lichvar
> - take a second "post" system timestamp after the completion
For this hardware, completion is an interrupt, which has a lot of
jitter on it. But this hardware is odd, in that it uses an
interrupt. Every other MDIO bus controller uses polled IO, with an
mdelay(10) or similar between each poll. So the jitter is going to be
much larger.
Even though the FEC is special with its interrupt completion, i would
like to see the solution being reasonably generic so that others can
copy it into other MDIO bus drivers. That is what is nice about taking
the time stamp around the write which triggers the bus transaction. It
is independent of interrupt or polled, and should mean about the same
thing for different vendors hardware.
Andrew
On Tue, Aug 20, 2019 at 05:23:06PM +0200, Andrew Lunn wrote:
> > - take a second "post" system timestamp after the completion
>
> For this hardware, completion is an interrupt, which has a lot of
> jitter on it. But this hardware is odd, in that it uses an
> interrupt. Every other MDIO bus controller uses polled IO, with an
> mdelay(10) or similar between each poll. So the jitter is going to be
> much larger.
I think a large jitter is ok in this case. We just need to timestamp
something that we know for sure happened after the PHC timestamp. It
should have no impact on the offset and its stability, just the
reported delay. A test with phc2sys should be able to confirm that.
phc2sys selects the measurement with the shortest delay, which has
least uncertainty. I'd say that applies to both interrupt and polling.
If it is difficult to specify the minimum interrupt delay, I'd still
prefer an overly pessimistic interval assuming a zero delay.
--
Miroslav Lichvar
Am Di., 20. Aug. 2019 um 17:40 Uhr schrieb Miroslav Lichvar
<[email protected]>:
>
> On Tue, Aug 20, 2019 at 05:23:06PM +0200, Andrew Lunn wrote:
> > > - take a second "post" system timestamp after the completion
> >
> > For this hardware, completion is an interrupt, which has a lot of
> > jitter on it. But this hardware is odd, in that it uses an
> > interrupt. Every other MDIO bus controller uses polled IO, with an
> > mdelay(10) or similar between each poll. So the jitter is going to be
> > much larger.
>
> I think a large jitter is ok in this case. We just need to timestamp
> something that we know for sure happened after the PHC timestamp. It
> should have no impact on the offset and its stability, just the
> reported delay. A test with phc2sys should be able to confirm that.
> phc2sys selects the measurement with the shortest delay, which has
> least uncertainty. I'd say that applies to both interrupt and polling.
>
> If it is difficult to specify the minimum interrupt delay, I'd still
> prefer an overly pessimistic interval assuming a zero delay.
>
Currently I do not see the benefit from this. The original intention was to
compensate for the remaining offset as good as possible. The current code
of phc2sys uses the delay only for the filtering of the measurement record
with the shortest delay and for reporting and statistics. Why not simple shift
the timestamps with the offset to the point where we expect the PHC timestamp
to be captured, and we have a very good result compared to where we came
from.
Hubert
On Tue, Aug 20, 2019 at 06:56:56PM +0200, Hubert Feurstein wrote:
> Am Di., 20. Aug. 2019 um 17:40 Uhr schrieb Miroslav Lichvar
> > I think a large jitter is ok in this case. We just need to timestamp
> > something that we know for sure happened after the PHC timestamp. It
> > should have no impact on the offset and its stability, just the
> > reported delay. A test with phc2sys should be able to confirm that.
> > phc2sys selects the measurement with the shortest delay, which has
> > least uncertainty. I'd say that applies to both interrupt and polling.
> >
> > If it is difficult to specify the minimum interrupt delay, I'd still
> > prefer an overly pessimistic interval assuming a zero delay.
> >
> Currently I do not see the benefit from this. The original intention was to
> compensate for the remaining offset as good as possible.
That's ok, but IMHO the change should not break the assumptions of
existing application and users.
> The current code
> of phc2sys uses the delay only for the filtering of the measurement record
> with the shortest delay and for reporting and statistics. Why not simple shift
> the timestamps with the offset to the point where we expect the PHC timestamp
> to be captured, and we have a very good result compared to where we came
> from.
Because those reports/statistics are important in calculation of
maximum error. If someone had a requirement for a clock to be accurate
to 1.5 microseconds and the ioctl returned a delay indicating a
sufficient accuracy when in reality it could be worse, that would be a
problem.
BTW, phc2sys is not the only user of the ioctl.
--
Miroslav Lichvar
On Wed, Aug 21, 2019 at 11:53:12AM +0200, Hubert Feurstein wrote:
> Am Mi., 21. Aug. 2019 um 10:07 Uhr schrieb Miroslav Lichvar
> > Because those reports/statistics are important in calculation of
> > maximum error. If someone had a requirement for a clock to be accurate
> > to 1.5 microseconds and the ioctl returned a delay indicating a
> > sufficient accuracy when in reality it could be worse, that would be a
> > problem.
> >
> Ok, I understand your point. But including the MDIO completion into
> delay calculation
> will indicate a much wore precision as it actually is.
That's ok. It's the same with PCIe devices. It takes about 500 ns to
read a PCI register, so we know in the worst case the offset is
accurate to 250 ns. It's probably much better, maybe to 50 ns, but we
don't really know. We don't know how much asymmetry there is in the
PCIe delay (it certainly is not zero), or how much time the NIC
actually needs to respond to the command and when exactly it reads the
clock.
> When the MDIO
> driver implements
> the PTP system timestamping as follows ...
>
> ptp_read_system_prets(bus->ptp_sts);
> writel(value, mdio-reg)
> ptp_read_system_postts(bus->ptp_sts);
>
> ... then we catch already the error caused by interrupts which hit the
> pre/post_ts section.
> Now we only have the additional error of one MDIO clock cycle
> (~400ns). Because I expect
> the MDIO controller to shift out the MDIO frame on the next MDIO clock
> cycle.
Is this always the case?
> So if I subtract
> one MDIO clock cycle from pre_ts and add one MDIO clock cycle to
> post_ts the error indication
> would be sufficiently corrected IMHO.
If I understand it correctly, this ignores the time needed for the
frame to be received, decoded and the clock to be read.
--
Miroslav Lichvar
On Wed, 21 Aug 2019 at 12:53, Hubert Feurstein <[email protected]> wrote:
>
> Am Mi., 21. Aug. 2019 um 10:07 Uhr schrieb Miroslav Lichvar
> <[email protected]>:
> > > Currently I do not see the benefit from this. The original intention was to
> > > compensate for the remaining offset as good as possible.
> >
> > That's ok, but IMHO the change should not break the assumptions of
> > existing application and users.
> >
> > > The current code
> > > of phc2sys uses the delay only for the filtering of the measurement record
> > > with the shortest delay and for reporting and statistics. Why not simple shift
> > > the timestamps with the offset to the point where we expect the PHC timestamp
> > > to be captured, and we have a very good result compared to where we came
> > > from.
> >
> > Because those reports/statistics are important in calculation of
> > maximum error. If someone had a requirement for a clock to be accurate
> > to 1.5 microseconds and the ioctl returned a delay indicating a
> > sufficient accuracy when in reality it could be worse, that would be a
> > problem.
> >
> Ok, I understand your point. But including the MDIO completion into
> delay calculation
> will indicate a much wore precision as it actually is. When the MDIO
> driver implements
> the PTP system timestamping as follows ...
>
> ptp_read_system_prets(bus->ptp_sts);
> writel(value, mdio-reg)
> ptp_read_system_postts(bus->ptp_sts);
>
> ... then we catch already the error caused by interrupts which hit the
> pre/post_ts section.
> Now we only have the additional error of one MDIO clock cycle
> (~400ns). Because I expect
> the MDIO controller to shift out the MDIO frame on the next MDIO clock
> cycle. So if I subtract
How do you know?
The MDIO controller is a memory-mapped peripheral so there will be a
transaction on the core <-> peripheral interconnect in the SoC.
Depending on the system load, the transaction might not be
instantaneous as you think. Additionally there will be clock domain
crossings in the MDIO controller when transferring the command from
the platform clock to the peripheral clock, which might also add some
jitter.
MDIO frames may also begin with 32 bits of preamble, with some
controllers being able to suppress it. Have you checked/accounted for
this?
The only reliable moment when you know the MDIO command has completed
is when the controller says it has. If there is additional jitter in
waiting for the completion event caused by the GIC and the scheduling
of the ISR, then just switch the driver to poll mode, like everybody
else.
> one MDIO clock cycle from pre_ts and add one MDIO clock cycle to
> post_ts the error indication
> would be sufficiently corrected IMHO. And then we can shift both
> timestamps in the switch driver
> (in the gettimex handler) to compensate the switch depending offset.
> What do you think?
>
> Hubert
Regards,
-Vladimir
Am Mi., 21. Aug. 2019 um 10:07 Uhr schrieb Miroslav Lichvar
<[email protected]>:
> > Currently I do not see the benefit from this. The original intention was to
> > compensate for the remaining offset as good as possible.
>
> That's ok, but IMHO the change should not break the assumptions of
> existing application and users.
>
> > The current code
> > of phc2sys uses the delay only for the filtering of the measurement record
> > with the shortest delay and for reporting and statistics. Why not simple shift
> > the timestamps with the offset to the point where we expect the PHC timestamp
> > to be captured, and we have a very good result compared to where we came
> > from.
>
> Because those reports/statistics are important in calculation of
> maximum error. If someone had a requirement for a clock to be accurate
> to 1.5 microseconds and the ioctl returned a delay indicating a
> sufficient accuracy when in reality it could be worse, that would be a
> problem.
>
Ok, I understand your point. But including the MDIO completion into
delay calculation
will indicate a much wore precision as it actually is. When the MDIO
driver implements
the PTP system timestamping as follows ...
ptp_read_system_prets(bus->ptp_sts);
writel(value, mdio-reg)
ptp_read_system_postts(bus->ptp_sts);
... then we catch already the error caused by interrupts which hit the
pre/post_ts section.
Now we only have the additional error of one MDIO clock cycle
(~400ns). Because I expect
the MDIO controller to shift out the MDIO frame on the next MDIO clock
cycle. So if I subtract
one MDIO clock cycle from pre_ts and add one MDIO clock cycle to
post_ts the error indication
would be sufficiently corrected IMHO. And then we can shift both
timestamps in the switch driver
(in the gettimex handler) to compensate the switch depending offset.
What do you think?
Hubert
On Tue, 20 Aug 2019 at 11:49, Hubert Feurstein <[email protected]> wrote:
>
> From: Hubert Feurstein <[email protected]>
>
> In order to improve the synchronisation precision of phc2sys (from
> the linuxptp project) for devices like switches which are attached
> to the MDIO bus, it is necessary the get the system timestamps as
> close as possible to the access which causes the PTP timestamp
> register to be snapshotted in the switch hardware. Usually this is
> triggered by an MDIO write access, the snapshotted timestamp is then
> transferred by several MDIO reads.
>
> The ptp_read_system_*ts functions already check the ptp_sts pointer.
>
> Signed-off-by: Hubert Feurstein <[email protected]>
> ---
> drivers/net/ethernet/freescale/fec_main.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
> index c01d3ec3e9af..dd1253683ac0 100644
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -1815,10 +1815,12 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
> reinit_completion(&fep->mdio_done);
>
> /* start a write op */
> + ptp_read_system_prets(bus->ptp_sts);
> writel(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
> FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
> FEC_MMFR_TA | FEC_MMFR_DATA(value),
> fep->hwp + FEC_MII_DATA);
> + ptp_read_system_postts(bus->ptp_sts);
>
How do you know the core will not service an interrupt here?
Why are you not disabling (postponing) local interrupts after this
critical section? (which you were in the RFC)
If the argument is that you didn't notice any issue with phc2sys -N 5,
that's not a good argument. "Unlikely for a condition to happen" does
not mean deterministic.
Here is an example of the system servicing an interrupt during the
transmission of a 12-byte SPI transfer (proof that they can occur
anywhere where they aren't disabled):
https://drive.google.com/file/d/1rUZpfkBKHJGwQN4orFUWks_5i70wn-mj/view?usp=sharing
> /* wait for end of transfer */
> time_left = wait_for_completion_timeout(&fep->mdio_done,
> @@ -1956,7 +1958,7 @@ static int fec_enet_mii_init(struct platform_device *pdev)
> struct fec_enet_private *fep = netdev_priv(ndev);
> struct device_node *node;
> int err = -ENXIO;
> - u32 mii_speed, holdtime;
> + u32 mii_speed, mii_period, holdtime;
>
> /*
> * The i.MX28 dual fec interfaces are not equal.
> @@ -1993,6 +1995,7 @@ static int fec_enet_mii_init(struct platform_device *pdev)
> * document.
> */
> mii_speed = DIV_ROUND_UP(clk_get_rate(fep->clk_ipg), 5000000);
> + mii_period = div_u64((u64)mii_speed * 2 * NSEC_PER_SEC, clk_get_rate(fep->clk_ipg));
> if (fep->quirks & FEC_QUIRK_ENET_MAC)
> mii_speed--;
> if (mii_speed > 63) {
> @@ -2034,6 +2037,8 @@ static int fec_enet_mii_init(struct platform_device *pdev)
> pdev->name, fep->dev_id + 1);
> fep->mii_bus->priv = fep;
> fep->mii_bus->parent = &pdev->dev;
> + fep->mii_bus->flags = MII_BUS_F_PTP_STS_SUPPORTED;
> + fep->mii_bus->ptp_sts_offset = 32 * mii_period;
>
> node = of_get_child_by_name(pdev->dev.of_node, "mdio");
> err = of_mdiobus_register(fep->mii_bus, node);
> --
> 2.22.1
>
Regards,
-Vladimir
From: Hubert Feurstein <[email protected]>
Date: Tue, 20 Aug 2019 10:48:29 +0200
> From: Hubert Feurstein <[email protected]>
>
> Changelog:
> v3: mv88e6xxx_smi_indirect_write: forward ptp_sts only on the last write
> Copied Miroslav Lichvar because of PTP offset compensation patch
> v2: Added patch for PTP offset compensation
> Removed mdiobus_write_sts as there was no user
> Removed ptp_sts_supported-boolean and introduced flags instead
>
> With this patchset the phc2sys synchronisation precision improved to +/-555ns on
> an IMX6DL with an MV88E6220 switch attached.
>
> This patchset takes into account the comments from the following discussions:
> - https://lkml.org/lkml/2019/8/2/1364
> - https://lkml.org/lkml/2019/8/5/169
>
> Patch 01 adds the required infrastructure in the MDIO layer.
> Patch 02 adds additional PTP offset compensation.
> Patch 03 adds support for the PTP_SYS_OFFSET_EXTENDED ioctl in the mv88e6xxx driver.
> Patch 04 adds support for the PTP system timestamping in the imx-fec driver.
It looks like there is still some active discussion about these changes and
there will likely be another spin.