There is a possibility that the context descriptor still owned by the DMA
even the previous normal descriptor own bit is already cleared. Checking
the context descriptor readiness without delay might be not enough time
for the DMA to update the descriptor field, which causing failure in
getting HW Rx timestamp.
This patch introduces a 1us fsleep() in HW Rx timestamp checking loop
to give time for DMA to update/complete the context descriptor.
ptp4l Timestamp log without this patch:
-----------------------------------------------------------
$ echo 10000 > /sys/class/net/enp0s30f4/gro_flush_timeout
$ echo 10000 > /sys/class/net/enp0s30f4/napi_defer_hard_irqs
$ ptp4l -P2Hi enp0s30f4 --step_threshold=1 -m
ptp4l: selected /dev/ptp2 as PTP clock
ptp4l: port 1: INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l: selected local clock 901210.fffe.b57df7 as best master
ptp4l: port 1: new foreign master 22bb22.fffe.bb22bb-1
ptp4l: selected best master clock 22bb22.fffe.bb22bb
ptp4l: port 1: LISTENING to UNCALIBRATED on RS_SLAVE
ptp4l: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
ptp4l: port 1: received SYNC without timestamp
ptp4l: rms 49 max 63 freq -9573 +/- 34 delay 71 +/- 1
ptp4l: rms 15 max 25 freq -9553 +/- 20 delay 72 +/- 0
ptp4l: port 1: received SYNC without timestamp
ptp4l: rms 9 max 18 freq -9540 +/- 11 delay 70 +/- 0
ptp4l: port 1: received PDELAY_REQ without timestamp
ptp4l: rms 16 max 29 freq -9519 +/- 12 delay 72 +/- 0
ptp4l: port 1: received PDELAY_REQ without timestamp
ptp4l: rms 9 max 18 freq -9527 +/- 12 delay 72 +/- 0
ptp4l: rms 5 max 9 freq -9530 +/- 7 delay 70 +/- 0
ptp4l: rms 11 max 20 freq -9530 +/- 16 delay 72 +/- 0
ptp4l: rms 5 max 11 freq -9530 +/- 7 delay 74 +/- 0
ptp4l: rms 6 max 9 freq -9522 +/- 7 delay 72 +/- 0
ptp4l: port 1: received PDELAY_REQ without timestamp
-----------------------------------------------------------
ptp4l Timestamp log with this patch:
-----------------------------------------------------------
$ echo 10000 > /sys/class/net/enp0s30f4/gro_flush_timeout
$ echo 10000 > /sys/class/net/enp0s30f4/napi_defer_hard_irqs
$ ptp4l -P2Hi enp0s30f4 --step_threshold=1 -m
ptp4l: selected /dev/ptp2 as PTP clock
ptp4l: port 1: INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l: selected local clock 901210.fffe.b57df7 as best master
ptp4l: port 1: new foreign master 22bb22.fffe.bb22bb-1
ptp4l: selected best master clock 22bb22.fffe.bb22bb
ptp4l: port 1: LISTENING to UNCALIBRATED on RS_SLAVE
ptp4l: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
ptp4l: rms 30 max 45 freq -9400 +/- 23 delay 72 +/- 0
ptp4l: rms 7 max 16 freq -9414 +/- 10 delay 70 +/- 0
ptp4l: rms 6 max 9 freq -9422 +/- 6 delay 72 +/- 0
ptp4l: rms 13 max 20 freq -9436 +/- 13 delay 74 +/- 0
ptp4l: rms 12 max 27 freq -9446 +/- 11 delay 72 +/- 0
ptp4l: rms 9 max 12 freq -9453 +/- 6 delay 74 +/- 0
ptp4l: rms 9 max 15 freq -9438 +/- 11 delay 74 +/- 0
ptp4l: rms 10 max 16 freq -9435 +/- 12 delay 74 +/- 0
ptp4l: rms 8 max 18 freq -9428 +/- 8 delay 72 +/- 0
ptp4l: rms 8 max 18 freq -9423 +/- 8 delay 72 +/- 0
ptp4l: rms 9 max 16 freq -9431 +/- 12 delay 70 +/- 0
ptp4l: rms 9 max 18 freq -9441 +/- 9 delay 72 +/- 0
-----------------------------------------------------------
Fixes: ba1ffd74df74 ("stmmac: fix PTP support for GMAC4")
Cc: <[email protected]> # 5.4.x
Signed-off-by: Song Yoong Siang <[email protected]>
Signed-off-by: Tan Tee Min <[email protected]>
---
drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
index d3b4765c1a5b..289bf26a6105 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
@@ -279,10 +279,11 @@ static int dwmac4_wrback_get_rx_timestamp_status(void *desc, void *next_desc,
/* Check if timestamp is OK from context descriptor */
do {
ret = dwmac4_rx_check_timestamp(next_desc);
- if (ret < 0)
+ if (ret <= 0)
goto exit;
i++;
+ fsleep(1);
} while ((ret == 1) && (i < 10));
if (i == 10)
--
2.25.1
On Wed, Apr 13, 2022 at 05:59:15AM -0700, Richard Cochran wrote:
> On Wed, Apr 13, 2022 at 12:01:15PM +0800, Tan Tee Min wrote:
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
> > index d3b4765c1a5b..289bf26a6105 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
> > +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
> > @@ -279,10 +279,11 @@ static int dwmac4_wrback_get_rx_timestamp_status(void *desc, void *next_desc,
> > /* Check if timestamp is OK from context descriptor */
> > do {
> > ret = dwmac4_rx_check_timestamp(next_desc);
> > - if (ret < 0)
> > + if (ret <= 0)
> > goto exit;
> > i++;
> >
> > + fsleep(1);
>
> This is nutty. Why isn't this code using proper deferral mechanisms
> like work or kthread?
Appreciate your comment.
The dwmac4_wrback_get_rx_timestamp_status() is called by stmmac_rx()
function which is scheduled by NAPI framework.
Do we still need to create deferred work inside NAPI work?
Would you mind to explain it more in detail?
>
> > } while ((ret == 1) && (i < 10));
> >
> > if (i == 10)
> > --
> > 2.25.1
> >
>
> Thanks,
> Richard
Thanks,
Tee Min
On Wed, Apr 13, 2022 at 12:01:15PM +0800, Tan Tee Min wrote:
> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
> index d3b4765c1a5b..289bf26a6105 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
> @@ -279,10 +279,11 @@ static int dwmac4_wrback_get_rx_timestamp_status(void *desc, void *next_desc,
> /* Check if timestamp is OK from context descriptor */
> do {
> ret = dwmac4_rx_check_timestamp(next_desc);
> - if (ret < 0)
> + if (ret <= 0)
> goto exit;
> i++;
>
> + fsleep(1);
This is nutty. Why isn't this code using proper deferral mechanisms
like work or kthread?
> } while ((ret == 1) && (i < 10));
>
> if (i == 10)
> --
> 2.25.1
>
Thanks,
Richard
On Thu, 14 Apr 2022 15:29:34 +0800 Tan Tee Min wrote:
> > > --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
> > > +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
> > > @@ -279,10 +279,11 @@ static int dwmac4_wrback_get_rx_timestamp_status(void *desc, void *next_desc,
> > > /* Check if timestamp is OK from context descriptor */
> > > do {
> > > ret = dwmac4_rx_check_timestamp(next_desc);
> > > - if (ret < 0)
> > > + if (ret <= 0)
> > > goto exit;
> > > i++;
> > >
> > > + fsleep(1);
> >
> > This is nutty. Why isn't this code using proper deferral mechanisms
> > like work or kthread?
>
> Appreciate your comment.
> The dwmac4_wrback_get_rx_timestamp_status() is called by stmmac_rx()
> function which is scheduled by NAPI framework.
> Do we still need to create deferred work inside NAPI work?
> Would you mind to explain it more in detail?
fsleep() is a big hammer, can you try cpu_relax() and bumping the max
loop count a little?
On Thu, Apr 14, 2022 at 10:42:59AM +0200, Jakub Kicinski wrote:
> On Thu, 14 Apr 2022 15:29:34 +0800 Tan Tee Min wrote:
> > > > --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
> > > > +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
> > > > @@ -279,10 +279,11 @@ static int dwmac4_wrback_get_rx_timestamp_status(void *desc, void *next_desc,
> > > > /* Check if timestamp is OK from context descriptor */
> > > > do {
> > > > ret = dwmac4_rx_check_timestamp(next_desc);
> > > > - if (ret < 0)
> > > > + if (ret <= 0)
> > > > goto exit;
> > > > i++;
> > > >
> > > > + fsleep(1);
> > >
> > > This is nutty. Why isn't this code using proper deferral mechanisms
> > > like work or kthread?
> >
> > Appreciate your comment.
> > The dwmac4_wrback_get_rx_timestamp_status() is called by stmmac_rx()
> > function which is scheduled by NAPI framework.
> > Do we still need to create deferred work inside NAPI work?
> > Would you mind to explain it more in detail?
>
> fsleep() is a big hammer, can you try cpu_relax() and bumping the max
> loop count a little?
Thanks for the suggestion!
I tried cpu_relax(), unfortunately the issue still happens when
the system is in a high-load situation.
I agree that the fsleep(1) (=1us) is a big hammer.
Thus in order to improve this, I’ve figured out a smaller delay
time that is enough for the context descriptor to be ready which
is ndelay(500) (=500ns).
Would you think this is more acceptable?
On Tue, Apr 19, 2022 at 08:52:20AM +0800, Tan Tee Min wrote:
> I agree that the fsleep(1) (=1us) is a big hammer.
> Thus in order to improve this, I’ve figured out a smaller delay
> time that is enough for the context descriptor to be ready which
> is ndelay(500) (=500ns).
Why isn't the context descriptor ready?
I mean, the frame already belongs to the CPU, right?
Thanks,
Richard
On Tue, Apr 19, 2022 at 06:28:53AM -0700, Richard Cochran wrote:
> On Tue, Apr 19, 2022 at 08:52:20AM +0800, Tan Tee Min wrote:
>
> > I agree that the fsleep(1) (=1us) is a big hammer.
> > Thus in order to improve this, I’ve figured out a smaller delay
> > time that is enough for the context descriptor to be ready which
> > is ndelay(500) (=500ns).
>
> Why isn't the context descriptor ready?
>
> I mean, the frame already belongs to the CPU, right?
No. The context descriptor (frame) is possibly still owned by the
DMA controller in this situation.
This is why a looping in the original code to wait for the descriptor
to be owned by the application CPU. However, when NAPI is busy polling,
the context descriptor might be still owned by the DMA controller even
after the looping.
Thus, we are adding an additional nanosecond delay inside the loop,
so that the DMA controller can get a short moment to breathe and
complete the context descriptor.
Thanks,
Tee Min
>
> Thanks,
> Richard
On Wed, Apr 20, 2022 at 01:15:08PM +0800, Tan Tee Min wrote:
> No. The context descriptor (frame) is possibly still owned by the
> DMA controller in this situation.
So that is a problem. The solution is to postpone this logic until
the driver owns the buffer. Doesn't the HW offer some means of
notification, like an interrupt for example?
Thanks,
Richard