2023-04-14 08:21:39

by Horatiu Vultur

[permalink] [raw]
Subject: [PATCH net-next] net: lan966x: Fix lan966x_ifh_get

From time to time, it was observed that the nanosecond part of the
received timestamp, which is extracted from the IFH, it was actually
bigger than 1 second. So then when actually calculating the full
received timestamp, based on the nanosecond part from IFH and the second
part which is read from HW, it was actually wrong.

The issue seems to be inside the function lan966x_ifh_get, which
extracts information from an IFH(which is an byte array) and returns the
value in a u64. When extracting the timestamp value from the IFH, which
starts at bit 192 and have the size of 32 bits, then if the most
significant bit was set in the timestamp, then this bit was extended
then the return value became 0xffffffff... . To fix this, make sure to
clear all the other bits before returning the value.

Fixes: fd7627833ddf ("net: lan966x: Stop using packing library")
Signed-off-by: Horatiu Vultur <[email protected]>
---
drivers/net/ethernet/microchip/lan966x/lan966x_main.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
index 80e2ea7e6ce8a..508e494dcc342 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
@@ -608,6 +608,7 @@ static u64 lan966x_ifh_get(u8 *ifh, size_t pos, size_t length)
val |= (1 << i);
}

+ val &= GENMASK(length, 0);
return val;
}

--
2.38.0


2023-04-14 17:06:03

by Alexander Lobakin

[permalink] [raw]
Subject: Re: [PATCH net-next] net: lan966x: Fix lan966x_ifh_get

From: Horatiu Vultur <[email protected]>
Date: Fri, 14 Apr 2023 10:20:47 +0200

>>From time to time, it was observed that the nanosecond part of the
> received timestamp, which is extracted from the IFH, it was actually
> bigger than 1 second. So then when actually calculating the full
> received timestamp, based on the nanosecond part from IFH and the second
> part which is read from HW, it was actually wrong.
>
> The issue seems to be inside the function lan966x_ifh_get, which
> extracts information from an IFH(which is an byte array) and returns the
> value in a u64. When extracting the timestamp value from the IFH, which
> starts at bit 192 and have the size of 32 bits, then if the most
> significant bit was set in the timestamp, then this bit was extended
> then the return value became 0xffffffff... . To fix this, make sure to
> clear all the other bits before returning the value.

Ooooh, I remember I was having the same issue with sign extension :s
Pls see below.

>
> Fixes: fd7627833ddf ("net: lan966x: Stop using packing library")
> Signed-off-by: Horatiu Vultur <[email protected]>
> ---
> drivers/net/ethernet/microchip/lan966x/lan966x_main.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
> index 80e2ea7e6ce8a..508e494dcc342 100644
> --- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
> +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
> @@ -608,6 +608,7 @@ static u64 lan966x_ifh_get(u8 *ifh, size_t pos, size_t length)
> val |= (1 << i);

Alternatively, you can change that to (pick one that you like the most):

val |= 1ULL << i;
// or
val |= BIT_ULL(i);

The thing is that constants without any postfix (U, UL etc.) are treated
as signed longs, that's why `1 << 31` becomes 0xffffffff80000000. 1U /
1UL / 1ULL don't.

Adding unsigned postfix may also make it better for 32-bit systems, as
`1 << i` there is 32-bit value, so `1 << 48` may go wrong and/or even
trigger compilers.

> }
>
> + val &= GENMASK(length, 0);
> return val;
> }
>

(now blah not directly related to the fix)

I'm wondering a bit if lan966x_ifh_get() can be improved in general to
work with words rather than bits. You read one byte per each bit each
iteration there.
For example, byte arrays could be casted to __be{32,64} and you'd get
native byteorder for 32/64 bits via one __be*_to_cpu*() call.

Thanks,
Olek

2023-04-15 01:32:34

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next] net: lan966x: Fix lan966x_ifh_get

On Fri, 14 Apr 2023 19:00:20 +0200 Alexander Lobakin wrote:
> > @@ -608,6 +608,7 @@ static u64 lan966x_ifh_get(u8 *ifh, size_t pos, size_t length)
> > val |= (1 << i);
>
> Alternatively, you can change that to (pick one that you like the most):
>
> val |= 1ULL << i;
> // or
> val |= BIT_ULL(i);

// or
(u64)1 << i
// or, since you're only concerned about sign extension, even
1U << i

having the correct type of the lval seems cleaner than masking, indeed.

2023-04-17 07:17:18

by Horatiu Vultur

[permalink] [raw]
Subject: Re: [PATCH net-next] net: lan966x: Fix lan966x_ifh_get

The 04/14/2023 19:00, Alexander Lobakin wrote:
>
> From: Horatiu Vultur <[email protected]>
> Date: Fri, 14 Apr 2023 10:20:47 +0200

Hi Olek,

>
> >>From time to time, it was observed that the nanosecond part of the
> > received timestamp, which is extracted from the IFH, it was actually
> > bigger than 1 second. So then when actually calculating the full
> > received timestamp, based on the nanosecond part from IFH and the second
> > part which is read from HW, it was actually wrong.
> >
> > The issue seems to be inside the function lan966x_ifh_get, which
> > extracts information from an IFH(which is an byte array) and returns the
> > value in a u64. When extracting the timestamp value from the IFH, which
> > starts at bit 192 and have the size of 32 bits, then if the most
> > significant bit was set in the timestamp, then this bit was extended
> > then the return value became 0xffffffff... . To fix this, make sure to
> > clear all the other bits before returning the value.
>
> Ooooh, I remember I was having the same issue with sign extension :s
> Pls see below.
>
> >
> > Fixes: fd7627833ddf ("net: lan966x: Stop using packing library")
> > Signed-off-by: Horatiu Vultur <[email protected]>
> > ---
> > drivers/net/ethernet/microchip/lan966x/lan966x_main.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
> > index 80e2ea7e6ce8a..508e494dcc342 100644
> > --- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
> > +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
> > @@ -608,6 +608,7 @@ static u64 lan966x_ifh_get(u8 *ifh, size_t pos, size_t length)
> > val |= (1 << i);
>
> Alternatively, you can change that to (pick one that you like the most):
>
> val |= 1ULL << i;
> // or
> val |= BIT_ULL(i);
>
> The thing is that constants without any postfix (U, UL etc.) are treated
> as signed longs, that's why `1 << 31` becomes 0xffffffff80000000. 1U /
> 1UL / 1ULL don't.
>
> Adding unsigned postfix may also make it better for 32-bit systems, as
> `1 << i` there is 32-bit value, so `1 << 48` may go wrong and/or even
> trigger compilers.

Thanks for suggestion and the explanation, it was really helpful.
I will update this in the next version.

>
> > }
> >
> > + val &= GENMASK(length, 0);
> > return val;
> > }
> >
>
> (now blah not directly related to the fix)

I think this change regarding the improvement of the lan966x_ifh_get
should not be in the next version of this patch, as there are 2
different things. But I would still like to know how to do this!
>
> I'm wondering a bit if lan966x_ifh_get() can be improved in general to
> work with words rather than bits. You read one byte per each bit each
> iteration there.

Actually, I am not reading 1 byte per each bit iteration. I am reading 1
byte first time when entering in the loop or each time when the bit
iteration (j variable) is modulo 8.

> For example, byte arrays could be casted to __be{32,64} and you'd get
> native byteorder for 32/64 bits via one __be*_to_cpu*() call.

>
> Thanks,
> Olek

--
/Horatiu