LinuxLists.cc - [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

On 2/23/24 14:11, Charlie Jenkins wrote:
> The test cases for ip_fast_csum and csum_ipv6_magic were not properly
> aligning the IP header, which were causing failures on architectures
> that do not support misaligned accesses like some ARM platforms. To
> solve this, align the data along (14 + NET_IP_ALIGN) bytes which is the
> standard alignment of an IP header and must be supported by the
> architecture.
>
> Furthermore, all architectures except the m68k pad "struct
> csum_ipv6_magic_data" to 44 bits. To make compatible with the m68k,
> manually pad this structure to 44 bits.
>
> Fixes: 6f4c45cbcb00 ("kunit: Add tests for csum_ipv6_magic and ip_fast_csum")
> Signed-off-by: Charlie Jenkins <[email protected]>
> Reviewed-by: Guenter Roeck <[email protected]>
> Acked-by: Palmer Dabbelt <[email protected]>

Tested-by: Guenter Roeck <[email protected]>

2024-02-26 11:36:47

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

Le 23/02/2024 à 23:11, Charlie Jenkins a écrit :
> The test cases for ip_fast_csum and csum_ipv6_magic were not properly
> aligning the IP header, which were causing failures on architectures
> that do not support misaligned accesses like some ARM platforms. To
> solve this, align the data along (14 + NET_IP_ALIGN) bytes which is the
> standard alignment of an IP header and must be supported by the
> architecture.

I'm still wondering what we are really trying to fix here.

All other tests are explicitely testing that it works with any alignment.

Shouldn't ip_fast_csum() and csum_ipv6_magic() work for any alignment as
well ? I would expect it, I see no comment in arm code which explicits
that assumption around those functions.

Isn't the problem only the following line, because csum_offset is
unaligned ?

csum = *(__wsum *)(random_buf + i + csum_offset);

Otherwise, if there really is an alignment issue for the IPv6 source or
destination address, isn't it enough to perform a 32 bits alignment ?

I guess we should involve ARM people in this discussion.

Christophe

2024-02-26 11:48:36

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

On Mon, Feb 26, 2024 at 11:34:51AM +0000, Christophe Leroy wrote:
> Le 23/02/2024 ? 23:11, Charlie Jenkins a ?crit?:
> > The test cases for ip_fast_csum and csum_ipv6_magic were not properly
> > aligning the IP header, which were causing failures on architectures
> > that do not support misaligned accesses like some ARM platforms. To
> > solve this, align the data along (14 + NET_IP_ALIGN) bytes which is the
> > standard alignment of an IP header and must be supported by the
> > architecture.
>
> I'm still wondering what we are really trying to fix here.
>
> All other tests are explicitely testing that it works with any alignment.
>
> Shouldn't ip_fast_csum() and csum_ipv6_magic() work for any alignment as
> well ? I would expect it, I see no comment in arm code which explicits
> that assumption around those functions.

No, these functions are explicitly *not* designed to be used with any
alignment. They are for 16-bit alignment only.

I'm not sure where the idea that "any alignment" has come from, but it's
never been the case AFAIK that we've supported that - or if we do now,
that's something which has crept in under the radar.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2024-02-26 12:04:10

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

On Mon, Feb 26, 2024 at 11:57:24AM +0000, Christophe Leroy wrote:
>
>
> Le 26/02/2024 ? 12:47, Russell King (Oracle) a ?crit?:
> > On Mon, Feb 26, 2024 at 11:34:51AM +0000, Christophe Leroy wrote:
> >> Le 23/02/2024 ? 23:11, Charlie Jenkins a ?crit?:
> >>> The test cases for ip_fast_csum and csum_ipv6_magic were not properly
> >>> aligning the IP header, which were causing failures on architectures
> >>> that do not support misaligned accesses like some ARM platforms. To
> >>> solve this, align the data along (14 + NET_IP_ALIGN) bytes which is the
> >>> standard alignment of an IP header and must be supported by the
> >>> architecture.
> >>
> >> I'm still wondering what we are really trying to fix here.
> >>
> >> All other tests are explicitely testing that it works with any alignment.
> >>
> >> Shouldn't ip_fast_csum() and csum_ipv6_magic() work for any alignment as
> >> well ? I would expect it, I see no comment in arm code which explicits
> >> that assumption around those functions.
> >
> > No, these functions are explicitly *not* designed to be used with any
> > alignment. They are for 16-bit alignment only.
> >
> > I'm not sure where the idea that "any alignment" has come from, but it's
> > never been the case AFAIK that we've supported that - or if we do now,
> > that's something which has crept in under the radar.
> >
>
> Ok, 16-bit is fine for me, then there is no need to require a (14 +
> NET_IP_ALIGN) ie a 16-bytes (128-bit) alignment as this patch is doing.

Looking again at these two functions, I'm mistaken - this was written for
optimal use with 32-bit alignment, not 16-bit. However, the entire IP
layer is written with the assumption that for maximum performance, the IP
header will be 32-bit aligned.

However, that may not always be the case for incoming packets, and what
saves 32-bit Arm is the ability to do unaligned loads in later revisions
of the architecture, or the alignment fault handler (slow) on older
revisions.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2024-02-26 12:20:52

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

Le 26/02/2024 à 12:47, Russell King (Oracle) a écrit :
> On Mon, Feb 26, 2024 at 11:34:51AM +0000, Christophe Leroy wrote:
>> Le 23/02/2024 à 23:11, Charlie Jenkins a écrit :
>>> The test cases for ip_fast_csum and csum_ipv6_magic were not properly
>>> aligning the IP header, which were causing failures on architectures
>>> that do not support misaligned accesses like some ARM platforms. To
>>> solve this, align the data along (14 + NET_IP_ALIGN) bytes which is the
>>> standard alignment of an IP header and must be supported by the
>>> architecture.
>>
>> I'm still wondering what we are really trying to fix here.
>>
>> All other tests are explicitely testing that it works with any alignment.
>>
>> Shouldn't ip_fast_csum() and csum_ipv6_magic() work for any alignment as
>> well ? I would expect it, I see no comment in arm code which explicits
>> that assumption around those functions.
>
> No, these functions are explicitly *not* designed to be used with any
> alignment. They are for 16-bit alignment only.
>
> I'm not sure where the idea that "any alignment" has come from, but it's
> never been the case AFAIK that we've supported that - or if we do now,
> that's something which has crept in under the radar.
>

Ok, 16-bit is fine for me, then there is no need to require a (14 +
NET_IP_ALIGN) ie a 16-bytes (128-bit) alignment as this patch is doing.

Christophe

2024-02-26 16:44:42

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

On 2/26/24 03:34, Christophe Leroy wrote:
>
>
> Le 23/02/2024 à 23:11, Charlie Jenkins a écrit :
>> The test cases for ip_fast_csum and csum_ipv6_magic were not properly
>> aligning the IP header, which were causing failures on architectures
>> that do not support misaligned accesses like some ARM platforms. To
>> solve this, align the data along (14 + NET_IP_ALIGN) bytes which is the
>> standard alignment of an IP header and must be supported by the
>> architecture.
>
> I'm still wondering what we are really trying to fix here.
>
> All other tests are explicitely testing that it works with any alignment.
>
> Shouldn't ip_fast_csum() and csum_ipv6_magic() work for any alignment as
> well ? I would expect it, I see no comment in arm code which explicits
> that assumption around those functions.
>
> Isn't the problem only the following line, because csum_offset is
> unaligned ?
>
> csum = *(__wsum *)(random_buf + i + csum_offset);
>
> Otherwise, if there really is an alignment issue for the IPv6 source or
> destination address, isn't it enough to perform a 32 bits alignment ?
>

It isn't just arm.

Question should be what alignments the functions are supposed to be able
to handle, not what they are optimized for. If byte and/or half word alignments
are expected to be supported, there is still architecture code which would
have to be fixed. Unaligned accesses are known to fail on hppa64/parisc64
and on sh4, for example. If unaligned accesses are expected to be handled,
it would probably make sense to add a separate test case, though, to clarify
that the test fails due to alignment issues, not due to input parameters.

Thanks,
Guenter

2024-02-26 17:56:00

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

On Mon, Feb 26, 2024 at 08:44:29AM -0800, Guenter Roeck wrote:
> On 2/26/24 03:34, Christophe Leroy wrote:
> >
> >
> > Le 23/02/2024 ? 23:11, Charlie Jenkins a ?crit?:
> > > The test cases for ip_fast_csum and csum_ipv6_magic were not properly
> > > aligning the IP header, which were causing failures on architectures
> > > that do not support misaligned accesses like some ARM platforms. To
> > > solve this, align the data along (14 + NET_IP_ALIGN) bytes which is the
> > > standard alignment of an IP header and must be supported by the
> > > architecture.
> >
> > I'm still wondering what we are really trying to fix here.
> >
> > All other tests are explicitely testing that it works with any alignment.
> >
> > Shouldn't ip_fast_csum() and csum_ipv6_magic() work for any alignment as
> > well ? I would expect it, I see no comment in arm code which explicits
> > that assumption around those functions.
> >
> > Isn't the problem only the following line, because csum_offset is
> > unaligned ?
> >
> > csum = *(__wsum *)(random_buf + i + csum_offset);
> >
> > Otherwise, if there really is an alignment issue for the IPv6 source or
> > destination address, isn't it enough to perform a 32 bits alignment ?
> >
>
> It isn't just arm.
>
> Question should be what alignments the functions are supposed to be able
> to handle, not what they are optimized for. If byte and/or half word alignments
> are expected to be supported, there is still architecture code which would
> have to be fixed. Unaligned accesses are known to fail on hppa64/parisc64
> and on sh4, for example. If unaligned accesses are expected to be handled,
> it would probably make sense to add a separate test case, though, to clarify
> that the test fails due to alignment issues, not due to input parameters.

It's network driver dependent. Most network drivers receive packets
to the offset defined by NET_IP_ALIGN (which is normally 2) which
has the effect of "mis-aligning" the ethernet header, but aligning
the IP header.

Whether drivers do that is up to drivers (and their capabilities).
Some network drivers can not do this kind of alignment, so there are
cases where the received packets aren't offset by two bytes, leading
to the IP header being aligned to an odd 16-bit word rather than an
even 16-bit word (and thus 32-bit aligned.)

Then you have the possibility of other headers between the ethernet
and IP header - not only things like VLANs, but also possibly DSA
headers (for switches) and how big those are.

There's a lot to be researched here!

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2024-02-26 18:35:32

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

On Mon, Feb 26, 2024 at 05:50:57PM +0000, Russell King (Oracle) wrote:
> On Mon, Feb 26, 2024 at 08:44:29AM -0800, Guenter Roeck wrote:
> > On 2/26/24 03:34, Christophe Leroy wrote:
> > >
> > >
> > > Le 23/02/2024 ? 23:11, Charlie Jenkins a ?crit?:
> > > > The test cases for ip_fast_csum and csum_ipv6_magic were not properly
> > > > aligning the IP header, which were causing failures on architectures
> > > > that do not support misaligned accesses like some ARM platforms. To
> > > > solve this, align the data along (14 + NET_IP_ALIGN) bytes which is the
> > > > standard alignment of an IP header and must be supported by the
> > > > architecture.
> > >
> > > I'm still wondering what we are really trying to fix here.
> > >
> > > All other tests are explicitely testing that it works with any alignment.
> > >
> > > Shouldn't ip_fast_csum() and csum_ipv6_magic() work for any alignment as
> > > well ? I would expect it, I see no comment in arm code which explicits
> > > that assumption around those functions.
> > >
> > > Isn't the problem only the following line, because csum_offset is
> > > unaligned ?
> > >
> > > csum = *(__wsum *)(random_buf + i + csum_offset);
> > >
> > > Otherwise, if there really is an alignment issue for the IPv6 source or
> > > destination address, isn't it enough to perform a 32 bits alignment ?
> > >
> >
> > It isn't just arm.
> >
> > Question should be what alignments the functions are supposed to be able
> > to handle, not what they are optimized for. If byte and/or half word alignments
> > are expected to be supported, there is still architecture code which would
> > have to be fixed. Unaligned accesses are known to fail on hppa64/parisc64
> > and on sh4, for example. If unaligned accesses are expected to be handled,
> > it would probably make sense to add a separate test case, though, to clarify
> > that the test fails due to alignment issues, not due to input parameters.
>
> It's network driver dependent. Most network drivers receive packets
> to the offset defined by NET_IP_ALIGN (which is normally 2) which
> has the effect of "mis-aligning" the ethernet header, but aligning
> the IP header.
>
> Whether drivers do that is up to drivers (and their capabilities).
> Some network drivers can not do this kind of alignment, so there are
> cases where the received packets aren't offset by two bytes, leading
> to the IP header being aligned to an odd 16-bit word rather than an
> even 16-bit word (and thus 32-bit aligned.)
>
> Then you have the possibility of other headers between the ethernet
> and IP header - not only things like VLANs, but also possibly DSA
> headers (for switches) and how big those are.

Those additional combinations can be supported by future test cases,
but the goal of this patch was simply to have basic testing for these
functions. The NET_IP_ALIGN offset is what the kernel defines to be
supported, so that is the test case I went for.

- Charlie

>
> There's a lot to be researched here!
>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2024-02-26 19:10:58

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

On Mon, Feb 26, 2024 at 10:35:18AM -0800, Charlie Jenkins wrote:
> On Mon, Feb 26, 2024 at 05:50:57PM +0000, Russell King (Oracle) wrote:
> > On Mon, Feb 26, 2024 at 08:44:29AM -0800, Guenter Roeck wrote:
> > > On 2/26/24 03:34, Christophe Leroy wrote:
> > > >
> > > >
> > > > Le 23/02/2024 ? 23:11, Charlie Jenkins a ?crit?:
> > > > > The test cases for ip_fast_csum and csum_ipv6_magic were not properly
> > > > > aligning the IP header, which were causing failures on architectures
> > > > > that do not support misaligned accesses like some ARM platforms. To
> > > > > solve this, align the data along (14 + NET_IP_ALIGN) bytes which is the
> > > > > standard alignment of an IP header and must be supported by the
> > > > > architecture.
> > > >
> > > > I'm still wondering what we are really trying to fix here.
> > > >
> > > > All other tests are explicitely testing that it works with any alignment.
> > > >
> > > > Shouldn't ip_fast_csum() and csum_ipv6_magic() work for any alignment as
> > > > well ? I would expect it, I see no comment in arm code which explicits
> > > > that assumption around those functions.
> > > >
> > > > Isn't the problem only the following line, because csum_offset is
> > > > unaligned ?
> > > >
> > > > csum = *(__wsum *)(random_buf + i + csum_offset);
> > > >
> > > > Otherwise, if there really is an alignment issue for the IPv6 source or
> > > > destination address, isn't it enough to perform a 32 bits alignment ?
> > > >
> > >
> > > It isn't just arm.
> > >
> > > Question should be what alignments the functions are supposed to be able
> > > to handle, not what they are optimized for. If byte and/or half word alignments
> > > are expected to be supported, there is still architecture code which would
> > > have to be fixed. Unaligned accesses are known to fail on hppa64/parisc64
> > > and on sh4, for example. If unaligned accesses are expected to be handled,
> > > it would probably make sense to add a separate test case, though, to clarify
> > > that the test fails due to alignment issues, not due to input parameters.
> >
> > It's network driver dependent. Most network drivers receive packets
> > to the offset defined by NET_IP_ALIGN (which is normally 2) which
> > has the effect of "mis-aligning" the ethernet header, but aligning
> > the IP header.
> >
> > Whether drivers do that is up to drivers (and their capabilities).
> > Some network drivers can not do this kind of alignment, so there are
> > cases where the received packets aren't offset by two bytes, leading
> > to the IP header being aligned to an odd 16-bit word rather than an
> > even 16-bit word (and thus 32-bit aligned.)
> >
> > Then you have the possibility of other headers between the ethernet
> > and IP header - not only things like VLANs, but also possibly DSA
> > headers (for switches) and how big those are.
>
> Those additional combinations can be supported by future test cases,
> but the goal of this patch was simply to have basic testing for these
> functions. The NET_IP_ALIGN offset is what the kernel defines to be
> supported, so that is the test case I went for.

I think you misunderstand. "NET_IP_ALIGN offset is what the kernel
defines to be supported" is a gross misinterpretation. It is not
"defined to be supported" at all. It is the _preferred_ alignment
nothing more, nothing less.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2024-02-26 19:25:11

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

On Mon, Feb 26, 2024 at 07:06:46PM +0000, Russell King (Oracle) wrote:
> On Mon, Feb 26, 2024 at 10:35:18AM -0800, Charlie Jenkins wrote:
> > On Mon, Feb 26, 2024 at 05:50:57PM +0000, Russell King (Oracle) wrote:
> > > On Mon, Feb 26, 2024 at 08:44:29AM -0800, Guenter Roeck wrote:
> > > > On 2/26/24 03:34, Christophe Leroy wrote:
> > > > >
> > > > >
> > > > > Le 23/02/2024 ? 23:11, Charlie Jenkins a ?crit?:
> > > > > > The test cases for ip_fast_csum and csum_ipv6_magic were not properly
> > > > > > aligning the IP header, which were causing failures on architectures
> > > > > > that do not support misaligned accesses like some ARM platforms. To
> > > > > > solve this, align the data along (14 + NET_IP_ALIGN) bytes which is the
> > > > > > standard alignment of an IP header and must be supported by the
> > > > > > architecture.
> > > > >
> > > > > I'm still wondering what we are really trying to fix here.
> > > > >
> > > > > All other tests are explicitely testing that it works with any alignment.
> > > > >
> > > > > Shouldn't ip_fast_csum() and csum_ipv6_magic() work for any alignment as
> > > > > well ? I would expect it, I see no comment in arm code which explicits
> > > > > that assumption around those functions.
> > > > >
> > > > > Isn't the problem only the following line, because csum_offset is
> > > > > unaligned ?
> > > > >
> > > > > csum = *(__wsum *)(random_buf + i + csum_offset);
> > > > >
> > > > > Otherwise, if there really is an alignment issue for the IPv6 source or
> > > > > destination address, isn't it enough to perform a 32 bits alignment ?
> > > > >
> > > >
> > > > It isn't just arm.
> > > >
> > > > Question should be what alignments the functions are supposed to be able
> > > > to handle, not what they are optimized for. If byte and/or half word alignments
> > > > are expected to be supported, there is still architecture code which would
> > > > have to be fixed. Unaligned accesses are known to fail on hppa64/parisc64
> > > > and on sh4, for example. If unaligned accesses are expected to be handled,
> > > > it would probably make sense to add a separate test case, though, to clarify
> > > > that the test fails due to alignment issues, not due to input parameters.
> > >
> > > It's network driver dependent. Most network drivers receive packets
> > > to the offset defined by NET_IP_ALIGN (which is normally 2) which
> > > has the effect of "mis-aligning" the ethernet header, but aligning
> > > the IP header.
> > >
> > > Whether drivers do that is up to drivers (and their capabilities).
> > > Some network drivers can not do this kind of alignment, so there are
> > > cases where the received packets aren't offset by two bytes, leading
> > > to the IP header being aligned to an odd 16-bit word rather than an
> > > even 16-bit word (and thus 32-bit aligned.)
> > >
> > > Then you have the possibility of other headers between the ethernet
> > > and IP header - not only things like VLANs, but also possibly DSA
> > > headers (for switches) and how big those are.
> >
> > Those additional combinations can be supported by future test cases,
> > but the goal of this patch was simply to have basic testing for these
> > functions. The NET_IP_ALIGN offset is what the kernel defines to be
> > supported, so that is the test case I went for.
>
> I think you misunderstand. "NET_IP_ALIGN offset is what the kernel
> defines to be supported" is a gross misinterpretation. It is not
> "defined to be supported" at all. It is the _preferred_ alignment
> nothing more, nothing less.

What alignment can be relied on by a test case?

- Charlie

>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2024-02-26 22:36:34

by David Laight

[permalink] [raw]

Subject: RE: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

..
> I think you misunderstand. "NET_IP_ALIGN offset is what the kernel
> defines to be supported" is a gross misinterpretation. It is not
> "defined to be supported" at all. It is the _preferred_ alignment
> nothing more, nothing less.

I'm sure I've seen code that would realign IP headers to a 4 byte
boundary before processing them - but that might not have been in
Linux.

I'm also sure there are cpu which will fault double length misaligned
memory transfers - which might be used to marginally speed up code.
Assuming more than 4 byte alignment for the IP header is likely
'wishful thinking'.

There is plenty of ethernet hardware that can only write frames
to even boundaries and plenty of cpu that fault misaligned accesses.
There are even cases of both on the same silicon die.

You also pretty much never want a fault handler to fixup misaligned
ethernet frames (or really anything else for that matter).
It is always going to be better to check in the code itself.

x86 has just made people 'sloppy' :-)

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2024-02-26 23:17:18

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

On Mon, Feb 26, 2024 at 10:33:56PM +0000, David Laight wrote:
> ...
> > I think you misunderstand. "NET_IP_ALIGN offset is what the kernel
> > defines to be supported" is a gross misinterpretation. It is not
> > "defined to be supported" at all. It is the _preferred_ alignment
> > nothing more, nothing less.

This distinction is arbitrary in practice, but I am open to being proven
wrong if you have data to back up this statement. If the driver chooses
to not follow this, then the driver might not work. ARM defines the
NET_IP_ALIGN to be 2 to pad out the header to be on the supported
alignment. If the driver chooses to pad with one byte instead of 2
bytes, the driver may fail to work as the CPU may stall after the
misaligned access.

>
> I'm sure I've seen code that would realign IP headers to a 4 byte
> boundary before processing them - but that might not have been in
> Linux.
>
> I'm also sure there are cpu which will fault double length misaligned
> memory transfers - which might be used to marginally speed up code.
> Assuming more than 4 byte alignment for the IP header is likely
> 'wishful thinking'.
>
> There is plenty of ethernet hardware that can only write frames
> to even boundaries and plenty of cpu that fault misaligned accesses.
> There are even cases of both on the same silicon die.
>
> You also pretty much never want a fault handler to fixup misaligned
> ethernet frames (or really anything else for that matter).
> It is always going to be better to check in the code itself.
>
> x86 has just made people 'sloppy' :-)
>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>

If somebody has a solution they deem to be better, I am happy to change
this test case. Otherwise, I would appreciate a maintainer resolving
this discussion and apply this fix.

- Charlie

2024-02-26 23:49:49

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

On 2/26/24 15:17, Charlie Jenkins wrote:
> On Mon, Feb 26, 2024 at 10:33:56PM +0000, David Laight wrote:
>> ...
>>> I think you misunderstand. "NET_IP_ALIGN offset is what the kernel
>>> defines to be supported" is a gross misinterpretation. It is not
>>> "defined to be supported" at all. It is the _preferred_ alignment
>>> nothing more, nothing less.
>
> This distinction is arbitrary in practice, but I am open to being proven
> wrong if you have data to back up this statement. If the driver chooses
> to not follow this, then the driver might not work. ARM defines the
> NET_IP_ALIGN to be 2 to pad out the header to be on the supported
> alignment. If the driver chooses to pad with one byte instead of 2
> bytes, the driver may fail to work as the CPU may stall after the
> misaligned access.
>
>>
>> I'm sure I've seen code that would realign IP headers to a 4 byte
>> boundary before processing them - but that might not have been in
>> Linux.
>>
>> I'm also sure there are cpu which will fault double length misaligned
>> memory transfers - which might be used to marginally speed up code.
>> Assuming more than 4 byte alignment for the IP header is likely
>> 'wishful thinking'.
>>
>> There is plenty of ethernet hardware that can only write frames
>> to even boundaries and plenty of cpu that fault misaligned accesses.
>> There are even cases of both on the same silicon die.
>>
>> You also pretty much never want a fault handler to fixup misaligned
>> ethernet frames (or really anything else for that matter).
>> It is always going to be better to check in the code itself.
>>
>> x86 has just made people 'sloppy' :-)
>>
>> David
>>
>> -
>> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
>> Registration No: 1397386 (Wales)
>>
>
> If somebody has a solution they deem to be better, I am happy to change
> this test case. Otherwise, I would appreciate a maintainer resolving
> this discussion and apply this fix.
>
Agreed.

I do have a couple of patches which add explicit unaligned tests as well as
corner case tests (which are intended to trigger as many carry overflows
as possible). Once I get those working reliably, I'll be happy to submit
them as additional tests.

Thanks,
Guenter

2024-02-27 06:47:58

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

Le 27/02/2024 à 00:48, Guenter Roeck a écrit :
> On 2/26/24 15:17, Charlie Jenkins wrote:
>> On Mon, Feb 26, 2024 at 10:33:56PM +0000, David Laight wrote:
>>> ...
>>>> I think you misunderstand. "NET_IP_ALIGN offset is what the kernel
>>>> defines to be supported" is a gross misinterpretation. It is not
>>>> "defined to be supported" at all. It is the _preferred_ alignment
>>>> nothing more, nothing less.
>>
>> This distinction is arbitrary in practice, but I am open to being proven
>> wrong if you have data to back up this statement. If the driver chooses
>> to not follow this, then the driver might not work. ARM defines the
>> NET_IP_ALIGN to be 2 to pad out the header to be on the supported
>> alignment. If the driver chooses to pad with one byte instead of 2
>> bytes, the driver may fail to work as the CPU may stall after the
>> misaligned access.
>>
>>>
>>> I'm sure I've seen code that would realign IP headers to a 4 byte
>>> boundary before processing them - but that might not have been in
>>> Linux.
>>>
>>> I'm also sure there are cpu which will fault double length misaligned
>>> memory transfers - which might be used to marginally speed up code.
>>> Assuming more than 4 byte alignment for the IP header is likely
>>> 'wishful thinking'.
>>>
>>> There is plenty of ethernet hardware that can only write frames
>>> to even boundaries and plenty of cpu that fault misaligned accesses.
>>> There are even cases of both on the same silicon die.
>>>
>>> You also pretty much never want a fault handler to fixup misaligned
>>> ethernet frames (or really anything else for that matter).
>>> It is always going to be better to check in the code itself.
>>>
>>> x86 has just made people 'sloppy' :-)
>>>
>>> David
>>>
>>> -
>>> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
>>> MK1 1PT, UK
>>> Registration No: 1397386 (Wales)
>>>
>>
>> If somebody has a solution they deem to be better, I am happy to change
>> this test case. Otherwise, I would appreciate a maintainer resolving
>> this discussion and apply this fix.
>>
> Agreed.
>
> I do have a couple of patches which add explicit unaligned tests as well as
> corner case tests (which are intended to trigger as many carry overflows
> as possible). Once I get those working reliably, I'll be happy to submit
> them as additional tests.
>

The functions definitely have to work at least with and without VLAN,
which means the alignment cannot be greater than 4 bytes. That's also
the outcome of the discussion.

Therefore, we can easily fix the tests with for instance the following
changes. For the IPv6 test I switched proto and csum to keep csum
aligned. (In addition expected values need to be recalculated for the
IPv6 case).

diff --git a/lib/checksum_kunit.c b/lib/checksum_kunit.c
index bf70850035c7..26b0dbc5b8fd 100644
--- a/lib/checksum_kunit.c
+++ b/lib/checksum_kunit.c
@@ -581,7 +581,7 @@ static void test_ip_fast_csum(struct kunit *test)
u16 expected;

for (int len = IPv4_MIN_WORDS; len < IPv4_MAX_WORDS; len++) {
- for (int index = 0; index < NUM_IP_FAST_CSUM_TESTS; index++) {
+ for (int index = 0; index < NUM_IP_FAST_CSUM_TESTS; index += 4) {
csum_result = ip_fast_csum(random_buf + index, len);
expected =
expected_fast_csum[(len - IPv4_MIN_WORDS) *
@@ -603,12 +603,10 @@ static void test_csum_ipv6_magic(struct kunit *test)

const int daddr_offset = sizeof(struct in6_addr);
const int len_offset = sizeof(struct in6_addr) + sizeof(struct in6_addr);
- const int proto_offset = sizeof(struct in6_addr) + sizeof(struct
in6_addr) +
- sizeof(int);
- const int csum_offset = sizeof(struct in6_addr) + sizeof(struct
in6_addr) +
- sizeof(int) + sizeof(char);
+ const int csum_offset = len_offset + sizeof(int);
+ const int proto_offset = csum_offset + sizeof(int);

- for (int i = 0; i < NUM_IPv6_TESTS; i++) {
+ for (int i = 0; i < NUM_IPv6_TESTS; i += 4) {
saddr = (const struct in6_addr *)(random_buf + i);
daddr = (const struct in6_addr *)(random_buf + i +
daddr_offset);
---
We could do even better by taking into account
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS and do +1 when it is selected and
+4 when it is not selected.

Christophe

2024-02-27 10:29:29

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

On Tue, Feb 27, 2024 at 06:47:38AM +0000, Christophe Leroy wrote:
>
>
> Le 27/02/2024 ? 00:48, Guenter Roeck a ?crit?:
> > On 2/26/24 15:17, Charlie Jenkins wrote:
> >> On Mon, Feb 26, 2024 at 10:33:56PM +0000, David Laight wrote:
> >>> ...
> >>>> I think you misunderstand. "NET_IP_ALIGN offset is what the kernel
> >>>> defines to be supported" is a gross misinterpretation. It is not
> >>>> "defined to be supported" at all. It is the _preferred_ alignment
> >>>> nothing more, nothing less.
> >>
> >> This distinction is arbitrary in practice, but I am open to being proven
> >> wrong if you have data to back up this statement. If the driver chooses
> >> to not follow this, then the driver might not work. ARM defines the
> >> NET_IP_ALIGN to be 2 to pad out the header to be on the supported
> >> alignment. If the driver chooses to pad with one byte instead of 2
> >> bytes, the driver may fail to work as the CPU may stall after the
> >> misaligned access.
> >>
> >>>
> >>> I'm sure I've seen code that would realign IP headers to a 4 byte
> >>> boundary before processing them - but that might not have been in
> >>> Linux.
> >>>
> >>> I'm also sure there are cpu which will fault double length misaligned
> >>> memory transfers - which might be used to marginally speed up code.
> >>> Assuming more than 4 byte alignment for the IP header is likely
> >>> 'wishful thinking'.
> >>>
> >>> There is plenty of ethernet hardware that can only write frames
> >>> to even boundaries and plenty of cpu that fault misaligned accesses.
> >>> There are even cases of both on the same silicon die.
> >>>
> >>> You also pretty much never want a fault handler to fixup misaligned
> >>> ethernet frames (or really anything else for that matter).
> >>> It is always going to be better to check in the code itself.
> >>>
> >>> x86 has just made people 'sloppy' :-)
> >>>
> >>> ????David
> >>>
> >>> -
> >>> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
> >>> MK1 1PT, UK
> >>> Registration No: 1397386 (Wales)
> >>>
> >>
> >> If somebody has a solution they deem to be better, I am happy to change
> >> this test case. Otherwise, I would appreciate a maintainer resolving
> >> this discussion and apply this fix.
> >>
> > Agreed.
> >
> > I do have a couple of patches which add explicit unaligned tests as well as
> > corner case tests (which are intended to trigger as many carry overflows
> > as possible). Once I get those working reliably, I'll be happy to submit
> > them as additional tests.
> >
>
> The functions definitely have to work at least with and without VLAN,
> which means the alignment cannot be greater than 4 bytes. That's also
> the outcome of the discussion.

Thanks for completely ignoring what I've said. No. The alignment ends up
being commonly 2 bytes.

As I've said several times, network drivers do _not_ have to respect
NET_IP_ALIGN. There are 32-bit ARM drivers which have a DMA engine in
them which can only DMA to a 32-bit aligned address. This means that
the start of the ethernet header is placed at a 32-bit aligned address
making the IP header misaligned to 32-bit.

I don't see what is so difficult to understand about this... but it
seems that my comments on this are being ignored time and time again,
and I can only think that those who are ignoring my comments have
some alterior motive here.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2024-02-27 11:18:24

by Geert Uytterhoeven

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

Hi Charlie,

Thanks for your patch!

On Fri, Feb 23, 2024 at 11:12 PM Charlie Jenkins <[email protected]> wrote:
> The test cases for ip_fast_csum and csum_ipv6_magic were not properly
> aligning the IP header, which were causing failures on architectures
> that do not support misaligned accesses like some ARM platforms. To
> solve this, align the data along (14 + NET_IP_ALIGN) bytes which is the
> standard alignment of an IP header and must be supported by the
> architecture.
>
> Furthermore, all architectures except the m68k pad "struct
> csum_ipv6_magic_data" to 44 bits. To make compatible with the m68k,
> manually pad this structure to 44 bits.

s/bits/bytes/ everywhere

>
> Fixes: 6f4c45cbcb00 ("kunit: Add tests for csum_ipv6_magic and ip_fast_csum")
> Signed-off-by: Charlie Jenkins <[email protected]>
> Reviewed-by: Guenter Roeck <[email protected]>
> Acked-by: Palmer Dabbelt <[email protected]>
> ---
> The ip_fast_csum and csum_ipv6_magic tests did not work on all
> architectures due to differences in misaligned access support.
> Fix those issues by changing endianness of data and aligning the data.
>
> This patch relies upon a patch from Christophe:
>
> [PATCH net] kunit: Fix again checksum tests on big endian CPUs
>
> https://lore.kernel.org/lkml/73df3a9e95c2179119398ad1b4c84cdacbd8dfb6.1708684443.git.christophe.leroy@csgroup.eu/t/
> ---
> Changes in v10:
> - Christophe Leroy graciously decided to re-write my patch to fit his
> style so I have dropped my endianness+sparse changes and have based by
> alignment fixes on his patch. The link to his patch can be seen above.
> - I dropped Guenter's tested-by but kept his reviewed-by since only the base
> was changed.
> - Link to v9: https://lore.kernel.org/r/20240221-fix_sparse_errors_checksum_tests-v9-0-bff4d73ab9d1@rivosinc.com

> --- a/lib/checksum_kunit.c
> +++ b/lib/checksum_kunit.c

> @@ -595,28 +473,31 @@ static void test_ip_fast_csum(struct kunit *test)
> static void test_csum_ipv6_magic(struct kunit *test)
> {
> #if defined(CONFIG_NET)
> - const struct in6_addr *saddr;
> - const struct in6_addr *daddr;
> + struct csum_ipv6_magic_data {
> + const struct in6_addr saddr;
> + const struct in6_addr daddr;
> + __le32 len;
> + __wsum csum;
> + unsigned char proto;
> + unsigned char pad[3];
> + } *data;

If having a size of 44 bytes is critical, you really want to add a
BUILD_BUG_ON() check for that.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68korg

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2024-02-27 11:32:34

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

Le 27/02/2024 à 11:28, Russell King (Oracle) a écrit :
> On Tue, Feb 27, 2024 at 06:47:38AM +0000, Christophe Leroy wrote:
>>
>>
>> Le 27/02/2024 à 00:48, Guenter Roeck a écrit :
>>> On 2/26/24 15:17, Charlie Jenkins wrote:
>>>> On Mon, Feb 26, 2024 at 10:33:56PM +0000, David Laight wrote:
>>>>> ...
>>>>>> I think you misunderstand. "NET_IP_ALIGN offset is what the kernel
>>>>>> defines to be supported" is a gross misinterpretation. It is not
>>>>>> "defined to be supported" at all. It is the _preferred_ alignment
>>>>>> nothing more, nothing less.
>>>>
>>>> This distinction is arbitrary in practice, but I am open to being proven
>>>> wrong if you have data to back up this statement. If the driver chooses
>>>> to not follow this, then the driver might not work. ARM defines the
>>>> NET_IP_ALIGN to be 2 to pad out the header to be on the supported
>>>> alignment. If the driver chooses to pad with one byte instead of 2
>>>> bytes, the driver may fail to work as the CPU may stall after the
>>>> misaligned access.
>>>>
>>>>>
>>>>> I'm sure I've seen code that would realign IP headers to a 4 byte
>>>>> boundary before processing them - but that might not have been in
>>>>> Linux.
>>>>>
>>>>> I'm also sure there are cpu which will fault double length misaligned
>>>>> memory transfers - which might be used to marginally speed up code.
>>>>> Assuming more than 4 byte alignment for the IP header is likely
>>>>> 'wishful thinking'.
>>>>>
>>>>> There is plenty of ethernet hardware that can only write frames
>>>>> to even boundaries and plenty of cpu that fault misaligned accesses.
>>>>> There are even cases of both on the same silicon die.
>>>>>
>>>>> You also pretty much never want a fault handler to fixup misaligned
>>>>> ethernet frames (or really anything else for that matter).
>>>>> It is always going to be better to check in the code itself.
>>>>>
>>>>> x86 has just made people 'sloppy' :-)
>>>>>
>>>>> David
>>>>>
>>>>> -
>>>>> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
>>>>> MK1 1PT, UK
>>>>> Registration No: 1397386 (Wales)
>>>>>
>>>>
>>>> If somebody has a solution they deem to be better, I am happy to change
>>>> this test case. Otherwise, I would appreciate a maintainer resolving
>>>> this discussion and apply this fix.
>>>>
>>> Agreed.
>>>
>>> I do have a couple of patches which add explicit unaligned tests as well as
>>> corner case tests (which are intended to trigger as many carry overflows
>>> as possible). Once I get those working reliably, I'll be happy to submit
>>> them as additional tests.
>>>
>>
>> The functions definitely have to work at least with and without VLAN,
>> which means the alignment cannot be greater than 4 bytes. That's also
>> the outcome of the discussion.
>
> Thanks for completely ignoring what I've said. No. The alignment ends up
> being commonly 2 bytes.
>
> As I've said several times, network drivers do _not_ have to respect
> NET_IP_ALIGN. There are 32-bit ARM drivers which have a DMA engine in
> them which can only DMA to a 32-bit aligned address. This means that
> the start of the ethernet header is placed at a 32-bit aligned address
> making the IP header misaligned to 32-bit.
>
> I don't see what is so difficult to understand about this... but it
> seems that my comments on this are being ignored time and time again,
> and I can only think that those who are ignoring my comments have
> some alterior motive here.
>

I'm sorry for this misunderstanding. I'm not ignoring what you said at
all. I understood that ARM is able to handle unaligned accesses with
some exception handlers at worst case and that DMA constraints may lead
to the IP header beeing on a 2 bytes alignment only.

However I also understood from others that some architectures can't
handle such a 2 bytes only alignments.

It's been suggested during the discussion that alignment tests should be
added later in a follow-up patch. So for the time being I'm trying to
find a compromise and get the existing tests working on all platforms
but with a smaller alignment than the 16-bytes alignment brought by
Charlie's v10 patch. And a 4 bytes alignment seemed to me to be a good
compromise for this fix. The idea is also to make the fix as minimal as
possible, unlike Charlie's patch that is churning up the tests quite
heavily.

But maybe I misunderstood some of the discussion and indeed 2 bytes
alignment would work on all platforms and only an odd alignment is
problematic ?

2024-02-27 17:55:03

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

On Tue, Feb 27, 2024 at 11:32:19AM +0000, Christophe Leroy wrote:
>
>
> Le 27/02/2024 ? 11:28, Russell King (Oracle) a ?crit?:
> > On Tue, Feb 27, 2024 at 06:47:38AM +0000, Christophe Leroy wrote:
> >>
> >>
> >> Le 27/02/2024 ? 00:48, Guenter Roeck a ?crit?:
> >>> On 2/26/24 15:17, Charlie Jenkins wrote:
> >>>> On Mon, Feb 26, 2024 at 10:33:56PM +0000, David Laight wrote:
> >>>>> ...
> >>>>>> I think you misunderstand. "NET_IP_ALIGN offset is what the kernel
> >>>>>> defines to be supported" is a gross misinterpretation. It is not
> >>>>>> "defined to be supported" at all. It is the _preferred_ alignment
> >>>>>> nothing more, nothing less.
> >>>>
> >>>> This distinction is arbitrary in practice, but I am open to being proven
> >>>> wrong if you have data to back up this statement. If the driver chooses
> >>>> to not follow this, then the driver might not work. ARM defines the
> >>>> NET_IP_ALIGN to be 2 to pad out the header to be on the supported
> >>>> alignment. If the driver chooses to pad with one byte instead of 2
> >>>> bytes, the driver may fail to work as the CPU may stall after the
> >>>> misaligned access.
> >>>>
> >>>>>
> >>>>> I'm sure I've seen code that would realign IP headers to a 4 byte
> >>>>> boundary before processing them - but that might not have been in
> >>>>> Linux.
> >>>>>
> >>>>> I'm also sure there are cpu which will fault double length misaligned
> >>>>> memory transfers - which might be used to marginally speed up code.
> >>>>> Assuming more than 4 byte alignment for the IP header is likely
> >>>>> 'wishful thinking'.
> >>>>>
> >>>>> There is plenty of ethernet hardware that can only write frames
> >>>>> to even boundaries and plenty of cpu that fault misaligned accesses.
> >>>>> There are even cases of both on the same silicon die.
> >>>>>
> >>>>> You also pretty much never want a fault handler to fixup misaligned
> >>>>> ethernet frames (or really anything else for that matter).
> >>>>> It is always going to be better to check in the code itself.
> >>>>>
> >>>>> x86 has just made people 'sloppy' :-)
> >>>>>
> >>>>> ????David
> >>>>>
> >>>>> -
> >>>>> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
> >>>>> MK1 1PT, UK
> >>>>> Registration No: 1397386 (Wales)
> >>>>>
> >>>>
> >>>> If somebody has a solution they deem to be better, I am happy to change
> >>>> this test case. Otherwise, I would appreciate a maintainer resolving
> >>>> this discussion and apply this fix.
> >>>>
> >>> Agreed.
> >>>
> >>> I do have a couple of patches which add explicit unaligned tests as well as
> >>> corner case tests (which are intended to trigger as many carry overflows
> >>> as possible). Once I get those working reliably, I'll be happy to submit
> >>> them as additional tests.
> >>>
> >>
> >> The functions definitely have to work at least with and without VLAN,
> >> which means the alignment cannot be greater than 4 bytes. That's also
> >> the outcome of the discussion.
> >
> > Thanks for completely ignoring what I've said. No. The alignment ends up
> > being commonly 2 bytes.
> >
> > As I've said several times, network drivers do _not_ have to respect
> > NET_IP_ALIGN. There are 32-bit ARM drivers which have a DMA engine in
> > them which can only DMA to a 32-bit aligned address. This means that
> > the start of the ethernet header is placed at a 32-bit aligned address
> > making the IP header misaligned to 32-bit.
> >
> > I don't see what is so difficult to understand about this... but it
> > seems that my comments on this are being ignored time and time again,
> > and I can only think that those who are ignoring my comments have
> > some alterior motive here.
> >
>
> I'm sorry for this misunderstanding. I'm not ignoring what you said at
> all. I understood that ARM is able to handle unaligned accesses with
> some exception handlers at worst case and that DMA constraints may lead
> to the IP header beeing on a 2 bytes alignment only.
>
> However I also understood from others that some architectures can't
> handle such a 2 bytes only alignments.
>
> It's been suggested during the discussion that alignment tests should be
> added later in a follow-up patch. So for the time being I'm trying to
> find a compromise and get the existing tests working on all platforms
> but with a smaller alignment than the 16-bytes alignment brought by
> Charlie's v10 patch. And a 4 bytes alignment seemed to me to be a good
> compromise for this fix. The idea is also to make the fix as minimal as
> possible, unlike Charlie's patch that is churning up the tests quite
> heavily.

Do you have a list of platforms this is failing on? I haven't seen any
reports that haven't been fixed.

- Charlie

>
> But maybe I misunderstood some of the discussion and indeed 2 bytes
> alignment would work on all platforms and only an odd alignment is
> problematic ?

2024-02-27 17:56:54

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

On Tue, Feb 27, 2024 at 12:17:58PM +0100, Geert Uytterhoeven wrote:
> Hi Charlie,
>
> Thanks for your patch!
>
> On Fri, Feb 23, 2024 at 11:12 PM Charlie Jenkins <[email protected]> wrote:
> > The test cases for ip_fast_csum and csum_ipv6_magic were not properly
> > aligning the IP header, which were causing failures on architectures
> > that do not support misaligned accesses like some ARM platforms. To
> > solve this, align the data along (14 + NET_IP_ALIGN) bytes which is the
> > standard alignment of an IP header and must be supported by the
> > architecture.
> >
> > Furthermore, all architectures except the m68k pad "struct
> > csum_ipv6_magic_data" to 44 bits. To make compatible with the m68k,
> > manually pad this structure to 44 bits.
>
> s/bits/bytes/ everywhere

Whoops, thanks!

>
> >
> > Fixes: 6f4c45cbcb00 ("kunit: Add tests for csum_ipv6_magic and ip_fast_csum")
> > Signed-off-by: Charlie Jenkins <[email protected]>
> > Reviewed-by: Guenter Roeck <[email protected]>
> > Acked-by: Palmer Dabbelt <[email protected]>
> > ---
> > The ip_fast_csum and csum_ipv6_magic tests did not work on all
> > architectures due to differences in misaligned access support.
> > Fix those issues by changing endianness of data and aligning the data.
> >
> > This patch relies upon a patch from Christophe:
> >
> > [PATCH net] kunit: Fix again checksum tests on big endian CPUs
> >
> > https://lore.kernel.org/lkml/73df3a9e95c2179119398ad1b4c84cdacbd8dfb6.1708684443.git.christophe.leroy@csgroup.eu/t/
> > ---
> > Changes in v10:
> > - Christophe Leroy graciously decided to re-write my patch to fit his
> > style so I have dropped my endianness+sparse changes and have based by
> > alignment fixes on his patch. The link to his patch can be seen above.
> > - I dropped Guenter's tested-by but kept his reviewed-by since only the base
> > was changed.
> > - Link to v9: https://lore.kernel.org/r/20240221-fix_sparse_errors_checksum_tests-v9-0-bff4d73ab9d1@rivosinc.com
>
> > --- a/lib/checksum_kunit.c
> > +++ b/lib/checksum_kunit.c
>
> > @@ -595,28 +473,31 @@ static void test_ip_fast_csum(struct kunit *test)
> > static void test_csum_ipv6_magic(struct kunit *test)
> > {
> > #if defined(CONFIG_NET)
> > - const struct in6_addr *saddr;
> > - const struct in6_addr *daddr;
> > + struct csum_ipv6_magic_data {
> > + const struct in6_addr saddr;
> > + const struct in6_addr daddr;
> > + __le32 len;
> > + __wsum csum;
> > + unsigned char proto;
> > + unsigned char pad[3];
> > + } *data;
>
> If having a size of 44 bytes is critical, you really want to add a
> BUILD_BUG_ON() check for that.

Good idea, I will add that.

- Charlie

>
> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
> -- Linus Torvalds

2024-02-27 18:23:46

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

On Tue, Feb 27, 2024 at 06:11:24PM +0000, Christophe Leroy wrote:
>
>
> Le 27/02/2024 ? 18:54, Charlie Jenkins a ?crit?:
> > On Tue, Feb 27, 2024 at 11:32:19AM +0000, Christophe Leroy wrote:
> >>
> >>
> >> Le 27/02/2024 ? 11:28, Russell King (Oracle) a ?crit?:
> >>> On Tue, Feb 27, 2024 at 06:47:38AM +0000, Christophe Leroy wrote:
> >>>>
> >>>>
> >>>> Le 27/02/2024 ? 00:48, Guenter Roeck a ?crit?:
> >>>>> On 2/26/24 15:17, Charlie Jenkins wrote:
> >>>>>> On Mon, Feb 26, 2024 at 10:33:56PM +0000, David Laight wrote:
> >>>>>>> ...
> >>>>>>>> I think you misunderstand. "NET_IP_ALIGN offset is what the kernel
> >>>>>>>> defines to be supported" is a gross misinterpretation. It is not
> >>>>>>>> "defined to be supported" at all. It is the _preferred_ alignment
> >>>>>>>> nothing more, nothing less.
> >>>>>>
> >>>>>> This distinction is arbitrary in practice, but I am open to being proven
> >>>>>> wrong if you have data to back up this statement. If the driver chooses
> >>>>>> to not follow this, then the driver might not work. ARM defines the
> >>>>>> NET_IP_ALIGN to be 2 to pad out the header to be on the supported
> >>>>>> alignment. If the driver chooses to pad with one byte instead of 2
> >>>>>> bytes, the driver may fail to work as the CPU may stall after the
> >>>>>> misaligned access.
> >>>>>>
> >>>>>>>
> >>>>>>> I'm sure I've seen code that would realign IP headers to a 4 byte
> >>>>>>> boundary before processing them - but that might not have been in
> >>>>>>> Linux.
> >>>>>>>
> >>>>>>> I'm also sure there are cpu which will fault double length misaligned
> >>>>>>> memory transfers - which might be used to marginally speed up code.
> >>>>>>> Assuming more than 4 byte alignment for the IP header is likely
> >>>>>>> 'wishful thinking'.
> >>>>>>>
> >>>>>>> There is plenty of ethernet hardware that can only write frames
> >>>>>>> to even boundaries and plenty of cpu that fault misaligned accesses.
> >>>>>>> There are even cases of both on the same silicon die.
> >>>>>>>
> >>>>>>> You also pretty much never want a fault handler to fixup misaligned
> >>>>>>> ethernet frames (or really anything else for that matter).
> >>>>>>> It is always going to be better to check in the code itself.
> >>>>>>>
> >>>>>>> x86 has just made people 'sloppy' :-)
> >>>>>>>
> >>>>>>> ????David
> >>>>>>>
> >>>>>>> -
> >>>>>>> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
> >>>>>>> MK1 1PT, UK
> >>>>>>> Registration No: 1397386 (Wales)
> >>>>>>>
> >>>>>>
> >>>>>> If somebody has a solution they deem to be better, I am happy to change
> >>>>>> this test case. Otherwise, I would appreciate a maintainer resolving
> >>>>>> this discussion and apply this fix.
> >>>>>>
> >>>>> Agreed.
> >>>>>
> >>>>> I do have a couple of patches which add explicit unaligned tests as well as
> >>>>> corner case tests (which are intended to trigger as many carry overflows
> >>>>> as possible). Once I get those working reliably, I'll be happy to submit
> >>>>> them as additional tests.
> >>>>>
> >>>>
> >>>> The functions definitely have to work at least with and without VLAN,
> >>>> which means the alignment cannot be greater than 4 bytes. That's also
> >>>> the outcome of the discussion.
> >>>
> >>> Thanks for completely ignoring what I've said. No. The alignment ends up
> >>> being commonly 2 bytes.
> >>>
> >>> As I've said several times, network drivers do _not_ have to respect
> >>> NET_IP_ALIGN. There are 32-bit ARM drivers which have a DMA engine in
> >>> them which can only DMA to a 32-bit aligned address. This means that
> >>> the start of the ethernet header is placed at a 32-bit aligned address
> >>> making the IP header misaligned to 32-bit.
> >>>
> >>> I don't see what is so difficult to understand about this... but it
> >>> seems that my comments on this are being ignored time and time again,
> >>> and I can only think that those who are ignoring my comments have
> >>> some alterior motive here.
> >>>
> >>
> >> I'm sorry for this misunderstanding. I'm not ignoring what you said at
> >> all. I understood that ARM is able to handle unaligned accesses with
> >> some exception handlers at worst case and that DMA constraints may lead
> >> to the IP header beeing on a 2 bytes alignment only.
> >>
> >> However I also understood from others that some architectures can't
> >> handle such a 2 bytes only alignments.
> >>
> >> It's been suggested during the discussion that alignment tests should be
> >> added later in a follow-up patch. So for the time being I'm trying to
> >> find a compromise and get the existing tests working on all platforms
> >> but with a smaller alignment than the 16-bytes alignment brought by
> >> Charlie's v10 patch. And a 4 bytes alignment seemed to me to be a good
> >> compromise for this fix. The idea is also to make the fix as minimal as
> >> possible, unlike Charlie's patch that is churning up the tests quite
> >> heavily.
> >
> > Do you have a list of platforms this is failing on? I haven't seen any
> > reports that haven't been fixed.
>
> I don't have such a list, but I guess you do ? If all platforms have
> already been fixed, why are you sending this patch at all ?

This patch is what is doing the "fixing". Over the course of 10 versions
I have "fixed" the test cases to work on platforms that have various
alignment and endianness constraints. The endianness changes were picked
off of these patches and spun out into a different patch by you.

I originally introduced these two new test cases since I wrote the riscv
checksum function implementations and these tests were helpful for me
and I figured they may be helpful for somebody else too.

- Charlie

>
> Christophe

2024-02-27 18:37:02

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

Le 27/02/2024 à 19:21, Charlie Jenkins a écrit :
> On Tue, Feb 27, 2024 at 06:11:24PM +0000, Christophe Leroy wrote:
>>
>>
>> Le 27/02/2024 à 18:54, Charlie Jenkins a écrit :
>>> On Tue, Feb 27, 2024 at 11:32:19AM +0000, Christophe Leroy wrote:
>>>>
>>>>
>>>> Le 27/02/2024 à 11:28, Russell King (Oracle) a écrit :
>>>>> On Tue, Feb 27, 2024 at 06:47:38AM +0000, Christophe Leroy wrote:
>>>>>>
>>>>>>
>>>>>> Le 27/02/2024 à 00:48, Guenter Roeck a écrit :
>>>>>>> On 2/26/24 15:17, Charlie Jenkins wrote:
>>>>>>>> On Mon, Feb 26, 2024 at 10:33:56PM +0000, David Laight wrote:
>>>>>>>>> ...
>>>>>>>>>> I think you misunderstand. "NET_IP_ALIGN offset is what the kernel
>>>>>>>>>> defines to be supported" is a gross misinterpretation. It is not
>>>>>>>>>> "defined to be supported" at all. It is the _preferred_ alignment
>>>>>>>>>> nothing more, nothing less.
>>>>>>>>
>>>>>>>> This distinction is arbitrary in practice, but I am open to being proven
>>>>>>>> wrong if you have data to back up this statement. If the driver chooses
>>>>>>>> to not follow this, then the driver might not work. ARM defines the
>>>>>>>> NET_IP_ALIGN to be 2 to pad out the header to be on the supported
>>>>>>>> alignment. If the driver chooses to pad with one byte instead of 2
>>>>>>>> bytes, the driver may fail to work as the CPU may stall after the
>>>>>>>> misaligned access.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I'm sure I've seen code that would realign IP headers to a 4 byte
>>>>>>>>> boundary before processing them - but that might not have been in
>>>>>>>>> Linux.
>>>>>>>>>
>>>>>>>>> I'm also sure there are cpu which will fault double length misaligned
>>>>>>>>> memory transfers - which might be used to marginally speed up code.
>>>>>>>>> Assuming more than 4 byte alignment for the IP header is likely
>>>>>>>>> 'wishful thinking'.
>>>>>>>>>
>>>>>>>>> There is plenty of ethernet hardware that can only write frames
>>>>>>>>> to even boundaries and plenty of cpu that fault misaligned accesses.
>>>>>>>>> There are even cases of both on the same silicon die.
>>>>>>>>>
>>>>>>>>> You also pretty much never want a fault handler to fixup misaligned
>>>>>>>>> ethernet frames (or really anything else for that matter).
>>>>>>>>> It is always going to be better to check in the code itself.
>>>>>>>>>
>>>>>>>>> x86 has just made people 'sloppy' :-)
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>> -
>>>>>>>>> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
>>>>>>>>> MK1 1PT, UK
>>>>>>>>> Registration No: 1397386 (Wales)
>>>>>>>>>
>>>>>>>>
>>>>>>>> If somebody has a solution they deem to be better, I am happy to change
>>>>>>>> this test case. Otherwise, I would appreciate a maintainer resolving
>>>>>>>> this discussion and apply this fix.
>>>>>>>>
>>>>>>> Agreed.
>>>>>>>
>>>>>>> I do have a couple of patches which add explicit unaligned tests as well as
>>>>>>> corner case tests (which are intended to trigger as many carry overflows
>>>>>>> as possible). Once I get those working reliably, I'll be happy to submit
>>>>>>> them as additional tests.
>>>>>>>
>>>>>>
>>>>>> The functions definitely have to work at least with and without VLAN,
>>>>>> which means the alignment cannot be greater than 4 bytes. That's also
>>>>>> the outcome of the discussion.
>>>>>
>>>>> Thanks for completely ignoring what I've said. No. The alignment ends up
>>>>> being commonly 2 bytes.
>>>>>
>>>>> As I've said several times, network drivers do _not_ have to respect
>>>>> NET_IP_ALIGN. There are 32-bit ARM drivers which have a DMA engine in
>>>>> them which can only DMA to a 32-bit aligned address. This means that
>>>>> the start of the ethernet header is placed at a 32-bit aligned address
>>>>> making the IP header misaligned to 32-bit.
>>>>>
>>>>> I don't see what is so difficult to understand about this... but it
>>>>> seems that my comments on this are being ignored time and time again,
>>>>> and I can only think that those who are ignoring my comments have
>>>>> some alterior motive here.
>>>>>
>>>>
>>>> I'm sorry for this misunderstanding. I'm not ignoring what you said at
>>>> all. I understood that ARM is able to handle unaligned accesses with
>>>> some exception handlers at worst case and that DMA constraints may lead
>>>> to the IP header beeing on a 2 bytes alignment only.
>>>>
>>>> However I also understood from others that some architectures can't
>>>> handle such a 2 bytes only alignments.
>>>>
>>>> It's been suggested during the discussion that alignment tests should be
>>>> added later in a follow-up patch. So for the time being I'm trying to
>>>> find a compromise and get the existing tests working on all platforms
>>>> but with a smaller alignment than the 16-bytes alignment brought by
>>>> Charlie's v10 patch. And a 4 bytes alignment seemed to me to be a good
>>>> compromise for this fix. The idea is also to make the fix as minimal as
>>>> possible, unlike Charlie's patch that is churning up the tests quite
>>>> heavily.
>>>
>>> Do you have a list of platforms this is failing on? I haven't seen any
>>> reports that haven't been fixed.
>>
>> I don't have such a list, but I guess you do ? If all platforms have
>> already been fixed, why are you sending this patch at all ?
>
> This patch is what is doing the "fixing". Over the course of 10 versions
> I have "fixed" the test cases to work on platforms that have various
> alignment and endianness constraints. The endianness changes were picked
> off of these patches and spun out into a different patch by you.
>
> I originally introduced these two new test cases since I wrote the riscv
> checksum function implementations and these tests were helpful for me
> and I figured they may be helpful for somebody else too.

I see.

Then you mis-understood. I don't say your patch leave any platform
unfixed. I say that your patch seems bigger than required, it is a
churn. In addition your patch assumes an alignment of 16-bytes which, as
explained by Russell, it just wrong. At least an alignment of 4 bytes
must work on any platforms because of VLANs.

Christophe

2024-02-27 18:41:25

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

Le 27/02/2024 à 18:54, Charlie Jenkins a écrit :
> On Tue, Feb 27, 2024 at 11:32:19AM +0000, Christophe Leroy wrote:
>>
>>
>> Le 27/02/2024 à 11:28, Russell King (Oracle) a écrit :
>>> On Tue, Feb 27, 2024 at 06:47:38AM +0000, Christophe Leroy wrote:
>>>>
>>>>
>>>> Le 27/02/2024 à 00:48, Guenter Roeck a écrit :
>>>>> On 2/26/24 15:17, Charlie Jenkins wrote:
>>>>>> On Mon, Feb 26, 2024 at 10:33:56PM +0000, David Laight wrote:
>>>>>>> ...
>>>>>>>> I think you misunderstand. "NET_IP_ALIGN offset is what the kernel
>>>>>>>> defines to be supported" is a gross misinterpretation. It is not
>>>>>>>> "defined to be supported" at all. It is the _preferred_ alignment
>>>>>>>> nothing more, nothing less.
>>>>>>
>>>>>> This distinction is arbitrary in practice, but I am open to being proven
>>>>>> wrong if you have data to back up this statement. If the driver chooses
>>>>>> to not follow this, then the driver might not work. ARM defines the
>>>>>> NET_IP_ALIGN to be 2 to pad out the header to be on the supported
>>>>>> alignment. If the driver chooses to pad with one byte instead of 2
>>>>>> bytes, the driver may fail to work as the CPU may stall after the
>>>>>> misaligned access.
>>>>>>
>>>>>>>
>>>>>>> I'm sure I've seen code that would realign IP headers to a 4 byte
>>>>>>> boundary before processing them - but that might not have been in
>>>>>>> Linux.
>>>>>>>
>>>>>>> I'm also sure there are cpu which will fault double length misaligned
>>>>>>> memory transfers - which might be used to marginally speed up code.
>>>>>>> Assuming more than 4 byte alignment for the IP header is likely
>>>>>>> 'wishful thinking'.
>>>>>>>
>>>>>>> There is plenty of ethernet hardware that can only write frames
>>>>>>> to even boundaries and plenty of cpu that fault misaligned accesses.
>>>>>>> There are even cases of both on the same silicon die.
>>>>>>>
>>>>>>> You also pretty much never want a fault handler to fixup misaligned
>>>>>>> ethernet frames (or really anything else for that matter).
>>>>>>> It is always going to be better to check in the code itself.
>>>>>>>
>>>>>>> x86 has just made people 'sloppy' :-)
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>> -
>>>>>>> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
>>>>>>> MK1 1PT, UK
>>>>>>> Registration No: 1397386 (Wales)
>>>>>>>
>>>>>>
>>>>>> If somebody has a solution they deem to be better, I am happy to change
>>>>>> this test case. Otherwise, I would appreciate a maintainer resolving
>>>>>> this discussion and apply this fix.
>>>>>>
>>>>> Agreed.
>>>>>
>>>>> I do have a couple of patches which add explicit unaligned tests as well as
>>>>> corner case tests (which are intended to trigger as many carry overflows
>>>>> as possible). Once I get those working reliably, I'll be happy to submit
>>>>> them as additional tests.
>>>>>
>>>>
>>>> The functions definitely have to work at least with and without VLAN,
>>>> which means the alignment cannot be greater than 4 bytes. That's also
>>>> the outcome of the discussion.
>>>
>>> Thanks for completely ignoring what I've said. No. The alignment ends up
>>> being commonly 2 bytes.
>>>
>>> As I've said several times, network drivers do _not_ have to respect
>>> NET_IP_ALIGN. There are 32-bit ARM drivers which have a DMA engine in
>>> them which can only DMA to a 32-bit aligned address. This means that
>>> the start of the ethernet header is placed at a 32-bit aligned address
>>> making the IP header misaligned to 32-bit.
>>>
>>> I don't see what is so difficult to understand about this... but it
>>> seems that my comments on this are being ignored time and time again,
>>> and I can only think that those who are ignoring my comments have
>>> some alterior motive here.
>>>
>>
>> I'm sorry for this misunderstanding. I'm not ignoring what you said at
>> all. I understood that ARM is able to handle unaligned accesses with
>> some exception handlers at worst case and that DMA constraints may lead
>> to the IP header beeing on a 2 bytes alignment only.
>>
>> However I also understood from others that some architectures can't
>> handle such a 2 bytes only alignments.
>>
>> It's been suggested during the discussion that alignment tests should be
>> added later in a follow-up patch. So for the time being I'm trying to
>> find a compromise and get the existing tests working on all platforms
>> but with a smaller alignment than the 16-bytes alignment brought by
>> Charlie's v10 patch. And a 4 bytes alignment seemed to me to be a good
>> compromise for this fix. The idea is also to make the fix as minimal as
>> possible, unlike Charlie's patch that is churning up the tests quite
>> heavily.
>
> Do you have a list of platforms this is failing on? I haven't seen any
> reports that haven't been fixed.

I don't have such a list, but I guess you do ? If all platforms have
already been fixed, why are you sending this patch at all ?

Christophe

2024-02-27 19:07:17

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

On Tue, Feb 27, 2024 at 06:35:04PM +0000, Christophe Leroy wrote:
>
>
> Le 27/02/2024 ? 19:21, Charlie Jenkins a ?crit?:
> > On Tue, Feb 27, 2024 at 06:11:24PM +0000, Christophe Leroy wrote:
> >>
> >>
> >> Le 27/02/2024 ? 18:54, Charlie Jenkins a ?crit?:
> >>> On Tue, Feb 27, 2024 at 11:32:19AM +0000, Christophe Leroy wrote:
> >>>>
> >>>>
> >>>> Le 27/02/2024 ? 11:28, Russell King (Oracle) a ?crit?:
> >>>>> On Tue, Feb 27, 2024 at 06:47:38AM +0000, Christophe Leroy wrote:
> >>>>>>
> >>>>>>
> >>>>>> Le 27/02/2024 ? 00:48, Guenter Roeck a ?crit?:
> >>>>>>> On 2/26/24 15:17, Charlie Jenkins wrote:
> >>>>>>>> On Mon, Feb 26, 2024 at 10:33:56PM +0000, David Laight wrote:
> >>>>>>>>> ...
> >>>>>>>>>> I think you misunderstand. "NET_IP_ALIGN offset is what the kernel
> >>>>>>>>>> defines to be supported" is a gross misinterpretation. It is not
> >>>>>>>>>> "defined to be supported" at all. It is the _preferred_ alignment
> >>>>>>>>>> nothing more, nothing less.
> >>>>>>>>
> >>>>>>>> This distinction is arbitrary in practice, but I am open to being proven
> >>>>>>>> wrong if you have data to back up this statement. If the driver chooses
> >>>>>>>> to not follow this, then the driver might not work. ARM defines the
> >>>>>>>> NET_IP_ALIGN to be 2 to pad out the header to be on the supported
> >>>>>>>> alignment. If the driver chooses to pad with one byte instead of 2
> >>>>>>>> bytes, the driver may fail to work as the CPU may stall after the
> >>>>>>>> misaligned access.
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I'm sure I've seen code that would realign IP headers to a 4 byte
> >>>>>>>>> boundary before processing them - but that might not have been in
> >>>>>>>>> Linux.
> >>>>>>>>>
> >>>>>>>>> I'm also sure there are cpu which will fault double length misaligned
> >>>>>>>>> memory transfers - which might be used to marginally speed up code.
> >>>>>>>>> Assuming more than 4 byte alignment for the IP header is likely
> >>>>>>>>> 'wishful thinking'.
> >>>>>>>>>
> >>>>>>>>> There is plenty of ethernet hardware that can only write frames
> >>>>>>>>> to even boundaries and plenty of cpu that fault misaligned accesses.
> >>>>>>>>> There are even cases of both on the same silicon die.
> >>>>>>>>>
> >>>>>>>>> You also pretty much never want a fault handler to fixup misaligned
> >>>>>>>>> ethernet frames (or really anything else for that matter).
> >>>>>>>>> It is always going to be better to check in the code itself.
> >>>>>>>>>
> >>>>>>>>> x86 has just made people 'sloppy' :-)
> >>>>>>>>>
> >>>>>>>>> ????David
> >>>>>>>>>
> >>>>>>>>> -
> >>>>>>>>> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
> >>>>>>>>> MK1 1PT, UK
> >>>>>>>>> Registration No: 1397386 (Wales)
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> If somebody has a solution they deem to be better, I am happy to change
> >>>>>>>> this test case. Otherwise, I would appreciate a maintainer resolving
> >>>>>>>> this discussion and apply this fix.
> >>>>>>>>
> >>>>>>> Agreed.
> >>>>>>>
> >>>>>>> I do have a couple of patches which add explicit unaligned tests as well as
> >>>>>>> corner case tests (which are intended to trigger as many carry overflows
> >>>>>>> as possible). Once I get those working reliably, I'll be happy to submit
> >>>>>>> them as additional tests.
> >>>>>>>
> >>>>>>
> >>>>>> The functions definitely have to work at least with and without VLAN,
> >>>>>> which means the alignment cannot be greater than 4 bytes. That's also
> >>>>>> the outcome of the discussion.
> >>>>>
> >>>>> Thanks for completely ignoring what I've said. No. The alignment ends up
> >>>>> being commonly 2 bytes.
> >>>>>
> >>>>> As I've said several times, network drivers do _not_ have to respect
> >>>>> NET_IP_ALIGN. There are 32-bit ARM drivers which have a DMA engine in
> >>>>> them which can only DMA to a 32-bit aligned address. This means that
> >>>>> the start of the ethernet header is placed at a 32-bit aligned address
> >>>>> making the IP header misaligned to 32-bit.
> >>>>>
> >>>>> I don't see what is so difficult to understand about this... but it
> >>>>> seems that my comments on this are being ignored time and time again,
> >>>>> and I can only think that those who are ignoring my comments have
> >>>>> some alterior motive here.
> >>>>>
> >>>>
> >>>> I'm sorry for this misunderstanding. I'm not ignoring what you said at
> >>>> all. I understood that ARM is able to handle unaligned accesses with
> >>>> some exception handlers at worst case and that DMA constraints may lead
> >>>> to the IP header beeing on a 2 bytes alignment only.
> >>>>
> >>>> However I also understood from others that some architectures can't
> >>>> handle such a 2 bytes only alignments.
> >>>>
> >>>> It's been suggested during the discussion that alignment tests should be
> >>>> added later in a follow-up patch. So for the time being I'm trying to
> >>>> find a compromise and get the existing tests working on all platforms
> >>>> but with a smaller alignment than the 16-bytes alignment brought by
> >>>> Charlie's v10 patch. And a 4 bytes alignment seemed to me to be a good
> >>>> compromise for this fix. The idea is also to make the fix as minimal as
> >>>> possible, unlike Charlie's patch that is churning up the tests quite
> >>>> heavily.
> >>>
> >>> Do you have a list of platforms this is failing on? I haven't seen any
> >>> reports that haven't been fixed.
> >>
> >> I don't have such a list, but I guess you do ? If all platforms have
> >> already been fixed, why are you sending this patch at all ?
> >
> > This patch is what is doing the "fixing". Over the course of 10 versions
> > I have "fixed" the test cases to work on platforms that have various
> > alignment and endianness constraints. The endianness changes were picked
> > off of these patches and spun out into a different patch by you.
> >
> > I originally introduced these two new test cases since I wrote the riscv
> > checksum function implementations and these tests were helpful for me
> > and I figured they may be helpful for somebody else too.
>
> I see.
>
> Then you mis-understood. I don't say your patch leave any platform
> unfixed. I say that your patch seems bigger than required, it is a
> churn. In addition your patch assumes an alignment of 16-bytes which, as
> explained by Russell, it just wrong. At least an alignment of 4 bytes
> must work on any platforms because of VLANs.

Pardon my ignorance but I do not understand why VLANs cause this test
case to be incorrect/introduce churn. The VLAN tag is a 4-byte field
that is optionally included in an ethernet header. This causes the
header to change from 14 bytes to 18 bytes. If the architecture defines
NET_IP_ALIGN to 2, this pads the ethernet header by 2 bytes, causing the
payload to be aligned along 16 bytes without VLAN and 20 bytes with
VLAN. Another test case can be added that aligns along 18 + NET_IP_ALIGN
but that does not achieve the goal of reducing churn and I would not
expect those additionally 4 bytes to highlight bugs in any
implementation.

- Charlie

>
> Christophe

2024-02-27 19:31:33

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

On 2/27/24 09:54, Charlie Jenkins wrote:

>> It's been suggested during the discussion that alignment tests should be
>> added later in a follow-up patch. So for the time being I'm trying to
>> find a compromise and get the existing tests working on all platforms
>> but with a smaller alignment than the 16-bytes alignment brought by
>> Charlie's v10 patch. And a 4 bytes alignment seemed to me to be a good
>> compromise for this fix. The idea is also to make the fix as minimal as
>> possible, unlike Charlie's patch that is churning up the tests quite
>> heavily.
>
> Do you have a list of platforms this is failing on? I haven't seen any
> reports that haven't been fixed.
>

This is what I carry locally on top of v6.8-rc6:

097b149e4acb parisc: More csum_ipv6_magic fixes
15bf67a115eb kunit: Fix again checksum tests on big endian CPUs
bebe776d36ea parisc: Fix csum_ipv6_magic on 64-bit systems
523208f03063 parisc: Fix csum_ipv6_magic on 32-bit systems
a9dda1971c72 parisc: Fix ip_fast_csum
2ad0a6850b64 Revert "sh: Handle calling csum_partial with misaligned data"
7113cc414860 lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

I also have
0dd01a364cb7 lib: checksum: Add some corner cases to IPv6 checksum tests
e767cce6598b lib: checksum: Add tests for unaligned IPv6 addresses

which I may submit or not depending on the outcome of this discussion.

In other words, parisc and sh4 are currently known to be broken in the
upstream kernel, with fixes pending. On top of that, arm:mps2-an385
(probably all arm:nommu systems) crashes hard if csum_ipv6_magic()
is called with an unaligned address.

This is the "known" list of failures. I don't currently run kunit tests
on nios2 or riscv32, for example, nor on any architectures with no qemu
support.

On a side note, most architectures don't handle "len + proto" overflows.
While 'len' is a 32-bit parameter, IPv6 only allows for a 16-bit length
field. Many implementations of csum_ipv6_magic() specifically do
not handle such overflows because that would be pointless and require
extra code for no good reason. The current test code doesn't generate
such overflows, but its 'len' parameter is almost always larger than
16 bit and thus not realistic. Maybe it would make sense to limit
the range of 'len' to 16 bit when calling csum_ipv6_magic().

Thanks,
Guenter

2024-02-27 22:44:29

by David Laight

[permalink] [raw]

Subject: RE: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

..
> This is the "known" list of failures. I don't currently run kunit tests
> on nios2 or riscv32, for example, nor on any architectures with no qemu
> support.

nios2 is definitely going to 'crash and burn' if you do a misaligned access.

Although Intel (aka the Altera bit) are claiming current version
of their Quartus fpga build software is the last one the will
support the nios2.
They are expecting everyone to move to a risc-v soft cpu instead.
We aren't happy about that, I doubt some of the big telco's are
either - I believe some mobile base stations have fpga with a
lot of nios2 in them - almost certainly running with a few kB
of code and data memory and running small control tasks.
If you want to run Linux, find an fpga with an ARM core.

There are some solutions - like writing a compatible soft cpu.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2024-02-28 00:24:17

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

On Tue, Feb 27, 2024 at 11:31:01AM -0800, Guenter Roeck wrote:
> On 2/27/24 09:54, Charlie Jenkins wrote:
>
> > > It's been suggested during the discussion that alignment tests should be
> > > added later in a follow-up patch. So for the time being I'm trying to
> > > find a compromise and get the existing tests working on all platforms
> > > but with a smaller alignment than the 16-bytes alignment brought by
> > > Charlie's v10 patch. And a 4 bytes alignment seemed to me to be a good
> > > compromise for this fix. The idea is also to make the fix as minimal as
> > > possible, unlike Charlie's patch that is churning up the tests quite
> > > heavily.
> >
> > Do you have a list of platforms this is failing on? I haven't seen any
> > reports that haven't been fixed.
> >
>
> This is what I carry locally on top of v6.8-rc6:
>
> 097b149e4acb parisc: More csum_ipv6_magic fixes
> 15bf67a115eb kunit: Fix again checksum tests on big endian CPUs
> bebe776d36ea parisc: Fix csum_ipv6_magic on 64-bit systems
> 523208f03063 parisc: Fix csum_ipv6_magic on 32-bit systems
> a9dda1971c72 parisc: Fix ip_fast_csum
> 2ad0a6850b64 Revert "sh: Handle calling csum_partial with misaligned data"
> 7113cc414860 lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests
>
> I also have
> 0dd01a364cb7 lib: checksum: Add some corner cases to IPv6 checksum tests
> e767cce6598b lib: checksum: Add tests for unaligned IPv6 addresses
>
> which I may submit or not depending on the outcome of this discussion.
>
> In other words, parisc and sh4 are currently known to be broken in the
> upstream kernel, with fixes pending. On top of that, arm:mps2-an385
> (probably all arm:nommu systems) crashes hard if csum_ipv6_magic()
> is called with an unaligned address.
>
> This is the "known" list of failures. I don't currently run kunit tests
> on nios2 or riscv32, for example, nor on any architectures with no qemu
> support.
>
> On a side note, most architectures don't handle "len + proto" overflows.
> While 'len' is a 32-bit parameter, IPv6 only allows for a 16-bit length
> field. Many implementations of csum_ipv6_magic() specifically do
> not handle such overflows because that would be pointless and require
> extra code for no good reason. The current test code doesn't generate
> such overflows, but its 'len' parameter is almost always larger than
> 16 bit and thus not realistic. Maybe it would make sense to limit
> the range of 'len' to 16 bit when calling csum_ipv6_magic().

Thank you for the suggestion, I can limit len to 16-bit.

- Charlie

>
> Thanks,
> Guenter
>

2024-02-28 00:32:47

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

On Tue, Feb 27, 2024 at 10:28:45AM +0000, Russell King (Oracle) wrote:
> On Tue, Feb 27, 2024 at 06:47:38AM +0000, Christophe Leroy wrote:
> >
> >
> > Le 27/02/2024 ? 00:48, Guenter Roeck a ?crit?:
> > > On 2/26/24 15:17, Charlie Jenkins wrote:
> > >> On Mon, Feb 26, 2024 at 10:33:56PM +0000, David Laight wrote:
> > >>> ...
> > >>>> I think you misunderstand. "NET_IP_ALIGN offset is what the kernel
> > >>>> defines to be supported" is a gross misinterpretation. It is not
> > >>>> "defined to be supported" at all. It is the _preferred_ alignment
> > >>>> nothing more, nothing less.
> > >>
> > >> This distinction is arbitrary in practice, but I am open to being proven
> > >> wrong if you have data to back up this statement. If the driver chooses
> > >> to not follow this, then the driver might not work. ARM defines the
> > >> NET_IP_ALIGN to be 2 to pad out the header to be on the supported
> > >> alignment. If the driver chooses to pad with one byte instead of 2
> > >> bytes, the driver may fail to work as the CPU may stall after the
> > >> misaligned access.
> > >>
> > >>>
> > >>> I'm sure I've seen code that would realign IP headers to a 4 byte
> > >>> boundary before processing them - but that might not have been in
> > >>> Linux.
> > >>>
> > >>> I'm also sure there are cpu which will fault double length misaligned
> > >>> memory transfers - which might be used to marginally speed up code.
> > >>> Assuming more than 4 byte alignment for the IP header is likely
> > >>> 'wishful thinking'.
> > >>>
> > >>> There is plenty of ethernet hardware that can only write frames
> > >>> to even boundaries and plenty of cpu that fault misaligned accesses.
> > >>> There are even cases of both on the same silicon die.
> > >>>
> > >>> You also pretty much never want a fault handler to fixup misaligned
> > >>> ethernet frames (or really anything else for that matter).
> > >>> It is always going to be better to check in the code itself.
> > >>>
> > >>> x86 has just made people 'sloppy' :-)
> > >>>
> > >>> ????David
> > >>>
> > >>> -
> > >>> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
> > >>> MK1 1PT, UK
> > >>> Registration No: 1397386 (Wales)
> > >>>
> > >>
> > >> If somebody has a solution they deem to be better, I am happy to change
> > >> this test case. Otherwise, I would appreciate a maintainer resolving
> > >> this discussion and apply this fix.
> > >>
> > > Agreed.
> > >
> > > I do have a couple of patches which add explicit unaligned tests as well as
> > > corner case tests (which are intended to trigger as many carry overflows
> > > as possible). Once I get those working reliably, I'll be happy to submit
> > > them as additional tests.
> > >
> >
> > The functions definitely have to work at least with and without VLAN,
> > which means the alignment cannot be greater than 4 bytes. That's also
> > the outcome of the discussion.
>
> Thanks for completely ignoring what I've said. No. The alignment ends up
> being commonly 2 bytes.
>
> As I've said several times, network drivers do _not_ have to respect
> NET_IP_ALIGN. There are 32-bit ARM drivers which have a DMA engine in
> them which can only DMA to a 32-bit aligned address. This means that
> the start of the ethernet header is placed at a 32-bit aligned address
> making the IP header misaligned to 32-bit.
>
> I don't see what is so difficult to understand about this... but it
> seems that my comments on this are being ignored time and time again,
> and I can only think that those who are ignoring my comments have
> some alterior motive here.
>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

I don't understand how the capabilities of some ARM drivers factor in
here. It appears that a common case for calling this function is to pass
in an IP header that is aligned along an ethernet header + NET_IP_ALIGN.
It is perfectly acceptable that some drivers don't align along
NET_IP_ALIGN, but that does not seem relevant here.

This test case is supposed to be as true to the "general case" as
possible, so I have aligned the data along 14 + NET_IP_ALIGN. On ARM
this will be a 16-byte boundary since NET_IP_ALIGN is 2. A driver that
does not follow this may not be appropriately tested by this test case,
but anyone is welcome to submit additional test cases that address this
additional alignment concern.

- Charlie

2024-02-28 05:19:19

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

On 2/27/24 14:44, David Laight wrote:
> ..
>> This is the "known" list of failures. I don't currently run kunit tests
>> on nios2 or riscv32, for example, nor on any architectures with no qemu
>> support.
>
> nios2 is definitely going to 'crash and burn' if you do a misaligned access.
>

Curiously enough, it doesn't. I get lots of

kernel unaligned access @ 0xc848eb78; BADADDR 0xc86f1d01; cause=6, isn=0x20800017

but a checksum test with unaligned data does pass, so the kernel
somehow handles it. It does crash, later, though, if CONFIG_NET_TEST
is enabled. Apparently the gso tests trigger lots of unaligned
accesses, and those are just too much for the kernel to handle.

Guenter

> Although Intel (aka the Altera bit) are claiming current version
> of their Quartus fpga build software is the last one the will
> support the nios2.
> They are expecting everyone to move to a risc-v soft cpu instead.
> We aren't happy about that, I doubt some of the big telco's are
> either - I believe some mobile base stations have fpga with a
> lot of nios2 in them - almost certainly running with a few kB
> of code and data memory and running small control tasks.
> If you want to run Linux, find an fpga with an ARM core.
>
> There are some solutions - like writing a compatible soft cpu.
>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)

2024-02-28 07:26:11

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

Le 28/02/2024 à 01:21, Charlie Jenkins a écrit :
> On Tue, Feb 27, 2024 at 10:28:45AM +0000, Russell King (Oracle) wrote:
>> On Tue, Feb 27, 2024 at 06:47:38AM +0000, Christophe Leroy wrote:
>>>
>>>
>>> Le 27/02/2024 à 00:48, Guenter Roeck a écrit :
>>>> On 2/26/24 15:17, Charlie Jenkins wrote:
>>>>> On Mon, Feb 26, 2024 at 10:33:56PM +0000, David Laight wrote:
>>>>>> ...
>>>>>>> I think you misunderstand. "NET_IP_ALIGN offset is what the kernel
>>>>>>> defines to be supported" is a gross misinterpretation. It is not
>>>>>>> "defined to be supported" at all. It is the _preferred_ alignment
>>>>>>> nothing more, nothing less.
>>>>>
>>>>> This distinction is arbitrary in practice, but I am open to being proven
>>>>> wrong if you have data to back up this statement. If the driver chooses
>>>>> to not follow this, then the driver might not work. ARM defines the
>>>>> NET_IP_ALIGN to be 2 to pad out the header to be on the supported
>>>>> alignment. If the driver chooses to pad with one byte instead of 2
>>>>> bytes, the driver may fail to work as the CPU may stall after the
>>>>> misaligned access.
>>>>>
>>>>>>
>>>>>> I'm sure I've seen code that would realign IP headers to a 4 byte
>>>>>> boundary before processing them - but that might not have been in
>>>>>> Linux.
>>>>>>
>>>>>> I'm also sure there are cpu which will fault double length misaligned
>>>>>> memory transfers - which might be used to marginally speed up code.
>>>>>> Assuming more than 4 byte alignment for the IP header is likely
>>>>>> 'wishful thinking'.
>>>>>>
>>>>>> There is plenty of ethernet hardware that can only write frames
>>>>>> to even boundaries and plenty of cpu that fault misaligned accesses.
>>>>>> There are even cases of both on the same silicon die.
>>>>>>
>>>>>> You also pretty much never want a fault handler to fixup misaligned
>>>>>> ethernet frames (or really anything else for that matter).
>>>>>> It is always going to be better to check in the code itself.
>>>>>>
>>>>>> x86 has just made people 'sloppy' :-)
>>>>>>
>>>>>> David
>>>>>>
>>>>>> -
>>>>>> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
>>>>>> MK1 1PT, UK
>>>>>> Registration No: 1397386 (Wales)
>>>>>>
>>>>>
>>>>> If somebody has a solution they deem to be better, I am happy to change
>>>>> this test case. Otherwise, I would appreciate a maintainer resolving
>>>>> this discussion and apply this fix.
>>>>>
>>>> Agreed.
>>>>
>>>> I do have a couple of patches which add explicit unaligned tests as well as
>>>> corner case tests (which are intended to trigger as many carry overflows
>>>> as possible). Once I get those working reliably, I'll be happy to submit
>>>> them as additional tests.
>>>>
>>>
>>> The functions definitely have to work at least with and without VLAN,
>>> which means the alignment cannot be greater than 4 bytes. That's also
>>> the outcome of the discussion.
>>
>> Thanks for completely ignoring what I've said. No. The alignment ends up
>> being commonly 2 bytes.
>>
>> As I've said several times, network drivers do _not_ have to respect
>> NET_IP_ALIGN. There are 32-bit ARM drivers which have a DMA engine in
>> them which can only DMA to a 32-bit aligned address. This means that
>> the start of the ethernet header is placed at a 32-bit aligned address
>> making the IP header misaligned to 32-bit.
>>
>> I don't see what is so difficult to understand about this... but it
>> seems that my comments on this are being ignored time and time again,
>> and I can only think that those who are ignoring my comments have
>> some alterior motive here.
>>
>> --
>> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
>> FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
>
> I don't understand how the capabilities of some ARM drivers factor in
> here. It appears that a common case for calling this function is to pass
> in an IP header that is aligned along an ethernet header + NET_IP_ALIGN.
> It is perfectly acceptable that some drivers don't align along
> NET_IP_ALIGN, but that does not seem relevant here.
>
> This test case is supposed to be as true to the "general case" as
> possible, so I have aligned the data along 14 + NET_IP_ALIGN. On ARM
> this will be a 16-byte boundary since NET_IP_ALIGN is 2. A driver that
> does not follow this may not be appropriately tested by this test case,
> but anyone is welcome to submit additional test cases that address this
> additional alignment concern.

But then this test case is becoming less and less true to the "general
case" with this patch, whereas your initial implementation was almost
perfect as it was covering most cases, a lot more than what we get with
that patch applied.

2024-02-28 07:59:40

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

On 2/27/24 23:25, Christophe Leroy wrote:
[ ... ]
>>
>> This test case is supposed to be as true to the "general case" as
>> possible, so I have aligned the data along 14 + NET_IP_ALIGN. On ARM
>> this will be a 16-byte boundary since NET_IP_ALIGN is 2. A driver that
>> does not follow this may not be appropriately tested by this test case,
>> but anyone is welcome to submit additional test cases that address this
>> additional alignment concern.
>
> But then this test case is becoming less and less true to the "general
> case" with this patch, whereas your initial implementation was almost
> perfect as it was covering most cases, a lot more than what we get with
> that patch applied.
>
NP with me if that is where people want to go. I'll simply disable checksum
tests on all architectures which don't support unaligned accesses (so far
it looks like that is only arm with thumb instructions, and possibly nios2).
I personally find that less desirable and would have preferred a second
configurable set of tests for unaligned accesses, but I have no problem
with it.

Guenter

2024-02-28 10:15:56

by Geert Uytterhoeven

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

CC testing

On Wed, Feb 28, 2024 at 8:59 AM Guenter Roeck <[email protected]> wrote:
> On 2/27/24 23:25, Christophe Leroy wrote:
> [ ... ]
> >>
> >> This test case is supposed to be as true to the "general case" as
> >> possible, so I have aligned the data along 14 + NET_IP_ALIGN. On ARM
> >> this will be a 16-byte boundary since NET_IP_ALIGN is 2. A driver that
> >> does not follow this may not be appropriately tested by this test case,
> >> but anyone is welcome to submit additional test cases that address this
> >> additional alignment concern.
> >
> > But then this test case is becoming less and less true to the "general
> > case" with this patch, whereas your initial implementation was almost
> > perfect as it was covering most cases, a lot more than what we get with
> > that patch applied.
> >
> NP with me if that is where people want to go. I'll simply disable checksum
> tests on all architectures which don't support unaligned accesses (so far
> it looks like that is only arm with thumb instructions, and possibly nios2).
> I personally find that less desirable and would have preferred a second
> configurable set of tests for unaligned accesses, but I have no problem
> with it.

IMHO the tests should validate the expected functionality. If a test
fails, either functionality is missing or behaves wrong, or the test
is wrong.

What is the point of writing tests for a core functionality like network
checksumming that do not match the expected functionality?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68korg

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2024-02-28 15:42:13

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

On 2/28/24 02:15, Geert Uytterhoeven wrote:
> CC testing
>
> On Wed, Feb 28, 2024 at 8:59 AM Guenter Roeck <[email protected]> wrote:
>> On 2/27/24 23:25, Christophe Leroy wrote:
>> [ ... ]
>>>>
>>>> This test case is supposed to be as true to the "general case" as
>>>> possible, so I have aligned the data along 14 + NET_IP_ALIGN. On ARM
>>>> this will be a 16-byte boundary since NET_IP_ALIGN is 2. A driver that
>>>> does not follow this may not be appropriately tested by this test case,
>>>> but anyone is welcome to submit additional test cases that address this
>>>> additional alignment concern.
>>>
>>> But then this test case is becoming less and less true to the "general
>>> case" with this patch, whereas your initial implementation was almost
>>> perfect as it was covering most cases, a lot more than what we get with
>>> that patch applied.
>>>
>> NP with me if that is where people want to go. I'll simply disable checksum
>> tests on all architectures which don't support unaligned accesses (so far
>> it looks like that is only arm with thumb instructions, and possibly nios2).
>> I personally find that less desirable and would have preferred a second
>> configurable set of tests for unaligned accesses, but I have no problem
>> with it.
>
> IMHO the tests should validate the expected functionality. If a test
> fails, either functionality is missing or behaves wrong, or the test
> is wrong.
>
> What is the point of writing tests for a core functionality like network
> checksumming that do not match the expected functionality?
>

Tough one. I can't enable CONFIG_NET_TEST on nios2, parisc, and arm with THUMB
enabled due to crashes or hangs in gso tests. I accept that. Downside is that I
have to disable CONFIG_NET_TEST on those architectures/platforms entirely,
meaning a whole class of tests are missing for those architectures. I would
prefer to have a configuration option such as CONFIG_NET_GSO_TEST to let me
disable the problematic tests for the affected platforms so I can run all
the other network unit tests. Yes, obviously something is wrong either with
the affected tests or with the implementation of the tested functionality
on the affected systems, but that could be handled separately if a separate
configuration option existed, and new regressions in other tests on the affected
architectures could be identified as they happen.

This case is similar. I'd prefer to have a separate configuration option,
say, CONFIG_CHECKSUM_MISALIGNED_KUNIT, which I can disable to be able to
run the common checksum tests on platforms / architectures which don't
support unaligned accesses.

However, as I said, if the community wants to take a harsh stance, I have no
problem with just disabling groups of tests entirely on platforms which have
a problem with part of it.

Guenter

2024-02-29 08:07:30

by David Gow

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

On Wed, 28 Feb 2024 at 23:40, Guenter Roeck <[email protected]> wrote:
>
> On 2/28/24 02:15, Geert Uytterhoeven wrote:
> > CC testing
> >
> > On Wed, Feb 28, 2024 at 8:59 AM Guenter Roeck <[email protected]> wrote:
> >> On 2/27/24 23:25, Christophe Leroy wrote:
> >> [ ... ]
> >>>>
> >>>> This test case is supposed to be as true to the "general case" as
> >>>> possible, so I have aligned the data along 14 + NET_IP_ALIGN. On ARM
> >>>> this will be a 16-byte boundary since NET_IP_ALIGN is 2. A driver that
> >>>> does not follow this may not be appropriately tested by this test case,
> >>>> but anyone is welcome to submit additional test cases that address this
> >>>> additional alignment concern.
> >>>
> >>> But then this test case is becoming less and less true to the "general
> >>> case" with this patch, whereas your initial implementation was almost
> >>> perfect as it was covering most cases, a lot more than what we get with
> >>> that patch applied.
> >>>
> >> NP with me if that is where people want to go. I'll simply disable checksum
> >> tests on all architectures which don't support unaligned accesses (so far
> >> it looks like that is only arm with thumb instructions, and possibly nios2).
> >> I personally find that less desirable and would have preferred a second
> >> configurable set of tests for unaligned accesses, but I have no problem
> >> with it.
> >
> > IMHO the tests should validate the expected functionality. If a test
> > fails, either functionality is missing or behaves wrong, or the test
> > is wrong.
> >
> > What is the point of writing tests for a core functionality like network
> > checksumming that do not match the expected functionality?
> >
>
> Tough one. I can't enable CONFIG_NET_TEST on nios2, parisc, and arm with THUMB
> enabled due to crashes or hangs in gso tests. I accept that. Downside is that I
> have to disable CONFIG_NET_TEST on those architectures/platforms entirely,
> meaning a whole class of tests are missing for those architectures. I would
> prefer to have a configuration option such as CONFIG_NET_GSO_TEST to let me
> disable the problematic tests for the affected platforms so I can run all
> the other network unit tests. Yes, obviously something is wrong either with
> the affected tests or with the implementation of the tested functionality
> on the affected systems, but that could be handled separately if a separate
> configuration option existed, and new regressions in other tests on the affected
> architectures could be identified as they happen.
>
> This case is similar. I'd prefer to have a separate configuration option,
> say, CONFIG_CHECKSUM_MISALIGNED_KUNIT, which I can disable to be able to
> run the common checksum tests on platforms / architectures which don't
> support unaligned accesses.
>
> However, as I said, if the community wants to take a harsh stance, I have no
> problem with just disabling groups of tests entirely on platforms which have
> a problem with part of it.
>
> Guenter
>

I think the ideal solution is for there to be some official stance on
the required alignment, for every architecture to support that, and
for the tests to exercise it. Now, judging from the sheer number of
replies in this thread, it seems like there isn't any real agreement
on that. (From my quick reading of some of the checksum code, my
assumption was that this was either 1- or 2- byte alignment required,
with 4-byte alignment being ideal for performance reasons in most
setups).

If different architectures have different alignment requirements
(ouch!), my feeling is that the test suite should be written to the
maximum such alignment (as any non-architecture-specific code will
need to align things anyway), and architectures/drivers with
non-aligned buffers can have their own tests. If it turns out there
are a lot of such drivers/architectures, then we can add the extra
config option.

I'd rather, if there is a config option to disable these tests, it be
of the form ARCH_HAS_UNALIGNED_CHECKSUM to enable it, or similar.
There's also the option of having the test 'skip' itself on a
configuration which doesn't support it. That way it'll still show up
in the list of tests, but with a description, like "Disabled due to
checksum alignment requirements" or something, which may be more
obvious to people debugging it later.

For the gso test hangs, I think it's probably quite sensible to have a
config option for the GSO tests generally. I'd be more hesitant to
have a separate CONFIG_NET_GSO_FREQUENTLY_BROKEN_TESTS, which is
selected automatically by a bunch of architectures. At that point, I
think we need to either just fix the bugs, or start thinking about a
better solution for these tests / architectures.

One of the things I'm hoping to work on this year is some improvements
to KUnit tooling to automatically run tests across a wider set of
architectures and configs, so test authors can catch this sort of
thing before even sending patches out. We can do a bit of this with
the manual --arch <arch> option to kunit.py, but very few people will
test things across more than a couple of architectures, and rarely
will we get good testing on the less common architectures, like 32-bit
ones, big endian ones, or ones with stricter alignment requirements.
So we can do better there.

tl;dr: I think it's a good idea for tests to sit behind config
options. Obviously they shouldn't be either too broad, or too
granular, but common sense usually prevails here. I'd rather not have
config options explicitly for "broken" tests, though: if you have to,
try to make the config option for the missing/broken feature (HAS_xxx)
rather than the test if possible. Otherwise, 'skip' the test, with a
suitable reason string if you can.

-- David

Attachments:

smime.p7s (3.92 kB)
S/MIME Cryptographic Signature

2024-02-29 19:38:59

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

On Wed, Feb 28, 2024 at 07:40:43AM -0800, Guenter Roeck wrote:
> On 2/28/24 02:15, Geert Uytterhoeven wrote:
> > CC testing
> >
> > On Wed, Feb 28, 2024 at 8:59 AM Guenter Roeck <[email protected]> wrote:
> > > On 2/27/24 23:25, Christophe Leroy wrote:
> > > [ ... ]
> > > > >
> > > > > This test case is supposed to be as true to the "general case" as
> > > > > possible, so I have aligned the data along 14 + NET_IP_ALIGN. On ARM
> > > > > this will be a 16-byte boundary since NET_IP_ALIGN is 2. A driver that
> > > > > does not follow this may not be appropriately tested by this test case,
> > > > > but anyone is welcome to submit additional test cases that address this
> > > > > additional alignment concern.
> > > >
> > > > But then this test case is becoming less and less true to the "general
> > > > case" with this patch, whereas your initial implementation was almost
> > > > perfect as it was covering most cases, a lot more than what we get with
> > > > that patch applied.
> > > >
> > > NP with me if that is where people want to go. I'll simply disable checksum
> > > tests on all architectures which don't support unaligned accesses (so far
> > > it looks like that is only arm with thumb instructions, and possibly nios2).
> > > I personally find that less desirable and would have preferred a second
> > > configurable set of tests for unaligned accesses, but I have no problem
> > > with it.
> >
> > IMHO the tests should validate the expected functionality. If a test
> > fails, either functionality is missing or behaves wrong, or the test
> > is wrong.
> >
> > What is the point of writing tests for a core functionality like network
> > checksumming that do not match the expected functionality?
> >
>
> Tough one. I can't enable CONFIG_NET_TEST on nios2, parisc, and arm with THUMB
> enabled due to crashes or hangs in gso tests. I accept that. Downside is that I
> have to disable CONFIG_NET_TEST on those architectures/platforms entirely,
> meaning a whole class of tests are missing for those architectures. I would
> prefer to have a configuration option such as CONFIG_NET_GSO_TEST to let me
> disable the problematic tests for the affected platforms so I can run all
> the other network unit tests. Yes, obviously something is wrong either with
> the affected tests or with the implementation of the tested functionality
> on the affected systems, but that could be handled separately if a separate
> configuration option existed, and new regressions in other tests on the affected
> architectures could be identified as they happen.

I think I got confused here, is this an issue with the tests included in
this patch or is it unrelated?

- Charlie

>
> This case is similar. I'd prefer to have a separate configuration option,
> say, CONFIG_CHECKSUM_MISALIGNED_KUNIT, which I can disable to be able to
> run the common checksum tests on platforms / architectures which don't
> support unaligned accesses.
>
> However, as I said, if the community wants to take a harsh stance, I have no
> problem with just disabling groups of tests entirely on platforms which have
> a problem with part of it.
>
> Guenter
>

2024-02-29 20:22:43

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

On 2/29/24 11:38, Charlie Jenkins wrote:
[ ... ]
>> Tough one. I can't enable CONFIG_NET_TEST on nios2, parisc, and arm with THUMB
>> enabled due to crashes or hangs in gso tests. I accept that. Downside is that I
>> have to disable CONFIG_NET_TEST on those architectures/platforms entirely,
>> meaning a whole class of tests are missing for those architectures. I would
>> prefer to have a configuration option such as CONFIG_NET_GSO_TEST to let me
>> disable the problematic tests for the affected platforms so I can run all
>> the other network unit tests. Yes, obviously something is wrong either with
>> the affected tests or with the implementation of the tested functionality
>> on the affected systems, but that could be handled separately if a separate
>> configuration option existed, and new regressions in other tests on the affected
>> architectures could be identified as they happen.
>
> I think I got confused here, is this an issue with the tests included in
> this patch or is it unrelated?
>

Unrelated. It was intended to be an example of another set of tests which
suffer from a similar problem (crash on certain architectures if enabled).
Sorry for the confusion.

Guenter

2024-03-01 06:46:43

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

Le 26/02/2024 à 17:44, Guenter Roeck a écrit :
> On 2/26/24 03:34, Christophe Leroy wrote:
>>
>>
>> Le 23/02/2024 à 23:11, Charlie Jenkins a écrit :
>>> The test cases for ip_fast_csum and csum_ipv6_magic were not properly
>>> aligning the IP header, which were causing failures on architectures
>>> that do not support misaligned accesses like some ARM platforms. To
>>> solve this, align the data along (14 + NET_IP_ALIGN) bytes which is the
>>> standard alignment of an IP header and must be supported by the
>>> architecture.
>>
>> I'm still wondering what we are really trying to fix here.
>>
>> All other tests are explicitely testing that it works with any alignment.
>>
>> Shouldn't ip_fast_csum() and csum_ipv6_magic() work for any alignment as
>> well ? I would expect it, I see no comment in arm code which explicits
>> that assumption around those functions.
>>
>> Isn't the problem only the following line, because csum_offset is
>> unaligned ?
>>
>> csum = *(__wsum *)(random_buf + i + csum_offset);
>>
>> Otherwise, if there really is an alignment issue for the IPv6 source or
>> destination address, isn't it enough to perform a 32 bits alignment ?
>>
>
> It isn't just arm.
>
> Question should be what alignments the functions are supposed to be able
> to handle, not what they are optimized for. If byte and/or half word
> alignments
> are expected to be supported, there is still architecture code which would
> have to be fixed. Unaligned accesses are known to fail on hppa64/parisc64
> and on sh4, for example. If unaligned accesses are expected to be handled,
> it would probably make sense to add a separate test case, though, to
> clarify
> that the test fails due to alignment issues, not due to input parameters.
>

When you say "Unaligned accesses are known to fail on hppa64/parisc64
and on sh4", do you mean unaligned accesses in general or do you mean
ip_fast_csum() with unaligned ip header and csum_ipv6_magic() with
unaligned source and dest addresses ?

Because later in this thread it is said that only ARM and NIOS2
potentially have an issue.

And when you say "unaligned", to what level is that ? Is it 4-bytes
alignment or more or less ?

Christophe

2024-03-01 07:00:40

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

Le 28/02/2024 à 08:59, Guenter Roeck a écrit :
> On 2/27/24 23:25, Christophe Leroy wrote:
> [ ... ]
>>>
>>> This test case is supposed to be as true to the "general case" as
>>> possible, so I have aligned the data along 14 + NET_IP_ALIGN. On ARM
>>> this will be a 16-byte boundary since NET_IP_ALIGN is 2. A driver that
>>> does not follow this may not be appropriately tested by this test case,
>>> but anyone is welcome to submit additional test cases that address this
>>> additional alignment concern.
>>
>> But then this test case is becoming less and less true to the "general
>> case" with this patch, whereas your initial implementation was almost
>> perfect as it was covering most cases, a lot more than what we get with
>> that patch applied.
>>
> NP with me if that is where people want to go. I'll simply disable checksum
> tests on all architectures which don't support unaligned accesses (so far
> it looks like that is only arm with thumb instructions, and possibly
> nios2).
> I personally find that less desirable and would have preferred a second
> configurable set of tests for unaligned accesses, but I have no problem
> with it.
>

Can you tell more about the symptoms you encounter on ARM ? According to
Russell (ARM Maintainer) it should work, quoting him below:

However, that may not always be the case for incoming packets, and what
saves 32-bit Arm is the ability to do unaligned loads in later revisions
of the architecture, or the alignment fault handler (slow) on older
revisions.

NIOS2 doesn't have her how functions and relies on CONFIG_GENERIC_CSUM.
Which means that ip_fast_csum() is from lib/checksum.c and is
implemented using do_csum() which handles unaligned accesses by
splitting accesses into smaller aligned accesses.
Therefore, ip_fast_csum() shouldn't be an issue for NIOS2.

Regarding csum_ipv6_magic(), NIOS2 uses the function in
net/ipv6/ip6_checksum.c
This function dereferences saddr and daddr with 32-bits accesses:
saddr->s6_addr32[0], is that a problem when saddr and daddr are not
32-bits aligned ? Does it Oops ?

2024-03-01 16:27:01

[permalink] [raw]

Subject: Re: [PATCH v10] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

On 2/29/24 22:46, Christophe Leroy wrote:
>
>
> Le 26/02/2024 à 17:44, Guenter Roeck a écrit :
>> On 2/26/24 03:34, Christophe Leroy wrote:
>>>
>>>
>>> Le 23/02/2024 à 23:11, Charlie Jenkins a écrit :
>>>> The test cases for ip_fast_csum and csum_ipv6_magic were not properly
>>>> aligning the IP header, which were causing failures on architectures
>>>> that do not support misaligned accesses like some ARM platforms. To
>>>> solve this, align the data along (14 + NET_IP_ALIGN) bytes which is the
>>>> standard alignment of an IP header and must be supported by the
>>>> architecture.
>>>
>>> I'm still wondering what we are really trying to fix here.
>>>
>>> All other tests are explicitely testing that it works with any alignment.
>>>
>>> Shouldn't ip_fast_csum() and csum_ipv6_magic() work for any alignment as
>>> well ? I would expect it, I see no comment in arm code which explicits
>>> that assumption around those functions.
>>>
>>> Isn't the problem only the following line, because csum_offset is
>>> unaligned ?
>>>
>>> csum = *(__wsum *)(random_buf + i + csum_offset);
>>>
>>> Otherwise, if there really is an alignment issue for the IPv6 source or
>>> destination address, isn't it enough to perform a 32 bits alignment ?
>>>
>>
>> It isn't just arm.
>>
>> Question should be what alignments the functions are supposed to be able
>> to handle, not what they are optimized for. If byte and/or half word
>> alignments
>> are expected to be supported, there is still architecture code which would
>> have to be fixed. Unaligned accesses are known to fail on hppa64/parisc64
>> and on sh4, for example. If unaligned accesses are expected to be handled,
>> it would probably make sense to add a separate test case, though, to
>> clarify
>> that the test fails due to alignment issues, not due to input parameters.
>>
>
> When you say "Unaligned accesses are known to fail on hppa64/parisc64
> and on sh4", do you mean unaligned accesses in general or do you mean
> ip_fast_csum() with unaligned ip header and csum_ipv6_magic() with
> unaligned source and dest addresses ?
>
> Because later in this thread it is said that only ARM and NIOS2
> potentially have an issue.
>
> And when you say "unaligned", to what level is that ? Is it 4-bytes
> alignment or more or less ?
>

This e-mail chain is getting too long. Here is an attempt of a quick summary.

- Someone else suggested that unaligned accesses with nios2 should fail.
I since then tested and found that they pass at least for the checksum tests,
while dumping "unaligned access" messages into the kernel log. Other tests
(specifically gso) cause crashes, but that is unrelated.

- checksum tests on sh4 fail for unaligned data because of a bug introduced
to the architecture's checksum core with commit cadc4e1a2b4d ("sh: Handle
calling csum_partial with misaligned data"). The tests pass after reverting
that patch. I reported this, but that revert (or a fix of the problem it
introduced) has not been applied to linux-next.

- Checksum tests on unaligned data fail on parisc in mainline due to a number
of bugs in checksum assembler code (and with upstream qemu due to a bug in
qemu's hppa64 emulation). All those issues should by now be fixed in linux-next.
For reference, the following patches (SHAs from next-20240301) are needed to fix
the known problems:
0568b6f0d863 parisc: Strip upper 32 bit of sum in csum_ipv6_magic for 64-bit builds
4b75b12d7050 parisc: Fix csum_ipv6_magic on 64-bit systems
4408ba75e4ba parisc: Fix csum_ipv6_magic on 32-bit systems
a2abae8f0b63 parisc: Fix ip_fast_csum
qemu (v8.2 and later) needs
https://lore.kernel.org/all/[email protected]/T/
for the hppa64/parisc64 tests to work with qemu.

- Checksum tests on unaligned data cause a crash on arm systems with "thumb"
instruction set enabled (such as mps2_defconfig and an385). I didn't bother
checking if the crash is with 1-byte or 2-byte alignment.

- There used to be a crash with checksum tests on m68k because of word alignment
which the implementation of the unit tests for csum_ipv6_magic() did not take
into account (this is fixed by the padding in struct csum_ipv6_magic_data).
I don't know if this patch is needed to fix that problem or if it was since
fixed differently.

I hope that covers everything. As I said above, the chain is getting long
and I may have missed something.

I am currently re-testing on all platforms/architectures available in qemu
with the known bugs outside lib/checksum_kunit.c fixed and with the sh4 patch
reverted, but without this patch. I'll send an update in response to the v11
patch as soon as I have the results.

Guenter

2024-03-01 20:47:53