2023-12-18 08:57:41

by Suman Ghosh

[permalink] [raw]
Subject: [net PATCH] octeontx2-af: Fix marking couple of structure as __packed

Couple of structures was not marked as __packed which may have some
performance implication. This patch fixes the same and mark them as
__packed.

Fixes: 42006910b5ea ("octeontx2-af: cleanup KPU config data")
Signed-off-by: Suman Ghosh <[email protected]>
---
drivers/net/ethernet/marvell/octeontx2/af/npc.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/npc.h b/drivers/net/ethernet/marvell/octeontx2/af/npc.h
index ab3e39eef2eb..8c0732c9a7ee 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/npc.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/npc.h
@@ -528,7 +528,7 @@ struct npc_lt_def {
u8 ltype_mask;
u8 ltype_match;
u8 lid;
-};
+} __packed;

struct npc_lt_def_ipsec {
u8 ltype_mask;
@@ -536,7 +536,7 @@ struct npc_lt_def_ipsec {
u8 lid;
u8 spi_offset;
u8 spi_nz;
-};
+} __packed;

struct npc_lt_def_apad {
u8 ltype_mask;
--
2.25.1



2023-12-18 20:45:22

by Jacob Keller

[permalink] [raw]
Subject: Re: [net PATCH] octeontx2-af: Fix marking couple of structure as __packed



On 12/18/2023 12:27 AM, Suman Ghosh wrote:
> Couple of structures was not marked as __packed which may have some
> performance implication. This patch fixes the same and mark them as
> __packed.

Not sure I follow why lack of __packed would have performance
implications? I get that __packed is important to ensure layout is
correct or to ensure the whole structure has the right size rather than
unexpected gaps. I'd guess maybe because the structures size would
include padding without __packed, leading to a lot of gaps when
combining several structures together...

I did test on my system with pahole, and even without __packed, I don't
get any gaps in the npc_lt_def_cfg structure:


> struct npc_lt_def_cfg {
> struct npc_lt_def rx_ol2; /* 0 3 */
> struct npc_lt_def rx_oip4; /* 3 3 */
> struct npc_lt_def rx_iip4; /* 6 3 */
> struct npc_lt_def rx_oip6; /* 9 3 */
> struct npc_lt_def rx_iip6; /* 12 3 */
> struct npc_lt_def rx_otcp; /* 15 3 */
> struct npc_lt_def rx_itcp; /* 18 3 */
> struct npc_lt_def rx_oudp; /* 21 3 */
> struct npc_lt_def rx_iudp; /* 24 3 */
> struct npc_lt_def rx_osctp; /* 27 3 */
> struct npc_lt_def rx_isctp; /* 30 3 */
> struct npc_lt_def_ipsec rx_ipsec[2]; /* 33 10 */
> struct npc_lt_def pck_ol2; /* 43 3 */
> struct npc_lt_def pck_oip4; /* 46 3 */
> struct npc_lt_def pck_oip6; /* 49 3 */
> struct npc_lt_def pck_iip4; /* 52 3 */
> struct npc_lt_def_apad rx_apad0; /* 55 4 */
> struct npc_lt_def_apad rx_apad1; /* 59 4 */
> struct npc_lt_def_color ovlan; /* 63 5 */
> /* --- cacheline 1 boundary (64 bytes) was 4 bytes ago --- */
> struct npc_lt_def_color ivlan; /* 68 5 */
> struct npc_lt_def_color rx_gen0_color; /* 73 5 */
> struct npc_lt_def_color rx_gen1_color; /* 78 5 */
> struct npc_lt_def_et rx_et[2]; /* 83 10 */
>
> /* size: 93, cachelines: 2, members: 23 */
> /* last cacheline: 29 bytes */
> };


However that may not be true across all compilers etc. Also all the
other structures are __packed. Makes sense.

Reviewed-by: Jacob Keller <[email protected]>

>
> Fixes: 42006910b5ea ("octeontx2-af: cleanup KPU config data")
> Signed-off-by: Suman Ghosh <[email protected]>
> ---
> drivers/net/ethernet/marvell/octeontx2/af/npc.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/marvell/octeontx2/af/npc.h b/drivers/net/ethernet/marvell/octeontx2/af/npc.h
> index ab3e39eef2eb..8c0732c9a7ee 100644
> --- a/drivers/net/ethernet/marvell/octeontx2/af/npc.h
> +++ b/drivers/net/ethernet/marvell/octeontx2/af/npc.h
> @@ -528,7 +528,7 @@ struct npc_lt_def {
> u8 ltype_mask;
> u8 ltype_match;
> u8 lid;
> -};
> +} __packed;
>
> struct npc_lt_def_ipsec {
> u8 ltype_mask;
> @@ -536,7 +536,7 @@ struct npc_lt_def_ipsec {
> u8 lid;
> u8 spi_offset;
> u8 spi_nz;
> -};
> +} __packed;
>
> struct npc_lt_def_apad {
> u8 ltype_mask;

2023-12-19 14:23:46

by Suman Ghosh

[permalink] [raw]
Subject: RE: [EXT] Re: [net PATCH] octeontx2-af: Fix marking couple of structure as __packed

>Not sure I follow why lack of __packed would have performance
>implications? I get that __packed is important to ensure layout is
>correct or to ensure the whole structure has the right size rather than
>unexpected gaps. I'd guess maybe because the structures size would
>include padding without __packed, leading to a lot of gaps when
>combining several structures together...
>
>I did test on my system with pahole, and even without __packed, I don't
>get any gaps in the npc_lt_def_cfg structure:
>
>
>> struct npc_lt_def_cfg {
>> struct npc_lt_def rx_ol2; /* 0
>3 */
>> struct npc_lt_def rx_oip4; /* 3
>3 */
>> struct npc_lt_def rx_iip4; /* 6
>3 */
>> struct npc_lt_def rx_oip6; /* 9
>3 */
>> struct npc_lt_def rx_iip6; /* 12
>3 */
>> struct npc_lt_def rx_otcp; /* 15
>3 */
>> struct npc_lt_def rx_itcp; /* 18
>3 */
>> struct npc_lt_def rx_oudp; /* 21
>3 */
>> struct npc_lt_def rx_iudp; /* 24
>3 */
>> struct npc_lt_def rx_osctp; /* 27
>3 */
>> struct npc_lt_def rx_isctp; /* 30
>3 */
>> struct npc_lt_def_ipsec rx_ipsec[2]; /* 33
>10 */
>> struct npc_lt_def pck_ol2; /* 43
>3 */
>> struct npc_lt_def pck_oip4; /* 46
>3 */
>> struct npc_lt_def pck_oip6; /* 49
>3 */
>> struct npc_lt_def pck_iip4; /* 52
>3 */
>> struct npc_lt_def_apad rx_apad0; /* 55
>4 */
>> struct npc_lt_def_apad rx_apad1; /* 59
>4 */
>> struct npc_lt_def_color ovlan; /* 63
>5 */
>> /* --- cacheline 1 boundary (64 bytes) was 4 bytes ago --- */
>> struct npc_lt_def_color ivlan; /* 68
>5 */
>> struct npc_lt_def_color rx_gen0_color; /* 73
>5 */
>> struct npc_lt_def_color rx_gen1_color; /* 78
>5 */
>> struct npc_lt_def_et rx_et[2]; /* 83
>10 */
>>
>> /* size: 93, cachelines: 2, members: 23 */
>> /* last cacheline: 29 bytes */ };
>
>
>However that may not be true across all compilers etc. Also all the
>other structures are __packed. Makes sense.
>
>Reviewed-by: Jacob Keller <[email protected]>
>
[Suman] I agree, "having performance impact" is a wrong statement. I will update the same in v2.

2023-12-19 15:26:55

by David Laight

[permalink] [raw]
Subject: RE: [net PATCH] octeontx2-af: Fix marking couple of structure as __packed

From: Jacob Keller
> Sent: 18 December 2023 20:44
>
> On 12/18/2023 12:27 AM, Suman Ghosh wrote:
> > Couple of structures was not marked as __packed which may have some
> > performance implication. This patch fixes the same and mark them as
> > __packed.
>
> Not sure I follow why lack of __packed would have performance
> implications? I get that __packed is important to ensure layout is
> correct or to ensure the whole structure has the right size rather than
> unexpected gaps. I'd guess maybe because the structures size would
> include padding without __packed, leading to a lot of gaps when
> combining several structures together...
>
> I did test on my system with pahole, and even without __packed, I don't
> get any gaps in the npc_lt_def_cfg structure:
>
>
> > struct npc_lt_def_cfg {
> > struct npc_lt_def rx_ol2; /* 0 3 */
> > struct npc_lt_def rx_oip4; /* 3 3 */
> > struct npc_lt_def rx_iip4; /* 6 3 */
> > struct npc_lt_def rx_oip6; /* 9 3 */
> > struct npc_lt_def rx_iip6; /* 12 3 */
> > struct npc_lt_def rx_otcp; /* 15 3 */
> > struct npc_lt_def rx_itcp; /* 18 3 */
> > struct npc_lt_def rx_oudp; /* 21 3 */
> > struct npc_lt_def rx_iudp; /* 24 3 */
> > struct npc_lt_def rx_osctp; /* 27 3 */
> > struct npc_lt_def rx_isctp; /* 30 3 */
> > struct npc_lt_def_ipsec rx_ipsec[2]; /* 33 10 */
> > struct npc_lt_def pck_ol2; /* 43 3 */
> > struct npc_lt_def pck_oip4; /* 46 3 */
> > struct npc_lt_def pck_oip6; /* 49 3 */
> > struct npc_lt_def pck_iip4; /* 52 3 */
> > struct npc_lt_def_apad rx_apad0; /* 55 4 */
> > struct npc_lt_def_apad rx_apad1; /* 59 4 */
> > struct npc_lt_def_color ovlan; /* 63 5 */
> > /* --- cacheline 1 boundary (64 bytes) was 4 bytes ago --- */
> > struct npc_lt_def_color ivlan; /* 68 5 */
> > struct npc_lt_def_color rx_gen0_color; /* 73 5 */
> > struct npc_lt_def_color rx_gen1_color; /* 78 5 */
> > struct npc_lt_def_et rx_et[2]; /* 83 10 */
> >
> > /* size: 93, cachelines: 2, members: 23 */
> > /* last cacheline: 29 bytes */
> > };
>
>
> However that may not be true across all compilers etc. Also all the
> other structures are __packed. Makes sense.

Or not - maybe all the __packed should be removed instead!

Unless these structures (or any others) appear in 'messages' which
get transferred between systems they really shouldn't be __packed.
And a 93 byte 'message' with all those fields seems rather odd.

The above breakdown seems to imply everything is 'unsigned char'
so the __packed makes no difference.

Using __packed requires the compiler generate byte loads/store
with shifts (etc) on many architectures and should really be avoided
unless it is absolutely needed for binary compatibility.

Even then if the problem is a 64bit field that only needs to be
32bit aligned (as is common for some compat32 code) then the 64bit
fields should be marked as being 32bit aligned.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2023-12-19 21:06:45

by Jacob Keller

[permalink] [raw]
Subject: Re: [net PATCH] octeontx2-af: Fix marking couple of structure as __packed



On 12/19/2023 7:26 AM, David Laight wrote:
> From: Jacob Keller
>> Sent: 18 December 2023 20:44
>>
>> On 12/18/2023 12:27 AM, Suman Ghosh wrote:
>>> Couple of structures was not marked as __packed which may have some
>>> performance implication. This patch fixes the same and mark them as
>>> __packed.
>>
>> Not sure I follow why lack of __packed would have performance
>> implications? I get that __packed is important to ensure layout is
>> correct or to ensure the whole structure has the right size rather than
>> unexpected gaps. I'd guess maybe because the structures size would
>> include padding without __packed, leading to a lot of gaps when
>> combining several structures together...
>>
>> I did test on my system with pahole, and even without __packed, I don't
>> get any gaps in the npc_lt_def_cfg structure:
>>
>>
>>> struct npc_lt_def_cfg {
>>> struct npc_lt_def rx_ol2; /* 0 3 */
>>> struct npc_lt_def rx_oip4; /* 3 3 */
>>> struct npc_lt_def rx_iip4; /* 6 3 */
>>> struct npc_lt_def rx_oip6; /* 9 3 */
>>> struct npc_lt_def rx_iip6; /* 12 3 */
>>> struct npc_lt_def rx_otcp; /* 15 3 */
>>> struct npc_lt_def rx_itcp; /* 18 3 */
>>> struct npc_lt_def rx_oudp; /* 21 3 */
>>> struct npc_lt_def rx_iudp; /* 24 3 */
>>> struct npc_lt_def rx_osctp; /* 27 3 */
>>> struct npc_lt_def rx_isctp; /* 30 3 */
>>> struct npc_lt_def_ipsec rx_ipsec[2]; /* 33 10 */
>>> struct npc_lt_def pck_ol2; /* 43 3 */
>>> struct npc_lt_def pck_oip4; /* 46 3 */
>>> struct npc_lt_def pck_oip6; /* 49 3 */
>>> struct npc_lt_def pck_iip4; /* 52 3 */
>>> struct npc_lt_def_apad rx_apad0; /* 55 4 */
>>> struct npc_lt_def_apad rx_apad1; /* 59 4 */
>>> struct npc_lt_def_color ovlan; /* 63 5 */
>>> /* --- cacheline 1 boundary (64 bytes) was 4 bytes ago --- */
>>> struct npc_lt_def_color ivlan; /* 68 5 */
>>> struct npc_lt_def_color rx_gen0_color; /* 73 5 */
>>> struct npc_lt_def_color rx_gen1_color; /* 78 5 */
>>> struct npc_lt_def_et rx_et[2]; /* 83 10 */
>>>
>>> /* size: 93, cachelines: 2, members: 23 */
>>> /* last cacheline: 29 bytes */
>>> };
>>
>>
>> However that may not be true across all compilers etc. Also all the
>> other structures are __packed. Makes sense.
>
> Or not - maybe all the __packed should be removed instead!
>
> Unless these structures (or any others) appear in 'messages' which
> get transferred between systems they really shouldn't be __packed.
> And a 93 byte 'message' with all those fields seems rather odd.
>
> The above breakdown seems to imply everything is 'unsigned char'
> so the __packed makes no difference.
>
> Using __packed requires the compiler generate byte loads/store
> with shifts (etc) on many architectures and should really be avoided
> unless it is absolutely needed for binary compatibility.
>
> Even then if the problem is a 64bit field that only needs to be
> 32bit aligned (as is common for some compat32 code) then the 64bit
> fields should be marked as being 32bit aligned.
>
> David
>
Right. Typically packed is only required when dealing with something
where the exact binary layout matters (i.e. copying to/from hardware or
across systems in such a way that the layout might change with different
compilers/arch).

If that isn't how this structure is used, then ya, removing __packed
seems reasonable. And at least for one system I see no difference in the
actual generated layout, making __packed redundant.

However, its not clear to me at a glance how this structure is used and
whether it really is copied between places where binary compatibility is
a requirement.

> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)

2023-12-20 13:05:14

by Suman Ghosh

[permalink] [raw]
Subject: RE: [EXT] Re: [net PATCH] octeontx2-af: Fix marking couple of structure as __packed

>>>
>>> However that may not be true across all compilers etc. Also all the
>>> other structures are __packed. Makes sense.
>>
>> Or not - maybe all the __packed should be removed instead!
>>
>> Unless these structures (or any others) appear in 'messages' which get
>> transferred between systems they really shouldn't be __packed.
>> And a 93 byte 'message' with all those fields seems rather odd.
>>
>> The above breakdown seems to imply everything is 'unsigned char'
>> so the __packed makes no difference.
>>
>> Using __packed requires the compiler generate byte loads/store with
>> shifts (etc) on many architectures and should really be avoided unless
>> it is absolutely needed for binary compatibility.
>>
>> Even then if the problem is a 64bit field that only needs to be 32bit
>> aligned (as is common for some compat32 code) then the 64bit fields
>> should be marked as being 32bit aligned.
>>
>> David
>>
>Right. Typically packed is only required when dealing with something
>where the exact binary layout matters (i.e. copying to/from hardware or
>across systems in such a way that the layout might change with different
>compilers/arch).
>
>If that isn't how this structure is used, then ya, removing __packed
>seems reasonable. And at least for one system I see no difference in the
>actual generated layout, making __packed redundant.
>
>However, its not clear to me at a glance how this structure is used and
>whether it really is copied between places where binary compatibility is
>a requirement.
>
>> -
>> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
>> MK1 1PT, UK Registration No: 1397386 (Wales)
[Suman] Yes, these structures are copied from firmware. It is needed to inform kernel about some parsing information required by hardware. That is the reason structures are packed and these two were missed.