2023-11-24 10:25:16

by Joey Gouly

[permalink] [raw]
Subject: [BUG] Boot crash on v6.7-rc2

Hi all,

I just hit a boot crash on v6.7-rc2 (arm64, FVP model):

[ 1.418845] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000517
[ 1.418855] Mem abort info:
[ 1.418860] ESR = 0x0000000096000004
[ 1.418867] EC = 0x25: DABT (current EL), IL = 32 bits
[ 1.418876] SET = 0, FnV = 0
[ 1.418882] EA = 0, S1PTW = 0
[ 1.418889] FSC = 0x04: level 0 translation fault
[ 1.418897] Data abort info:
[ 1.418902] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 1.418910] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 1.418919] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 1.418928] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000881835000
[ 1.418938] [0000000000000517] pgd=0000000000000000, p4d=0000000000000000
[ 1.418952] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[ 1.418961] Modules linked in:
[ 1.418969] CPU: 0 PID: 8 Comm: kworker/0:0 Tainted: G T 6.7.0-rc2-dirty #4191 40d10cdc812c74fd5dc5d91e2452ff6f1e5f4b4a
[ 1.418984] Hardware name: FVP Base RevC (DT)
[ 1.418992] Workqueue: mld mld_ifc_work
[ 1.419003] pstate: 101402005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 1.419016] pc : ___neigh_create+0x790/0x9c8
[ 1.419028] lr : ___neigh_create+0x270/0x9c8
[ 1.419041] sp : ffff8000800c3a20
[ 1.419048] x29: ffff8000800c3a20 x28: ffffd7c64c921078 x27: ffff00080188bd50
[ 1.419066] x26: ffff00080183a30c x25: ffff00080188bda0 x24: ffff00080183a300
[ 1.419084] x23: 0000000000000000 x22: 0000000000000010 x21: ffff00080188bcc0
[ 1.419102] x20: 0000000000000000 x19: ffff0008003ef000 x18: 0000000000000014
[ 1.419119] x17: 00000000cf0f2572 x16: 0000000080faa78d x15: 00000000b79921ac
[ 1.419137] x14: ffff00087ff332c0 x13: 1600000000000000 x12: 00000000000002ff
[ 1.419155] x11: 000000007c2c4dbd x10: 0000000000000003 x9 : 0000000000000000
[ 1.419172] x8 : ffff00080188bd80 x7 : 00000000be3df655 x6 : 00000000f1691d6f
[ 1.419190] x5 : 000000007c2c4dbd x4 : 0000000000000000 x3 : 000000008eb8ab5b
[ 1.419207] x2 : 000000000000050f x1 : 000000000000001d x0 : 00000000000002ff
[ 1.419225] Call trace:
[ 1.419230] ___neigh_create+0x790/0x9c8
[ 1.419243] __neigh_create+0x18/0x20
[ 1.419255] ip6_finish_output2+0x5f8/0x8c4
[ 1.419267] ip6_finish_output+0x1f0/0x258
[ 1.419279] ip6_output+0x70/0x1cc
[ 1.419291] NF_HOOK.constprop.0+0x4c/0xd8
[ 1.419302] mld_sendpack+0x1b4/0x394
[ 1.419313] mld_ifc_work+0x1d4/0x4b4

I tracked it down to the following line in net/core/neighbour.c ___neigh_create:
memcpy(n->primary_key, pkey, key_len);

I did this by surrounding the memcpy with BUG():
BUG_ON(n->tbl != tbl);
memcpy(n->primary_key, pkey, key_len);
BUG_ON(n->tbl != tbl);

And it was crashing on the second one.

Checking `struct neighbour`:

struct neighbour {
struct neighbour __rcu *next;
struct neigh_table *tbl;
.. fields ..
u8 primary_key[0];
} __randomize_layout;

Due to the `__randomize_layout`, `primary_key` field is being placed before `tbl` (actually it's the same address since it's a 0 length array). That means the memcpy() corrupts the tbl pointer.

I think I just got unlucky with my CONFIG_RANDSTRUCT seed (I can provide it if needed), it doesn't look as if it's a new issue.

I couldn't reproduce directly on v6.6 (the offsets for `tbl` and `primary_key` didn't overlap).
However I tried changing the zero-length-array to a flexible one:

+ DECLARE_FLEX_ARRAY(u8, primary_key);
+ u8 primary_key[0];

Then the field offsets ended up overlapping, and I also got the same crash on v6.6.

Thanks,
Joey


2023-11-24 15:29:39

by Gustavo A. R. Silva

[permalink] [raw]
Subject: Re: [BUG] Boot crash on v6.7-rc2



On 11/24/23 04:24, Joey Gouly wrote:
> Hi all,
>
> I just hit a boot crash on v6.7-rc2 (arm64, FVP model):

[..]

> Checking `struct neighbour`:
>
> struct neighbour {
> struct neighbour __rcu *next;
> struct neigh_table *tbl;
> .. fields ..
> u8 primary_key[0];
> } __randomize_layout;
>
> Due to the `__randomize_layout`, `primary_key` field is being placed before `tbl` (actually it's the same address since it's a 0 length array). That means the memcpy() corrupts the tbl pointer.
>
> I think I just got unlucky with my CONFIG_RANDSTRUCT seed (I can provide it if needed), it doesn't look as if it's a new issue.

It seems the issue is caused by this change that was recently added to -rc2:

commit 1ee60356c2dc ("gcc-plugins: randstruct: Only warn about true flexible arrays")

Previously, one-element and zero-length arrays were treated as true flexible arrays
(however, they are "fake" flex arrays), and __randomize_layout would leave them
untouched at the end of the struct; the same for proper C99 flex-array members. But
after the commit above, that's no longer the case: Only C99 flex-array members will
behave correctly (remaining untouched at end of the struct), and the other two types
of arrays will be randomized.

>
> I couldn't reproduce directly on v6.6 (the offsets for `tbl` and `primary_key` didn't overlap).
> However I tried changing the zero-length-array to a flexible one:
>
> + DECLARE_FLEX_ARRAY(u8, primary_key);
> + u8 primary_key[0];
>
> Then the field offsets ended up overlapping, and I also got the same crash on v6.6.

The right approach is to transform the zero-length array into a C99 flex-array member,
like this:

diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 07022bb0d44d..0d28172193fa 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -162,7 +162,7 @@ struct neighbour {
struct rcu_head rcu;
struct net_device *dev;
netdevice_tracker dev_tracker;
- u8 primary_key[0];
+ u8 primary_key[];
} __randomize_layout;

struct neigh_ops {

--
Gustavo

2023-11-24 15:54:10

by Gustavo A. R. Silva

[permalink] [raw]
Subject: Re: [BUG] Boot crash on v6.7-rc2



On 11/24/23 09:28, Gustavo A. R. Silva wrote:
>
>
> On 11/24/23 04:24, Joey Gouly wrote:
>> Hi all,
>>
>> I just hit a boot crash on v6.7-rc2 (arm64, FVP model):
>
> [..]
>
>> Checking `struct neighbour`:
>>
>>     struct neighbour {
>>         struct neighbour __rcu    *next;
>>         struct neigh_table    *tbl;
>>     .. fields ..
>>         u8            primary_key[0];
>>     } __randomize_layout;
>>
>> Due to the `__randomize_layout`, `primary_key` field is being placed before `tbl` (actually it's the same address since it's a 0 length array). That means the
>> memcpy() corrupts the tbl pointer.
>>
>> I think I just got unlucky with my CONFIG_RANDSTRUCT seed (I can provide it if needed), it doesn't look as if it's a new issue.
>
> It seems the issue is caused by this change that was recently added to -rc2:
>
> commit 1ee60356c2dc ("gcc-plugins: randstruct: Only warn about true flexible arrays")
>
> Previously, one-element and zero-length arrays were treated as true flexible arrays
> (however, they are "fake" flex arrays), and __randomize_layout would leave them
> untouched at the end of the struct; the same for proper C99 flex-array members. But
> after the commit above, that's no longer the case: Only C99 flex-array members will
> behave correctly (remaining untouched at end of the struct), and the other two types
> of arrays will be randomized.

Kees,

I think we should complement the changes in commit 1ee60356c2dc with the following
update:

diff --git a/scripts/gcc-plugins/randomize_layout_plugin.c b/scripts/gcc-plugins/randomize_layout_plugin.c
index 910bd21d08f4..746ff2d272f2 100644
--- a/scripts/gcc-plugins/randomize_layout_plugin.c
+++ b/scripts/gcc-plugins/randomize_layout_plugin.c
@@ -339,8 +339,7 @@ static int relayout_struct(tree type)

/*
* enforce that we don't randomize the layout of the last
- * element of a struct if it's a 0 or 1-length array
- * or a proper flexible array
+ * element of a struct if it's a proper flexible array
*/
if (is_flexible_array(newtree[num_fields - 1])) {
has_flexarray = true;

--
Gustavo

>
>>
>> I couldn't reproduce directly on v6.6 (the offsets for `tbl` and `primary_key` didn't overlap).
>> However I tried changing the zero-length-array to a flexible one:
>>
>>     +    DECLARE_FLEX_ARRAY(u8, primary_key);
>>     +    u8        primary_key[0];
>>
>> Then the field offsets ended up overlapping, and I also got the same crash on v6.6.
>
> The right approach is to transform the zero-length array into a C99 flex-array member,
> like this:
>
> diff --git a/include/net/neighbour.h b/include/net/neighbour.h
> index 07022bb0d44d..0d28172193fa 100644
> --- a/include/net/neighbour.h
> +++ b/include/net/neighbour.h
> @@ -162,7 +162,7 @@ struct neighbour {
>         struct rcu_head         rcu;
>         struct net_device       *dev;
>         netdevice_tracker       dev_tracker;
> -       u8                      primary_key[0];
> +       u8                      primary_key[];
>  } __randomize_layout;
>
>  struct neigh_ops {
>
> --
> Gustavo

2023-11-25 17:54:55

by Gustavo A. R. Silva

[permalink] [raw]
Subject: Re: [BUG] Boot crash on v6.7-rc2



On 11/24/23 09:28, Gustavo A. R. Silva wrote:
>
>
> On 11/24/23 04:24, Joey Gouly wrote:
>> Hi all,
>>
>> I just hit a boot crash on v6.7-rc2 (arm64, FVP model):
>
> [..]
>
>> Checking `struct neighbour`:
>>
>>     struct neighbour {
>>         struct neighbour __rcu    *next;
>>         struct neigh_table    *tbl;
>>     .. fields ..
>>         u8            primary_key[0];
>>     } __randomize_layout;
>>
>> Due to the `__randomize_layout`, `primary_key` field is being placed before `tbl` (actually it's the same address since it's a 0 length array). That means the
>> memcpy() corrupts the tbl pointer.
>>
>> I think I just got unlucky with my CONFIG_RANDSTRUCT seed (I can provide it if needed), it doesn't look as if it's a new issue.
>
> It seems the issue is caused by this change that was recently added to -rc2:
>
> commit 1ee60356c2dc ("gcc-plugins: randstruct: Only warn about true flexible arrays")
>
> Previously, one-element and zero-length arrays were treated as true flexible arrays
> (however, they are "fake" flex arrays), and __randomize_layout would leave them
> untouched at the end of the struct; the same for proper C99 flex-array members. But
> after the commit above, that's no longer the case: Only C99 flex-array members will
> behave correctly (remaining untouched at end of the struct), and the other two types
> of arrays will be randomized.

mmh... it seems that commit 1ee60356c2dc only prevents one-element arrays from being
treated as flex arrays, while the code should still keep zero-length arrays untouched:

if (typesize == NULL_TREE && TYPE_DOMAIN(fieldtype) != NULL_TREE &&
TYPE_MAX_VALUE(TYPE_DOMAIN(fieldtype)) == NULL_TREE)
return true;

- if (typesize != NULL_TREE &&
- (TREE_CONSTANT(typesize) && (!tree_to_uhwi(typesize) ||
- tree_to_uhwi(typesize) == tree_to_uhwi(elemsize))))
- return true;
-

Sorry about the confusion.

>
>>
>> I couldn't reproduce directly on v6.6 (the offsets for `tbl` and `primary_key` didn't overlap).
>> However I tried changing the zero-length-array to a flexible one:
>>
>>     +    DECLARE_FLEX_ARRAY(u8, primary_key);
>>     +    u8        primary_key[0];
>>
>> Then the field offsets ended up overlapping, and I also got the same crash on v6.6.
>
> The right approach is to transform the zero-length array into a C99 flex-array member,
> like this:
>
> diff --git a/include/net/neighbour.h b/include/net/neighbour.h
> index 07022bb0d44d..0d28172193fa 100644
> --- a/include/net/neighbour.h
> +++ b/include/net/neighbour.h
> @@ -162,7 +162,7 @@ struct neighbour {
>         struct rcu_head         rcu;
>         struct net_device       *dev;
>         netdevice_tracker       dev_tracker;
> -       u8                      primary_key[0];
> +       u8                      primary_key[];
>  } __randomize_layout;
>
>  struct neigh_ops {

In any case, I think we still should convert [0] to [ ].

--
Gustavo

2023-11-25 18:32:24

by Kees Cook

[permalink] [raw]
Subject: Re: [BUG] Boot crash on v6.7-rc2



On November 25, 2023 9:54:28 AM PST, "Gustavo A. R. Silva" <[email protected]> wrote:
>
>
>On 11/24/23 09:28, Gustavo A. R. Silva wrote:
>>
>>
>> On 11/24/23 04:24, Joey Gouly wrote:
>>> Hi all,
>>>
>>> I just hit a boot crash on v6.7-rc2 (arm64, FVP model):
>>
>> [..]
>>
>>> Checking `struct neighbour`:
>>>
>>>     struct neighbour {
>>>         struct neighbour __rcu    *next;
>>>         struct neigh_table    *tbl;
>>>     .. fields ..
>>>         u8            primary_key[0];
>>>     } __randomize_layout;
>>>
>>> Due to the `__randomize_layout`, `primary_key` field is being placed before `tbl` (actually it's the same address since it's a 0 length array). That means the memcpy() corrupts the tbl pointer.
>>>
>>> I think I just got unlucky with my CONFIG_RANDSTRUCT seed (I can provide it if needed), it doesn't look as if it's a new issue.
>>
>> It seems the issue is caused by this change that was recently added to -rc2:
>>
>> commit 1ee60356c2dc ("gcc-plugins: randstruct: Only warn about true flexible arrays")
>>
>> Previously, one-element and zero-length arrays were treated as true flexible arrays
>> (however, they are "fake" flex arrays), and __randomize_layout would leave them
>> untouched at the end of the struct; the same for proper C99 flex-array members. But
>> after the commit above, that's no longer the case: Only C99 flex-array members will
>> behave correctly (remaining untouched at end of the struct), and the other two types
>> of arrays will be randomized.
>
>mmh... it seems that commit 1ee60356c2dc only prevents one-element arrays from being
>treated as flex arrays, while the code should still keep zero-length arrays untouched:
>
> if (typesize == NULL_TREE && TYPE_DOMAIN(fieldtype) != NULL_TREE &&
> TYPE_MAX_VALUE(TYPE_DOMAIN(fieldtype)) == NULL_TREE)
> return true;
>
>- if (typesize != NULL_TREE &&
>- (TREE_CONSTANT(typesize) && (!tree_to_uhwi(typesize) ||
>- tree_to_uhwi(typesize) == tree_to_uhwi(elemsize))))
>- return true;
>-

This should be both the 0 and 1 checks. I think the original fix is correct: switch to a true flex array.

>
>Sorry about the confusion.
>
>>
>>>
>>> I couldn't reproduce directly on v6.6 (the offsets for `tbl` and `primary_key` didn't overlap).
>>> However I tried changing the zero-length-array to a flexible one:
>>>
>>>     +    DECLARE_FLEX_ARRAY(u8, primary_key);
>>>     +    u8        primary_key[0];

Was this line supposed to be "-"?

>>>
>>> Then the field offsets ended up overlapping, and I also got the same crash on v6.6.
>>
>> The right approach is to transform the zero-length array into a C99 flex-array member,
>> like this:
>>
>> diff --git a/include/net/neighbour.h b/include/net/neighbour.h
>> index 07022bb0d44d..0d28172193fa 100644
>> --- a/include/net/neighbour.h
>> +++ b/include/net/neighbour.h
>> @@ -162,7 +162,7 @@ struct neighbour {
>>         struct rcu_head         rcu;
>>         struct net_device       *dev;
>>         netdevice_tracker       dev_tracker;
>> -       u8                      primary_key[0];
>> +       u8                      primary_key[];
>>  } __randomize_layout;
>>
>>  struct neigh_ops {
>
>In any case, I think we still should convert [0] to [ ].

I would expect the above to fix the problem. If it doesn't I'll need to take a closer look at the plugin...

-Kees

--
Kees Cook

2023-11-25 18:41:42

by Gustavo A. R. Silva

[permalink] [raw]
Subject: Re: [BUG] Boot crash on v6.7-rc2



On 11/25/23 12:31, Kees Cook wrote:
>
>
> On November 25, 2023 9:54:28 AM PST, "Gustavo A. R. Silva" <[email protected]> wrote:
>>
>>
>> On 11/24/23 09:28, Gustavo A. R. Silva wrote:
>>>
>>>
>>> On 11/24/23 04:24, Joey Gouly wrote:
>>>> Hi all,
>>>>
>>>> I just hit a boot crash on v6.7-rc2 (arm64, FVP model):
>>>
>>> [..]
>>>
>>>> Checking `struct neighbour`:
>>>>
>>>>     struct neighbour {
>>>>         struct neighbour __rcu    *next;
>>>>         struct neigh_table    *tbl;
>>>>     .. fields ..
>>>>         u8            primary_key[0];
>>>>     } __randomize_layout;
>>>>
>>>> Due to the `__randomize_layout`, `primary_key` field is being placed before `tbl` (actually it's the same address since it's a 0 length array). That means the memcpy() corrupts the tbl pointer.
>>>>
>>>> I think I just got unlucky with my CONFIG_RANDSTRUCT seed (I can provide it if needed), it doesn't look as if it's a new issue.
>>>
>>> It seems the issue is caused by this change that was recently added to -rc2:
>>>
>>> commit 1ee60356c2dc ("gcc-plugins: randstruct: Only warn about true flexible arrays")
>>>
>>> Previously, one-element and zero-length arrays were treated as true flexible arrays
>>> (however, they are "fake" flex arrays), and __randomize_layout would leave them
>>> untouched at the end of the struct; the same for proper C99 flex-array members. But
>>> after the commit above, that's no longer the case: Only C99 flex-array members will
>>> behave correctly (remaining untouched at end of the struct), and the other two types
>>> of arrays will be randomized.
>>
>> mmh... it seems that commit 1ee60356c2dc only prevents one-element arrays from being
>> treated as flex arrays, while the code should still keep zero-length arrays untouched:
>>
>> if (typesize == NULL_TREE && TYPE_DOMAIN(fieldtype) != NULL_TREE &&
>> TYPE_MAX_VALUE(TYPE_DOMAIN(fieldtype)) == NULL_TREE)
>> return true;
>>
>> - if (typesize != NULL_TREE &&
>> - (TREE_CONSTANT(typesize) && (!tree_to_uhwi(typesize) ||
>> - tree_to_uhwi(typesize) == tree_to_uhwi(elemsize))))
>> - return true;
>> -
>
> This should be both the 0 and 1 checks. I think the original fix is correct: switch to a true flex array.

This code is new to me and I got a bit confused. Thanks for the clarification. :)

So, it'd be nice to apply this change:

https://lore.kernel.org/linux-hardening/[email protected]/

>
>>
>> Sorry about the confusion.
>>
>>>
>>>>
>>>> I couldn't reproduce directly on v6.6 (the offsets for `tbl` and `primary_key` didn't overlap).
>>>> However I tried changing the zero-length-array to a flexible one:
>>>>
>>>>     +    DECLARE_FLEX_ARRAY(u8, primary_key);
>>>>     +    u8        primary_key[0];
>
> Was this line supposed to be "-"?
>
>>>>
>>>> Then the field offsets ended up overlapping, and I also got the same crash on v6.6.
>>>
>>> The right approach is to transform the zero-length array into a C99 flex-array member,
>>> like this:
>>>
>>> diff --git a/include/net/neighbour.h b/include/net/neighbour.h
>>> index 07022bb0d44d..0d28172193fa 100644
>>> --- a/include/net/neighbour.h
>>> +++ b/include/net/neighbour.h
>>> @@ -162,7 +162,7 @@ struct neighbour {
>>>         struct rcu_head         rcu;
>>>         struct net_device       *dev;
>>>         netdevice_tracker       dev_tracker;
>>> -       u8                      primary_key[0];
>>> +       u8                      primary_key[];
>>>  } __randomize_layout;
>>>
>>>  struct neigh_ops {
>>
>> In any case, I think we still should convert [0] to [ ].
>
> I would expect the above to fix the problem. If it doesn't I'll need to take a closer look at the plugin...

I think this should fix the issue. Let me go create a proper patch for this.
I'll send it out, shortly.

--
Gustavo

2023-11-26 01:14:29

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: [BUG] Boot crash on v6.7-rc2

On Fri, Nov 24, 2023 at 10:24:58AM +0000, Joey Gouly wrote:
> Hi all,
>
> I just hit a boot crash on v6.7-rc2 (arm64, FVP model):
>
> [ 1.418845] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000517
> [ 1.418855] Mem abort info:
> [ 1.418860] ESR = 0x0000000096000004
> [ 1.418867] EC = 0x25: DABT (current EL), IL = 32 bits
> [ 1.418876] SET = 0, FnV = 0
> [ 1.418882] EA = 0, S1PTW = 0
> [ 1.418889] FSC = 0x04: level 0 translation fault
> [ 1.418897] Data abort info:
> [ 1.418902] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> [ 1.418910] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [ 1.418919] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [ 1.418928] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000881835000
> [ 1.418938] [0000000000000517] pgd=0000000000000000, p4d=0000000000000000
> [ 1.418952] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> [ 1.418961] Modules linked in:
> [ 1.418969] CPU: 0 PID: 8 Comm: kworker/0:0 Tainted: G T 6.7.0-rc2-dirty #4191 40d10cdc812c74fd5dc5d91e2452ff6f1e5f4b4a
> [ 1.418984] Hardware name: FVP Base RevC (DT)
> [ 1.418992] Workqueue: mld mld_ifc_work
> [ 1.419003] pstate: 101402005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> [ 1.419016] pc : ___neigh_create+0x790/0x9c8
> [ 1.419028] lr : ___neigh_create+0x270/0x9c8
> [ 1.419041] sp : ffff8000800c3a20
> [ 1.419048] x29: ffff8000800c3a20 x28: ffffd7c64c921078 x27: ffff00080188bd50
> [ 1.419066] x26: ffff00080183a30c x25: ffff00080188bda0 x24: ffff00080183a300
> [ 1.419084] x23: 0000000000000000 x22: 0000000000000010 x21: ffff00080188bcc0
> [ 1.419102] x20: 0000000000000000 x19: ffff0008003ef000 x18: 0000000000000014
> [ 1.419119] x17: 00000000cf0f2572 x16: 0000000080faa78d x15: 00000000b79921ac
> [ 1.419137] x14: ffff00087ff332c0 x13: 1600000000000000 x12: 00000000000002ff
> [ 1.419155] x11: 000000007c2c4dbd x10: 0000000000000003 x9 : 0000000000000000
> [ 1.419172] x8 : ffff00080188bd80 x7 : 00000000be3df655 x6 : 00000000f1691d6f
> [ 1.419190] x5 : 000000007c2c4dbd x4 : 0000000000000000 x3 : 000000008eb8ab5b
> [ 1.419207] x2 : 000000000000050f x1 : 000000000000001d x0 : 00000000000002ff
> [ 1.419225] Call trace:
> [ 1.419230] ___neigh_create+0x790/0x9c8
> [ 1.419243] __neigh_create+0x18/0x20
> [ 1.419255] ip6_finish_output2+0x5f8/0x8c4
> [ 1.419267] ip6_finish_output+0x1f0/0x258
> [ 1.419279] ip6_output+0x70/0x1cc
> [ 1.419291] NF_HOOK.constprop.0+0x4c/0xd8
> [ 1.419302] mld_sendpack+0x1b4/0x394
> [ 1.419313] mld_ifc_work+0x1d4/0x4b4
>
> I tracked it down to the following line in net/core/neighbour.c ___neigh_create:
> memcpy(n->primary_key, pkey, key_len);
>
> I did this by surrounding the memcpy with BUG():
> BUG_ON(n->tbl != tbl);
> memcpy(n->primary_key, pkey, key_len);
> BUG_ON(n->tbl != tbl);
>
> And it was crashing on the second one.
>
> Checking `struct neighbour`:
>
> struct neighbour {
> struct neighbour __rcu *next;
> struct neigh_table *tbl;
> .. fields ..
> u8 primary_key[0];
> } __randomize_layout;
>
> Due to the `__randomize_layout`, `primary_key` field is being placed before `tbl` (actually it's the same address since it's a 0 length array). That means the memcpy() corrupts the tbl pointer.
>
> I think I just got unlucky with my CONFIG_RANDSTRUCT seed (I can provide it if needed), it doesn't look as if it's a new issue.
>
> I couldn't reproduce directly on v6.6 (the offsets for `tbl` and `primary_key` didn't overlap).
> However I tried changing the zero-length-array to a flexible one:
>
> + DECLARE_FLEX_ARRAY(u8, primary_key);
> + u8 primary_key[0];
>
> Then the field offsets ended up overlapping, and I also got the same crash on v6.6.
>

Thanks for the well-handled regression report. I'm adding it to regzbot
for tracking:

#regzbot ^introduced: 1ee60356c2dca9
#regzbot title: Boot crash caused by true flexible array warning
#regzbot fix: neighbour: Fix __randomize_layout crash in struct neighbour

--
An old man doll... just what I always wanted! - Clara


Attachments:
(No filename) (4.15 kB)
signature.asc (235.00 B)
Download all attachments