2023-10-27 11:29:12

by Lena Wang (王娜)

[permalink] [raw]
Subject: [PATCH net-next] net:prevent shared skb corruption on rx-gro-list segmentation

From 1d4aea846dd08d824e1097a933ed8618a6da03da Mon Sep 17 00:00:00 2001
From: lena wang <[email protected]>
Date: Fri, 27 Oct 2023 17:52:50 +0800
Subject: [PATCH net-next] net:prevent shared skb corruption on rx-gro-
list segmentation

shared skb triggers corruptions on rx-gro-list segmentation.

[42438.749474] [ T1425] Unable to handle kernel NULL pointer
dereference at virtual address 00000000000000d8
[42438.749530] [ T1425] Mem abort info:
[42438.749547] [ T1425] ESR = 0x0000000096000006
[42438.749566] [ T1425] EC = 0x25: DABT (current EL), IL = 32 bits
[42438.749588] [ T1425] SET = 0, FnV = 0
[42438.749606] [ T1425] EA = 0, S1PTW = 0
[42438.749623] [ T1425] FSC = 0x06: level 2 translation fault
[42438.749642] [ T1425] Data abort info:
[42438.749656] [ T1425] ISV = 0, ISS = 0x00000006
[42438.749674] [ T1425] CM = 0, WnR = 0
[42438.749691] [ T1425] user pgtable: 4k pages, 39-bit VAs,
pgdp=00000001f96b9000
[42438.749714] [ T1425] [00000000000000d8] pgd=08000001f96ba003,
p4d=08000001f96ba003, pud=08000001f96ba003, pmd=0000000000000000
[42438.749769] [ T1425] Internal error: Oops: 0000000096000006 [#1]
PREEMPT SMP
[42438.749796] [ T1425] cpufreq stop DVFS log done
[42438.797144] [ T1425] Kernel Offset: 0x29eb000000 from
0xffffffc008000000
[42438.797185] [ T1425] PHYS_OFFSET: 0x40000000
[42438.797203] [ T1425] pstate: 80400005 (Nzcv daif +PAN -UAO)
[42438.797219] [ T1425] pc : [0xffffffe9f3e38c34]
__udp_gso_segment+0x24c/0x48c
[42438.797248] [ T1425] lr : [0xffffffe9f3e38bf4]
__udp_gso_segment+0x20c/0x48c
[42438.797269] [ T1425] sp : ffffffc017acb5c0
[42438.797280] [ T1425] x29: ffffffc017acb5d0 x28: 000000000000c85f
......
[42438.799172] [ T1425] Call trace:
[42438.799184] [ T1425] dump_backtrace+0xf4/0x118
[42438.799210] [ T1425] show_stack+0x18/0x24
[42438.799227] [ T1425] dump_stack_lvl+0x60/0x7c
[42438.799250] [ T1425] dump_stack+0x18/0x3c
[42438.799270] [ T1425] mrdump_common_die+0x24c/0x398 [mrdump]
[42438.799354] [ T1425] ipanic_die+0x20/0x34 [mrdump]
[42438.799425] [ T1425] notify_die+0x80/0xd8
[42438.799447] [ T1425] die+0x94/0x2b8
[42438.799463] [ T1425] __do_kernel_fault+0x264/0x298
[42438.799485] [ T1425] do_page_fault+0x98/0x4a0
[42438.799503] [ T1425] do_translation_fault+0x38/0x54
[42438.799521] [ T1425] do_mem_abort+0x58/0x118
[42438.799540] [ T1425] el1_abort+0x3c/0x5c
[42438.799555] [ T1425] el1h_64_sync_handler+0x54/0x90
[42438.799575] [ T1425] el1h_64_sync+0x68/0x6c
[42438.799591] [ T1425] __udp_gso_segment+0x24c/0x48c
[42438.799611] [ T1425] udp4_ufo_fragment+0x118/0x15c
[42438.799628] [ T1425] inet_gso_segment+0x164/0x338
[42438.799644] [ T1425] skb_mac_gso_segment+0xc4/0x13c
[42438.799666] [ T1425] __skb_gso_segment+0xc4/0x124
[42438.799686] [ T1425] validate_xmit_skb+0x9c/0x2e0
[42438.799703] [ T1425] validate_xmit_skb_list+0x4c/0x80
[42438.799722] [ T1425] sch_direct_xmit+0x70/0x3d0
[42438.799739] [ T1425] __dev_queue_xmit+0x5f0/0xd40
[42438.799757] [ T1425] ip_finish_output2+0x3f8/0x460
[42438.799774] [ T1425] __ip_finish_output+0x194/0x240
[42438.799790] [ T1425] ip_finish_output+0x20/0xf4
[42438.799810] [ T1425] ip_output+0x100/0x1a0
[42438.799830] [ T1425] NF_HOOK+0xac/0x154
[42438.799848] [ T1425] ip_forward+0x308/0x320
[42438.799868] [ T1425] ip_sublist_rcv+0x1f0/0x25c
[42438.799887] [ T1425] ip_list_rcv+0x138/0x174
[42438.799905] [ T1425] __netif_receive_skb_list_core+0x1e8/0x28c
[42438.799923] [ T1425] netif_receive_skb_list_internal+0x1ec/0x2b4
[42438.799942] [ T1425] netif_receive_skb_list+0x2c/0x144
[42438.799960] [ T1425] ccmni_queue_state_callback+0x98/0x304 [ccmni]
[42438.800009] [ T1425] port_net_queue_state_notify+0x178/0x1c0
[ccci_md_all]
[42438.800257] [ T1425] ccci_port_queue_status_notify+0xf0/0x2a8
[ccci_md_all]
[42438.800496] [ T1425] dpmaif_rxq_push_thread+0xb0/0x1d4
[ccci_dpmaif]
[42438.800602] [ T1425] kthread+0x104/0x1d4
[42438.800621] [ T1425] ret_from_fork+0x10/0x20

In some scenarios the GRO-ed skb shared with multi users. This
segmentation touches the shared heads which sets frag_list to null.
After linearization the skb->next is null which results the corruption.

So for shared skb, it needs to clone first than unclone with header and
data separated for different devices.

Signed-off-by: lena wang <[email protected]>
---
net/core/skbuff.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index b157efea5dea..adeb3ad9697b 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4351,6 +4351,12 @@ struct sk_buff *skb_segment_list(struct sk_buff
*skb,

skb_push(skb, -skb_network_offset(skb) + offset);

+ if (skb_shared(skb)) {
+ skb = skb_share_check(skb, GFP_ATOMIC);
+ if (!skb)
+ goto err_linearize;
+ }
+
/* Ensure the head is writeable before touching the shared info
*/
err = skb_unclone(skb, GFP_ATOMIC);
if (err)
--
2.18.0


2023-10-27 12:41:38

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH net-next] net:prevent shared skb corruption on rx-gro-list segmentation

On Fri, Oct 27, 2023 at 1:28 PM Lena Wang (王娜) <[email protected]> wrote:
>
> From 1d4aea846dd08d824e1097a933ed8618a6da03da Mon Sep 17 00:00:00 2001
> From: lena wang <[email protected]>
> Date: Fri, 27 Oct 2023 17:52:50 +0800
> Subject: [PATCH net-next] net:prevent shared skb corruption on rx-gro-
> list segmentation
>
> shared skb triggers corruptions on rx-gro-list segmentation.
>
> [42438.749474] [ T1425] Unable to handle kernel NULL pointer
> dereference at virtual address 00000000000000d8
> [42438.749530] [ T1425] Mem abort info:
> [42438.749547] [ T1425] ESR = 0x0000000096000006
> [42438.749566] [ T1425] EC = 0x25: DABT (current EL), IL = 32 bits
> [42438.749588] [ T1425] SET = 0, FnV = 0
> [42438.749606] [ T1425] EA = 0, S1PTW = 0
> [42438.749623] [ T1425] FSC = 0x06: level 2 translation fault
> [42438.749642] [ T1425] Data abort info:
> [42438.749656] [ T1425] ISV = 0, ISS = 0x00000006
> [42438.749674] [ T1425] CM = 0, WnR = 0
> [42438.749691] [ T1425] user pgtable: 4k pages, 39-bit VAs,
> pgdp=00000001f96b9000
> [42438.749714] [ T1425] [00000000000000d8] pgd=08000001f96ba003,
> p4d=08000001f96ba003, pud=08000001f96ba003, pmd=0000000000000000
> [42438.749769] [ T1425] Internal error: Oops: 0000000096000006 [#1]
> PREEMPT SMP
> [42438.749796] [ T1425] cpufreq stop DVFS log done
> [42438.797144] [ T1425] Kernel Offset: 0x29eb000000 from
> 0xffffffc008000000
> [42438.797185] [ T1425] PHYS_OFFSET: 0x40000000
> [42438.797203] [ T1425] pstate: 80400005 (Nzcv daif +PAN -UAO)
> [42438.797219] [ T1425] pc : [0xffffffe9f3e38c34]
> __udp_gso_segment+0x24c/0x48c
> [42438.797248] [ T1425] lr : [0xffffffe9f3e38bf4]
> __udp_gso_segment+0x20c/0x48c
> [42438.797269] [ T1425] sp : ffffffc017acb5c0
> [42438.797280] [ T1425] x29: ffffffc017acb5d0 x28: 000000000000c85f
> ......
> [42438.799172] [ T1425] Call trace:
> [42438.799184] [ T1425] dump_backtrace+0xf4/0x118
> [42438.799210] [ T1425] show_stack+0x18/0x24
> [42438.799227] [ T1425] dump_stack_lvl+0x60/0x7c
> [42438.799250] [ T1425] dump_stack+0x18/0x3c
> [42438.799270] [ T1425] mrdump_common_die+0x24c/0x398 [mrdump]
> [42438.799354] [ T1425] ipanic_die+0x20/0x34 [mrdump]
> [42438.799425] [ T1425] notify_die+0x80/0xd8
> [42438.799447] [ T1425] die+0x94/0x2b8
> [42438.799463] [ T1425] __do_kernel_fault+0x264/0x298
> [42438.799485] [ T1425] do_page_fault+0x98/0x4a0
> [42438.799503] [ T1425] do_translation_fault+0x38/0x54
> [42438.799521] [ T1425] do_mem_abort+0x58/0x118
> [42438.799540] [ T1425] el1_abort+0x3c/0x5c
> [42438.799555] [ T1425] el1h_64_sync_handler+0x54/0x90
> [42438.799575] [ T1425] el1h_64_sync+0x68/0x6c
> [42438.799591] [ T1425] __udp_gso_segment+0x24c/0x48c
> [42438.799611] [ T1425] udp4_ufo_fragment+0x118/0x15c
> [42438.799628] [ T1425] inet_gso_segment+0x164/0x338
> [42438.799644] [ T1425] skb_mac_gso_segment+0xc4/0x13c
> [42438.799666] [ T1425] __skb_gso_segment+0xc4/0x124
> [42438.799686] [ T1425] validate_xmit_skb+0x9c/0x2e0
> [42438.799703] [ T1425] validate_xmit_skb_list+0x4c/0x80
> [42438.799722] [ T1425] sch_direct_xmit+0x70/0x3d0
> [42438.799739] [ T1425] __dev_queue_xmit+0x5f0/0xd40
> [42438.799757] [ T1425] ip_finish_output2+0x3f8/0x460
> [42438.799774] [ T1425] __ip_finish_output+0x194/0x240
> [42438.799790] [ T1425] ip_finish_output+0x20/0xf4
> [42438.799810] [ T1425] ip_output+0x100/0x1a0
> [42438.799830] [ T1425] NF_HOOK+0xac/0x154
> [42438.799848] [ T1425] ip_forward+0x308/0x320
> [42438.799868] [ T1425] ip_sublist_rcv+0x1f0/0x25c
> [42438.799887] [ T1425] ip_list_rcv+0x138/0x174
> [42438.799905] [ T1425] __netif_receive_skb_list_core+0x1e8/0x28c
> [42438.799923] [ T1425] netif_receive_skb_list_internal+0x1ec/0x2b4
> [42438.799942] [ T1425] netif_receive_skb_list+0x2c/0x144
> [42438.799960] [ T1425] ccmni_queue_state_callback+0x98/0x304 [ccmni]
> [42438.800009] [ T1425] port_net_queue_state_notify+0x178/0x1c0
> [ccci_md_all]
> [42438.800257] [ T1425] ccci_port_queue_status_notify+0xf0/0x2a8
> [ccci_md_all]
> [42438.800496] [ T1425] dpmaif_rxq_push_thread+0xb0/0x1d4
> [ccci_dpmaif]
> [42438.800602] [ T1425] kthread+0x104/0x1d4
> [42438.800621] [ T1425] ret_from_fork+0x10/0x20
>
> In some scenarios the GRO-ed skb shared with multi users. This
> segmentation touches the shared heads which sets frag_list to null.
> After linearization the skb->next is null which results the corruption.

What scenarios would do that ?

Packets sent to ndo_start_xmit() are not supposed to be shared. (with
pktgen exception, which is well understood)

A driver wants to be able to overwrite parts of skb (skb->next and
many other fields), this is not specific to GSO

A real fix (if needed) would probably be elsewhere.

>
> So for shared skb, it needs to clone first than unclone with header and
> data separated for different devices.
>
> Signed-off-by: lena wang <[email protected]>

Why targeting net-next ?


> ---
> net/core/skbuff.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index b157efea5dea..adeb3ad9697b 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -4351,6 +4351,12 @@ struct sk_buff *skb_segment_list(struct sk_buff
> *skb,
>
> skb_push(skb, -skb_network_offset(skb) + offset);
>
> + if (skb_shared(skb)) {
> + skb = skb_share_check(skb, GFP_ATOMIC);
> + if (!skb)
> + goto err_linearize;
> + }
> +
> /* Ensure the head is writeable before touching the shared info
> */
> err = skb_unclone(skb, GFP_ATOMIC);
> if (err)
> --
> 2.18.0

2023-10-30 09:57:09

by Lena Wang (王娜)

[permalink] [raw]
Subject: Re: [PATCH net-next] net:prevent shared skb corruption on rx-gro-list segmentation

On Fri, 2023-10-27 at 14:39 +0200, Eric Dumazet wrote:
> > In some scenarios the GRO-ed skb shared with multi users. This
> > segmentation touches the shared heads which sets frag_list to null.
> > After linearization the skb->next is null which results the
> corruption.
>
> What scenarios would do that ?
>
> Packets sent to ndo_start_xmit() are not supposed to be shared. (with
> pktgen exception, which is well understood)
>
> A driver wants to be able to overwrite parts of skb (skb->next and
> many other fields), this is not specific to GSO
>
> A real fix (if needed) would probably be elsewhere.
>
This crashed skb's useer count=2 is added in the last line of
skb_segment_list: skb_get(skb). As it is crashed before it runs to
consume_skb, it is wrong about my shared_skb guess.


> >
> > So for shared skb, it needs to clone first than unclone with header
> and
> > data separated for different devices.
> >
> > Signed-off-by: lena wang <[email protected]>
>
> Why targeting net-next ?
>
I am not familiar with tag used. sorry.

This issue is related with patch: [PATCH] net: fix NULL pointer in
skb_segment_list
https://lore.kernel.org/all/Y9gt5EUizK1UImEP@debian/

ZhaiYan told me in mail the problem she encountered is releated to BPF
decapsulation and misconfiguration, where UDP packets from a GUE tunnel
triggered the crash:GRO merges the GUE packets as data and a BPF
program pulls off the GUE header at tc. Since in this case the inner IP
header's total length is not touched by GRO, after decapsulation the
head skb becomes inconsistent with the real total length of GRO chain.
ip_rcv_core then thinks this packet is too long and trim it to fit,
which inherently drops frag_list.
Furthermore there are several locations that may alter the packet
layout, like pskb_pull_tail. It can happen when skb->len mismatches
iphdr->tot_len, for example.


With zhaiyan's patch when in this scenario skb's frag_list is null, it
will be a skb with skb->next=null when skb_segment_list is finished.

Then crash moves into __udp_gso_sement_list -> skb_segment_list(finish)
-> __udpv4_gso_segment_list_csum, it uses skb->next without check then
crash.

Whether do we need to implement here to ignere it?


Thanks
Lena