2024-03-13 13:48:25

by Shiming Cheng

[permalink] [raw]
Subject: [PATCH net] udp: fix segmentation crash for untrusted source packet

Kernel exception is reported when making udp frag list segmentation.
Backtrace is as below:
at out/android15-6.6/kernel-6.6/kernel-6.6/net/ipv4/udp_offload.c:229
at out/android15-6.6/kernel-6.6/kernel-6.6/net/ipv4/udp_offload.c:262
features=features@entry=19, is_ipv6=false)
at out/android15-6.6/kernel-6.6/kernel-6.6/net/ipv4/udp_offload.c:289
features=19)
at out/android15-6.6/kernel-6.6/kernel-6.6/net/ipv4/udp_offload.c:399
features=19)
at out/android15-6.6/kernel-6.6/kernel-6.6/net/ipv4/af_inet.c:1418
skb@entry=0x0, features=19, features@entry=0)
at out/android15-6.6/kernel-6.6/kernel-6.6/net/core/gso.c:53
tx_path=<optimized out>)
at out/android15-6.6/kernel-6.6/kernel-6.6/net/core/gso.c:124

This packet's frag list is null while gso_type is not 0. Then it is treated
as a GRO-ed packet and sent to segment frag list. Function call path is
udp_rcv_segment => config features value
__udpv4_gso_segment => skb_gso_ok returns false. Here it should be
true. Failed reason is features doesn't match
gso_type.
__udp_gso_segment_list
skb_segment_list => packet is linear with skb->next = NULL
__udpv4_gso_segment_list_csum => use skb->next directly and
crash happens

In rx-gro-list GRO-ed packet is set gso type as
NETIF_F_GSO_UDP_L4 | NETIF_F_GSO_FRAGLIST in napi_gro_complete. In gso
flow the features should also set them to match with gso_type. Or else it
will always return false in skb_gso_ok. Then it can't discover the
untrusted source packet and result crash in following function.

Fixes: f2696099c6c6 ("udp: Avoid post-GRO UDP checksum recalculation")
Signed-off-by: Shiming Cheng <[email protected]>
Signed-off-by: Lena Wang <[email protected]>
---
include/net/udp.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/udp.h b/include/net/udp.h
index 488a6d2babcc..c87baa23b9da 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -464,7 +464,7 @@ void udpv6_encap_enable(void);
static inline struct sk_buff *udp_rcv_segment(struct sock *sk,
struct sk_buff *skb, bool ipv4)
{
- netdev_features_t features = NETIF_F_SG;
+ netdev_features_t features = NETIF_F_SG | NETIF_F_GSO_UDP_L4 | NETIF_F_GSO_FRAGLIST;
struct sk_buff *segs;

/* Avoid csum recalculation by skb_segment unless userspace explicitly
--
2.18.0



2024-03-13 15:41:43

by Paolo Abeni

[permalink] [raw]
Subject: Re: [PATCH net] udp: fix segmentation crash for untrusted source packet

On Wed, 2024-03-13 at 21:34 +0800, Shiming Cheng wrote:
> Kernel exception is reported when making udp frag list segmentation.
> Backtrace is as below:
> at out/android15-6.6/kernel-6.6/kernel-6.6/net/ipv4/udp_offload.c:229
> at out/android15-6.6/kernel-6.6/kernel-6.6/net/ipv4/udp_offload.c:262
> features=features@entry=19, is_ipv6=false)
> at out/android15-6.6/kernel-6.6/kernel-6.6/net/ipv4/udp_offload.c:289
> features=19)
> at out/android15-6.6/kernel-6.6/kernel-6.6/net/ipv4/udp_offload.c:399
> features=19)
> at out/android15-6.6/kernel-6.6/kernel-6.6/net/ipv4/af_inet.c:1418
> skb@entry=0x0, features=19, features@entry=0)
> at out/android15-6.6/kernel-6.6/kernel-6.6/net/core/gso.c:53
> tx_path=<optimized out>)
> at out/android15-6.6/kernel-6.6/kernel-6.6/net/core/gso.c:124

A full backtrace would help better understanding the issue.

> This packet's frag list is null while gso_type is not 0. Then it is treated
> as a GRO-ed packet and sent to segment frag list. Function call path is
> udp_rcv_segment => config features value
> __udpv4_gso_segment => skb_gso_ok returns false. Here it should be
> true. 

Why? If I read correctly the above, this is GSO packet landing in an
UDP socket with no UDP_GRO sockopt. The packet is expected to be
segmented again.

> Failed reason is features doesn't
match
> gso_type.
> __udp_gso_segment_list
> skb_segment_list => packet is linear with skb->next = NULL
> __udpv4_gso_segment_list_csum => use skb->next directly and
> crash happens
>
> In rx-gro-list GRO-ed packet is set gso type as
> NETIF_F_GSO_UDP_L4 | NETIF_F_GSO_FRAGLIST in napi_gro_complete. In gso
> flow the features should also set them to match with gso_type. Or else it
> will always return false in skb_gso_ok. Then it can't discover the
> untrusted source packet and result crash in following function.

What is the 'untrusted source' here? I read the above as the packet
aggregation happened in the GRO engine???

Could you please give a complete description of the relevant scenario?

> Fixes: f2696099c6c6 ("udp: Avoid post-GRO UDP checksum recalculation")
> Signed-off-by: Shiming Cheng <[email protected]>
> Signed-off-by: Lena Wang <[email protected]>
> ---
> include/net/udp.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/net/udp.h b/include/net/udp.h
> index 488a6d2babcc..c87baa23b9da 100644
> --- a/include/net/udp.h
> +++ b/include/net/udp.h
> @@ -464,7 +464,7 @@ void udpv6_encap_enable(void);
> static inline struct sk_buff *udp_rcv_segment(struct sock *sk,
> struct sk_buff *skb, bool ipv4)
> {
> - netdev_features_t features = NETIF_F_SG;
> + netdev_features_t features = NETIF_F_SG | NETIF_F_GSO_UDP_L4 | NETIF_F_GSO_FRAGLIST;

This looks wrong: real UDP_L4 GSO packets will not segmented anymore
and should be dropped (?!?)



2024-03-15 08:13:16

by Lena Wang (王娜)

[permalink] [raw]
Subject: Re: [PATCH net] udp: fix segmentation crash for untrusted source packet

On Wed, 2024-03-13 at 16:41 +0100, Paolo Abeni wrote:
>
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
> On Wed, 2024-03-13 at 21:34 +0800, Shiming Cheng wrote:
> > Kernel exception is reported when making udp frag list
> segmentation.
> > Backtrace is as below:
> > at out/android15-6.6/kernel-6.6/kernel-
> 6.6/net/ipv4/udp_offload.c:229
> > at out/android15-6.6/kernel-6.6/kernel-
> 6.6/net/ipv4/udp_offload.c:262
> > features=features@entry=19, is_ipv6=false)
> > at out/android15-6.6/kernel-6.6/kernel-
> 6.6/net/ipv4/udp_offload.c:289
> > features=19)
> > at out/android15-6.6/kernel-6.6/kernel-
> 6.6/net/ipv4/udp_offload.c:399
> > features=19)
> > at out/android15-6.6/kernel-6.6/kernel-
> 6.6/net/ipv4/af_inet.c:1418
> > skb@entry=0x0, features=19, features@entry=0)
> > at out/android15-6.6/kernel-6.6/kernel-6.6/net/core/gso.c:53
> > tx_path=<optimized out>)
> > at out/android15-6.6/kernel-6.6/kernel-6.6/net/core/gso.c:124
>
> A full backtrace would help better understanding the issue.

Below is full backtrace:
[ 1100.812205][ C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted:
G W OE 6.6.17-android15-0-g380371ea9bf1 #1
[ 1100.812211][ C3] Hardware name: MT6991(ENG) (DT)
[ 1100.812215][ C3] Call trace:
[ 1100.812218][ C3] dump_backtrace+0xec/0x138
[ 1100.812222][ C3] show_stack+0x18/0x24
[ 1100.812226][ C3] dump_stack_lvl+0x50/0x6c
[ 1100.812232][ C3] dump_stack+0x18/0x24
[ 1100.812237][ C3] mrdump_common_die+0x24c/0x388 [mrdump]
[ 1100.812259][ C3] ipanic_die+0x20/0x34 [mrdump]
[ 1100.812269][ C3] notifier_call_chain+0x90/0x174
[ 1100.812275][ C3] notify_die+0x50/0x8c
[ 1100.812279][ C3] die+0x94/0x308
[ 1100.812283][ C3] __do_kernel_fault+0x240/0x26c
[ 1100.812288][ C3] do_page_fault+0xa0/0x48c
[ 1100.812293][ C3] do_translation_fault+0x38/0x54
[ 1100.812297][ C3] do_mem_abort+0x58/0x104
[ 1100.812302][ C3] el1_abort+0x3c/0x5c
[ 1100.812307][ C3] el1h_64_sync_handler+0x54/0x90
[ 1100.812313][ C3] el1h_64_sync+0x68/0x6c
[ 1100.812317][ C3] __udp_gso_segment+0x298/0x4d4
[ 1100.812322][ C3] udp4_ufo_fragment+0x130/0x174
[ 1100.812326][ C3] inet_gso_segment+0x164/0x330
[ 1100.812330][ C3] skb_mac_gso_segment+0xc4/0x13c
[ 1100.812335][ C3] __skb_gso_segment+0xc4/0x120
[ 1100.812339][ C3] udp_rcv_segment+0x50/0x134
[ 1100.812344][ C3] udp_queue_rcv_skb+0x74/0x114
[ 1100.812348][ C3] udp_unicast_rcv_skb+0x94/0xac
[ 1100.812353][ C3] __udp4_lib_rcv+0x3e0/0x818
[ 1100.812358][ C3] udp_rcv+0x20/0x30
[ 1100.812362][ C3] ip_protocol_deliver_rcu+0x194/0x368
[ 1100.812368][ C3] ip_local_deliver+0xe4/0x184
[ 1100.812373][ C3] ip_rcv+0x90/0x118
[ 1100.812378][ C3] __netif_receive_skb+0x74/0x124
[ 1100.812383][ C3] process_backlog+0xd8/0x18c
[ 1100.812388][ C3] __napi_poll+0x5c/0x1fc
[ 1100.812392][ C3] net_rx_action+0x150/0x334
[ 1100.812397][ C3] __do_softirq+0x120/0x3f4
[ 1100.812401][ C3] ____do_softirq+0x10/0x20
[ 1100.812405][ C3] call_on_irq_stack+0x3c/0x74
[ 1100.812410][ C3] do_softirq_own_stack+0x1c/0x2c
[ 1100.812414][ C3] __irq_exit_rcu+0x5c/0xd4
[ 1100.812418][ C3] irq_exit_rcu+0x10/0x1c
[ 1100.812422][ C3] el1_interrupt+0x38/0x58
[ 1100.812428][ C3] el1h_64_irq_handler+0x18/0x24
[ 1100.812434][ C3] el1h_64_irq+0x68/0x6c
[ 1100.812437][ C3] arch_local_irq_enable+0x4/0x8
[ 1100.812443][ C3] cpuidle_enter+0x38/0x54
[ 1100.812449][ C3] do_idle+0x198/0x294
[ 1100.812454][ C3] cpu_startup_entry+0x34/0x3c
[ 1100.812459][ C3] secondary_start_kernel+0x138/0x158
[ 1100.812465][ C3] __secondary_switched+0xc0/0xc4

> > This packet's frag list is null while gso_type is not 0. Then it is
> treated
> > as a GRO-ed packet and sent to segment frag list. Function call
> path is
> > udp_rcv_segment => config features value
> > __udpv4_gso_segment => skb_gso_ok returns false. Here it
> should be
> > true.
>
> Why? If I read correctly the above, this is GSO packet landing in an
> UDP socket with no UDP_GRO sockopt. The packet is expected to be
> segmented again.
>
Yes, it is GSO packet, however the fragment list of this GSO packet
becomes NULL. As the occurrence rate is very low, we really don’t know
why and when it becomes to be NULL. It happens both in cellular and
wlan network and seems an unknown kernel issue.

To avoid crash the packet should skip to be segmented when fraglist is
null.

> >Failed reason is features doesn't
> match
> > gso_type.
> > __udp_gso_segment_list
> > skb_segment_list => packet is linear with skb->next =
> NULL
> > __udpv4_gso_segment_list_csum => use skb->next directly
> and
> > crash happens
> >
> > In rx-gro-list GRO-ed packet is set gso type as
> > NETIF_F_GSO_UDP_L4 | NETIF_F_GSO_FRAGLIST in napi_gro_complete. In
> gso
> > flow the features should also set them to match with gso_type. Or
> else it
> > will always return false in skb_gso_ok. Then it can't discover the
> > untrusted source packet and result crash in following function.
>
> What is the 'untrusted source' here? I read the above as the packet
> aggregation happened in the GRO engine???
>
> Could you please give a complete description of the relevant
> scenario?
>

According to the backtrace info, we infer it is a rx-frag_list GRO
packet. Before sending into the UDP socket with no UDP_GRO sockopt, it
seems enter "skb_condense" to trim it and loose his frag list. However
it still keeps gso_type and gso_size. Then it continues to do
skb_segment_list.

First crash happens in skb_segment_list.
This patch resolves the crash and lets the packet becomes a skb without
skb->next:
https://lore.kernel.org/all/Y9gt5EUizK1UImEP@debian/
Then crash moves to __udp_gso_sement_list -> skb_segment_list(finish)
-> __udpv4_gso_segment_list_csum, it uses skb->next without check then
crash.


What we want to do is to drop this abnormal packet. So we set features
NETIF_F_GSO_UDP_L4 |NETIF_F_GSO_FRAGLIST to match fixes: f2696099c6c6
condation then drop it.

Another solution is we can check the frag_list directly in
skb_segment_lsit:
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 011d690..5b054f0 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4336,6 +4336,8 @@
int len_diff, err;

skb_push(skb, -skb_network_offset(skb) + offset);
+ if (!list_skb)
+ goto err_linearize;

/* Ensure the head is writeable before touching the shared info */
err = skb_unclone(skb, GFP_ATOMIC);

> > Fixes: f2696099c6c6 ("udp: Avoid post-GRO UDP checksum
> recalculation")
> > Signed-off-by: Shiming Cheng <[email protected]>
> > Signed-off-by: Lena Wang <[email protected]>
> > ---
> > include/net/udp.h | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/include/net/udp.h b/include/net/udp.h
> > index 488a6d2babcc..c87baa23b9da 100644
> > --- a/include/net/udp.h
> > +++ b/include/net/udp.h
> > @@ -464,7 +464,7 @@ void udpv6_encap_enable(void);
> > static inline struct sk_buff *udp_rcv_segment(struct sock *sk,
> > struct sk_buff *skb, bool ipv4)
> > {
> > -netdev_features_t features = NETIF_F_SG;
> > +netdev_features_t features = NETIF_F_SG | NETIF_F_GSO_UDP_L4 |
> NETIF_F_GSO_FRAGLIST;
>
> This looks wrong: real UDP_L4 GSO packets will not segmented anymore
> and should be dropped (?!?)
>
>

2024-03-16 13:47:31

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH net] udp: fix segmentation crash for untrusted source packet

Lena Wang (王娜) wrote:
> On Wed, 2024-03-13 at 16:41 +0100, Paolo Abeni wrote:
> >
> > External email : Please do not click links or open attachments until
> > you have verified the sender or the content.
> > On Wed, 2024-03-13 at 21:34 +0800, Shiming Cheng wrote:
> > > Kernel exception is reported when making udp frag list
> > segmentation.
> > > Backtrace is as below:
> > > at out/android15-6.6/kernel-6.6/kernel-
> > 6.6/net/ipv4/udp_offload.c:229
> > > at out/android15-6.6/kernel-6.6/kernel-
> > 6.6/net/ipv4/udp_offload.c:262
> > > features=features@entry=19, is_ipv6=false)
> > > at out/android15-6.6/kernel-6.6/kernel-
> > 6.6/net/ipv4/udp_offload.c:289
> > > features=19)
> > > at out/android15-6.6/kernel-6.6/kernel-
> > 6.6/net/ipv4/udp_offload.c:399
> > > features=19)
> > > at out/android15-6.6/kernel-6.6/kernel-
> > 6.6/net/ipv4/af_inet.c:1418
> > > skb@entry=0x0, features=19, features@entry=0)
> > > at out/android15-6.6/kernel-6.6/kernel-6.6/net/core/gso.c:53
> > > tx_path=<optimized out>)
> > > at out/android15-6.6/kernel-6.6/kernel-6.6/net/core/gso.c:124
> >
> > A full backtrace would help better understanding the issue.
>
> Below is full backtrace:
> [ 1100.812205][ C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted:
> G W OE 6.6.17-android15-0-g380371ea9bf1 #1
> [ 1100.812211][ C3] Hardware name: MT6991(ENG) (DT)
> [ 1100.812215][ C3] Call trace:
> [ 1100.812218][ C3] dump_backtrace+0xec/0x138
> [ 1100.812222][ C3] show_stack+0x18/0x24
> [ 1100.812226][ C3] dump_stack_lvl+0x50/0x6c
> [ 1100.812232][ C3] dump_stack+0x18/0x24
> [ 1100.812237][ C3] mrdump_common_die+0x24c/0x388 [mrdump]
> [ 1100.812259][ C3] ipanic_die+0x20/0x34 [mrdump]
> [ 1100.812269][ C3] notifier_call_chain+0x90/0x174
> [ 1100.812275][ C3] notify_die+0x50/0x8c
> [ 1100.812279][ C3] die+0x94/0x308
> [ 1100.812283][ C3] __do_kernel_fault+0x240/0x26c
> [ 1100.812288][ C3] do_page_fault+0xa0/0x48c
> [ 1100.812293][ C3] do_translation_fault+0x38/0x54
> [ 1100.812297][ C3] do_mem_abort+0x58/0x104
> [ 1100.812302][ C3] el1_abort+0x3c/0x5c
> [ 1100.812307][ C3] el1h_64_sync_handler+0x54/0x90
> [ 1100.812313][ C3] el1h_64_sync+0x68/0x6c
> [ 1100.812317][ C3] __udp_gso_segment+0x298/0x4d4
> [ 1100.812322][ C3] udp4_ufo_fragment+0x130/0x174
> [ 1100.812326][ C3] inet_gso_segment+0x164/0x330
> [ 1100.812330][ C3] skb_mac_gso_segment+0xc4/0x13c
> [ 1100.812335][ C3] __skb_gso_segment+0xc4/0x120
> [ 1100.812339][ C3] udp_rcv_segment+0x50/0x134
> [ 1100.812344][ C3] udp_queue_rcv_skb+0x74/0x114
> [ 1100.812348][ C3] udp_unicast_rcv_skb+0x94/0xac
> [ 1100.812353][ C3] __udp4_lib_rcv+0x3e0/0x818
> [ 1100.812358][ C3] udp_rcv+0x20/0x30
> [ 1100.812362][ C3] ip_protocol_deliver_rcu+0x194/0x368
> [ 1100.812368][ C3] ip_local_deliver+0xe4/0x184
> [ 1100.812373][ C3] ip_rcv+0x90/0x118
> [ 1100.812378][ C3] __netif_receive_skb+0x74/0x124
> [ 1100.812383][ C3] process_backlog+0xd8/0x18c
> [ 1100.812388][ C3] __napi_poll+0x5c/0x1fc
> [ 1100.812392][ C3] net_rx_action+0x150/0x334
> [ 1100.812397][ C3] __do_softirq+0x120/0x3f4
> [ 1100.812401][ C3] ____do_softirq+0x10/0x20
> [ 1100.812405][ C3] call_on_irq_stack+0x3c/0x74
> [ 1100.812410][ C3] do_softirq_own_stack+0x1c/0x2c
> [ 1100.812414][ C3] __irq_exit_rcu+0x5c/0xd4
> [ 1100.812418][ C3] irq_exit_rcu+0x10/0x1c
> [ 1100.812422][ C3] el1_interrupt+0x38/0x58
> [ 1100.812428][ C3] el1h_64_irq_handler+0x18/0x24
> [ 1100.812434][ C3] el1h_64_irq+0x68/0x6c
> [ 1100.812437][ C3] arch_local_irq_enable+0x4/0x8
> [ 1100.812443][ C3] cpuidle_enter+0x38/0x54
> [ 1100.812449][ C3] do_idle+0x198/0x294
> [ 1100.812454][ C3] cpu_startup_entry+0x34/0x3c
> [ 1100.812459][ C3] secondary_start_kernel+0x138/0x158
> [ 1100.812465][ C3] __secondary_switched+0xc0/0xc4
>
> > > This packet's frag list is null while gso_type is not 0. Then it is
> > treated
> > > as a GRO-ed packet and sent to segment frag list. Function call
> > path is
> > > udp_rcv_segment => config features value
> > > __udpv4_gso_segment => skb_gso_ok returns false. Here it
> > should be
> > > true.
> >
> > Why? If I read correctly the above, this is GSO packet landing in an
> > UDP socket with no UDP_GRO sockopt. The packet is expected to be
> > segmented again.
> >
> Yes, it is GSO packet, however the fragment list of this GSO packet
> becomes NULL. As the occurrence rate is very low, we really don’t know
> why and when it becomes to be NULL. It happens both in cellular and
> wlan network and seems an unknown kernel issue.
>
> To avoid crash the packet should skip to be segmented when fraglist is
> null.
>
> > >Failed reason is features doesn't
> > match
> > > gso_type.
> > > __udp_gso_segment_list
> > > skb_segment_list => packet is linear with skb->next =
> > NULL
> > > __udpv4_gso_segment_list_csum => use skb->next directly
> > and
> > > crash happens
> > >
> > > In rx-gro-list GRO-ed packet is set gso type as
> > > NETIF_F_GSO_UDP_L4 | NETIF_F_GSO_FRAGLIST in napi_gro_complete. In
> > gso
> > > flow the features should also set them to match with gso_type. Or
> > else it
> > > will always return false in skb_gso_ok. Then it can't discover the
> > > untrusted source packet and result crash in following function.
> >
> > What is the 'untrusted source' here? I read the above as the packet
> > aggregation happened in the GRO engine???
> >
> > Could you please give a complete description of the relevant
> > scenario?
> >
>
> According to the backtrace info, we infer it is a rx-frag_list GRO

It would be helpful to see an skb_dump. But if this happens rarely in
production, understood if that is not feasible.

The packet arrives on process_backlog, so still not sure how it is
produced.

> packet. Before sending into the UDP socket with no UDP_GRO sockopt, it
> seems enter "skb_condense" to trim it and loose his frag list. However
> it still keeps gso_type and gso_size. Then it continues to do
> skb_segment_list.
>
> First crash happens in skb_segment_list.
> This patch resolves the crash and lets the packet becomes a skb without
> skb->next:
> https://lore.kernel.org/all/Y9gt5EUizK1UImEP@debian/
> Then crash moves to __udp_gso_sement_list -> skb_segment_list(finish)
> -> __udpv4_gso_segment_list_csum, it uses skb->next without check then
> crash.
>
>
> What we want to do is to drop this abnormal packet.

I think we want to deliver this packet if possible.

Thanks for the added context. So this is assumed to be a GSO skb with
SKB_GSO_FRAGLIST that somewhere lots its fraglist? That is the bug
if true.

You are suggesting that this happens in the skb_condense in
__udp_enqueue_schedule_skb?

If generated by GRO then on a device that has NETIF_F_GRO_FRAGLIST set.
So one workaround (not fix) is to disable that.

> So we set features
> NETIF_F_GSO_UDP_L4 |NETIF_F_GSO_FRAGLIST to match fixes: f2696099c6c6
> condation then drop it.

2024-03-26 13:18:42

by Lena Wang (王娜)

[permalink] [raw]
Subject: Re: [PATCH net] udp: fix segmentation crash for untrusted source packet

On Sat, 2024-03-16 at 09:47 -0400, Willem de Bruijn wrote:
>
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
> Lena Wang (王娜) wrote:
> > On Wed, 2024-03-13 at 16:41 +0100, Paolo Abeni wrote:
> > >
> > > External email : Please do not click links or open attachments
> until
> > > you have verified the sender or the content.
> > > On Wed, 2024-03-13 at 21:34 +0800, Shiming Cheng wrote:
> > > > Kernel exception is reported when making udp frag list
> > > segmentation.
> > > > Backtrace is as below:
> > > > at out/android15-6.6/kernel-6.6/kernel-
> > > 6.6/net/ipv4/udp_offload.c:229
> > > > at out/android15-6.6/kernel-6.6/kernel-
> > > 6.6/net/ipv4/udp_offload.c:262
> > > > features=features@entry=19, is_ipv6=false)
> > > > at out/android15-6.6/kernel-6.6/kernel-
> > > 6.6/net/ipv4/udp_offload.c:289
> > > > features=19)
> > > > at out/android15-6.6/kernel-6.6/kernel-
> > > 6.6/net/ipv4/udp_offload.c:399
> > > > features=19)
> > > > at out/android15-6.6/kernel-6.6/kernel-
> > > 6.6/net/ipv4/af_inet.c:1418
> > > > skb@entry=0x0, features=19, features@entry=0)
> > > > at out/android15-6.6/kernel-6.6/kernel-
> 6.6/net/core/gso.c:53
> > > > tx_path=<optimized out>)
> > > > at out/android15-6.6/kernel-6.6/kernel-
> 6.6/net/core/gso.c:124
> > >
> > > A full backtrace would help better understanding the issue.
> >
> > Below is full backtrace:
> > [ 1100.812205][ C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted:
> > G W OE 6.6.17-android15-0-g380371ea9bf1 #1
> > [ 1100.812211][ C3] Hardware name: MT6991(ENG) (DT)
> > [ 1100.812215][ C3] Call trace:
> > [ 1100.812218][ C3] dump_backtrace+0xec/0x138
> > [ 1100.812222][ C3] show_stack+0x18/0x24
> > [ 1100.812226][ C3] dump_stack_lvl+0x50/0x6c
> > [ 1100.812232][ C3] dump_stack+0x18/0x24
> > [ 1100.812237][ C3] mrdump_common_die+0x24c/0x388 [mrdump]
> > [ 1100.812259][ C3] ipanic_die+0x20/0x34 [mrdump]
> > [ 1100.812269][ C3] notifier_call_chain+0x90/0x174
> > [ 1100.812275][ C3] notify_die+0x50/0x8c
> > [ 1100.812279][ C3] die+0x94/0x308
> > [ 1100.812283][ C3] __do_kernel_fault+0x240/0x26c
> > [ 1100.812288][ C3] do_page_fault+0xa0/0x48c
> > [ 1100.812293][ C3] do_translation_fault+0x38/0x54
> > [ 1100.812297][ C3] do_mem_abort+0x58/0x104
> > [ 1100.812302][ C3] el1_abort+0x3c/0x5c
> > [ 1100.812307][ C3] el1h_64_sync_handler+0x54/0x90
> > [ 1100.812313][ C3] el1h_64_sync+0x68/0x6c
> > [ 1100.812317][ C3] __udp_gso_segment+0x298/0x4d4
> > [ 1100.812322][ C3] udp4_ufo_fragment+0x130/0x174
> > [ 1100.812326][ C3] inet_gso_segment+0x164/0x330
> > [ 1100.812330][ C3] skb_mac_gso_segment+0xc4/0x13c
> > [ 1100.812335][ C3] __skb_gso_segment+0xc4/0x120
> > [ 1100.812339][ C3] udp_rcv_segment+0x50/0x134
> > [ 1100.812344][ C3] udp_queue_rcv_skb+0x74/0x114
> > [ 1100.812348][ C3] udp_unicast_rcv_skb+0x94/0xac
> > [ 1100.812353][ C3] __udp4_lib_rcv+0x3e0/0x818
> > [ 1100.812358][ C3] udp_rcv+0x20/0x30
> > [ 1100.812362][ C3] ip_protocol_deliver_rcu+0x194/0x368
> > [ 1100.812368][ C3] ip_local_deliver+0xe4/0x184
> > [ 1100.812373][ C3] ip_rcv+0x90/0x118
> > [ 1100.812378][ C3] __netif_receive_skb+0x74/0x124
> > [ 1100.812383][ C3] process_backlog+0xd8/0x18c
> > [ 1100.812388][ C3] __napi_poll+0x5c/0x1fc
> > [ 1100.812392][ C3] net_rx_action+0x150/0x334
> > [ 1100.812397][ C3] __do_softirq+0x120/0x3f4
> > [ 1100.812401][ C3] ____do_softirq+0x10/0x20
> > [ 1100.812405][ C3] call_on_irq_stack+0x3c/0x74
> > [ 1100.812410][ C3] do_softirq_own_stack+0x1c/0x2c
> > [ 1100.812414][ C3] __irq_exit_rcu+0x5c/0xd4
> > [ 1100.812418][ C3] irq_exit_rcu+0x10/0x1c
> > [ 1100.812422][ C3] el1_interrupt+0x38/0x58
> > [ 1100.812428][ C3] el1h_64_irq_handler+0x18/0x24
> > [ 1100.812434][ C3] el1h_64_irq+0x68/0x6c
> > [ 1100.812437][ C3] arch_local_irq_enable+0x4/0x8
> > [ 1100.812443][ C3] cpuidle_enter+0x38/0x54
> > [ 1100.812449][ C3] do_idle+0x198/0x294
> > [ 1100.812454][ C3] cpu_startup_entry+0x34/0x3c
> > [ 1100.812459][ C3] secondary_start_kernel+0x138/0x158
> > [ 1100.812465][ C3] __secondary_switched+0xc0/0xc4
> >
> > > > This packet's frag list is null while gso_type is not 0. Then
> it is
> > > treated
> > > > as a GRO-ed packet and sent to segment frag list. Function call
> > > path is
> > > > udp_rcv_segment => config features value
> > > > __udpv4_gso_segment => skb_gso_ok returns false. Here it
> > > should be
> > > > true.
> > >
> > > Why? If I read correctly the above, this is GSO packet landing in
> an
> > > UDP socket with no UDP_GRO sockopt. The packet is expected to be
> > > segmented again.
> > >
> > Yes, it is GSO packet, however the fragment list of this GSO packet
> > becomes NULL. As the occurrence rate is very low, we really don’t
> know
> > why and when it becomes to be NULL. It happens both in cellular and
> > wlan network and seems an unknown kernel issue.
> >
> > To avoid crash the packet should skip to be segmented when fraglist
> is
> > null.
> >
> > > >Failed reason is features doesn't
> > > match
> > > > gso_type.
> > > > __udp_gso_segment_list
> > > > skb_segment_list => packet is linear with skb->next
> =
> > > NULL
> > > > __udpv4_gso_segment_list_csum => use skb->next
> directly
> > > and
> > > > crash happens
> > > >
> > > > In rx-gro-list GRO-ed packet is set gso type as
> > > > NETIF_F_GSO_UDP_L4 | NETIF_F_GSO_FRAGLIST in napi_gro_complete.
> In
> > > gso
> > > > flow the features should also set them to match with gso_type.
> Or
> > > else it
> > > > will always return false in skb_gso_ok. Then it can't discover
> the
> > > > untrusted source packet and result crash in following function.
> > >
> > > What is the 'untrusted source' here? I read the above as the
> packet
> > > aggregation happened in the GRO engine???
> > >
> > > Could you please give a complete description of the relevant
> > > scenario?
> > >
> >
> > According to the backtrace info, we infer it is a rx-frag_list GRO
>
> It would be helpful to see an skb_dump. But if this happens rarely in
> production, understood if that is not feasible.
>
> The packet arrives on process_backlog, so still not sure how it is
> produced.
>
Yes, it rarely happens. It is very hard to debug it and not sure its
produced path.

> > packet. Before sending into the UDP socket with no UDP_GRO sockopt,
> it
> > seems enter "skb_condense" to trim it and loose his frag list.
> However
> > it still keeps gso_type and gso_size. Then it continues to do
> > skb_segment_list.
> >
> > First crash happens in skb_segment_list.
> > This patch resolves the crash and lets the packet becomes a skb
> without
> > skb->next:
> > https://lore.kernel.org/all/Y9gt5EUizK1UImEP@debian/
> > Then crash moves to __udp_gso_sement_list ->
> skb_segment_list(finish)
> > -> __udpv4_gso_segment_list_csum, it uses skb->next without check
> then
> > crash.
> >
> >
> > What we want to do is to drop this abnormal packet.
>
> I think we want to deliver this packet if possible.
>
> Thanks for the added context. So this is assumed to be a GSO skb with
> SKB_GSO_FRAGLIST that somewhere lots its fraglist? That is the bug
> if true.
>
> You are suggesting that this happens in the skb_condense in
> __udp_enqueue_schedule_skb?
>
We try to add a skb_condense directly before skb_segment_list then get
a similar KE backtrace and skb dump value same with this issue skb
dump.
However we don't know which condition trigger the flow runinto
skb_condense.

> If generated by GRO then on a device that has NETIF_F_GRO_FRAGLIST
> set.
> So one workaround (not fix) is to disable that.
>
As we met other issue previously with GRO in
skb_segment(disalbe NETIF_F_GRO_FRAGLIST flow), it is still not safe to
disable GRO fraglist.

We hope current patch could be applied to drop the invalid packet.

> > So we set features
> > NETIF_F_GSO_UDP_L4 |NETIF_F_GSO_FRAGLIST to match fixes:
> f2696099c6c6
> > condation then drop it.