From: Vasiliy Kovalev <[email protected]>
After unloading the module, an instance continues to exist that accesses
outdated memory addresses.
To prevent this, the dump_pdp_en flag has been added, which blocks the
dump of pdp contexts by a false value. And only after these checks can
the net_generic() function be called.
These errors were found using the syzkaller program:
Syzkaller hit 'general protection fault in gtp_genl_dump_pdp' bug.
gtp: GTP module loaded (pdp ctx size 104 bytes)
gtp: GTP module unloaded
general protection fault, probably for non-canonical address
0xdffffc0000000001:0000 [#1] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 PID: 2782 Comm: syz-executor139 Not tainted 5.10.200-std-def-alt1 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-alt1
RIP: 0010:gtp_genl_dump_pdp+0x1b1/0x790 [gtp]
..
Call Trace:
genl_lock_dumpit+0x6b/0xa0 net/netlink/genetlink.c:623
netlink_dump+0x575/0xc70 net/netlink/af_netlink.c:2271
__netlink_dump_start+0x64e/0x910 net/netlink/af_netlink.c:2376
genl_family_rcv_msg_dumpit+0x2b8/0x310 net/netlink/genetlink.c:686
genl_family_rcv_msg net/netlink/genetlink.c:780 [inline]
genl_rcv_msg+0x450/0x5a0 net/netlink/genetlink.c:800
netlink_rcv_skb+0x150/0x440 net/netlink/af_netlink.c:2497
genl_rcv+0x29/0x40 net/netlink/genetlink.c:811
netlink_unicast_kernel net/netlink/af_netlink.c:1322 [inline]
netlink_unicast+0x54e/0x800 net/netlink/af_netlink.c:1348
netlink_sendmsg+0x914/0xe00 net/netlink/af_netlink.c:1916
sock_sendmsg_nosec net/socket.c:651 [inline]
__sock_sendmsg+0x159/0x190 net/socket.c:663
____sys_sendmsg+0x712/0x870 net/socket.c:2376
___sys_sendmsg+0xf8/0x170 net/socket.c:2430
__sys_sendmsg+0xea/0x1b0 net/socket.c:2459
do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x62/0xc7
RIP: 0033:0x7f2ea16c2d49
Fixes: 94a6d9fb88df ("gtp: fix wrong condition in gtp_genl_dump_pdp()")
Signed-off-by: Vasiliy Kovalev <[email protected]>
---
drivers/net/gtp.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 477b4d4f860bd3..3fc4639711cd83 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -1675,6 +1675,8 @@ static int gtp_genl_get_pdp(struct sk_buff *skb, struct genl_info *info)
return err;
}
+static bool dump_pdp_en;
+
static int gtp_genl_dump_pdp(struct sk_buff *skb,
struct netlink_callback *cb)
{
@@ -1684,12 +1686,19 @@ static int gtp_genl_dump_pdp(struct sk_buff *skb,
struct pdp_ctx *pctx;
struct gtp_net *gn;
- gn = net_generic(net, gtp_net_id);
-
- if (cb->args[4])
+ /* Do not allow further operations if the module is
+ * unloaded before or after the process is blocked.
+ */
+ if (!dump_pdp_en)
return 0;
rcu_read_lock();
+ if (!dump_pdp_en || cb->args[4]) {
+ rcu_read_unlock();
+ return 0;
+ }
+ gn = net_generic(net, gtp_net_id);
+
list_for_each_entry_rcu(gtp, &gn->gtp_dev_list, list) {
if (last_gtp && last_gtp != gtp)
continue;
@@ -1914,6 +1923,8 @@ static int __init gtp_init(void)
if (err < 0)
goto unreg_genl_family;
+ dump_pdp_en = true;
+
pr_info("GTP module loaded (pdp ctx size %zd bytes)\n",
sizeof(struct pdp_ctx));
return 0;
@@ -1930,6 +1941,7 @@ late_initcall(gtp_init);
static void __exit gtp_fini(void)
{
+ dump_pdp_en = false;
genl_unregister_family(>p_genl_family);
rtnl_link_unregister(>p_link_ops);
unregister_pernet_subsys(>p_net_ops);
--
2.33.8
On Wed, Jan 24, 2024 at 11:14 AM <[email protected]> wrote:
>
> From: Vasiliy Kovalev <[email protected]>
>
> After unloading the module, an instance continues to exist that accesses
> outdated memory addresses.
>
> To prevent this, the dump_pdp_en flag has been added, which blocks the
> dump of pdp contexts by a false value. And only after these checks can
> the net_generic() function be called.
>
> These errors were found using the syzkaller program:
>
> Syzkaller hit 'general protection fault in gtp_genl_dump_pdp' bug.
> gtp: GTP module loaded (pdp ctx size 104 bytes)
> gtp: GTP module unloaded
> general protection fault, probably for non-canonical address
> 0xdffffc0000000001:0000 [#1] SMP KASAN NOPTI
> KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
> CPU: 0 PID: 2782 Comm: syz-executor139 Not tainted 5.10.200-std-def-alt1 #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-alt1
> RIP: 0010:gtp_genl_dump_pdp+0x1b1/0x790 [gtp]
> ...
> Call Trace:
> genl_lock_dumpit+0x6b/0xa0 net/netlink/genetlink.c:623
> netlink_dump+0x575/0xc70 net/netlink/af_netlink.c:2271
> __netlink_dump_start+0x64e/0x910 net/netlink/af_netlink.c:2376
> genl_family_rcv_msg_dumpit+0x2b8/0x310 net/netlink/genetlink.c:686
> genl_family_rcv_msg net/netlink/genetlink.c:780 [inline]
> genl_rcv_msg+0x450/0x5a0 net/netlink/genetlink.c:800
> netlink_rcv_skb+0x150/0x440 net/netlink/af_netlink.c:2497
> genl_rcv+0x29/0x40 net/netlink/genetlink.c:811
> netlink_unicast_kernel net/netlink/af_netlink.c:1322 [inline]
> netlink_unicast+0x54e/0x800 net/netlink/af_netlink.c:1348
> netlink_sendmsg+0x914/0xe00 net/netlink/af_netlink.c:1916
> sock_sendmsg_nosec net/socket.c:651 [inline]
> __sock_sendmsg+0x159/0x190 net/socket.c:663
> ____sys_sendmsg+0x712/0x870 net/socket.c:2376
> ___sys_sendmsg+0xf8/0x170 net/socket.c:2430
> __sys_sendmsg+0xea/0x1b0 net/socket.c:2459
> do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
> entry_SYSCALL_64_after_hwframe+0x62/0xc7
> RIP: 0033:0x7f2ea16c2d49
>
> Fixes: 94a6d9fb88df ("gtp: fix wrong condition in gtp_genl_dump_pdp()")
> Signed-off-by: Vasiliy Kovalev <[email protected]>
> ---
> drivers/net/gtp.c | 18 +++++++++++++++---
> 1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
> index 477b4d4f860bd3..3fc4639711cd83 100644
> --- a/drivers/net/gtp.c
> +++ b/drivers/net/gtp.c
> @@ -1675,6 +1675,8 @@ static int gtp_genl_get_pdp(struct sk_buff *skb, struct genl_info *info)
> return err;
> }
>
> +static bool dump_pdp_en;
> +
Hmm, it seems there is a missing try_module_get() somewhere...
__netlink_dump_start() does one, so perhaps we reach __netlink_dump_start()
with a NULL in control->module ?
On Wed, Jan 24, 2024 at 11:14 AM <[email protected]> wrote:
>
> From: Vasiliy Kovalev <[email protected]>
>
> After unloading the module, an instance continues to exist that accesses
> outdated memory addresses.
>
> To prevent this, the dump_pdp_en flag has been added, which blocks the
> dump of pdp contexts by a false value. And only after these checks can
> the net_generic() function be called.
>
> These errors were found using the syzkaller program:
>
> Syzkaller hit 'general protection fault in gtp_genl_dump_pdp' bug.
> gtp: GTP module loaded (pdp ctx size 104 bytes)
> gtp: GTP module unloaded
> general protection fault, probably for non-canonical address
> 0xdffffc0000000001:0000 [#1] SMP KASAN NOPTI
> KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
> CPU: 0 PID: 2782 Comm: syz-executor139 Not tainted 5.10.200-std-def-alt1 #1
Oh wait, this is a 5.10 kernel ?
Please generate a stack trace using a recent tree, it is possible the
bug has been fixed already.
24.01.2024 13:57, Eric Dumazet wrote:
> Oh wait, this is a 5.10 kernel ?
Yes, but the bug is reproduced on the latest stable kernels.
> Please generate a stack trace using a recent tree, it is possible the
> bug has been fixed already.
See [PATCH 0/1] above, there's a stack for the 6.6.13 kernel at the
bottom of the message.
[ 523.915255] Call Trace:
[ 523.915255] <TASK>
[ 523.915255] ? __die+0x1f/0x70
[ 523.915255] ? page_fault_oops+0x14d/0x4a0
[ 523.915255] ? exc_page_fault+0x7b/0x180
[ 523.915255] ? asm_exc_page_fault+0x22/0x30
[ 523.915255] ? gtp_genl_dump_pdp+0x82/0x190 [gtp]
[ 523.915255] ? gtp_genl_dump_pdp+0x82/0x190 [gtp]
[ 523.915255] genl_dumpit+0x2f/0x90
[ 523.915255] netlink_dump+0x126/0x320
[ 523.915255] __netlink_dump_start+0x1da/0x2a0
[ 523.915255] genl_family_rcv_msg_dumpit+0x93/0x100
[ 523.915255] ? __pfx_genl_start+0x10/0x10
[ 523.915255] ? __pfx_genl_dumpit+0x10/0x10
[ 523.915255] ? __pfx_genl_done+0x10/0x10
[ 523.915255] genl_rcv_msg+0x112/0x2a0
[ 523.915255] ? __pfx_gtp_genl_dump_pdp+0x10/0x10 [gtp]
[ 523.915255] ? __pfx_genl_rcv_msg+0x10/0x10
[ 523.915255] netlink_rcv_skb+0x54/0x110
[ 523.915255] genl_rcv+0x24/0x40
[ 523.915255] netlink_unicast+0x19f/0x290
[ 523.915255] netlink_sendmsg+0x250/0x4e0
[ 523.915255] ____sys_sendmsg+0x376/0x3b0
[ 523.915255] ? copy_msghdr_from_user+0x6d/0xb0
[ 523.915255] ___sys_sendmsg+0x86/0xe0
[ 523.915255] ? do_fault+0x296/0x470
[ 523.915255] ? __handle_mm_fault+0x771/0xda0
[ 523.915255] __sys_sendmsg+0x57/0xb0
[ 523.915255] do_syscall_64+0x59/0x90
[ 523.915255] ? ct_kernel_exit.isra.0+0x71/0x90
[ 523.915255] ? __ct_user_enter+0x5a/0xd0
[ 523.915255] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 523.915255] RIP: 0033:0x7f2bcb93cd49
--
Regards,
Vasiliy Kovalev
On Wed, Jan 24, 2024 at 12:20 PM <[email protected]> wrote:
>
> 24.01.2024 13:57, Eric Dumazet wrote:
> > Oh wait, this is a 5.10 kernel ?
> Yes, but the bug is reproduced on the latest stable kernels.
> > Please generate a stack trace using a recent tree, it is possible the
> > bug has been fixed already.
>
> See [PATCH 0/1] above, there's a stack for the 6.6.13 kernel at the
> bottom of the message.
Ah, ok. Not sure why you sent a cover letter for a single patch...
Setting a boolean, in a module that can disappear will not prevent the
module from disappearing.
This work around might work, or might not work, depending on timing,
preemptions, ....
Thanks.
Hi,
On Wed, Jan 24, 2024 at 02:20:04PM +0300, [email protected] wrote:
> 24.01.2024 13:57, Eric Dumazet wrote:
> > Oh wait, this is a 5.10 kernel ?
>
> Yes, but the bug is reproduced on the latest stable kernels.
>
> > Please generate a stack trace using a recent tree, it is possible the
> > bug has been fixed already.
__netlink_dump_start() is called at the beginning of the dump, which is
grabbing a reference on this module.
do you have a reproducer?
> See [PATCH 0/1] above, there's a stack for the 6.6.13 kernel at the bottom
> of the message.
>
> [ 523.915255] Call Trace:
> [ 523.915255] <TASK>
> [ 523.915255] ? __die+0x1f/0x70
> [ 523.915255] ? page_fault_oops+0x14d/0x4a0
> [ 523.915255] ? exc_page_fault+0x7b/0x180
> [ 523.915255] ? asm_exc_page_fault+0x22/0x30
> [ 523.915255] ? gtp_genl_dump_pdp+0x82/0x190 [gtp]
> [ 523.915255] ? gtp_genl_dump_pdp+0x82/0x190 [gtp]
> [ 523.915255] genl_dumpit+0x2f/0x90
> [ 523.915255] netlink_dump+0x126/0x320
> [ 523.915255] __netlink_dump_start+0x1da/0x2a0
> [ 523.915255] genl_family_rcv_msg_dumpit+0x93/0x100
> [ 523.915255] ? __pfx_genl_start+0x10/0x10
> [ 523.915255] ? __pfx_genl_dumpit+0x10/0x10
> [ 523.915255] ? __pfx_genl_done+0x10/0x10
> [ 523.915255] genl_rcv_msg+0x112/0x2a0
> [ 523.915255] ? __pfx_gtp_genl_dump_pdp+0x10/0x10 [gtp]
> [ 523.915255] ? __pfx_genl_rcv_msg+0x10/0x10
> [ 523.915255] netlink_rcv_skb+0x54/0x110
> [ 523.915255] genl_rcv+0x24/0x40
> [ 523.915255] netlink_unicast+0x19f/0x290
> [ 523.915255] netlink_sendmsg+0x250/0x4e0
> [ 523.915255] ____sys_sendmsg+0x376/0x3b0
> [ 523.915255] ? copy_msghdr_from_user+0x6d/0xb0
> [ 523.915255] ___sys_sendmsg+0x86/0xe0
> [ 523.915255] ? do_fault+0x296/0x470
> [ 523.915255] ? __handle_mm_fault+0x771/0xda0
> [ 523.915255] __sys_sendmsg+0x57/0xb0
> [ 523.915255] do_syscall_64+0x59/0x90
> [ 523.915255] ? ct_kernel_exit.isra.0+0x71/0x90
> [ 523.915255] ? __ct_user_enter+0x5a/0xd0
> [ 523.915255] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> [ 523.915255] RIP: 0033:0x7f2bcb93cd49
>
> --
> Regards,
> Vasiliy Kovalev
>
29.01.2024 15:02, Pablo Neira Ayuso wrote:
> __netlink_dump_start() is called at the beginning of the dump, which is
> grabbing a reference on this module.
>
> do you have a reproducer?
Hi, yes, in the cover letter [PATCH 0/1]
https://lore.kernel.org/all/[email protected]/
--
Regards,
Vasiliy Kovalev
Hi,
24.01.2024 14:52, Eric Dumazet wrote:
> On Wed, Jan 24, 2024 at 12:20 PM <[email protected]> wrote:
>> 24.01.2024 13:57, Eric Dumazet wrote:
>>> Oh wait, this is a 5.10 kernel ?
>> Yes, but the bug is reproduced on the latest stable kernels.
>>> Please generate a stack trace using a recent tree, it is possible the
>>> bug has been fixed already.
>> See [PATCH 0/1] above, there's a stack for the 6.6.13 kernel at the
>> bottom of the message.
> Ah, ok. Not sure why you sent a cover letter for a single patch...
>
> Setting a boolean, in a module that can disappear will not prevent the
> module from disappearing.
>
> This work around might work, or might not work, depending on timing,
> preemptions, ....
>
> Thanks.
I tested running the reproducer [1] on the 6.8-rc3 kernel, the crash
occurs in less than 10 seconds and the qemu VM restarts:
dmesg -w:
[ 106.941736] gtp: GTP module unloaded
[ 106.962548] gtp: GTP module loaded (pdp ctx size 104 bytes)
[ 107.014691] gtp: GTP module unloaded
[ 107.041554] gtp: GTP module loaded (pdp ctx size 104 bytes)
[ 107.082283] gtp: GTP module unloaded
[ 107.123268] general protection fault, probably for non-canonical
address 0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 107.124050] KASAN: null-ptr-deref in range
[0x0000000000000010-0x0000000000000017]
[ 107.124339] CPU: 1 PID: 5826 Comm: gtp Not tainted
6.8.0-rc3-std-def-alt1 #1
[ 107.124604] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
1.16.0-alt1 04/01/2014
[ 107.124916] RIP: 0010:gtp_genl_dump_pdp+0x1be/0x800 [gtp]
[ 107.125141] Code: c6 89 c6 e8 64 e9 86 df 58 45 85 f6 0f 85 4e 04 00
00 e8 c5 ee 86 df 48 8b 54 24 18 48 b8 00 00 00 00 00 fc ff df 48 c1 ea
03 <80> 3c 02 00 0f 85 de 05 00 00 48 8b 44 24 18 4c 8b 30 4c 39 f0 74
[ 107.125960] RSP: 0018:ffff888014107220 EFLAGS: 00010202
[ 107.126164] RAX: dffffc0000000000 RBX: 0000000000000000 RCX:
0000000000000000
[ 107.126434] RDX: 0000000000000002 RSI: 0000000000000000 RDI:
0000000000000000
[ 107.126707] RBP: 0000000000000000 R08: 0000000000000000 R09:
0000000000000000
[ 107.126976] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000000
[ 107.127245] R13: ffff88800fcda588 R14: 0000000000000001 R15:
0000000000000000
[ 107.127515] FS: 00007f1be4eb05c0(0000) GS:ffff88806ce80000(0000)
knlGS:0000000000000000
[ 107.127955] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 107.128177] CR2: 00007f1be4e766cf CR3: 000000000c33e000 CR4:
0000000000750ef0
[ 107.128450] PKRU: 55555554
[ 107.128577] Call Trace:
[ 107.128699] <TASK>
[ 107.128790] ? show_regs+0x90/0xa0
[ 107.128935] ? die_addr+0x50/0xd0
[ 107.129075] ? exc_general_protection+0x148/0x220
[ 107.129267] ? asm_exc_general_protection+0x22/0x30
[ 107.129469] ? gtp_genl_dump_pdp+0x1be/0x800 [gtp]
[ 107.129677] ? __alloc_skb+0x1dd/0x350
[ 107.129831] ? __pfx___alloc_skb+0x10/0x10
[ 107.129999] genl_dumpit+0x11d/0x230
[ 107.130150] netlink_dump+0x5b9/0xce0
[ 107.130301] ? lockdep_hardirqs_on_prepare+0x253/0x430
[ 107.130503] ? __pfx_netlink_dump+0x10/0x10
[ 107.130686] ? kasan_save_track+0x10/0x40
[ 107.130849] ? __kasan_kmalloc+0x9b/0xa0
[ 107.131009] ? genl_start+0x675/0x970
[ 107.131162] __netlink_dump_start+0x6fc/0x9f0
[ 107.131341] genl_family_rcv_msg_dumpit+0x1bb/0x2d0
[ 107.131538] ? __pfx_genl_family_rcv_msg_dumpit+0x10/0x10
[ 107.131754] ? genl_op_from_small+0x2a/0x440
[ 107.131972] ? cap_capable+0x1d0/0x240
[ 107.132127] ? __pfx_genl_start+0x10/0x10
[ 107.132292] ? __pfx_genl_dumpit+0x10/0x10
[ 107.132461] ? __pfx_genl_done+0x10/0x10
[ 107.132645] ? security_capable+0x9d/0xe0
With the proposed patch applied, such a crash is not observed during
long-term testing.
[1]
https://lore.kernel.org/lkml/[email protected]/T/#mf9b411baec52858b1c9118c671f26a6dc424e7b4
--
Regards,
Vasiliy Kovalev
On Fri, Feb 9, 2024 at 7:16 PM <[email protected]> wrote:
>
> Hi,
>
> 24.01.2024 14:52, Eric Dumazet wrote:
> > On Wed, Jan 24, 2024 at 12:20 PM <[email protected]> wrote:
> >> 24.01.2024 13:57, Eric Dumazet wrote:
> >>> Oh wait, this is a 5.10 kernel ?
> >> Yes, but the bug is reproduced on the latest stable kernels.
> >>> Please generate a stack trace using a recent tree, it is possible the
> >>> bug has been fixed already.
> >> See [PATCH 0/1] above, there's a stack for the 6.6.13 kernel at the
> >> bottom of the message.
> > Ah, ok. Not sure why you sent a cover letter for a single patch...
> >
> > Setting a boolean, in a module that can disappear will not prevent the
> > module from disappearing.
> >
> > This work around might work, or might not work, depending on timing,
> > preemptions, ....
> >
> > Thanks.
>
> I tested running the reproducer [1] on the 6.8-rc3 kernel, the crash
> occurs in less than 10 seconds and the qemu VM restarts:
>
> dmesg -w:
>
> [ 106.941736] gtp: GTP module unloaded
> [ 106.962548] gtp: GTP module loaded (pdp ctx size 104 bytes)
> [ 107.014691] gtp: GTP module unloaded
> [ 107.041554] gtp: GTP module loaded (pdp ctx size 104 bytes)
> [ 107.082283] gtp: GTP module unloaded
> [ 107.123268] general protection fault, probably for non-canonical
> address 0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN NOPTI
> [ 107.124050] KASAN: null-ptr-deref in range
> [0x0000000000000010-0x0000000000000017]
> [ 107.124339] CPU: 1 PID: 5826 Comm: gtp Not tainted
> 6.8.0-rc3-std-def-alt1 #1
> [ 107.124604] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> 1.16.0-alt1 04/01/2014
> [ 107.124916] RIP: 0010:gtp_genl_dump_pdp+0x1be/0x800 [gtp]
> [ 107.125141] Code: c6 89 c6 e8 64 e9 86 df 58 45 85 f6 0f 85 4e 04 00
> 00 e8 c5 ee 86 df 48 8b 54 24 18 48 b8 00 00 00 00 00 fc ff df 48 c1 ea
> 03 <80> 3c 02 00 0f 85 de 05 00 00 48 8b 44 24 18 4c 8b 30 4c 39 f0 74
> [ 107.125960] RSP: 0018:ffff888014107220 EFLAGS: 00010202
> [ 107.126164] RAX: dffffc0000000000 RBX: 0000000000000000 RCX:
> 0000000000000000
> [ 107.126434] RDX: 0000000000000002 RSI: 0000000000000000 RDI:
> 0000000000000000
> [ 107.126707] RBP: 0000000000000000 R08: 0000000000000000 R09:
> 0000000000000000
> [ 107.126976] R10: 0000000000000000 R11: 0000000000000000 R12:
> 0000000000000000
> [ 107.127245] R13: ffff88800fcda588 R14: 0000000000000001 R15:
> 0000000000000000
> [ 107.127515] FS: 00007f1be4eb05c0(0000) GS:ffff88806ce80000(0000)
> knlGS:0000000000000000
> [ 107.127955] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 107.128177] CR2: 00007f1be4e766cf CR3: 000000000c33e000 CR4:
> 0000000000750ef0
> [ 107.128450] PKRU: 55555554
> [ 107.128577] Call Trace:
> [ 107.128699] <TASK>
> [ 107.128790] ? show_regs+0x90/0xa0
> [ 107.128935] ? die_addr+0x50/0xd0
> [ 107.129075] ? exc_general_protection+0x148/0x220
> [ 107.129267] ? asm_exc_general_protection+0x22/0x30
> [ 107.129469] ? gtp_genl_dump_pdp+0x1be/0x800 [gtp]
> [ 107.129677] ? __alloc_skb+0x1dd/0x350
> [ 107.129831] ? __pfx___alloc_skb+0x10/0x10
> [ 107.129999] genl_dumpit+0x11d/0x230
> [ 107.130150] netlink_dump+0x5b9/0xce0
> [ 107.130301] ? lockdep_hardirqs_on_prepare+0x253/0x430
> [ 107.130503] ? __pfx_netlink_dump+0x10/0x10
> [ 107.130686] ? kasan_save_track+0x10/0x40
> [ 107.130849] ? __kasan_kmalloc+0x9b/0xa0
> [ 107.131009] ? genl_start+0x675/0x970
> [ 107.131162] __netlink_dump_start+0x6fc/0x9f0
> [ 107.131341] genl_family_rcv_msg_dumpit+0x1bb/0x2d0
> [ 107.131538] ? __pfx_genl_family_rcv_msg_dumpit+0x10/0x10
> [ 107.131754] ? genl_op_from_small+0x2a/0x440
> [ 107.131972] ? cap_capable+0x1d0/0x240
> [ 107.132127] ? __pfx_genl_start+0x10/0x10
> [ 107.132292] ? __pfx_genl_dumpit+0x10/0x10
> [ 107.132461] ? __pfx_genl_done+0x10/0x10
> [ 107.132645] ? security_capable+0x9d/0xe0
>
> With the proposed patch applied, such a crash is not observed during
> long-term testing.
Maybe, but the patch is not good, I think I and Pablo gave feedback on this ?
Please trace __netlink_dump_start() content of control->module
gtp_genl_family.module should be set, and we should get it.
Otherwise, if the bug is in the core, we would need a dozen of 'work
arounds because it is better than nothing'
Thank you.
09.02.2024 22:21, Eric Dumazet wrote:
> Maybe, but the patch is not good, I think I and Pablo gave feedback on this ?
>
> Please trace __netlink_dump_start() content of control->module
>
> gtp_genl_family.module should be set, and we should get it.
>
> Otherwise, if the bug is in the core, we would need a dozen of 'work
> arounds because it is better than nothing'
>
> Thank you.
Thanks.
I tracked the moment when the __netlink_dump_start() function was
called, it turned out that in the gtp_init() initialization function
before registering pernet subsystem (gtp_net_ops), therefore, outdated
data is used, which leads to a crash.
The documentation says that ops structure must be assigned before
registering a generic netlink family [1].
I have fixed and sent a new patch [2].
[1]
https://elixir.bootlin.com/linux/v6.8-rc4/source/net/netlink/genetlink.c#L773
[2]
https://lore.kernel.org/netdev/[email protected]/T/#u
--
Regards,
Vasiliy Kovalev
On Wed, Feb 14, 2024 at 5:50 PM <[email protected]> wrote:
>
> 09.02.2024 22:21, Eric Dumazet wrote:
>
> > Maybe, but the patch is not good, I think I and Pablo gave feedback on this ?
> >
> > Please trace __netlink_dump_start() content of control->module
> >
> > gtp_genl_family.module should be set, and we should get it.
> >
> > Otherwise, if the bug is in the core, we would need a dozen of 'work
> > arounds because it is better than nothing'
> >
> > Thank you.
>
> Thanks.
>
> I tracked the moment when the __netlink_dump_start() function was
> called, it turned out that in the gtp_init() initialization function
> before registering pernet subsystem (gtp_net_ops), therefore, outdated
> data is used, which leads to a crash.
>
> The documentation says that ops structure must be assigned before
> registering a generic netlink family [1].
>
> I have fixed and sent a new patch [2].
>
> [1]
> https://elixir.bootlin.com/linux/v6.8-rc4/source/net/netlink/genetlink.c#L773
>
> [2]
> https://lore.kernel.org/netdev/[email protected]/T/#u
>
Excellent detective work, thanks a lot !