2022-09-13 21:25:57

by Roberto Ricci

[permalink] [raw]
Subject: BUG: unable to handle page fault for address, with ipv6.disable=1

Executing the `ss` command in a system with kernel 5.19.8, booted with
the "ipv6.disable=1" parameter, causes this oops:


[ 74.952477] BUG: unable to handle page fault for address: ffffffffffffffc8
[ 74.952568] #PF: supervisor read access in kernel mode
[ 74.952632] #PF: error_code(0x0000) - not-present page
[ 74.952695] PGD 25814067 P4D 25814067 PUD 25816067 PMD 0
[ 74.952770] Oops: 0000 [#1] PREEMPT SMP PTI
[ 74.952816] CPU: 0 PID: 704 Comm: ss Not tainted 5.19.8_1 #1
[ 74.952869] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
WARNING! Modules path isn't set, but is needed to parse this symbol
[ 74.953292] RIP: 0010:raw_diag_dump+0xea/0x1d0 raw_diag
[ 74.953379] Code: c3 89 44 24 24 4c 89 e8 41 89 ed 49 89 c7 48 8b 04 24 48 8b 58 08 89 dd 83 e5 01 74 0d e9 d5 00 00 00 48 8b 1b f6 c3 01 75 7f <4c> 3b 63 c8 75 f2 44 39 ed 7c 69 41 0f b6 07 66 39 43 a8 75 5f 41
All code
========
0: c3 retq
1: 89 44 24 24 mov %eax,0x24(%rsp)
5: 4c 89 e8 mov %r13,%rax
8: 41 89 ed mov %ebp,%r13d
b: 49 89 c7 mov %rax,%r15
e: 48 8b 04 24 mov (%rsp),%rax
12: 48 8b 58 08 mov 0x8(%rax),%rbx
16: 89 dd mov %ebx,%ebp
18: 83 e5 01 and $0x1,%ebp
1b: 74 0d je 0x2a
1d: e9 d5 00 00 00 jmpq 0xf7
22: 48 8b 1b mov (%rbx),%rbx
25: f6 c3 01 test $0x1,%bl
28: 75 7f jne 0xa9
2a:* 4c 3b 63 c8 cmp -0x38(%rbx),%r12 <-- trapping instruction
2e: 75 f2 jne 0x22
30: 44 39 ed cmp %r13d,%ebp
33: 7c 69 jl 0x9e
35: 41 0f b6 07 movzbl (%r15),%eax
39: 66 39 43 a8 cmp %ax,-0x58(%rbx)
3d: 75 5f jne 0x9e
3f: 41 rex.B

Code starting with the faulting instruction
===========================================
0: 4c 3b 63 c8 cmp -0x38(%rbx),%r12
4: 75 f2 jne 0xfffffffffffffff8
6: 44 39 ed cmp %r13d,%ebp
9: 7c 69 jl 0x74
b: 41 0f b6 07 movzbl (%r15),%eax
f: 66 39 43 a8 cmp %ax,-0x58(%rbx)
13: 75 5f jne 0x74
15: 41 rex.B
[ 74.953617] RSP: 0018:ffffbb8740af7908 EFLAGS: 00010246
[ 74.953668] RAX: ffffffff9d22e680 RBX: 0000000000000000 RCX: 000000000000000c
[ 74.953729] RDX: 0000000000000000 RSI: ffffffff9d22e680 RDI: 0000000000000000
[ 74.953788] RBP: 0000000000000000 R08: 00000000ffffffff R09: 0000000000034f2b
[ 74.953848] R10: ffff9f1bcb598000 R11: 0000000000000000 R12: ffffffff9d225a40
[ 74.953907] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9f1bc6474610
[ 74.953967] FS: 00007f66aab55740(0000) GS:ffff9f1bfec00000(0000) knlGS:0000000000000000
[ 74.954069] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 74.954120] CR2: ffffffffffffffc8 CR3: 000000002fe1e000 CR4: 00000000000006f0
[ 74.954188] Call Trace:
[ 74.954221] <TASK>
[ 74.954248] __inet_diag_dump (net/ipv4/inet_diag.c:1179)
[ 74.954462] netlink_dump (net/netlink/af_netlink.c:2276)
[ 74.954549] __netlink_dump_start (net/netlink/af_netlink.c:2380)
[ 74.954613] inet_diag_handler_cmd (net/ipv4/inet_diag.c:1347)
[ 74.954672] ? inet_diag_dump_start_compat (net/ipv4/inet_diag.c:1244)
[ 74.954725] ? inet_diag_dump_compat (net/ipv4/inet_diag.c:1197)
[ 74.954768] ? inet_diag_unregister (net/ipv4/inet_diag.c:1254)
[ 74.954811] sock_diag_rcv_msg (net/core/sock_diag.c:235 net/core/sock_diag.c:266)
[ 74.954905] ? sock_diag_bind (net/core/sock_diag.c:247)
[ 74.954950] netlink_rcv_skb (net/netlink/af_netlink.c:2501)
[ 74.954993] sock_diag_rcv (net/core/sock_diag.c:278)
[ 74.955032] netlink_unicast (net/netlink/af_netlink.c:1320 net/netlink/af_netlink.c:1345)
[ 74.955074] netlink_sendmsg (net/netlink/af_netlink.c:1921)
[ 74.955116] sock_sendmsg (net/socket.c:714 net/socket.c:734)
[ 74.955199] ____sys_sendmsg (net/socket.c:2488)
[ 74.955245] ? import_iovec (lib/iov_iter.c:2008)
[ 74.955302] ? sendmsg_copy_msghdr (net/socket.c:2429 net/socket.c:2519)
[ 74.955348] ___sys_sendmsg (net/socket.c:2544)
[ 74.955447] ? __schedule (kernel/sched/core.c:6476)
[ 74.955522] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/preempt.h:103 ./include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:194)
[ 74.955583] ? do_notify_parent_cldstop (kernel/signal.c:2191)
[ 74.955656] ? preempt_count_add (./include/linux/ftrace.h:910 kernel/sched/core.c:5598 kernel/sched/core.c:5595 kernel/sched/core.c:5623)
[ 74.955712] ? _raw_spin_lock_irq (./arch/x86/include/asm/atomic.h:202 ./include/linux/atomic/atomic-instrumented.h:543 ./include/asm-generic/qspinlock.h:111 ./include/linux/spinlock.h:185 ./include/linux/spinlock_api_smp.h:120 kernel/locking/spinlock.c:170)
[ 74.955752] ? ptrace_stop.part.0 (kernel/signal.c:2331)
[ 74.955795] __sys_sendmsg (./include/linux/file.h:31 net/socket.c:2573)
[ 74.955835] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[ 74.955914] ? syscall_exit_to_user_mode (./arch/x86/include/asm/jump_label.h:55 ./arch/x86/include/asm/nospec-branch.h:382 ./arch/x86/include/asm/entry-common.h:94 kernel/entry/common.c:133 kernel/entry/common.c:296)
[ 74.955965] ? do_syscall_64 (arch/x86/entry/common.c:87)
[ 74.957786] ? do_syscall_64 (arch/x86/entry/common.c:87)
[ 74.959896] ? handle_mm_fault (mm/memory.c:5144)
[ 74.961184] ? do_user_addr_fault (arch/x86/mm/fault.c:1422)
[ 74.962609] ? fpregs_assert_state_consistent (arch/x86/kernel/fpu/context.h:39 arch/x86/kernel/fpu/core.c:772)
[ 74.964171] ? exit_to_user_mode_prepare (./arch/x86/include/asm/entry-common.h:57 kernel/entry/common.c:203)
[ 74.965968] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[ 74.967266] RIP: 0033:0x7f66aac577d3
[ 74.968499] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b7 66 2e 0f 1f 84 00 00 00 00 00 90 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 89 54 24 1c 48
All code
========
0: 64 89 02 mov %eax,%fs:(%rdx)
3: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
a: eb b7 jmp 0xffffffffffffffc3
c: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
13: 00 00 00
16: 90 nop
17: 64 8b 04 25 18 00 00 mov %fs:0x18,%eax
1e: 00
1f: 85 c0 test %eax,%eax
21: 75 14 jne 0x37
23: b8 2e 00 00 00 mov $0x2e,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 55 ja 0x87
32: c3 retq
33: 0f 1f 40 00 nopl 0x0(%rax)
37: 48 83 ec 28 sub $0x28,%rsp
3b: 89 54 24 1c mov %edx,0x1c(%rsp)
3f: 48 rex.W

Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 55 ja 0x5d
8: c3 retq
9: 0f 1f 40 00 nopl 0x0(%rax)
d: 48 83 ec 28 sub $0x28,%rsp
11: 89 54 24 1c mov %edx,0x1c(%rsp)
15: 48 rex.W
[ 74.970741] RSP: 002b:00007ffc6132da68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 74.971846] RAX: ffffffffffffffda RBX: 00007ffc6132dbe0 RCX: 00007f66aac577d3
[ 74.972950] RDX: 0000000000000000 RSI: 00007ffc6132db60 RDI: 0000000000000003
[ 74.974039] RBP: 00000000000000ff R08: 0000000000000014 R09: 0000000000000000
[ 74.975120] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000a
[ 74.976883] R13: 00007ffc6132dae0 R14: 0000000000000003 R15: 00007ffc6132db60
[ 74.978531] </TASK>
[ 74.980707] Modules linked in: raw_diag unix_diag netlink_diag cfg80211 8021q garp mrp stp llc joydev ppdev intel_agp psmouse intel_gtt i2c_piix4 input_leds pcspkr parport_pc evdev floppy mac_hid parport tiny_power_button snd_seq snd_seq_device snd_timer snd soundcore vhost_vsock vmw_vsock_virtio_transport_common vsock vhost_net vhost vhost_iotlb tap uhid hid hci_vhci bluetooth ecdh_generic rfkill ecc vfio_iommu_type1 vfio uinput userio ppp_generic slhc tun loop nvram cuse fuse ext4 crc16 mbcache jbd2 sr_mod cdrom sd_mod ata_generic pata_acpi ata_piix serio_raw e1000 bochs drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm_ttm_helper ttm drm agpgart libata scsi_mod scsi_common qemu_fw_cfg button dm_mirror dm_region_hash dm_log dm_mod btrfs blake2b_generic xor raid6_pq libcrc32c crc32c_generic
[ 74.980838] Unloaded tainted modules: pcc_cpufreq():1 acpi_cpufreq():1 crc32c_intel():1 crc32c_intel():1
[ 74.992495] CR2: ffffffffffffffc8
[ 74.994263] ---[ end trace 0000000000000000 ]---
WARNING! Modules path isn't set, but is needed to parse this symbol
[ 74.995844] RIP: 0010:raw_diag_dump+0xea/0x1d0 raw_diag
[ 74.997239] Code: c3 89 44 24 24 4c 89 e8 41 89 ed 49 89 c7 48 8b 04 24 48 8b 58 08 89 dd 83 e5 01 74 0d e9 d5 00 00 00 48 8b 1b f6 c3 01 75 7f <4c> 3b 63 c8 75 f2 44 39 ed 7c 69 41 0f b6 07 66 39 43 a8 75 5f 41
All code
========
0: c3 retq
1: 89 44 24 24 mov %eax,0x24(%rsp)
5: 4c 89 e8 mov %r13,%rax
8: 41 89 ed mov %ebp,%r13d
b: 49 89 c7 mov %rax,%r15
e: 48 8b 04 24 mov (%rsp),%rax
12: 48 8b 58 08 mov 0x8(%rax),%rbx
16: 89 dd mov %ebx,%ebp
18: 83 e5 01 and $0x1,%ebp
1b: 74 0d je 0x2a
1d: e9 d5 00 00 00 jmpq 0xf7
22: 48 8b 1b mov (%rbx),%rbx
25: f6 c3 01 test $0x1,%bl
28: 75 7f jne 0xa9
2a:* 4c 3b 63 c8 cmp -0x38(%rbx),%r12 <-- trapping instruction
2e: 75 f2 jne 0x22
30: 44 39 ed cmp %r13d,%ebp
33: 7c 69 jl 0x9e
35: 41 0f b6 07 movzbl (%r15),%eax
39: 66 39 43 a8 cmp %ax,-0x58(%rbx)
3d: 75 5f jne 0x9e
3f: 41 rex.B

Code starting with the faulting instruction
===========================================
0: 4c 3b 63 c8 cmp -0x38(%rbx),%r12
4: 75 f2 jne 0xfffffffffffffff8
6: 44 39 ed cmp %r13d,%ebp
9: 7c 69 jl 0x74
b: 41 0f b6 07 movzbl (%r15),%eax
f: 66 39 43 a8 cmp %ax,-0x58(%rbx)
13: 75 5f jne 0x74
15: 41 rex.B
[ 75.000699] RSP: 0018:ffffbb8740af7908 EFLAGS: 00010246
[ 75.001981] RAX: ffffffff9d22e680 RBX: 0000000000000000 RCX: 000000000000000c
[ 75.003420] RDX: 0000000000000000 RSI: ffffffff9d22e680 RDI: 0000000000000000
[ 75.004824] RBP: 0000000000000000 R08: 00000000ffffffff R09: 0000000000034f2b
[ 75.006250] R10: ffff9f1bcb598000 R11: 0000000000000000 R12: ffffffff9d225a40
[ 75.007656] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9f1bc6474610
[ 75.009757] FS: 00007f66aab55740(0000) GS:ffff9f1bfec00000(0000) knlGS:0000000000000000
[ 75.012209] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 75.014095] CR2: ffffffffffffffc8 CR3: 000000002fe1e000 CR4: 00000000000006f0
[ 75.015497] note: ss[704] exited with preempt_count 1


I reproduced this with Void Linux x86_64 in a virtual machine. The kernels are
those provided by the distribution (Void uses vanilla kernels, I don't believe
these very small patches make any difference
https://github.com/void-linux/void-packages/tree/0a87c670f35e01a3ac1d850f628fe1bab5d3c433/srcpkgs/linux5.19/patches).

Kernels 5.19.8 and 5.18.19 are affected, 5.16.20 is not.
I don't know about 5.17.x because Void doesn't package it.
The iproute2 version is 5.16.0 (but this also happens with 5.19.0).

I attach the kernel config, the full dmesg and the output of `strace ss`.

Sorry if you received a duplicate of this email. I sent another berfore
and the server didn't like it.


Attachments:
(No filename) (12.25 kB)
config (259.65 kB)
dmesg (34.97 kB)
strace_ss (11.54 kB)
Download all attachments

2022-09-14 16:08:12

by Ido Schimmel

[permalink] [raw]
Subject: Re: BUG: unable to handle page fault for address, with ipv6.disable=1

+ Eric

Original report:
https://lore.kernel.org/netdev/YyD0kMC7qIBNOE3j@riccipc/T/#u

On Tue, Sep 13, 2022 at 11:22:24PM +0200, Roberto Ricci wrote:
> Executing the `ss` command in a system with kernel 5.19.8, booted with
> the "ipv6.disable=1" parameter, causes this oops:
>
>
> [ 74.952477] BUG: unable to handle page fault for address: ffffffffffffffc8
> [ 74.952568] #PF: supervisor read access in kernel mode
> [ 74.952632] #PF: error_code(0x0000) - not-present page
> [ 74.952695] PGD 25814067 P4D 25814067 PUD 25816067 PMD 0
> [ 74.952770] Oops: 0000 [#1] PREEMPT SMP PTI
> [ 74.952816] CPU: 0 PID: 704 Comm: ss Not tainted 5.19.8_1 #1
> [ 74.952869] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> WARNING! Modules path isn't set, but is needed to parse this symbol
> [ 74.953292] RIP: 0010:raw_diag_dump+0xea/0x1d0 raw_diag
[...]
> [ 74.954188] Call Trace:
> [ 74.954221] <TASK>
> [ 74.954248] __inet_diag_dump (net/ipv4/inet_diag.c:1179)
> [ 74.954462] netlink_dump (net/netlink/af_netlink.c:2276)
> [ 74.954549] __netlink_dump_start (net/netlink/af_netlink.c:2380)
> [ 74.954613] inet_diag_handler_cmd (net/ipv4/inet_diag.c:1347)
> [ 74.954672] ? inet_diag_dump_start_compat (net/ipv4/inet_diag.c:1244)
> [ 74.954725] ? inet_diag_dump_compat (net/ipv4/inet_diag.c:1197)
> [ 74.954768] ? inet_diag_unregister (net/ipv4/inet_diag.c:1254)
> [ 74.954811] sock_diag_rcv_msg (net/core/sock_diag.c:235 net/core/sock_diag.c:266)
> [ 74.954905] ? sock_diag_bind (net/core/sock_diag.c:247)
> [ 74.954950] netlink_rcv_skb (net/netlink/af_netlink.c:2501)
> [ 74.954993] sock_diag_rcv (net/core/sock_diag.c:278)
> [ 74.955032] netlink_unicast (net/netlink/af_netlink.c:1320 net/netlink/af_netlink.c:1345)
> [ 74.955074] netlink_sendmsg (net/netlink/af_netlink.c:1921)
> [ 74.955116] sock_sendmsg (net/socket.c:714 net/socket.c:734)
> [ 74.955199] ____sys_sendmsg (net/socket.c:2488)
> [ 74.955245] ? import_iovec (lib/iov_iter.c:2008)
> [ 74.955302] ? sendmsg_copy_msghdr (net/socket.c:2429 net/socket.c:2519)
> [ 74.955348] ___sys_sendmsg (net/socket.c:2544)
> [ 74.955447] ? __schedule (kernel/sched/core.c:6476)
> [ 74.955522] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/preempt.h:103 ./include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:194)
> [ 74.955583] ? do_notify_parent_cldstop (kernel/signal.c:2191)
> [ 74.955656] ? preempt_count_add (./include/linux/ftrace.h:910 kernel/sched/core.c:5598 kernel/sched/core.c:5595 kernel/sched/core.c:5623)
> [ 74.955712] ? _raw_spin_lock_irq (./arch/x86/include/asm/atomic.h:202 ./include/linux/atomic/atomic-instrumented.h:543 ./include/asm-generic/qspinlock.h:111 ./include/linux/spinlock.h:185 ./include/linux/spinlock_api_smp.h:120 kernel/locking/spinlock.c:170)
> [ 74.955752] ? ptrace_stop.part.0 (kernel/signal.c:2331)
> [ 74.955795] __sys_sendmsg (./include/linux/file.h:31 net/socket.c:2573)
> [ 74.955835] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
> [ 74.955914] ? syscall_exit_to_user_mode (./arch/x86/include/asm/jump_label.h:55 ./arch/x86/include/asm/nospec-branch.h:382 ./arch/x86/include/asm/entry-common.h:94 kernel/entry/common.c:133 kernel/entry/common.c:296)
> [ 74.955965] ? do_syscall_64 (arch/x86/entry/common.c:87)
> [ 74.957786] ? do_syscall_64 (arch/x86/entry/common.c:87)
> [ 74.959896] ? handle_mm_fault (mm/memory.c:5144)
> [ 74.961184] ? do_user_addr_fault (arch/x86/mm/fault.c:1422)
> [ 74.962609] ? fpregs_assert_state_consistent (arch/x86/kernel/fpu/context.h:39 arch/x86/kernel/fpu/core.c:772)
> [ 74.964171] ? exit_to_user_mode_prepare (./arch/x86/include/asm/entry-common.h:57 kernel/entry/common.c:203)
> [ 74.965968] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
> [ 74.967266] RIP: 0033:0x7f66aac577d3

[...]

> I reproduced this with Void Linux x86_64 in a virtual machine. The kernels are
> those provided by the distribution (Void uses vanilla kernels, I don't believe
> these very small patches make any difference
> https://github.com/void-linux/void-packages/tree/0a87c670f35e01a3ac1d850f628fe1bab5d3c433/srcpkgs/linux5.19/patches).
>
> Kernels 5.19.8 and 5.18.19 are affected, 5.16.20 is not.
> I don't know about 5.17.x because Void doesn't package it.
> The iproute2 version is 5.16.0 (but this also happens with 5.19.0).

This is most likely caused by commit 0daf07e52709 ("raw: convert raw
sockets to RCU") which is being back ported to stable kernels.

It made the initialization of 'raw_v6_hashinfo' conditional on IPv6
being enabled. Can you try the following patch (works on my end)?

diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 19732b5dce23..d40b7d60e00e 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -1072,13 +1072,13 @@ static int __init inet6_init(void)
for (r = &inetsw6[0]; r < &inetsw6[SOCK_MAX]; ++r)
INIT_LIST_HEAD(r);

+ raw_hashinfo_init(&raw_v6_hashinfo);
+
if (disable_ipv6_mod) {
pr_info("Loaded, but administratively disabled, reboot required to enable\n");
goto out;
}

- raw_hashinfo_init(&raw_v6_hashinfo);
-
err = proto_register(&tcpv6_prot, 1);
if (err)
goto out;

Another approach is the following, but I prefer the first:

diff --git a/net/ipv4/raw_diag.c b/net/ipv4/raw_diag.c
index 999321834b94..4fbdd69a2be8 100644
--- a/net/ipv4/raw_diag.c
+++ b/net/ipv4/raw_diag.c
@@ -20,7 +20,7 @@ raw_get_hashinfo(const struct inet_diag_req_v2 *r)
if (r->sdiag_family == AF_INET) {
return &raw_v4_hashinfo;
#if IS_ENABLED(CONFIG_IPV6)
- } else if (r->sdiag_family == AF_INET6) {
+ } else if (r->sdiag_family == AF_INET6 && ipv6_mod_enabled()) {
return &raw_v6_hashinfo;
#endif
} else {

2022-09-14 18:16:57

by Eric Dumazet

[permalink] [raw]
Subject: Re: BUG: unable to handle page fault for address, with ipv6.disable=1

On Wed, Sep 14, 2022 at 8:47 AM Ido Schimmel <[email protected]> wrote:
>
> + Eric
>
> Original report:
> https://lore.kernel.org/netdev/YyD0kMC7qIBNOE3j@riccipc/T/#u
>
> On Tue, Sep 13, 2022 at 11:22:24PM +0200, Roberto Ricci wrote:
> > Executing the `ss` command in a system with kernel 5.19.8, booted with
> > the "ipv6.disable=1" parameter, causes this oops:
> >
> >
> > [ 74.952477] BUG: unable to handle page fault for address: ffffffffffffffc8
> > [ 74.952568] #PF: supervisor read access in kernel mode
> > [ 74.952632] #PF: error_code(0x0000) - not-present page
> > [ 74.952695] PGD 25814067 P4D 25814067 PUD 25816067 PMD 0
> > [ 74.952770] Oops: 0000 [#1] PREEMPT SMP PTI
> > [ 74.952816] CPU: 0 PID: 704 Comm: ss Not tainted 5.19.8_1 #1
> > [ 74.952869] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> > WARNING! Modules path isn't set, but is needed to parse this symbol
> > [ 74.953292] RIP: 0010:raw_diag_dump+0xea/0x1d0 raw_diag
> [...]
> > [ 74.954188] Call Trace:
> > [ 74.954221] <TASK>
> > [ 74.954248] __inet_diag_dump (net/ipv4/inet_diag.c:1179)
> > [ 74.954462] netlink_dump (net/netlink/af_netlink.c:2276)
> > [ 74.954549] __netlink_dump_start (net/netlink/af_netlink.c:2380)
> > [ 74.954613] inet_diag_handler_cmd (net/ipv4/inet_diag.c:1347)
> > [ 74.954672] ? inet_diag_dump_start_compat (net/ipv4/inet_diag.c:1244)
> > [ 74.954725] ? inet_diag_dump_compat (net/ipv4/inet_diag.c:1197)
> > [ 74.954768] ? inet_diag_unregister (net/ipv4/inet_diag.c:1254)
> > [ 74.954811] sock_diag_rcv_msg (net/core/sock_diag.c:235 net/core/sock_diag.c:266)
> > [ 74.954905] ? sock_diag_bind (net/core/sock_diag.c:247)
> > [ 74.954950] netlink_rcv_skb (net/netlink/af_netlink.c:2501)
> > [ 74.954993] sock_diag_rcv (net/core/sock_diag.c:278)
> > [ 74.955032] netlink_unicast (net/netlink/af_netlink.c:1320 net/netlink/af_netlink.c:1345)
> > [ 74.955074] netlink_sendmsg (net/netlink/af_netlink.c:1921)
> > [ 74.955116] sock_sendmsg (net/socket.c:714 net/socket.c:734)
> > [ 74.955199] ____sys_sendmsg (net/socket.c:2488)
> > [ 74.955245] ? import_iovec (lib/iov_iter.c:2008)
> > [ 74.955302] ? sendmsg_copy_msghdr (net/socket.c:2429 net/socket.c:2519)
> > [ 74.955348] ___sys_sendmsg (net/socket.c:2544)
> > [ 74.955447] ? __schedule (kernel/sched/core.c:6476)
> > [ 74.955522] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/preempt.h:103 ./include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:194)
> > [ 74.955583] ? do_notify_parent_cldstop (kernel/signal.c:2191)
> > [ 74.955656] ? preempt_count_add (./include/linux/ftrace.h:910 kernel/sched/core.c:5598 kernel/sched/core.c:5595 kernel/sched/core.c:5623)
> > [ 74.955712] ? _raw_spin_lock_irq (./arch/x86/include/asm/atomic.h:202 ./include/linux/atomic/atomic-instrumented.h:543 ./include/asm-generic/qspinlock.h:111 ./include/linux/spinlock.h:185 ./include/linux/spinlock_api_smp.h:120 kernel/locking/spinlock.c:170)
> > [ 74.955752] ? ptrace_stop.part.0 (kernel/signal.c:2331)
> > [ 74.955795] __sys_sendmsg (./include/linux/file.h:31 net/socket.c:2573)
> > [ 74.955835] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
> > [ 74.955914] ? syscall_exit_to_user_mode (./arch/x86/include/asm/jump_label.h:55 ./arch/x86/include/asm/nospec-branch.h:382 ./arch/x86/include/asm/entry-common.h:94 kernel/entry/common.c:133 kernel/entry/common.c:296)
> > [ 74.955965] ? do_syscall_64 (arch/x86/entry/common.c:87)
> > [ 74.957786] ? do_syscall_64 (arch/x86/entry/common.c:87)
> > [ 74.959896] ? handle_mm_fault (mm/memory.c:5144)
> > [ 74.961184] ? do_user_addr_fault (arch/x86/mm/fault.c:1422)
> > [ 74.962609] ? fpregs_assert_state_consistent (arch/x86/kernel/fpu/context.h:39 arch/x86/kernel/fpu/core.c:772)
> > [ 74.964171] ? exit_to_user_mode_prepare (./arch/x86/include/asm/entry-common.h:57 kernel/entry/common.c:203)
> > [ 74.965968] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
> > [ 74.967266] RIP: 0033:0x7f66aac577d3
>
> [...]
>
> > I reproduced this with Void Linux x86_64 in a virtual machine. The kernels are
> > those provided by the distribution (Void uses vanilla kernels, I don't believe
> > these very small patches make any difference
> > https://github.com/void-linux/void-packages/tree/0a87c670f35e01a3ac1d850f628fe1bab5d3c433/srcpkgs/linux5.19/patches).
> >
> > Kernels 5.19.8 and 5.18.19 are affected, 5.16.20 is not.
> > I don't know about 5.17.x because Void doesn't package it.
> > The iproute2 version is 5.16.0 (but this also happens with 5.19.0).
>
> This is most likely caused by commit 0daf07e52709 ("raw: convert raw
> sockets to RCU") which is being back ported to stable kernels.
>
> It made the initialization of 'raw_v6_hashinfo' conditional on IPv6
> being enabled. Can you try the following patch (works on my end)?
>
> diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
> index 19732b5dce23..d40b7d60e00e 100644
> --- a/net/ipv6/af_inet6.c
> +++ b/net/ipv6/af_inet6.c
> @@ -1072,13 +1072,13 @@ static int __init inet6_init(void)
> for (r = &inetsw6[0]; r < &inetsw6[SOCK_MAX]; ++r)
> INIT_LIST_HEAD(r);
>
> + raw_hashinfo_init(&raw_v6_hashinfo);
> +
> if (disable_ipv6_mod) {
> pr_info("Loaded, but administratively disabled, reboot required to enable\n");
> goto out;
> }
>
> - raw_hashinfo_init(&raw_v6_hashinfo);
> -
> err = proto_register(&tcpv6_prot, 1);
> if (err)
> goto out;
>
> Another approach is the following, but I prefer the first:

+1, thanks for looking at this Ido !

>
> diff --git a/net/ipv4/raw_diag.c b/net/ipv4/raw_diag.c
> index 999321834b94..4fbdd69a2be8 100644
> --- a/net/ipv4/raw_diag.c
> +++ b/net/ipv4/raw_diag.c
> @@ -20,7 +20,7 @@ raw_get_hashinfo(const struct inet_diag_req_v2 *r)
> if (r->sdiag_family == AF_INET) {
> return &raw_v4_hashinfo;
> #if IS_ENABLED(CONFIG_IPV6)
> - } else if (r->sdiag_family == AF_INET6) {
> + } else if (r->sdiag_family == AF_INET6 && ipv6_mod_enabled()) {
> return &raw_v6_hashinfo;
> #endif
> } else {

2022-09-15 23:57:46

by Roberto Ricci

[permalink] [raw]
Subject: Re: BUG: unable to handle page fault for address, with ipv6.disable=1

On 2022-09-14 Wed 18:47:12 +0300, Ido Schimmel wrote:
> This is most likely caused by commit 0daf07e52709 ("raw: convert raw
> sockets to RCU") which is being back ported to stable kernels.
>
> It made the initialization of 'raw_v6_hashinfo' conditional on IPv6
> being enabled. Can you try the following patch (works on my end)?
>
> diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
> index 19732b5dce23..d40b7d60e00e 100644
> --- a/net/ipv6/af_inet6.c
> +++ b/net/ipv6/af_inet6.c
> @@ -1072,13 +1072,13 @@ static int __init inet6_init(void)
> for (r = &inetsw6[0]; r < &inetsw6[SOCK_MAX]; ++r)
> INIT_LIST_HEAD(r);
>
> + raw_hashinfo_init(&raw_v6_hashinfo);
> +
> if (disable_ipv6_mod) {
> pr_info("Loaded, but administratively disabled, reboot required to enable\n");
> goto out;
> }
>
> - raw_hashinfo_init(&raw_v6_hashinfo);
> -
> err = proto_register(&tcpv6_prot, 1);
> if (err)
> goto out;
>
> Another approach is the following, but I prefer the first:
>
> diff --git a/net/ipv4/raw_diag.c b/net/ipv4/raw_diag.c
> index 999321834b94..4fbdd69a2be8 100644
> --- a/net/ipv4/raw_diag.c
> +++ b/net/ipv4/raw_diag.c
> @@ -20,7 +20,7 @@ raw_get_hashinfo(const struct inet_diag_req_v2 *r)
> if (r->sdiag_family == AF_INET) {
> return &raw_v4_hashinfo;
> #if IS_ENABLED(CONFIG_IPV6)
> - } else if (r->sdiag_family == AF_INET6) {
> + } else if (r->sdiag_family == AF_INET6 && ipv6_mod_enabled()) {
> return &raw_v6_hashinfo;
> #endif
> } else {

Both the solutions you proposed work for me. Thanks.