2018-04-29 02:15:19

by Fengguang Wu

[permalink] [raw]
Subject: [llc_ui_release] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004

Hello,

FYI this happens in mainline kernel 4.17.0-rc2.
It looks like a new regression.

It occurs in 5 out of 5 boots.

[main] 375 sockets created based on info from socket cachefile.
[main] Generating file descriptors
[main] Added 83 filenames from /dev
udevd[507]: failed to execute '/sbin/modprobe' '/sbin/modprobe -bv platform:regulatory': No such file or directory
[ 372.057947] caif:caif_disconnect_client(): nothing to disconnect
[ 372.082415] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[ 372.083866] PGD 16e5b067 P4D 16e5b067 PUD 16cce067 PMD 0
[ 372.085033] Oops: 0000 [#1] SMP
[ 372.085654] CPU: 1 PID: 494 Comm: trinity-main Not tainted 4.17.0-rc2 #171
[ 372.086910] RIP: 0010:refcount_inc_not_zero+0x25/0x2f0:
__read_once_size at include/linux/compiler.h:188
(inlined by) arch_atomic_read at arch/x86/include/asm/atomic.h:31
(inlined by) atomic_read at include/asm-generic/atomic-instrumented.h:22
(inlined by) refcount_inc_not_zero at lib/refcount.c:120
[ 372.087918] RSP: 0018:ffff880016fb7d08 EFLAGS: 00010206
[ 372.089279] RAX: ffff880016bd5f00 RBX: 0000000000000004 RCX: ffffffff818317ed
[ 372.090142] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000004
[ 372.091443] RBP: ffff880016f46878 R08: 0000000000000002 R09: 0000000000000000
[ 372.093103] R10: 0000000001a0ae15 R11: 0000000011e52352 R12: 0000000000000004
[ 372.094480] R13: 0000000000000000 R14: ffffffff84655070 R15: ffff88001f2f93c0
[ 372.095848] FS: 00007fc0f65dc700(0000) GS:ffff88001d600000(0000) knlGS:0000000000000000
[ 372.097643] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 372.098736] CR2: 0000000000000004 CR3: 0000000016e5a000 CR4: 00000000000006a0
[ 372.099978] DR0: 0000000000693000 DR1: 0000000000000000 DR2: 0000000000000000
[ 372.101483] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
[ 372.102856] Call Trace:
[ 372.103434] refcount_inc+0x19/0x190:
refcount_inc at lib/refcount.c:153
[ 372.104200] llc_ui_release+0xf5/0x270:
constant_test_bit at arch/x86/include/asm/bitops.h:328
(inlined by) sock_flag at include/net/sock.h:802
(inlined by) llc_ui_release at net/llc/af_llc.c:208
[ 372.105122] sock_release+0x56/0x120:
sock_release at net/socket.c:594
[ 372.105773] ? sock_release+0x120/0x120:
sock_close at net/socket.c:1148
[ 372.106490] sock_close+0x1f/0x30:
sock_close at net/socket.c:1151
[ 372.107176] __fput+0x2e9/0x620:
__fput at fs/file_table.c:209
[ 372.107858] ____fput+0x1e/0x30:
____fput at fs/file_table.c:243
[ 372.108526] task_work_run+0x11a/0x180:
task_work_run at kernel/task_work.c:115 (discriminator 1)
[ 372.109491] do_exit+0xda4/0x2210:
do_exit at kernel/exit.c:866
[ 372.110171] ? __do_page_fault+0xffe/0x1150:
__do_page_fault at arch/x86/mm/fault.c:1444 (discriminator 1)
[ 372.110677] do_group_exit+0x1ce/0x1f0:
do_group_exit at kernel/exit.c:957
[ 372.111130] __do_sys_exit_group+0x1b/0x20:
__do_sys_exit_group at kernel/exit.c:979
[ 372.111878] __x64_sys_exit_group+0x1f/0x20:
__x64_sys_exit_group at kernel/exit.c:977
[ 372.112655] do_syscall_64+0x3c8/0x940:
do_syscall_64 at arch/x86/entry/common.c:287
[ 372.113612] entry_SYSCALL_64_after_hwframe+0x49/0xbe:
entry_SYSCALL_64_after_hwframe at arch/x86/entry/entry_64.S:240
[ 372.114638] RIP: 0033:0x7fc0f60c1408
[ 372.115389] RSP: 002b:00007fff057791b8 EFLAGS: 00000206 ORIG_RAX: 00000000000000e7
[ 372.117042] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fc0f60c1408
[ 372.118462] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
[ 372.119892] RBP: 0000000000000001 R08: 00000000000000e7 R09: ffffffffffffffa0
[ 372.121479] R10: 00007fff05778f50 R11: 0000000000000206 R12: 0000000000000005
[ 372.122831] R13: 00007fff057793a0 R14: 0000000000000000 R15: 0000000000000000
[ 372.124211] Code: 84 00 00 00 00 00 41 57 41 56 49 c7 c6 70 50 65 84 41 55 41 54 49 89 fc 55 53 48 83 ec 08 e8 a3 53 a4 ff 48 83 05 4b 01 9d 04 01 <45> 8b 2c 24 e8 92 53 a4 ff 31 c0 45 85 ed 41 8d 6d 01 0f 94 c0
[ 372.129128] RIP: refcount_inc_not_zero+0x25/0x2f0:
__read_once_size at include/linux/compiler.h:188
(inlined by) arch_atomic_read at arch/x86/include/asm/atomic.h:31
(inlined by) atomic_read at include/asm-generic/atomic-instrumented.h:22
(inlined by) refcount_inc_not_zero at lib/refcount.c:120 RSP: ffff880016fb7d08
[ 372.130205] CR2: 0000000000000004
[ 372.130604] ---[ end trace a6d858cc768df5f2 ]---
[ 372.131122] Kernel panic - not syncing: Fatal exception

Attached the full dmesg, kconfig and reproduce scripts.

Thanks,
Fengguang


Attachments:
(No filename) (4.69 kB)
dmesg-quantal-ivb41-78:20180425191426:x86_64-randconfig-s3-04251452:4.17.0-rc2:171 (83.76 kB)
.config (105.42 kB)
reproduce-quantal-ivb41-78:20180425191426:x86_64-randconfig-s3-04251452:4.17.0-rc2:171 (964.00 B)
Download all attachments

2018-04-29 03:33:11

by Linus Torvalds

[permalink] [raw]
Subject: Re: [llc_ui_release] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004

On Sat, Apr 28, 2018 at 7:12 PM Fengguang Wu <[email protected]> wrote:

> FYI this happens in mainline kernel 4.17.0-rc2.
> It looks like a new regression.

> It occurs in 5 out of 5 boots.

> [main] 375 sockets created based on info from socket cachefile.
> [main] Generating file descriptors
> [main] Added 83 filenames from /dev
> udevd[507]: failed to execute '/sbin/modprobe' '/sbin/modprobe -bv
platform:regulatory': No such file or directory
> [ 372.057947] caif:caif_disconnect_client(): nothing to disconnect
> [ 372.082415] BUG: unable to handle kernel NULL pointer dereference at
0000000000000004

I think this is fixed by commit 3a04ce7130a7 ("llc: fix NULL pointer deref
for SOCK_ZAPPED")

Liunus

2018-04-29 12:17:01

by Fengguang Wu

[permalink] [raw]
Subject: Re: [llc_ui_release] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004

On Sun, Apr 29, 2018 at 03:30:48AM +0000, Linus Torvalds wrote:
>On Sat, Apr 28, 2018 at 7:12 PM Fengguang Wu <[email protected]> wrote:
>
>> FYI this happens in mainline kernel 4.17.0-rc2.
>> It looks like a new regression.
>
>> It occurs in 5 out of 5 boots.
>
>> [main] 375 sockets created based on info from socket cachefile.
>> [main] Generating file descriptors
>> [main] Added 83 filenames from /dev
>> udevd[507]: failed to execute '/sbin/modprobe' '/sbin/modprobe -bv
>platform:regulatory': No such file or directory
>> [ 372.057947] caif:caif_disconnect_client(): nothing to disconnect
>> [ 372.082415] BUG: unable to handle kernel NULL pointer dereference at
>0000000000000004
>
>I think this is fixed by commit 3a04ce7130a7 ("llc: fix NULL pointer deref
>for SOCK_ZAPPED")

Confirmed. Sorry for the late report!

Regards,
Fengguang