2023-09-20 10:57:08

by kernel test robot

[permalink] [raw]
Subject: [linus:master] [connector/cn_proc] 2aa1f7a1f4: BUG:kernel_NULL_pointer_dereference,address



Hello,

kernel test robot noticed "BUG:kernel_NULL_pointer_dereference,address" on:

commit: 2aa1f7a1f47ce8dac7593af605aaa859b3cf3bb1 ("connector/cn_proc: Add filtering to fix some bugs")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[test failed on linus/master 57d88e8a5974644039fbc47806bac7bb12025636]
[test failed on linux-next/master dfa449a58323de195773cf928d99db4130702bf7]

in testcase: stress-ng
version: stress-ng-x86_64-0.15.04-1_20230912
with following parameters:

nr_threads: 100%
testtime: 60s
sc_pid_max: 4194304
class: scheduler
test: netlink-proc
cpufreq_governor: performance



compiler: gcc-12
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]


[ 37.396174][ T4144] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 37.419771][ T4144] #PF: supervisor read access in kernel mode
[ 37.425772][ T4144] #PF: error_code(0x0000) - not-present page
[ 37.431771][ T4144] PGD 184255067 P4D 0
[ 37.435867][ T4144] Oops: 0000 [#1] SMP NOPTI
[ 37.440388][ T4144] CPU: 45 PID: 4144 Comm: stress-ng Not tainted 6.5.0-rc2-00552-g2aa1f7a1f47c #1
[ 37.449509][ T4144] Hardware name: Inspur NF5180M6/NF5180M6, BIOS 06.00.04 04/12/2022
[ 37.457502][ T4144] RIP: 0010:cn_filter (drivers/connector/cn_proc.c:60)
[ 37.462384][ T4144] Code: 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 85 ff 74 15 48 8b 87 78 02 00 00 <83> 38 02 0f 94 c0 0f b6 c0 c3 cc cc cc cc 31 c0 c3 cc cc cc cc 66
All code
========
0: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
5: 90 nop
6: 90 nop
7: 90 nop
8: 90 nop
9: 90 nop
a: 90 nop
b: 90 nop
c: 90 nop
d: 90 nop
e: 90 nop
f: 90 nop
10: 90 nop
11: 90 nop
12: 90 nop
13: 90 nop
14: 90 nop
15: f3 0f 1e fa endbr64
19: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
1e: 48 85 ff test %rdi,%rdi
21: 74 15 je 0x38
23: 48 8b 87 78 02 00 00 mov 0x278(%rdi),%rax
2a:* 83 38 02 cmpl $0x2,(%rax) <-- trapping instruction
2d: 0f 94 c0 sete %al
30: 0f b6 c0 movzbl %al,%eax
33: c3 retq
34: cc int3
35: cc int3
36: cc int3
37: cc int3
38: 31 c0 xor %eax,%eax
3a: c3 retq
3b: cc int3
3c: cc int3
3d: cc int3
3e: cc int3
3f: 66 data16

Code starting with the faulting instruction
===========================================
0: 83 38 02 cmpl $0x2,(%rax)
3: 0f 94 c0 sete %al
6: 0f b6 c0 movzbl %al,%eax
9: c3 retq
a: cc int3
b: cc int3
c: cc int3
d: cc int3
e: 31 c0 xor %eax,%eax
10: c3 retq
11: cc int3
12: cc int3
13: cc int3
14: cc int3
15: 66 data16
[ 37.482194][ T4144] RSP: 0018:ffa000002efcfc78 EFLAGS: 00010286
[ 37.488305][ T4144] RAX: 0000000000000000 RBX: ff1100014764c000 RCX: 0000000000000000
[ 37.496325][ T4144] RDX: 0000000000000000 RSI: ff110001005e4c00 RDI: ff1100014764c000
[ 37.504340][ T4144] RBP: ffa000002efcfcc0 R08: 0000000000000000 R09: ffffffff83b2cd80
[ 37.512358][ T4144] R10: ff110001005e4c00 R11: 0000000000000000 R12: ff110001005e4c00
[ 37.520375][ T4144] R13: ff1100014764c080 R14: ffffffff81971d50 R15: 0000000000000001
[ 37.528391][ T4144] FS: 00007f06a096e740(0000) GS:ff11002000140000(0000) knlGS:0000000000000000
[ 37.537365][ T4144] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 37.543997][ T4144] CR2: 0000000000000000 CR3: 0000000148042001 CR4: 0000000000771ee0
[ 37.552020][ T4144] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 37.560047][ T4144] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 37.568073][ T4144] PKRU: 55555554
[ 37.571676][ T4144] Call Trace:
[ 37.575021][ T4144] <TASK>
[ 37.578016][ T4144] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434)
[ 37.581966][ T4144] ? page_fault_oops (arch/x86/mm/fault.c:707)
[ 37.586875][ T4144] ? exc_page_fault (arch/x86/include/asm/irqflags.h:37 arch/x86/include/asm/irqflags.h:72 arch/x86/mm/fault.c:1494 arch/x86/mm/fault.c:1542)
[ 37.591689][ T4144] ? asm_exc_page_fault (arch/x86/include/asm/idtentry.h:570)
[ 37.596765][ T4144] ? __pfx_cn_filter (drivers/connector/cn_proc.c:52)
[ 37.601575][ T4144] ? cn_filter (drivers/connector/cn_proc.c:60)
[ 37.605865][ T4144] ? kmalloc_reserve (net/core/skbuff.c:562)
[ 37.610671][ T4144] do_one_broadcast (net/netlink/af_netlink.c:1496 (discriminator 1))
[ 37.615481][ T4144] netlink_broadcast_filtered (net/netlink/af_netlink.c:1555 (discriminator 11))
[ 37.621246][ T4144] ? __pfx_cn_filter (drivers/connector/cn_proc.c:52)
[ 37.626053][ T4144] proc_fork_connector (drivers/connector/cn_proc.c:82)
[ 37.631118][ T4144] copy_process (kernel/fork.c:2728)
[ 37.635844][ T4144] kernel_clone (include/linux/random.h:26 kernel/fork.c:2913)
[ 37.640301][ T4144] __do_sys_clone (kernel/fork.c:3056)
[ 37.644848][ T4144] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[ 37.649309][ T4144] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[ 37.655250][ T4144] RIP: 0033:0x7f06a0ad89fb
[ 37.659713][ T4144] Code: ed 0f 85 f8 00 00 00 64 4c 8b 0c 25 10 00 00 00 45 31 c0 4d 8d 91 d0 02 00 00 31 d2 31 f6 bf 11 00 20 01 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 41 89 c5 85 c0 0f 85 9e 00 00
All code
========
0: ed in (%dx),%eax
1: 0f 85 f8 00 00 00 jne 0xff
7: 64 4c 8b 0c 25 10 00 mov %fs:0x10,%r9
e: 00 00
10: 45 31 c0 xor %r8d,%r8d
13: 4d 8d 91 d0 02 00 00 lea 0x2d0(%r9),%r10
1a: 31 d2 xor %edx,%edx
1c: 31 f6 xor %esi,%esi
1e: bf 11 00 20 01 mov $0x1200011,%edi
23: b8 38 00 00 00 mov $0x38,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 0f 87 91 00 00 00 ja 0xc7
36: 41 89 c5 mov %eax,%r13d
39: 85 c0 test %eax,%eax
3b: 0f .byte 0xf
3c: 85 .byte 0x85
3d: 9e sahf
...

Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 0f 87 91 00 00 00 ja 0x9d
c: 41 89 c5 mov %eax,%r13d
f: 85 c0 test %eax,%eax
11: 0f .byte 0xf
12: 85 .byte 0x85
13: 9e sahf


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230920/[email protected]



--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


2023-10-04 15:40:23

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [linus:master] [connector/cn_proc] 2aa1f7a1f4: BUG:kernel_NULL_pointer_dereference,address

On Wed, 20 Sep 2023 14:51:32 +0800 kernel test robot wrote:
> kernel test robot noticed "BUG:kernel_NULL_pointer_dereference,address" on:
>
> commit: 2aa1f7a1f47ce8dac7593af605aaa859b3cf3bb1 ("connector/cn_proc: Add filtering to fix some bugs")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

Anjali, have you had the chance to look into this?

2023-10-04 16:39:55

by Anjali Kulkarni

[permalink] [raw]
Subject: Re: [linus:master] [connector/cn_proc] 2aa1f7a1f4: BUG:kernel_NULL_pointer_dereference,address



> On Oct 4, 2023, at 8:40 AM, Jakub Kicinski <[email protected]> wrote:
>
> On Wed, 20 Sep 2023 14:51:32 +0800 kernel test robot wrote:
>> kernel test robot noticed "BUG:kernel_NULL_pointer_dereference,address" on:
>>
>> commit: 2aa1f7a1f47ce8dac7593af605aaa859b3cf3bb1 ("connector/cn_proc: Add filtering to fix some bugs")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> Anjali, have you had the chance to look into this?

Sorry, not yet, will look at it this week.
Anjali

2023-10-10 21:37:51

by Anjali Kulkarni

[permalink] [raw]
Subject: Re: [linus:master] [connector/cn_proc] 2aa1f7a1f4: BUG:kernel_NULL_pointer_dereference,address

I just sent out a patch which I think will fix this issue - can you take a look ?

Thanks
Anjali

> On Oct 4, 2023, at 9:39 AM, Anjali Kulkarni <[email protected]> wrote:
>
>
>
>> On Oct 4, 2023, at 8:40 AM, Jakub Kicinski <[email protected]> wrote:
>>
>> On Wed, 20 Sep 2023 14:51:32 +0800 kernel test robot wrote:
>>> kernel test robot noticed "BUG:kernel_NULL_pointer_dereference,address" on:
>>>
>>> commit: 2aa1f7a1f47ce8dac7593af605aaa859b3cf3bb1 ("connector/cn_proc: Add filtering to fix some bugs")
>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>
>> Anjali, have you had the chance to look into this?
>
> Sorry, not yet, will look at it this week.
> Anjali
>

2023-10-13 23:01:57

by Anjali Kulkarni

[permalink] [raw]
Subject: Re: [linus:master] [connector/cn_proc] 2aa1f7a1f4: BUG:kernel_NULL_pointer_dereference,address



> On Oct 4, 2023, at 8:40 AM, Jakub Kicinski <[email protected]> wrote:
>
> On Wed, 20 Sep 2023 14:51:32 +0800 kernel test robot wrote:
>> kernel test robot noticed "BUG:kernel_NULL_pointer_dereference,address" on:
>>
>> commit: 2aa1f7a1f47ce8dac7593af605aaa859b3cf3bb1 ("connector/cn_proc: Add filtering to fix some bugs")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> Anjali, have you had the chance to look into this?

Hi,
I was unable to reproduce the issues with the steps given - many packages are missing, etc. - I am still trying though - however, the stack trace of this issue shows it is a NULL pointer de-reference (it looks like in cn_filter() function) - and I found a potential suspect where a check for NULL pointer was missing. So I’ve sent out the patch fix for this - is it possible for someone to please test with this fix and let me know if the issue is resolved? The fix looks like:

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index 05d562e9c8b1..a8e55569e4f5 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -54,7 +54,7 @@ static int cn_filter(struct sock *dsk, struct sk_buff *skb, void *data)
enum proc_cn_mcast_op mc_op;
uintptr_t val;

- if (!dsk || !data)
+ if (!dsk || !data || !dsk->sk_user_data)
return 0;

ptr = (__u32 *)data;
-- 2.42.0


2023-10-17 07:13:03

by kernel test robot

[permalink] [raw]
Subject: Re: [linus:master] [connector/cn_proc] 2aa1f7a1f4: BUG:kernel_NULL_pointer_dereference,address

hi, Anjali Kulkarni,

On Fri, Oct 13, 2023 at 11:00:31PM +0000, Anjali Kulkarni wrote:
>
>
> > On Oct 4, 2023, at 8:40 AM, Jakub Kicinski <[email protected]> wrote:
> >
> > On Wed, 20 Sep 2023 14:51:32 +0800 kernel test robot wrote:
> >> kernel test robot noticed "BUG:kernel_NULL_pointer_dereference,address" on:
> >>
> >> commit: 2aa1f7a1f47ce8dac7593af605aaa859b3cf3bb1 ("connector/cn_proc: Add filtering to fix some bugs")
> >> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > Anjali, have you had the chance to look into this?
>
> Hi,
> I was unable to reproduce the issues with the steps given - many packages are missing, etc. - I am still trying though - however, the stack trace of this issue shows it is a NULL pointer de-reference (it looks like in cn_filter() function) - and I found a potential suspect where a check for NULL pointer was missing. So I’ve sent out the patch fix for this - is it possible for someone to please test with this fix and let me know if the issue is resolved? The fix looks like:

I applied below patch upon v6.6-rc6, the issue reported by original report was
gone.

(and I confirmed the issue still can be reproduced on v6.6-rc6)

Tested-by: kernel test robot <[email protected]>

>
> diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
> index 05d562e9c8b1..a8e55569e4f5 100644
> --- a/drivers/connector/cn_proc.c
> +++ b/drivers/connector/cn_proc.c
> @@ -54,7 +54,7 @@ static int cn_filter(struct sock *dsk, struct sk_buff *skb, void *data)
> enum proc_cn_mcast_op mc_op;
> uintptr_t val;
>
> - if (!dsk || !data)
> + if (!dsk || !data || !dsk->sk_user_data)
> return 0;
>
> ptr = (__u32 *)data;
> -- 2.42.0
>
>

2023-10-17 18:24:35

by Anjali Kulkarni

[permalink] [raw]
Subject: Re: [linus:master] [connector/cn_proc] 2aa1f7a1f4: BUG:kernel_NULL_pointer_dereference,address



> On Oct 17, 2023, at 12:12 AM, Oliver Sang <[email protected]> wrote:
>
> hi, Anjali Kulkarni,
>
> On Fri, Oct 13, 2023 at 11:00:31PM +0000, Anjali Kulkarni wrote:
>>
>>
>>> On Oct 4, 2023, at 8:40 AM, Jakub Kicinski <[email protected]> wrote:
>>>
>>> On Wed, 20 Sep 2023 14:51:32 +0800 kernel test robot wrote:
>>>> kernel test robot noticed "BUG:kernel_NULL_pointer_dereference,address" on:
>>>>
>>>> commit: 2aa1f7a1f47ce8dac7593af605aaa859b3cf3bb1 ("connector/cn_proc: Add filtering to fix some bugs")
>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>>
>>> Anjali, have you had the chance to look into this?
>>
>> Hi,
>> I was unable to reproduce the issues with the steps given - many packages are missing, etc. - I am still trying though - however, the stack trace of this issue shows it is a NULL pointer de-reference (it looks like in cn_filter() function) - and I found a potential suspect where a check for NULL pointer was missing. So I’ve sent out the patch fix for this - is it possible for someone to please test with this fix and let me know if the issue is resolved? The fix looks like:
>
> I applied below patch upon v6.6-rc6, the issue reported by original report was
> gone.
>

Thank you very much for testing!

> (and I confirmed the issue still can be reproduced on v6.6-rc6)
>
> Tested-by: kernel test robot <[email protected]>
>
>>
>> diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
>> index 05d562e9c8b1..a8e55569e4f5 100644
>> --- a/drivers/connector/cn_proc.c
>> +++ b/drivers/connector/cn_proc.c
>> @@ -54,7 +54,7 @@ static int cn_filter(struct sock *dsk, struct sk_buff *skb, void *data)
>> enum proc_cn_mcast_op mc_op;
>> uintptr_t val;
>>
>> - if (!dsk || !data)
>> + if (!dsk || !data || !dsk->sk_user_data)
>> return 0;
>>
>> ptr = (__u32 *)data;
>> -- 2.42.0