2023-06-07 11:40:12

by Ayush Jain

[permalink] [raw]
Subject: Null pointer dereference during rcutorture test on linux-next from next-20230602

Hello All,

Observed null pointer dereference during rcutorture test on linux-next tree
from next-20230602.

Commit ID: commit bc708bbd8260ee4eb3428b0109f5f3be661fae46 (HEAD, tag: next-20230602)

Here I am attaching log trace

[12133.344278] rcu-torture: rcu_torture_read_exit: Start of test
[12133.344282] rcu-torture: rcu_torture_read_exit: Start of episode
[12138.350637] rcu-torture: rcu_torture_read_exit: End of episode
[12143.419412] smpboot: CPU 1 is now offline
[12143.427996] BUG: kernel NULL pointer dereference, address: 0000000000000128
[12143.435777] #PF: supervisor read access in kernel mode
[12143.441517] #PF: error_code(0x0000) - not-present page
[12143.447256] PGD 0 P4D 0
[12143.450087] Oops: 0000 [#1] PREEMPT SMP NOPTI
[12143.454955] CPU: 68 PID: 978653 Comm: rcu_torture_rea Kdump: loaded Not tainted 6.4.0-rc5-next-20230606-1686061107994 #1
[12143.467095] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
[12143.475934] RIP: 0010:__bitmap_and+0x18/0x70
[12143.480713] Code: 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 41 89 cb 49 89 f9 41 c1 eb 06 74 51 45 89 da 31 c0 45 31 c0 <48> 8b 3c c6 48 23 3c c2 49 89 3c c1 48 83 c0 01 49 09 f8 49 39 c2
[12143.501675] RSP: 0018:ffffa3a90db70d90 EFLAGS: 00010046
[12143.507510] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000080
[12143.515468] RDX: ffff8a1ec17a1d68 RSI: 0000000000000128 RDI: ffff8a1e800429c0
[12143.523425] RBP: ffff8a1ec17a1980 R08: 0000000000000000 R09: ffff8a1e800429c0
[12143.531385] R10: 0000000000000002 R11: 0000000000000002 R12: ffff8a1e800429c0
[12143.539352] R13: 0000000000000000 R14: 0000000000032580 R15: 0000000000000000
[12143.547320] FS: 0000000000000000(0000) GS:ffff8a2dbf100000(0000) knlGS:0000000000000000
[12143.556354] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12143.562770] CR2: 0000000000000128 CR3: 0000003089e50000 CR4: 00000000003506e0
[12143.570729] Call Trace:
[12143.573463] <IRQ>
[12143.575714] ? __die+0x24/0x70
[12143.579130] ? page_fault_oops+0x82/0x150
[12143.583615] ? exc_page_fault+0x69/0x150
[12143.588001] ? asm_exc_page_fault+0x26/0x30
[12143.592678] ? __bitmap_and+0x18/0x70
[12143.596768] select_idle_cpu+0x84/0x3d0
[12143.601059] select_idle_sibling+0x1b7/0x500
[12143.605831] select_task_rq_fair+0x1b2/0x2e0
[12143.610603] select_task_rq+0x7a/0xc0
[12143.614696] try_to_wake_up+0xe8/0x550
[12143.618885] ? update_process_times+0x83/0x90
[12143.623747] ? __pfx_hrtimer_wakeup+0x10/0x10
[12143.628615] hrtimer_wakeup+0x22/0x30
[12143.632706] __hrtimer_run_queues+0x112/0x2b0
[12143.637574] hrtimer_interrupt+0x100/0x240
[12143.642152] __sysvec_apic_timer_interrupt+0x63/0x130
[12143.647796] sysvec_apic_timer_interrupt+0x71/0x90
[12143.653149] </IRQ>
[12143.655493] <TASK>
[12143.657834] asm_sysvec_apic_timer_interrupt+0x1a/0x20
[12143.663573] RIP: 0010:rcu_torture_read_lock+0x4/0x20 [rcutorture]
[12143.670389] Code: 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 e9 12 5a f6 d7 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa <0f> 1f 44 00 00 e8 b2 12 f6 d7 31 c0 e9 7b 9c a0 d8 66 66 2e 0f 1f
[12143.691351] RSP: 0018:ffffa3a90e3ebc40 EFLAGS: 00000202
[12143.697187] RAX: ffffffffc1dff0b0 RBX: 00000000ffffffff RCX: 0000000000000008
[12143.705155] RDX: 0000000000000060 RSI: 0000000000000060 RDI: ffffa3a90e3ebccc
[12143.713124] RBP: 0000000000000060 R08: 0000000000000060 R09: 0000000000000000
[12143.721091] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000060
[12143.729057] R13: 0000000000000008 R14: 0000000000000000 R15: ffffa3a90e3ebe20
[12143.737027] ? __pfx_rcu_torture_read_lock+0x10/0x10 [rcutorture]
[12143.743845] rcutorture_one_extend+0xc2/0x3a0 [rcutorture]
[12143.749980] rcu_torture_one_read+0x192/0x550 [rcutorture]
[12143.756115] ? update_load_avg+0x7e/0x740
[12143.760593] ? raw_spin_rq_lock_nested+0x15/0x30
[12143.765749] ? newidle_balance+0x26e/0x400
[12143.770330] ? __pfx_rcu_torture_reader+0x10/0x10 [rcutorture]
[12143.776854] rcu_torture_reader+0xcd/0x280 [rcutorture]
[12143.782698] ? __pfx_rcu_torture_timer+0x10/0x10 [rcutorture]
[12143.789124] kthread+0xe6/0x120
[12143.792636] ? __pfx_kthread+0x10/0x10
[12143.796823] ret_from_fork+0x2c/0x50
[12143.800824] </TASK>

Please let me know if any extra information is required.

Thanks & Regards
Ayush Jain


2023-06-07 19:33:18

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Null pointer dereference during rcutorture test on linux-next from next-20230602

On Wed, Jun 07, 2023 at 04:48:19PM +0530, Jain, Ayush wrote:
> Hello All,
>
> Observed null pointer dereference during rcutorture test on linux-next tree
> from next-20230602.
>
> Commit ID: commit bc708bbd8260ee4eb3428b0109f5f3be661fae46 (HEAD, tag: next-20230602)
>
> Here I am attaching log trace
>
> [12133.344278] rcu-torture: rcu_torture_read_exit: Start of test
> [12133.344282] rcu-torture: rcu_torture_read_exit: Start of episode
> [12138.350637] rcu-torture: rcu_torture_read_exit: End of episode
> [12143.419412] smpboot: CPU 1 is now offline
> [12143.427996] BUG: kernel NULL pointer dereference, address: 0000000000000128
> [12143.435777] #PF: supervisor read access in kernel mode
> [12143.441517] #PF: error_code(0x0000) - not-present page
> [12143.447256] PGD 0 P4D 0
> [12143.450087] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [12143.454955] CPU: 68 PID: 978653 Comm: rcu_torture_rea Kdump: loaded Not tainted 6.4.0-rc5-next-20230606-1686061107994 #1
> [12143.467095] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
> [12143.475934] RIP: 0010:__bitmap_and+0x18/0x70
> [12143.480713] Code: 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 41 89 cb 49 89 f9 41 c1 eb 06 74 51 45 89 da 31 c0 45 31 c0 <48> 8b 3c c6 48 23 3c c2 49 89 3c c1 48 83 c0 01 49 09 f8 49 39 c2
> [12143.501675] RSP: 0018:ffffa3a90db70d90 EFLAGS: 00010046
> [12143.507510] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000080
> [12143.515468] RDX: ffff8a1ec17a1d68 RSI: 0000000000000128 RDI: ffff8a1e800429c0
> [12143.523425] RBP: ffff8a1ec17a1980 R08: 0000000000000000 R09: ffff8a1e800429c0
> [12143.531385] R10: 0000000000000002 R11: 0000000000000002 R12: ffff8a1e800429c0
> [12143.539352] R13: 0000000000000000 R14: 0000000000032580 R15: 0000000000000000
> [12143.547320] FS: 0000000000000000(0000) GS:ffff8a2dbf100000(0000) knlGS:0000000000000000
> [12143.556354] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [12143.562770] CR2: 0000000000000128 CR3: 0000003089e50000 CR4: 00000000003506e0
> [12143.570729] Call Trace:
> [12143.573463] <IRQ>
> [12143.575714] ? __die+0x24/0x70
> [12143.579130] ? page_fault_oops+0x82/0x150
> [12143.583615] ? exc_page_fault+0x69/0x150
> [12143.588001] ? asm_exc_page_fault+0x26/0x30
> [12143.592678] ? __bitmap_and+0x18/0x70
> [12143.596768] select_idle_cpu+0x84/0x3d0
> [12143.601059] select_idle_sibling+0x1b7/0x500
> [12143.605831] select_task_rq_fair+0x1b2/0x2e0
> [12143.610603] select_task_rq+0x7a/0xc0
> [12143.614696] try_to_wake_up+0xe8/0x550
> [12143.618885] ? update_process_times+0x83/0x90
> [12143.623747] ? __pfx_hrtimer_wakeup+0x10/0x10
> [12143.628615] hrtimer_wakeup+0x22/0x30
> [12143.632706] __hrtimer_run_queues+0x112/0x2b0
> [12143.637574] hrtimer_interrupt+0x100/0x240
> [12143.642152] __sysvec_apic_timer_interrupt+0x63/0x130
> [12143.647796] sysvec_apic_timer_interrupt+0x71/0x90
> [12143.653149] </IRQ>
> [12143.655493] <TASK>
> [12143.657834] asm_sysvec_apic_timer_interrupt+0x1a/0x20
> [12143.663573] RIP: 0010:rcu_torture_read_lock+0x4/0x20 [rcutorture]
> [12143.670389] Code: 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 e9 12 5a f6 d7 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa <0f> 1f 44 00 00 e8 b2 12 f6 d7 31 c0 e9 7b 9c a0 d8 66 66 2e 0f 1f
> [12143.691351] RSP: 0018:ffffa3a90e3ebc40 EFLAGS: 00000202
> [12143.697187] RAX: ffffffffc1dff0b0 RBX: 00000000ffffffff RCX: 0000000000000008
> [12143.705155] RDX: 0000000000000060 RSI: 0000000000000060 RDI: ffffa3a90e3ebccc
> [12143.713124] RBP: 0000000000000060 R08: 0000000000000060 R09: 0000000000000000
> [12143.721091] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000060
> [12143.729057] R13: 0000000000000008 R14: 0000000000000000 R15: ffffa3a90e3ebe20
> [12143.737027] ? __pfx_rcu_torture_read_lock+0x10/0x10 [rcutorture]
> [12143.743845] rcutorture_one_extend+0xc2/0x3a0 [rcutorture]
> [12143.749980] rcu_torture_one_read+0x192/0x550 [rcutorture]
> [12143.756115] ? update_load_avg+0x7e/0x740
> [12143.760593] ? raw_spin_rq_lock_nested+0x15/0x30
> [12143.765749] ? newidle_balance+0x26e/0x400
> [12143.770330] ? __pfx_rcu_torture_reader+0x10/0x10 [rcutorture]
> [12143.776854] rcu_torture_reader+0xcd/0x280 [rcutorture]
> [12143.782698] ? __pfx_rcu_torture_timer+0x10/0x10 [rcutorture]
> [12143.789124] kthread+0xe6/0x120
> [12143.792636] ? __pfx_kthread+0x10/0x10
> [12143.796823] ret_from_fork+0x2c/0x50
> [12143.800824] </TASK>
>
> Please let me know if any extra information is required.

Could you please let us know which rcutorture scenario this is and/or
what module parameters were supplied to modprobe (depending on how you
initiated the test)? One way to do this is to post the full console
log somewhere, along with the .config.

At first glance, the wakeup path got a NULL cpumask pointer somehow.

Is bisection a possibility? It would be very helpful.

Thanx, Paul

2023-06-08 04:42:46

by Ayush Jain

[permalink] [raw]
Subject: Re: Null pointer dereference during rcutorture test on linux-next from next-20230602

Hello Paul,

On 6/8/2023 12:54 AM, Paul E. McKenney wrote:
> On Wed, Jun 07, 2023 at 04:48:19PM +0530, Jain, Ayush wrote:
>> Hello All,
>>
>> Observed null pointer dereference during rcutorture test on linux-next tree
>> from next-20230602.
>>
>> Commit ID: commit bc708bbd8260ee4eb3428b0109f5f3be661fae46 (HEAD, tag: next-20230602)
>>
>> Here I am attaching log trace
>>
>> [12133.344278] rcu-torture: rcu_torture_read_exit: Start of test
>> [12133.344282] rcu-torture: rcu_torture_read_exit: Start of episode
>> [12138.350637] rcu-torture: rcu_torture_read_exit: End of episode
>> [12143.419412] smpboot: CPU 1 is now offline
>> [12143.427996] BUG: kernel NULL pointer dereference, address: 0000000000000128
>> [12143.435777] #PF: supervisor read access in kernel mode
>> [12143.441517] #PF: error_code(0x0000) - not-present page
>> [12143.447256] PGD 0 P4D 0
>> [12143.450087] Oops: 0000 [#1] PREEMPT SMP NOPTI
>> [12143.454955] CPU: 68 PID: 978653 Comm: rcu_torture_rea Kdump: loaded Not tainted 6.4.0-rc5-next-20230606-1686061107994 #1
>> [12143.467095] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
>> [12143.475934] RIP: 0010:__bitmap_and+0x18/0x70
>> [12143.480713] Code: 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 41 89 cb 49 89 f9 41 c1 eb 06 74 51 45 89 da 31 c0 45 31 c0 <48> 8b 3c c6 48 23 3c c2 49 89 3c c1 48 83 c0 01 49 09 f8 49 39 c2
>> [12143.501675] RSP: 0018:ffffa3a90db70d90 EFLAGS: 00010046
>> [12143.507510] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000080
>> [12143.515468] RDX: ffff8a1ec17a1d68 RSI: 0000000000000128 RDI: ffff8a1e800429c0
>> [12143.523425] RBP: ffff8a1ec17a1980 R08: 0000000000000000 R09: ffff8a1e800429c0
>> [12143.531385] R10: 0000000000000002 R11: 0000000000000002 R12: ffff8a1e800429c0
>> [12143.539352] R13: 0000000000000000 R14: 0000000000032580 R15: 0000000000000000
>> [12143.547320] FS: 0000000000000000(0000) GS:ffff8a2dbf100000(0000) knlGS:0000000000000000
>> [12143.556354] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [12143.562770] CR2: 0000000000000128 CR3: 0000003089e50000 CR4: 00000000003506e0
>> [12143.570729] Call Trace:
>> [12143.573463] <IRQ>
>> [12143.575714] ? __die+0x24/0x70
>> [12143.579130] ? page_fault_oops+0x82/0x150
>> [12143.583615] ? exc_page_fault+0x69/0x150
>> [12143.588001] ? asm_exc_page_fault+0x26/0x30
>> [12143.592678] ? __bitmap_and+0x18/0x70
>> [12143.596768] select_idle_cpu+0x84/0x3d0
>> [12143.601059] select_idle_sibling+0x1b7/0x500
>> [12143.605831] select_task_rq_fair+0x1b2/0x2e0
>> [12143.610603] select_task_rq+0x7a/0xc0
>> [12143.614696] try_to_wake_up+0xe8/0x550
>> [12143.618885] ? update_process_times+0x83/0x90
>> [12143.623747] ? __pfx_hrtimer_wakeup+0x10/0x10
>> [12143.628615] hrtimer_wakeup+0x22/0x30
>> [12143.632706] __hrtimer_run_queues+0x112/0x2b0
>> [12143.637574] hrtimer_interrupt+0x100/0x240
>> [12143.642152] __sysvec_apic_timer_interrupt+0x63/0x130
>> [12143.647796] sysvec_apic_timer_interrupt+0x71/0x90
>> [12143.653149] </IRQ>
>> [12143.655493] <TASK>
>> [12143.657834] asm_sysvec_apic_timer_interrupt+0x1a/0x20
>> [12143.663573] RIP: 0010:rcu_torture_read_lock+0x4/0x20 [rcutorture]
>> [12143.670389] Code: 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 e9 12 5a f6 d7 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa <0f> 1f 44 00 00 e8 b2 12 f6 d7 31 c0 e9 7b 9c a0 d8 66 66 2e 0f 1f
>> [12143.691351] RSP: 0018:ffffa3a90e3ebc40 EFLAGS: 00000202
>> [12143.697187] RAX: ffffffffc1dff0b0 RBX: 00000000ffffffff RCX: 0000000000000008
>> [12143.705155] RDX: 0000000000000060 RSI: 0000000000000060 RDI: ffffa3a90e3ebccc
>> [12143.713124] RBP: 0000000000000060 R08: 0000000000000060 R09: 0000000000000000
>> [12143.721091] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000060
>> [12143.729057] R13: 0000000000000008 R14: 0000000000000000 R15: ffffa3a90e3ebe20
>> [12143.737027] ? __pfx_rcu_torture_read_lock+0x10/0x10 [rcutorture]
>> [12143.743845] rcutorture_one_extend+0xc2/0x3a0 [rcutorture]
>> [12143.749980] rcu_torture_one_read+0x192/0x550 [rcutorture]
>> [12143.756115] ? update_load_avg+0x7e/0x740
>> [12143.760593] ? raw_spin_rq_lock_nested+0x15/0x30
>> [12143.765749] ? newidle_balance+0x26e/0x400
>> [12143.770330] ? __pfx_rcu_torture_reader+0x10/0x10 [rcutorture]
>> [12143.776854] rcu_torture_reader+0xcd/0x280 [rcutorture]
>> [12143.782698] ? __pfx_rcu_torture_timer+0x10/0x10 [rcutorture]
>> [12143.789124] kthread+0xe6/0x120
>> [12143.792636] ? __pfx_kthread+0x10/0x10
>> [12143.796823] ret_from_fork+0x2c/0x50
>> [12143.800824] </TASK>
>>
>> Please let me know if any extra information is required.
>
> Could you please let us know which rcutorture scenario this is and/or
> what module parameters were supplied to modprobe (depending on how you
> initiated the test)? One way to do this is to post the full console
> log somewhere, along with the .config.
>

This is a testcase from avocado-misc-tests which can be triggered by
following these steps:

git clone https://github.com/avocado-framework/avocado
cd avocado
python3 setup.py install
python3 optional_plugins/varianter_yaml_to_mux/setup.py install
cd ..
git clone https://github.com/avocado-framework-tests/avocado-misc-tests
cd avocado-misc-tests/generic/
avocado run rcutorture.py

and here is the kernel .config attached

> At first glance, the wakeup path got a NULL cpumask pointer somehow.
>
> Is bisection a possibility? It would be very helpful.

Yes I am on it, as i get finished will loop here as well.

>
> Thanx, Paul

Thanks,
Ayush Jain


Attachments:
kernel_config_next_rcu_issue (227.42 kB)

2023-06-08 08:49:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Null pointer dereference during rcutorture test on linux-next from next-20230602

On Wed, Jun 07, 2023 at 12:24:12PM -0700, Paul E. McKenney wrote:
> On Wed, Jun 07, 2023 at 04:48:19PM +0530, Jain, Ayush wrote:
> > Hello All,
> >
> > Observed null pointer dereference during rcutorture test on linux-next tree
> > from next-20230602.
> >
> > Commit ID: commit bc708bbd8260ee4eb3428b0109f5f3be661fae46 (HEAD, tag: next-20230602)
> >
> > Here I am attaching log trace
> >
> > [12133.344278] rcu-torture: rcu_torture_read_exit: Start of test
> > [12133.344282] rcu-torture: rcu_torture_read_exit: Start of episode
> > [12138.350637] rcu-torture: rcu_torture_read_exit: End of episode
> > [12143.419412] smpboot: CPU 1 is now offline
> > [12143.427996] BUG: kernel NULL pointer dereference, address: 0000000000000128
> > [12143.435777] #PF: supervisor read access in kernel mode
> > [12143.441517] #PF: error_code(0x0000) - not-present page
> > [12143.447256] PGD 0 P4D 0
> > [12143.450087] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > [12143.454955] CPU: 68 PID: 978653 Comm: rcu_torture_rea Kdump: loaded Not tainted 6.4.0-rc5-next-20230606-1686061107994 #1
> > [12143.467095] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
> > [12143.475934] RIP: 0010:__bitmap_and+0x18/0x70
> > [12143.480713] Code: 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 41 89 cb 49 89 f9 41 c1 eb 06 74 51 45 89 da 31 c0 45 31 c0 <48> 8b 3c c6 48 23 3c c2 49 89 3c c1 48 83 c0 01 49 09 f8 49 39 c2
> > [12143.501675] RSP: 0018:ffffa3a90db70d90 EFLAGS: 00010046
> > [12143.507510] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000080
> > [12143.515468] RDX: ffff8a1ec17a1d68 RSI: 0000000000000128 RDI: ffff8a1e800429c0
> > [12143.523425] RBP: ffff8a1ec17a1980 R08: 0000000000000000 R09: ffff8a1e800429c0
> > [12143.531385] R10: 0000000000000002 R11: 0000000000000002 R12: ffff8a1e800429c0
> > [12143.539352] R13: 0000000000000000 R14: 0000000000032580 R15: 0000000000000000
> > [12143.547320] FS: 0000000000000000(0000) GS:ffff8a2dbf100000(0000) knlGS:0000000000000000
> > [12143.556354] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [12143.562770] CR2: 0000000000000128 CR3: 0000003089e50000 CR4: 00000000003506e0
> > [12143.570729] Call Trace:
> > [12143.573463] <IRQ>
> > [12143.575714] ? __die+0x24/0x70
> > [12143.579130] ? page_fault_oops+0x82/0x150
> > [12143.583615] ? exc_page_fault+0x69/0x150
> > [12143.588001] ? asm_exc_page_fault+0x26/0x30
> > [12143.592678] ? __bitmap_and+0x18/0x70
> > [12143.596768] select_idle_cpu+0x84/0x3d0
> > [12143.601059] select_idle_sibling+0x1b7/0x500
> > [12143.605831] select_task_rq_fair+0x1b2/0x2e0
> > [12143.610603] select_task_rq+0x7a/0xc0
> > [12143.614696] try_to_wake_up+0xe8/0x550
> > [12143.618885] ? update_process_times+0x83/0x90
> > [12143.623747] ? __pfx_hrtimer_wakeup+0x10/0x10
> > [12143.628615] hrtimer_wakeup+0x22/0x30
> > [12143.632706] __hrtimer_run_queues+0x112/0x2b0
> > [12143.637574] hrtimer_interrupt+0x100/0x240
> > [12143.642152] __sysvec_apic_timer_interrupt+0x63/0x130
> > [12143.647796] sysvec_apic_timer_interrupt+0x71/0x90
> > [12143.653149] </IRQ>
> > [12143.655493] <TASK>
> > [12143.657834] asm_sysvec_apic_timer_interrupt+0x1a/0x20

I'm thikning this is because of ("sched/fair: Multi-LLC
select_idle_sibling()") which I've already dropped from tip/sched/core
(and should hopefully also dissapear from -next if it hasn't already).

Also see:

https://lkml.kernel.org/r/[email protected]

2023-06-08 16:34:53

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Null pointer dereference during rcutorture test on linux-next from next-20230602

On Thu, Jun 08, 2023 at 10:33:18AM +0200, Peter Zijlstra wrote:
> On Wed, Jun 07, 2023 at 12:24:12PM -0700, Paul E. McKenney wrote:
> > On Wed, Jun 07, 2023 at 04:48:19PM +0530, Jain, Ayush wrote:
> > > Hello All,
> > >
> > > Observed null pointer dereference during rcutorture test on linux-next tree
> > > from next-20230602.
> > >
> > > Commit ID: commit bc708bbd8260ee4eb3428b0109f5f3be661fae46 (HEAD, tag: next-20230602)
> > >
> > > Here I am attaching log trace
> > >
> > > [12133.344278] rcu-torture: rcu_torture_read_exit: Start of test
> > > [12133.344282] rcu-torture: rcu_torture_read_exit: Start of episode
> > > [12138.350637] rcu-torture: rcu_torture_read_exit: End of episode
> > > [12143.419412] smpboot: CPU 1 is now offline
> > > [12143.427996] BUG: kernel NULL pointer dereference, address: 0000000000000128
> > > [12143.435777] #PF: supervisor read access in kernel mode
> > > [12143.441517] #PF: error_code(0x0000) - not-present page
> > > [12143.447256] PGD 0 P4D 0
> > > [12143.450087] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > > [12143.454955] CPU: 68 PID: 978653 Comm: rcu_torture_rea Kdump: loaded Not tainted 6.4.0-rc5-next-20230606-1686061107994 #1
> > > [12143.467095] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
> > > [12143.475934] RIP: 0010:__bitmap_and+0x18/0x70
> > > [12143.480713] Code: 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 41 89 cb 49 89 f9 41 c1 eb 06 74 51 45 89 da 31 c0 45 31 c0 <48> 8b 3c c6 48 23 3c c2 49 89 3c c1 48 83 c0 01 49 09 f8 49 39 c2
> > > [12143.501675] RSP: 0018:ffffa3a90db70d90 EFLAGS: 00010046
> > > [12143.507510] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000080
> > > [12143.515468] RDX: ffff8a1ec17a1d68 RSI: 0000000000000128 RDI: ffff8a1e800429c0
> > > [12143.523425] RBP: ffff8a1ec17a1980 R08: 0000000000000000 R09: ffff8a1e800429c0
> > > [12143.531385] R10: 0000000000000002 R11: 0000000000000002 R12: ffff8a1e800429c0
> > > [12143.539352] R13: 0000000000000000 R14: 0000000000032580 R15: 0000000000000000
> > > [12143.547320] FS: 0000000000000000(0000) GS:ffff8a2dbf100000(0000) knlGS:0000000000000000
> > > [12143.556354] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [12143.562770] CR2: 0000000000000128 CR3: 0000003089e50000 CR4: 00000000003506e0
> > > [12143.570729] Call Trace:
> > > [12143.573463] <IRQ>
> > > [12143.575714] ? __die+0x24/0x70
> > > [12143.579130] ? page_fault_oops+0x82/0x150
> > > [12143.583615] ? exc_page_fault+0x69/0x150
> > > [12143.588001] ? asm_exc_page_fault+0x26/0x30
> > > [12143.592678] ? __bitmap_and+0x18/0x70
> > > [12143.596768] select_idle_cpu+0x84/0x3d0
> > > [12143.601059] select_idle_sibling+0x1b7/0x500
> > > [12143.605831] select_task_rq_fair+0x1b2/0x2e0
> > > [12143.610603] select_task_rq+0x7a/0xc0
> > > [12143.614696] try_to_wake_up+0xe8/0x550
> > > [12143.618885] ? update_process_times+0x83/0x90
> > > [12143.623747] ? __pfx_hrtimer_wakeup+0x10/0x10
> > > [12143.628615] hrtimer_wakeup+0x22/0x30
> > > [12143.632706] __hrtimer_run_queues+0x112/0x2b0
> > > [12143.637574] hrtimer_interrupt+0x100/0x240
> > > [12143.642152] __sysvec_apic_timer_interrupt+0x63/0x130
> > > [12143.647796] sysvec_apic_timer_interrupt+0x71/0x90
> > > [12143.653149] </IRQ>
> > > [12143.655493] <TASK>
> > > [12143.657834] asm_sysvec_apic_timer_interrupt+0x1a/0x20
>
> I'm thikning this is because of ("sched/fair: Multi-LLC
> select_idle_sibling()") which I've already dropped from tip/sched/core
> (and should hopefully also dissapear from -next if it hasn't already).
>
> Also see:
>
> https://lkml.kernel.org/r/[email protected]

That sounds a lot easier than bisecting, thank you!

Thanx, Paul

2023-06-09 11:17:40

by Ayush Jain

[permalink] [raw]
Subject: Re: Null pointer dereference during rcutorture test on linux-next from next-20230602

Hello,

On 6/8/2023 9:55 PM, Paul E. McKenney wrote:
> On Thu, Jun 08, 2023 at 10:33:18AM +0200, Peter Zijlstra wrote:
>> On Wed, Jun 07, 2023 at 12:24:12PM -0700, Paul E. McKenney wrote:
>>> On Wed, Jun 07, 2023 at 04:48:19PM +0530, Jain, Ayush wrote:
>>>> Hello All,
>>>>
>>>> Observed null pointer dereference during rcutorture test on linux-next tree
>>>> from next-20230602.
>>>>
>>>> Commit ID: commit bc708bbd8260ee4eb3428b0109f5f3be661fae46 (HEAD, tag: next-20230602)
>>>>
>>>> Here I am attaching log trace
>>>>
>>>> [12133.344278] rcu-torture: rcu_torture_read_exit: Start of test
>>>> [12133.344282] rcu-torture: rcu_torture_read_exit: Start of episode
>>>> [12138.350637] rcu-torture: rcu_torture_read_exit: End of episode
>>>> [12143.419412] smpboot: CPU 1 is now offline
>>>> [12143.427996] BUG: kernel NULL pointer dereference, address: 0000000000000128
>>>> [12143.435777] #PF: supervisor read access in kernel mode
>>>> [12143.441517] #PF: error_code(0x0000) - not-present page
>>>> [12143.447256] PGD 0 P4D 0
>>>> [12143.450087] Oops: 0000 [#1] PREEMPT SMP NOPTI
>>>> [12143.454955] CPU: 68 PID: 978653 Comm: rcu_torture_rea Kdump: loaded Not tainted 6.4.0-rc5-next-20230606-1686061107994 #1
>>>> [12143.467095] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
>>>> [12143.475934] RIP: 0010:__bitmap_and+0x18/0x70
>>>> [12143.480713] Code: 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 41 89 cb 49 89 f9 41 c1 eb 06 74 51 45 89 da 31 c0 45 31 c0 <48> 8b 3c c6 48 23 3c c2 49 89 3c c1 48 83 c0 01 49 09 f8 49 39 c2
>>>> [12143.501675] RSP: 0018:ffffa3a90db70d90 EFLAGS: 00010046
>>>> [12143.507510] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000080
>>>> [12143.515468] RDX: ffff8a1ec17a1d68 RSI: 0000000000000128 RDI: ffff8a1e800429c0
>>>> [12143.523425] RBP: ffff8a1ec17a1980 R08: 0000000000000000 R09: ffff8a1e800429c0
>>>> [12143.531385] R10: 0000000000000002 R11: 0000000000000002 R12: ffff8a1e800429c0
>>>> [12143.539352] R13: 0000000000000000 R14: 0000000000032580 R15: 0000000000000000
>>>> [12143.547320] FS: 0000000000000000(0000) GS:ffff8a2dbf100000(0000) knlGS:0000000000000000
>>>> [12143.556354] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [12143.562770] CR2: 0000000000000128 CR3: 0000003089e50000 CR4: 00000000003506e0
>>>> [12143.570729] Call Trace:
>>>> [12143.573463] <IRQ>
>>>> [12143.575714] ? __die+0x24/0x70
>>>> [12143.579130] ? page_fault_oops+0x82/0x150
>>>> [12143.583615] ? exc_page_fault+0x69/0x150
>>>> [12143.588001] ? asm_exc_page_fault+0x26/0x30
>>>> [12143.592678] ? __bitmap_and+0x18/0x70
>>>> [12143.596768] select_idle_cpu+0x84/0x3d0
>>>> [12143.601059] select_idle_sibling+0x1b7/0x500
>>>> [12143.605831] select_task_rq_fair+0x1b2/0x2e0
>>>> [12143.610603] select_task_rq+0x7a/0xc0
>>>> [12143.614696] try_to_wake_up+0xe8/0x550
>>>> [12143.618885] ? update_process_times+0x83/0x90
>>>> [12143.623747] ? __pfx_hrtimer_wakeup+0x10/0x10
>>>> [12143.628615] hrtimer_wakeup+0x22/0x30
>>>> [12143.632706] __hrtimer_run_queues+0x112/0x2b0
>>>> [12143.637574] hrtimer_interrupt+0x100/0x240
>>>> [12143.642152] __sysvec_apic_timer_interrupt+0x63/0x130
>>>> [12143.647796] sysvec_apic_timer_interrupt+0x71/0x90
>>>> [12143.653149] </IRQ>
>>>> [12143.655493] <TASK>
>>>> [12143.657834] asm_sysvec_apic_timer_interrupt+0x1a/0x20
>>
>> I'm thikning this is because of ("sched/fair: Multi-LLC
>> select_idle_sibling()") which I've already dropped from tip/sched/core
>> (and should hopefully also dissapear from -next if it hasn't already).
>>

Yes, that was very likely it, i don't see any rcutorture test failure in
today's next(next-20230609) build.

>> Also see:
>>
>> https://lkml.kernel.org/r/[email protected]
>
> That sounds a lot easier than bisecting, thank you!
>
> Thanx, Paul

Thank you both for your help.

Thanks & Regards,
Ayush Jain