Dear Linux folks,
Running rcutorture on the POWER8 system IBM S822LC with Ubuntu 20.10, it
found the bug below. I more or less used rcu/dev (0ba8896d2fd7
(lib/irq_poll: Declare IRQ_POLL softirq vector as ksoftirqd-parking
safe)) [1]. The bug manifested for the four configurations below.
1. results-rcutorture-kasan/SRCU-T
2. results-rcutorture-kasan/TINY02
3. results-rcutorture/SRCU-T
4. results-rcutorture/TINY02
For example, the attached
/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.11-22.00.51-torture/results-rcutorture-kasan/SRCU-T/console.log
contains:
```
[ 0.012154][ T1] BUG: sleeping function called from invalid
context at include/linux/sched/mm.h:256
[ 0.013128][ T1] in_atomic(): 0, irqs_disabled(): 1, non_block: 0,
pid: 1, name: swapper/0
[ 0.014015][ T1] preempt_count: 0, expected: 0
[ 0.014505][ T1] 2 locks held by swapper/0/1:
[ 0.014987][ T1] #0: c0000000026108a0
(cpu_hotplug_lock){.+.+}-{0:0}, at: static_key_enable+0x24/0x50
[ 0.015995][ T1] #1: c0000000027416c8
(jump_label_mutex){+.+.}-{3:3}, at: static_key_enable_cpuslocked+0x88/0x120
[ 0.017107][ T1] irq event stamp: 46
[ 0.017507][ T1] hardirqs last enabled at (45):
[<c0000000010c1054>] _raw_spin_unlock_irqrestore+0x94/0xd0
[ 0.018549][ T1] hardirqs last disabled at (46):
[<c0000000000a9bc4>] do_patch_instruction+0x3b4/0x4a0
[ 0.019549][ T1] softirqs last enabled at (0):
[<c000000000149540>] copy_process+0x8d0/0x1df0
[ 0.020474][ T1] softirqs last disabled at (0):
[<0000000000000000>] 0x0
[ 0.021200][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.17.0-rc3-00349-gd3a9fd9fed88 #34
[ 0.022115][ T1] Call Trace:
[ 0.022443][ T1] [c0000000084837d0] [c000000000961aac]
dump_stack_lvl+0xa0/0xec (unreliable)
[ 0.023356][ T1] [c000000008483820] [c00000000019b314]
__might_resched+0x2f4/0x310
[ 0.024174][ T1] [c0000000084838b0] [c0000000004c0c70]
kmem_cache_alloc+0x220/0x4b0
[ 0.025000][ T1] [c000000008483920] [c000000000448af4]
__pud_alloc+0x74/0x1d0
[ 0.025772][ T1] [c000000008483970] [c00000000008fe3c]
hash__map_kernel_page+0x2cc/0x390
[ 0.026643][ T1] [c000000008483a20] [c0000000000a9944]
do_patch_instruction+0x134/0x4a0
[ 0.027511][ T1] [c000000008483a70] [c0000000000559d4]
arch_jump_label_transform+0x64/0x78
[ 0.028401][ T1] [c000000008483a90] [c0000000003d6288]
__jump_label_update+0x148/0x180
[ 0.029254][ T1] [c000000008483b30] [c0000000003d6800]
static_key_enable_cpuslocked+0xd0/0x120
[ 0.030179][ T1] [c000000008483ba0] [c0000000003d6880]
static_key_enable+0x30/0x50
[ 0.030996][ T1] [c000000008483bd0] [c00000000200a8f4]
check_kvm_guest+0x60/0x88
[ 0.031799][ T1] [c000000008483c00] [c000000002027744]
pSeries_smp_probe+0x54/0xb0
[ 0.032617][ T1] [c000000008483c30] [c000000002011db8]
smp_prepare_cpus+0x3e0/0x430
[ 0.033444][ T1] [c000000008483cd0] [c000000002004908]
kernel_init_freeable+0x20c/0x43c
[ 0.034307][ T1] [c000000008483db0] [c000000000012c00]
kernel_init+0x30/0x1a0
[ 0.035078][ T1] [c000000008483e10] [c00000000000cd64]
ret_from_kernel_thread+0x5c/0x64
```
Kind regards,
Paul
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
Le 14/02/2022 à 10:35, Peter Zijlstra a écrit :
> On Sun, Feb 13, 2022 at 12:05:50AM +0100, Paul Menzel wrote:
>
>> [ 0.012154][ T1] BUG: sleeping function called from invalid context at
>
>> [ 0.022443][ T1] [c0000000084837d0] [c000000000961aac] > dump_stack_lvl+0xa0/0xec (unreliable)
>> [ 0.023356][ T1] [c000000008483820] [c00000000019b314] > __might_resched+0x2f4/0x310
>> [ 0.024174][ T1] [c0000000084838b0] [c0000000004c0c70] > kmem_cache_alloc+0x220/0x4b0
>> [ 0.025000][ T1] [c000000008483920] [c000000000448af4] > __pud_alloc+0x74/0x1d0
>> [ 0.025772][ T1] [c000000008483970] [c00000000008fe3c] > hash__map_kernel_page+0x2cc/0x390
>> [ 0.026643][ T1] [c000000008483a20] [c0000000000a9944] > do_patch_instruction+0x134/0x4a0
>
> do_patch_instruction() rightfully disables IRQs, but then it goes and
> tries a memory alloc, which seems a bit daft.
>
> I'm thinking Christophe might know more... he's recently been poking at
> Power text poking..
I don't know all details about PPC64, but here it seems like
hash__map_kernel_page() allocates intermediate page directories when
there is not one yet. So if that's the first time
hash__map_kernel_page() is called for the text_poke_addr() it allocated
PUD and PMD if necessary.
As it is kernel memory, once PUD and PMD are allocated they will remain
forever I think. So maybe the only thing to do it to perform a dummy
mapping/unmapping in text_area_cpu_up() to ensure the page directory are
created in advance.
Christophe
On Sun, Feb 13, 2022 at 12:05:50AM +0100, Paul Menzel wrote:
> [ 0.012154][ T1] BUG: sleeping function called from invalid context at
> [ 0.022443][ T1] [c0000000084837d0] [c000000000961aac] > dump_stack_lvl+0xa0/0xec (unreliable)
> [ 0.023356][ T1] [c000000008483820] [c00000000019b314] > __might_resched+0x2f4/0x310
> [ 0.024174][ T1] [c0000000084838b0] [c0000000004c0c70] > kmem_cache_alloc+0x220/0x4b0
> [ 0.025000][ T1] [c000000008483920] [c000000000448af4] > __pud_alloc+0x74/0x1d0
> [ 0.025772][ T1] [c000000008483970] [c00000000008fe3c] > hash__map_kernel_page+0x2cc/0x390
> [ 0.026643][ T1] [c000000008483a20] [c0000000000a9944] > do_patch_instruction+0x134/0x4a0
do_patch_instruction() rightfully disables IRQs, but then it goes and
tries a memory alloc, which seems a bit daft.
I'm thinking Christophe might know more... he's recently been poking at
Power text poking..