2020-12-01 07:52:05

by Sven Schnelle

[permalink] [raw]
Subject: Re: [GIT pull] locking/urgent for v5.10-rc6

Hi Peter,

Peter Zijlstra <[email protected]> writes:

> On Mon, Nov 30, 2020 at 01:52:11PM +0100, Peter Zijlstra wrote:
>> On Mon, Nov 30, 2020 at 01:31:33PM +0100, Sven Schnelle wrote:
>> > [ 0.670280] ------------[ cut here ]------------
>> > [ 0.670288] WARNING: CPU: 1 PID: 0 at kernel/rcu/tree.c:1054 rcu_irq_enter+0x7e/0xa8
>> > [ 0.670293] Modules linked in:
>> > [ 0.670299] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 5.10.0-rc6 #2263
>> > [ 0.670304] Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0)
>> > [ 0.670309] Krnl PSW : 0404d00180000000 0000000000d8a8da (rcu_irq_enter+0x82/0xa8)
>> > [ 0.670318] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
>> > [ 0.670325] Krnl GPRS: 0000000000000000 0000000080000002 0000000000000001 000000000101fcee
>> > [ 0.670331] 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> > [ 0.670337] 000003e00029ff48 0000000000000000 00000000017212d8 0000000000000001
>> > [ 0.670343] 0000000005ba0100 00000000000324bb 000003e00029fe40 000003e00029fe10
>> >
>> > [ 0.670358] Krnl Code: 0000000000d8a8ca: ec180013017e cij %r1,1,8,0000000000d8a8f0
>> > [ 0.670358] 0000000000d8a8d0: ecb80005007e cij %r11,0,8,0000000000d8a8da
>> > [ 0.670358] #0000000000d8a8d6: af000000 mc 0,0
>> > [ 0.670358] >0000000000d8a8da: ebbff0a00004 lmg %r11,%r15,160(%r15)
>> > [ 0.670358] 0000000000d8a8e0: c0f4ffffff68 brcl 15,0000000000d8a7b0
>> > [ 0.670358] 0000000000d8a8e6: c0e5000038c1 brasl %r14,0000000000d91a68
>> > [ 0.670358] 0000000000d8a8ec: a7f4ffdc brc 15,0000000000d8a8a4
>> > [ 0.670358] 0000000000d8a8f0: c0e5000038bc brasl %r14,0000000000d91a68
>> > [ 0.670392] Call Trace:
>> > [ 0.670396] [<0000000000d8a8da>] rcu_irq_enter+0x82/0xa8
>> > [ 0.670401] [<0000000000157f9a>] irq_enter+0x22/0x30
>> > [ 0.670404] [<000000000010e51c>] do_IRQ+0x64/0xd0
>> > [ 0.670408] [<0000000000d9a65a>] ext_int_handler+0x18e/0x194
>> > [ 0.670412] [<0000000000d9a6a0>] psw_idle+0x40/0x48
>> > [ 0.670416] ([<0000000000104202>] enabled_wait+0x22/0xf0)
>> > [ 0.670419] [<00000000001046e2>] arch_cpu_idle+0x22/0x38
>> > [ 0.670423] [<0000000000d986cc>] default_idle_call+0x74/0xd8
>> > [ 0.670427] [<000000000019a94a>] do_idle+0xf2/0x1b0
>> > [ 0.670431] [<000000000019ac7e>] cpu_startup_entry+0x36/0x40
>> > [ 0.670435] [<0000000000118b9a>] smp_start_secondary+0x82/0x88
>>
>> But but but...
>>
>> do_idle() # IRQs on
>> local_irq_disable(); # IRQs off
>> defaul_idle_call() # IRQs off
> lockdep_hardirqs_on(); # IRQs off, but lockdep things they're on
>> arch_cpu_idle() # IRQs off
>> enabled_wait() # IRQs off
>> raw_local_save() # still off
>> psw_idle() # very much off
>> ext_int_handler # get an interrupt ?!?!
> rcu_irq_enter() # lockdep thinks IRQs are on <- FAIL
>
>
> I can't much read s390 assembler, but ext_int_handler() has a
> TRACE_IRQS_OFF, which would be sufficient to re-align the lockdep state
> with the actual state, but there's some condition before it, what's that
> test and is that right?

That test was introduced to only track changes in IRQ state because of
recursion problems in lockdep. This now seems to no longer work. We
propably could remove that as lockdep now can handle recursion much
better with all the recent changes, but given that we're already at
-rc6, i don't want to touch entry.S code because of a debugging feature.