2018-11-08 21:00:35

by Qian Cai

[permalink] [raw]
Subject: WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c

Just booting up the latest git master (b00d209) on an aarch64 server and saw
this. Not sure about the third warning (at kernel/cpu.c:315
lockdep_assert_cpus_held+0x50/0x60) relates to irqchip or not, but appended it
to here anyway just in case.

[    0.000000] WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c:1696
its_init+0x588/0xb54
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G                T 4.20.0-
rc1+ #7
[    0.000000] pstate: 60000085 (nZCv daIf -PAN -UAO)
[    0.000000] pc : its_init+0x588/0xb54
[    0.000000] lr : its_init+0x558/0xb54
[    0.000000] sp : ffff200009ca79d0
[    0.000000] x29: ffff200009ca79d0 x28: 0000001dc0fb0000 
[    0.000000] x27: ffff7fe007703ec0 x26: 3fe2e7a5444e6979 
[    0.000000] x25: ffff2000093a8000 x24: 0000000000010000 
[    0.000000] x23: ffff200009ce61a0 x22: 1fffe40001394f4e 
[    0.000000] x21: ffff200009f84000 x20: ffff20000c0c3000 
[    0.000000] x19: ffff200008fb2000 x18: 000000000000003f 
[    0.000000] x17: 0000000000000000 x16: 0000000000000000 
[    0.000000] x15: 0000000000000007 x14: ffff200009ceea80 
[    0.000000] x13: ffff200009ceea80 x12: ffff801dc0d60f80 
[    0.000000] x11: ffff801dc0d6f080 x10: ffff200009ceea80 
[    0.000000] x9 : 0000000000000000 x8 : ffff801dc0fc0000 
[    0.000000] x7 : a2a2a2a2a2a2a2a2 x6 : ffff1003b81f7fff 
[    0.000000] x5 : ffff801dc0fbffff x4 : 0000000000000000 
[    0.000000] x3 : 000000000000003f x2 : dfff200000000000 
[    0.000000] x1 : 0000000000010000 x0 : 6667f74e4ba50b25 
[    0.000000] Call trace:
[    0.000000]  its_init+0x588/0xb54
[    0.000000]  gic_init_bases+0x53c/0x5d4
[    0.000000]  gic_acpi_init+0x2cc/0x564
[    0.000000]  acpi_match_madt+0x9c/0x15c
[    0.000000]  acpi_table_parse_entries_array+0x3e0/0x5d8
[    0.000000]  acpi_table_parse_entries+0xbc/0x114
[    0.000000]  acpi_table_parse_madt+0x4c/0x80
[    0.000000]  __acpi_probe_device_table+0x134/0x1ec
[    0.000000]  irqchip_init+0x48/0x74
[    0.000000]  init_IRQ+0xe4/0x12c
[    0.000000]  start_kernel+0x4d0/0x7d4
[    0.000000] irq event stamp: 0
[    0.000000] hardirqs last  enabled at (0):
[<0000000000000000>]           (null)
[    0.000000] hardirqs last disabled at (0):
[<0000000000000000>]           (null)
[    0.000000] softirqs last  enabled at (0):
[<0000000000000000>]           (null)
[    0.000000] softirqs last disabled at (0):
[<0000000000000000>]           (null)
[    0.000000] ---[ end trace f27c5aed63f8afd4 ]---
[    0.000000] GICv3: using LPI property table @0x0000001dc0fb0000
[    0.000000] ITS: Using DirectLPI for VPE invalidation
[    0.000000] ITS: Enabling GICv4 support
[    0.000000] WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c:2096
its_cpu_init_lpis+0x3b4/0x3f0
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W       T 4.20.0-
rc1+ #7
[    0.000000] pstate: 60000085 (nZCv daIf -PAN -UAO)
[    0.000000] pc : its_cpu_init_lpis+0x3b4/0x3f0
[    0.000000] lr : its_cpu_init_lpis+0x3b0/0x3f0
[    0.000000] sp : ffff200009ca7a00
[    0.000000] x29: ffff200009ca7a00 x28: ffff200009ce6188 
[    0.000000] x27: 0000601df1e30000 x26: ffff200009ca7a00 
[    0.000000] x25: 0000001dc0fc0000 x24: ffff20000c0c3000 
[    0.000000] x23: ffff7fe007703f00 x22: ffff2000093a8000 
[    0.000000] x21: ffff20000c180000 x20: ffff200009cd9000 
[    0.000000] x19: 1fffe40001394f4c x18: 000000000000003f 
[    0.000000] x17: 0000000000000000 x16: 0000000000000000 
[    0.000000] x15: 0000000000000007 x14: ffff2000094f050c 
[    0.000000] x13: ffff2000094f3cac x12: ffff2000094e7248 
[    0.000000] x11: ffff2000094e69e0 x10: ffff2000094e9d18 
[    0.000000] x9 : 1fffe40001394e5c x8 : ffff040001394edf 
[    0.000000] x7 : 0000000041b58ab3 x6 : 00000000f1f1f1f1 
[    0.000000] x5 : 0000000041b58ab3 x4 : 00000000f1f1f1f1 
[    0.000000] x3 : ffff040001394f4c x2 : 000000000000ffff 
[    0.000000] x1 : 0000000000010000 x0 : 00000000ffffffed 
[    0.000000] Call trace:
[    0.000000]  its_cpu_init_lpis+0x3b4/0x3f0
[    0.000000]  its_cpu_init+0xa8/0x220
[    0.000000]  gic_init_bases+0x540/0x5d4
[    0.000000]  gic_acpi_init+0x2cc/0x564
[    0.000000]  acpi_match_madt+0x9c/0x15c
[    0.000000]  acpi_table_parse_entries_array+0x3e0/0x5d8
[    0.000000]  acpi_table_parse_entries+0xbc/0x114
[    0.000000]  acpi_table_parse_madt+0x4c/0x80
[    0.000000]  __acpi_probe_device_table+0x134/0x1ec
[    0.000000]  irqchip_init+0x48/0x74
[    0.000000]  init_IRQ+0xe4/0x12c
[    0.000000]  start_kernel+0x4d0/0x7d4
[    0.000000] irq event stamp: 0
[    0.000000] hardirqs last  enabled at (0):
[<0000000000000000>]           (null)
[    0.000000] hardirqs last disabled at (0):
[<0000000000000000>]           (null)
[    0.000000] softirqs last  enabled at (0):
[<0000000000000000>]           (null)
[    0.000000] softirqs last disabled at (0):
[<0000000000000000>]           (null)
[    0.000000] ---[ end trace f27c5aed63f8afd5 ]---
[    0.000000] GICv3: CPU0: using allocated LPI pending table
@0x0000001dc0fc0000
[    0.000000] rcu:  Offload RCU callbacks from CPUs: (none).
[    0.000000] WARNING: CPU: 0 PID: 0 at kernel/cpu.c:315
lockdep_assert_cpus_held+0x50/0x60
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W       T 4.20.0-
rc1+ #7
[    0.000000] pstate: 20000085 (nzCv daIf -PAN -UAO)
[    0.000000] pc : lockdep_assert_cpus_held+0x50/0x60
[    0.000000] lr : lockdep_assert_cpus_held+0x4c/0x60
[    0.000000] sp : ffff200009ca7b10
[    0.000000] x29: ffff200009ca7b10 x28: ffff200009e49000 
[    0.000000] x27: ffff200009ce1000 x26: ffff200009ce2000 
[    0.000000] x25: ffff200009135000 x24: ffff200009cd9868 
[    0.000000] x23: ffff200009c8c040 x22: 0000000000001000 
[    0.000000] x21: 0000000000000012 x20: ffff200009cd9000 
[    0.000000] x19: ffff200009ce5000 x18: 000000000000003f 
[    0.000000] x17: 0000000000000000 x16: 0000000000000000 
[    0.000000] x15: 0000000000000007 x14: ffff200009471ccc 
[    0.000000] x13: ffff2000094795a0 x12: ffff2000095249ac 
[    0.000000] x11: ffff2000094f4470 x10: ffff2000094f0a44 
[    0.000000] x9 : ffff200009526aa8 x8 : ffff0ffbffcc4004 
[    0.000000] x7 : 1fffeffbffcc4003 x6 : ffff0ffbffcc4003 
[    0.000000] x5 : ffff7fdffe62001b x4 : ffff0ffbffcc4004 
[    0.000000] x3 : ffff0ffbffcc4004 x2 : dfff200000000000 
[    0.000000] x1 : 0000000000000000 x0 : 0000000000000000 
[    0.000000] Call trace:
[    0.000000]  lockdep_assert_cpus_held+0x50/0x60
[    0.000000]  static_key_enable_cpuslocked+0x30/0xe8
[    0.000000]  arch_timer_check_ool_workaround+0x128/0x2d0
[    0.000000]  arch_timer_acpi_init+0x274/0x6ac
[    0.000000]  acpi_table_parse+0x1ac/0x218
[    0.000000]  __acpi_probe_device_table+0x164/0x1ec
[    0.000000]  timer_probe+0x1bc/0x254
[    0.000000]  time_init+0x44/0x98
[    0.000000]  start_kernel+0x4ec/0x7d4
[    0.000000] irq event stamp: 0
[    0.000000] hardirqs last  enabled at (0):
[<0000000000000000>]           (null)
[    0.000000] hardirqs last disabled at (0):
[<0000000000000000>]           (null)
[    0.000000] softirqs last  enabled at (0):
[<0000000000000000>]           (null)
[    0.000000] softirqs last disabled at (0):
[<0000000000000000>]           (null)
[    0.000000] ---[ end trace f27c5aed63f8afd6 ]---


2018-11-09 12:08:55

by Marc Zyngier

[permalink] [raw]
Subject: Re: WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c

[+Ard]

On 08/11/18 20:59, Qian Cai wrote:
> Just booting up the latest git master (b00d209) on an aarch64 server and saw
> this. Not sure about the third warning (at kernel/cpu.c:315
> lockdep_assert_cpus_held+0x50/0x60) relates to irqchip or not, but appended it
> to here anyway just in case.
>
> [    0.000000] WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c:1696
> its_init+0x588/0xb54

It looks like EFI cannot manage to reserve the memory for your GIC
redistributors. Pretty annoying. At the same time, you have reported
other issues with the EFI reservation mechanism, such as:

https://lore.kernel.org/patchwork/patch/1008413/

for which you have given a "Tested-by:". Is that related?

> [    0.000000] Modules linked in:
> [    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G                T 4.20.0-
> rc1+ #7
> [    0.000000] pstate: 60000085 (nZCv daIf -PAN -UAO)
> [    0.000000] pc : its_init+0x588/0xb54
> [    0.000000] lr : its_init+0x558/0xb54
> [    0.000000] sp : ffff200009ca79d0
> [    0.000000] x29: ffff200009ca79d0 x28: 0000001dc0fb0000 
> [    0.000000] x27: ffff7fe007703ec0 x26: 3fe2e7a5444e6979 
> [    0.000000] x25: ffff2000093a8000 x24: 0000000000010000 
> [    0.000000] x23: ffff200009ce61a0 x22: 1fffe40001394f4e 
> [    0.000000] x21: ffff200009f84000 x20: ffff20000c0c3000 
> [    0.000000] x19: ffff200008fb2000 x18: 000000000000003f 
> [    0.000000] x17: 0000000000000000 x16: 0000000000000000 
> [    0.000000] x15: 0000000000000007 x14: ffff200009ceea80 
> [    0.000000] x13: ffff200009ceea80 x12: ffff801dc0d60f80 
> [    0.000000] x11: ffff801dc0d6f080 x10: ffff200009ceea80 
> [    0.000000] x9 : 0000000000000000 x8 : ffff801dc0fc0000 
> [    0.000000] x7 : a2a2a2a2a2a2a2a2 x6 : ffff1003b81f7fff 
> [    0.000000] x5 : ffff801dc0fbffff x4 : 0000000000000000 
> [    0.000000] x3 : 000000000000003f x2 : dfff200000000000 
> [    0.000000] x1 : 0000000000010000 x0 : 6667f74e4ba50b25 
> [    0.000000] Call trace:
> [    0.000000]  its_init+0x588/0xb54
> [    0.000000]  gic_init_bases+0x53c/0x5d4
> [    0.000000]  gic_acpi_init+0x2cc/0x564
> [    0.000000]  acpi_match_madt+0x9c/0x15c
> [    0.000000]  acpi_table_parse_entries_array+0x3e0/0x5d8
> [    0.000000]  acpi_table_parse_entries+0xbc/0x114
> [    0.000000]  acpi_table_parse_madt+0x4c/0x80
> [    0.000000]  __acpi_probe_device_table+0x134/0x1ec
> [    0.000000]  irqchip_init+0x48/0x74
> [    0.000000]  init_IRQ+0xe4/0x12c
> [    0.000000]  start_kernel+0x4d0/0x7d4
> [    0.000000] irq event stamp: 0
> [    0.000000] hardirqs last  enabled at (0):
> [<0000000000000000>]           (null)
> [    0.000000] hardirqs last disabled at (0):
> [<0000000000000000>]           (null)
> [    0.000000] softirqs last  enabled at (0):
> [<0000000000000000>]           (null)
> [    0.000000] softirqs last disabled at (0):
> [<0000000000000000>]           (null)
> [    0.000000] ---[ end trace f27c5aed63f8afd4 ]---
> [    0.000000] GICv3: using LPI property table @0x0000001dc0fb0000
> [    0.000000] ITS: Using DirectLPI for VPE invalidation
> [    0.000000] ITS: Enabling GICv4 support
> [    0.000000] WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c:2096
> its_cpu_init_lpis+0x3b4/0x3f0

Same thing here.

> [    0.000000] Modules linked in:
> [    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W       T 4.20.0-
> rc1+ #7
> [    0.000000] pstate: 60000085 (nZCv daIf -PAN -UAO)
> [    0.000000] pc : its_cpu_init_lpis+0x3b4/0x3f0
> [    0.000000] lr : its_cpu_init_lpis+0x3b0/0x3f0
> [    0.000000] sp : ffff200009ca7a00
> [    0.000000] x29: ffff200009ca7a00 x28: ffff200009ce6188 
> [    0.000000] x27: 0000601df1e30000 x26: ffff200009ca7a00 
> [    0.000000] x25: 0000001dc0fc0000 x24: ffff20000c0c3000 
> [    0.000000] x23: ffff7fe007703f00 x22: ffff2000093a8000 
> [    0.000000] x21: ffff20000c180000 x20: ffff200009cd9000 
> [    0.000000] x19: 1fffe40001394f4c x18: 000000000000003f 
> [    0.000000] x17: 0000000000000000 x16: 0000000000000000 
> [    0.000000] x15: 0000000000000007 x14: ffff2000094f050c 
> [    0.000000] x13: ffff2000094f3cac x12: ffff2000094e7248 
> [    0.000000] x11: ffff2000094e69e0 x10: ffff2000094e9d18 
> [    0.000000] x9 : 1fffe40001394e5c x8 : ffff040001394edf 
> [    0.000000] x7 : 0000000041b58ab3 x6 : 00000000f1f1f1f1 
> [    0.000000] x5 : 0000000041b58ab3 x4 : 00000000f1f1f1f1 
> [    0.000000] x3 : ffff040001394f4c x2 : 000000000000ffff 
> [    0.000000] x1 : 0000000000010000 x0 : 00000000ffffffed 
> [    0.000000] Call trace:
> [    0.000000]  its_cpu_init_lpis+0x3b4/0x3f0
> [    0.000000]  its_cpu_init+0xa8/0x220
> [    0.000000]  gic_init_bases+0x540/0x5d4
> [    0.000000]  gic_acpi_init+0x2cc/0x564
> [    0.000000]  acpi_match_madt+0x9c/0x15c
> [    0.000000]  acpi_table_parse_entries_array+0x3e0/0x5d8
> [    0.000000]  acpi_table_parse_entries+0xbc/0x114
> [    0.000000]  acpi_table_parse_madt+0x4c/0x80
> [    0.000000]  __acpi_probe_device_table+0x134/0x1ec
> [    0.000000]  irqchip_init+0x48/0x74
> [    0.000000]  init_IRQ+0xe4/0x12c
> [    0.000000]  start_kernel+0x4d0/0x7d4
> [    0.000000] irq event stamp: 0
> [    0.000000] hardirqs last  enabled at (0):
> [<0000000000000000>]           (null)
> [    0.000000] hardirqs last disabled at (0):
> [<0000000000000000>]           (null)
> [    0.000000] softirqs last  enabled at (0):
> [<0000000000000000>]           (null)
> [    0.000000] softirqs last disabled at (0):
> [<0000000000000000>]           (null)
> [    0.000000] ---[ end trace f27c5aed63f8afd5 ]---
> [    0.000000] GICv3: CPU0: using allocated LPI pending table
> @0x0000001dc0fc0000
> [    0.000000] rcu:  Offload RCU callbacks from CPUs: (none).
> [    0.000000] WARNING: CPU: 0 PID: 0 at kernel/cpu.c:315
> lockdep_assert_cpus_held+0x50/0x60
> [    0.000000] Modules linked in:
> [    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W       T 4.20.0-
> rc1+ #7
> [    0.000000] pstate: 20000085 (nzCv daIf -PAN -UAO)
> [    0.000000] pc : lockdep_assert_cpus_held+0x50/0x60
> [    0.000000] lr : lockdep_assert_cpus_held+0x4c/0x60
> [    0.000000] sp : ffff200009ca7b10
> [    0.000000] x29: ffff200009ca7b10 x28: ffff200009e49000 
> [    0.000000] x27: ffff200009ce1000 x26: ffff200009ce2000 
> [    0.000000] x25: ffff200009135000 x24: ffff200009cd9868 
> [    0.000000] x23: ffff200009c8c040 x22: 0000000000001000 
> [    0.000000] x21: 0000000000000012 x20: ffff200009cd9000 
> [    0.000000] x19: ffff200009ce5000 x18: 000000000000003f 
> [    0.000000] x17: 0000000000000000 x16: 0000000000000000 
> [    0.000000] x15: 0000000000000007 x14: ffff200009471ccc 
> [    0.000000] x13: ffff2000094795a0 x12: ffff2000095249ac 
> [    0.000000] x11: ffff2000094f4470 x10: ffff2000094f0a44 
> [    0.000000] x9 : ffff200009526aa8 x8 : ffff0ffbffcc4004 
> [    0.000000] x7 : 1fffeffbffcc4003 x6 : ffff0ffbffcc4003 
> [    0.000000] x5 : ffff7fdffe62001b x4 : ffff0ffbffcc4004 
> [    0.000000] x3 : ffff0ffbffcc4004 x2 : dfff200000000000 
> [    0.000000] x1 : 0000000000000000 x0 : 0000000000000000 
> [    0.000000] Call trace:
> [    0.000000]  lockdep_assert_cpus_held+0x50/0x60
> [    0.000000]  static_key_enable_cpuslocked+0x30/0xe8
> [    0.000000]  arch_timer_check_ool_workaround+0x128/0x2d0

Looks like we're calling static_key_enable_cpuslocked in the wrong
context. That's a separate issue (I'll investigate it).

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2018-11-09 12:29:17

by Qian Cai

[permalink] [raw]
Subject: Re: WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c


On 11/9/18 at 7:08 AM, Marc Zyngier wrote:

> [+Ard]
>
> On 08/11/18 20:59, Qian Cai wrote:
> > Just booting up the latest git master (b00d209) on an aarch64 server and saw
> > this. Not sure about the third warning (at kernel/cpu.c:315
> > lockdep_assert_cpus_held+0x50/0x60) relates to irqchip or not, but appended it
> > to here anyway just in case.
> >
> > [    0.000000] WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c:1696
> > its_init+0x588/0xb54
>
> It looks like EFI cannot manage to reserve the memory for your GIC
> redistributors. Pretty annoying. At the same time, you have reported
> other issues with the EFI reservation mechanism, such as:
>
> https://lore.kernel.org/patchwork/patch/1008413/
>
> for which you have given a "Tested-by:". Is that related?
No, I don’t think so. Those warnings are still there even after applied the patch above.

2018-11-09 13:54:35

by Marc Zyngier

[permalink] [raw]
Subject: Re: WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c

On 09/11/18 12:28, Qian Cai wrote:
>
> On 11/9/18 at 7:08 AM, Marc Zyngier wrote:
>
>> [+Ard]
>>
>> On 08/11/18 20:59, Qian Cai wrote:
>>> Just booting up the latest git master (b00d209) on an aarch64 server and saw
>>> this. Not sure about the third warning (at kernel/cpu.c:315
>>> lockdep_assert_cpus_held+0x50/0x60) relates to irqchip or not, but appended it
>>> to here anyway just in case.
>>>
>>> [    0.000000] WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c:1696
>>> its_init+0x588/0xb54
>>
>> It looks like EFI cannot manage to reserve the memory for your GIC
>> redistributors. Pretty annoying. At the same time, you have reported
>> other issues with the EFI reservation mechanism, such as:
>>
>> https://lore.kernel.org/patchwork/patch/1008413/
>>
>> for which you have given a "Tested-by:". Is that related?
> No, I don’t think so. Those warnings are still there even after applied the patch above.

Do you also have this series[1] applied? I'd otherwise need your
configuration to try and reproduce it, as I can't manage to trigger it
on my own setup.

Thanks,

M.

[1] https://www.spinics.net/lists/arm-kernel/msg685751.html
--
Jazz is not dead. It just smells funny...

2018-11-09 15:30:43

by Qian Cai

[permalink] [raw]
Subject: Re: WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c



> On Nov 9, 2018, at 8:50 AM, Marc Zyngier <[email protected]> wrote:
>
> On 09/11/18 12:28, Qian Cai wrote:
>>
>> On 11/9/18 at 7:08 AM, Marc Zyngier wrote:
>>
>>> [+Ard]
>>>
>>> On 08/11/18 20:59, Qian Cai wrote:
>>>> Just booting up the latest git master (b00d209) on an aarch64 server and saw
>>>> this. Not sure about the third warning (at kernel/cpu.c:315
>>>> lockdep_assert_cpus_held+0x50/0x60) relates to irqchip or not, but appended it
>>>> to here anyway just in case.
>>>>
>>>> [ 0.000000] WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c:1696
>>>> its_init+0x588/0xb54
>>>
>>> It looks like EFI cannot manage to reserve the memory for your GIC
>>> redistributors. Pretty annoying. At the same time, you have reported
>>> other issues with the EFI reservation mechanism, such as:
>>>
>>> https://lore.kernel.org/patchwork/patch/1008413/
>>>
>>> for which you have given a "Tested-by:". Is that related?
>> No, I don’t think so. Those warnings are still there even after applied the patch above.
>
> Do you also have this series[1] applied? I'd otherwise need your
> configuration to try and reproduce it, as I can't manage to trigger it
> on my own setup.
>
> Thanks,
>
> M.
>
> [1] https://www.spinics.net/lists/arm-kernel/msg685751.html
After applied the above series on the top of the mainline (b00d209), the only
warning exist is,

[ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/cpu.c:315
lockdep_assert_cpus_held+0x50/0x60
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G T 4.20.0-rc1+ #9
[ 0.000000] pstate: 20000085 (nzCv daIf -PAN -UAO)
[ 0.000000] pc : lockdep_assert_cpus_held+0x50/0x60
[ 0.000000] lr : lockdep_assert_cpus_held+0x4c/0x60
[ 0.000000] sp : ffff200009c97b10
[ 0.000000] x29: ffff200009c97b10 x28: ffff200009e39000
[ 0.000000] x27: ffff200009cd1000 x26: ffff200009cd2000
[ 0.000000] x25: ffff200009125000 x24: ffff200009cc9868
[ 0.000000] x23: ffff200009c7c040 x22: 0000000000001000
[ 0.000000] x21: 0000000000000012 x20: ffff200009cc9000
[ 0.000000] x19: ffff200009cd5000 x18: 000000000000003f
[ 0.000000] x17: 0000000000000000 x16: 0000000000000000
[ 0.000000] x15: 0000000000000007 x14: ffff200009461cd4
[ 0.000000] x13: ffff2000094695ac x12: ffff2000095149a4
[ 0.000000] x11: ffff2000094e4478 x10: ffff2000094e0a50
[ 0.000000] x9 : ffff200009516aa8 x8 : ffff0ffbffcc4004
[ 0.000000] x7 : 1fffeffbffcc4003 x6 : ffff0ffbffcc4003
[ 0.000000] x5 : ffff7fdffe62001b x4 : ffff0ffbffcc4004
[ 0.000000] x3 : ffff0ffbffcc4004 x2 : dfff200000000000
[ 0.000000] x1 : 0000000000000000 x0 : 0000000000000000
[ 0.000000] Call trace:
[ 0.000000] lockdep_assert_cpus_held+0x50/0x60
[ 0.000000] static_key_enable_cpuslocked+0x30/0xe8
[ 0.000000] arch_timer_check_ool_workaround+0x128/0x2d0
[ 0.000000] arch_timer_acpi_init+0x274/0x6ac
[ 0.000000] acpi_table_parse+0x1ac/0x218
[ 0.000000] __acpi_probe_device_table+0x164/0x1ec
[ 0.000000] timer_probe+0x1bc/0x254
[ 0.000000] time_init+0x44/0x98
[ 0.000000] start_kernel+0x4ec/0x7d4
[ 0.000000] irq event stamp: 0
[ 0.000000] hardirqs last enabled at (0): [<0000000000000000>] (null)
[ 0.000000] hardirqs last disabled at (0): [<0000000000000000>] (null)
[ 0.000000] softirqs last enabled at (0): [<0000000000000000>] (null)
[ 0.000000] softirqs last disabled at (0): [<0000000000000000>] (null)
[ 0.000000] ---[ end trace 1dc5085680256ac1 ]—

Here is the kernel config used,
https://c.gmx.com/@642631272677512867/tqD5eulbQAC-1h-fkVe1Iw




2018-11-09 15:43:03

by Marc Zyngier

[permalink] [raw]
Subject: Re: WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c

On 09/11/18 15:28, Qian Cai wrote:
>
>
>> On Nov 9, 2018, at 8:50 AM, Marc Zyngier <[email protected]> wrote:
>>
>> On 09/11/18 12:28, Qian Cai wrote:
>>>
>>> On 11/9/18 at 7:08 AM, Marc Zyngier wrote:
>>>
>>>> [+Ard]
>>>>
>>>> On 08/11/18 20:59, Qian Cai wrote:
>>>>> Just booting up the latest git master (b00d209) on an aarch64 server and saw
>>>>> this. Not sure about the third warning (at kernel/cpu.c:315
>>>>> lockdep_assert_cpus_held+0x50/0x60) relates to irqchip or not, but appended it
>>>>> to here anyway just in case.
>>>>>
>>>>> [ 0.000000] WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c:1696
>>>>> its_init+0x588/0xb54
>>>>
>>>> It looks like EFI cannot manage to reserve the memory for your GIC
>>>> redistributors. Pretty annoying. At the same time, you have reported
>>>> other issues with the EFI reservation mechanism, such as:
>>>>
>>>> https://lore.kernel.org/patchwork/patch/1008413/
>>>>
>>>> for which you have given a "Tested-by:". Is that related?
>>> No, I don’t think so. Those warnings are still there even after applied the patch above.
>>
>> Do you also have this series[1] applied? I'd otherwise need your
>> configuration to try and reproduce it, as I can't manage to trigger it
>> on my own setup.
>>
>> Thanks,
>>
>> M.
>>
>> [1] https://www.spinics.net/lists/arm-kernel/msg685751.html
> After applied the above series on the top of the mainline (b00d209), the only
> warning exist is,

OK, so the EFI/GICv3 stuff is sorted.

>
> [ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/cpu.c:315
> lockdep_assert_cpus_held+0x50/0x60
> [ 0.000000] Modules linked in:
> [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G T 4.20.0-rc1+ #9
> [ 0.000000] pstate: 20000085 (nzCv daIf -PAN -UAO)
> [ 0.000000] pc : lockdep_assert_cpus_held+0x50/0x60
> [ 0.000000] lr : lockdep_assert_cpus_held+0x4c/0x60
> [ 0.000000] sp : ffff200009c97b10
> [ 0.000000] x29: ffff200009c97b10 x28: ffff200009e39000
> [ 0.000000] x27: ffff200009cd1000 x26: ffff200009cd2000
> [ 0.000000] x25: ffff200009125000 x24: ffff200009cc9868
> [ 0.000000] x23: ffff200009c7c040 x22: 0000000000001000
> [ 0.000000] x21: 0000000000000012 x20: ffff200009cc9000
> [ 0.000000] x19: ffff200009cd5000 x18: 000000000000003f
> [ 0.000000] x17: 0000000000000000 x16: 0000000000000000
> [ 0.000000] x15: 0000000000000007 x14: ffff200009461cd4
> [ 0.000000] x13: ffff2000094695ac x12: ffff2000095149a4
> [ 0.000000] x11: ffff2000094e4478 x10: ffff2000094e0a50
> [ 0.000000] x9 : ffff200009516aa8 x8 : ffff0ffbffcc4004
> [ 0.000000] x7 : 1fffeffbffcc4003 x6 : ffff0ffbffcc4003
> [ 0.000000] x5 : ffff7fdffe62001b x4 : ffff0ffbffcc4004
> [ 0.000000] x3 : ffff0ffbffcc4004 x2 : dfff200000000000
> [ 0.000000] x1 : 0000000000000000 x0 : 0000000000000000
> [ 0.000000] Call trace:
> [ 0.000000] lockdep_assert_cpus_held+0x50/0x60
> [ 0.000000] static_key_enable_cpuslocked+0x30/0xe8
> [ 0.000000] arch_timer_check_ool_workaround+0x128/0x2d0
> [ 0.000000] arch_timer_acpi_init+0x274/0x6ac
> [ 0.000000] acpi_table_parse+0x1ac/0x218
> [ 0.000000] __acpi_probe_device_table+0x164/0x1ec
> [ 0.000000] timer_probe+0x1bc/0x254
> [ 0.000000] time_init+0x44/0x98
> [ 0.000000] start_kernel+0x4ec/0x7d4
> [ 0.000000] irq event stamp: 0
> [ 0.000000] hardirqs last enabled at (0): [<0000000000000000>] (null)
> [ 0.000000] hardirqs last disabled at (0): [<0000000000000000>] (null)
> [ 0.000000] softirqs last enabled at (0): [<0000000000000000>] (null)
> [ 0.000000] softirqs last disabled at (0): [<0000000000000000>] (null)
> [ 0.000000] ---[ end trace 1dc5085680256ac1 ]—

Now this one: what machine is this? What is the workaround that gets
applied?

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2018-11-09 15:56:59

by Qian Cai

[permalink] [raw]
Subject: Re: WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c



> On Nov 9, 2018, at 10:38 AM, Marc Zyngier <[email protected]> wrote:
>
> On 09/11/18 15:28, Qian Cai wrote:
>>
>>
>>> On Nov 9, 2018, at 8:50 AM, Marc Zyngier <[email protected]> wrote:
>>>
>>> On 09/11/18 12:28, Qian Cai wrote:
>>>>
>>>> On 11/9/18 at 7:08 AM, Marc Zyngier wrote:
>>>>
>>>>> [+Ard]
>>>>>
>>>>> On 08/11/18 20:59, Qian Cai wrote:
>>>>>> Just booting up the latest git master (b00d209) on an aarch64 server and saw
>>>>>> this. Not sure about the third warning (at kernel/cpu.c:315
>>>>>> lockdep_assert_cpus_held+0x50/0x60) relates to irqchip or not, but appended it
>>>>>> to here anyway just in case.
>>>>>>
>>>>>> [ 0.000000] WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c:1696
>>>>>> its_init+0x588/0xb54
>>>>>
>>>>> It looks like EFI cannot manage to reserve the memory for your GIC
>>>>> redistributors. Pretty annoying. At the same time, you have reported
>>>>> other issues with the EFI reservation mechanism, such as:
>>>>>
>>>>> https://lore.kernel.org/patchwork/patch/1008413/
>>>>>
>>>>> for which you have given a "Tested-by:". Is that related?
>>>> No, I don’t think so. Those warnings are still there even after applied the patch above.
>>>
>>> Do you also have this series[1] applied? I'd otherwise need your
>>> configuration to try and reproduce it, as I can't manage to trigger it
>>> on my own setup.
>>>
>>> Thanks,
>>>
>>> M.
>>>
>>> [1] https://www.spinics.net/lists/arm-kernel/msg685751.html
>> After applied the above series on the top of the mainline (b00d209), the only
>> warning exist is,
>
> OK, so the EFI/GICv3 stuff is sorted.
>
>>
>> [ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/cpu.c:315
>> lockdep_assert_cpus_held+0x50/0x60
>> [ 0.000000] Modules linked in:
>> [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G T 4.20.0-rc1+ #9
>> [ 0.000000] pstate: 20000085 (nzCv daIf -PAN -UAO)
>> [ 0.000000] pc : lockdep_assert_cpus_held+0x50/0x60
>> [ 0.000000] lr : lockdep_assert_cpus_held+0x4c/0x60
>> [ 0.000000] sp : ffff200009c97b10
>> [ 0.000000] x29: ffff200009c97b10 x28: ffff200009e39000
>> [ 0.000000] x27: ffff200009cd1000 x26: ffff200009cd2000
>> [ 0.000000] x25: ffff200009125000 x24: ffff200009cc9868
>> [ 0.000000] x23: ffff200009c7c040 x22: 0000000000001000
>> [ 0.000000] x21: 0000000000000012 x20: ffff200009cc9000
>> [ 0.000000] x19: ffff200009cd5000 x18: 000000000000003f
>> [ 0.000000] x17: 0000000000000000 x16: 0000000000000000
>> [ 0.000000] x15: 0000000000000007 x14: ffff200009461cd4
>> [ 0.000000] x13: ffff2000094695ac x12: ffff2000095149a4
>> [ 0.000000] x11: ffff2000094e4478 x10: ffff2000094e0a50
>> [ 0.000000] x9 : ffff200009516aa8 x8 : ffff0ffbffcc4004
>> [ 0.000000] x7 : 1fffeffbffcc4003 x6 : ffff0ffbffcc4003
>> [ 0.000000] x5 : ffff7fdffe62001b x4 : ffff0ffbffcc4004
>> [ 0.000000] x3 : ffff0ffbffcc4004 x2 : dfff200000000000
>> [ 0.000000] x1 : 0000000000000000 x0 : 0000000000000000
>> [ 0.000000] Call trace:
>> [ 0.000000] lockdep_assert_cpus_held+0x50/0x60
>> [ 0.000000] static_key_enable_cpuslocked+0x30/0xe8
>> [ 0.000000] arch_timer_check_ool_workaround+0x128/0x2d0
>> [ 0.000000] arch_timer_acpi_init+0x274/0x6ac
>> [ 0.000000] acpi_table_parse+0x1ac/0x218
>> [ 0.000000] __acpi_probe_device_table+0x164/0x1ec
>> [ 0.000000] timer_probe+0x1bc/0x254
>> [ 0.000000] time_init+0x44/0x98
>> [ 0.000000] start_kernel+0x4ec/0x7d4
>> [ 0.000000] irq event stamp: 0
>> [ 0.000000] hardirqs last enabled at (0): [<0000000000000000>] (null)
>> [ 0.000000] hardirqs last disabled at (0): [<0000000000000000>] (null)
>> [ 0.000000] softirqs last enabled at (0): [<0000000000000000>] (null)
>> [ 0.000000] softirqs last disabled at (0): [<0000000000000000>] (null)
>> [ 0.000000] ---[ end trace 1dc5085680256ac1 ]—
>
> Now this one: what machine is this? What is the workaround that gets
> applied?

# dmidecode
# dmidecode 3.1
Getting SMBIOS data from sysfs.
SMBIOS 3.0.0 present.
Table at 0x397D0000.

Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
Vendor: Huawei
Version: 1.50
Release Date: 06/01/2018
Address: 0xA4800
Runtime Size: 366 kB
ROM Size: 8192 kB
Characteristics:
PCI is supported
BIOS is upgradeable
BIOS shadowing is allowed
Boot from CD is supported
Selectable boot is supported
EDD is supported
Japanese floppy for NEC 9800 1.2 MB is supported (int 13h)
Japanese floppy for Toshiba 1.2 MB is supported (int 13h)
5.25"/360 kB floppy services are supported (int 13h)
5.25"/1.2 MB floppy services are supported (int 13h)
3.5"/720 kB floppy services are supported (int 13h)
3.5"/2.88 MB floppy services are supported (int 13h)
8042 keyboard services are supported (int 9h)
CGA/mono video services are supported (int 10h)
ACPI is supported
USB legacy is supported
BIOS boot specification is supported
Targeted content distribution is supported
UEFI is supported
BIOS Revision: 0.0

Handle 0x0001, DMI type 1, 27 bytes
System Information
Manufacturer: Huawei
Product Name: TaiShan 2280
Version: V100R001C00
Serial Number: 2102311TBJ10H7000017
UUID: 8d9f1d7e-639a-11e7-9a0a-a08cf8625acc
Wake-up Type: Power Switch
SKU Number: To be filled by O.E.M.
Family: To be filled by O.E.M.

I am not sure if I understand your second question, but there is no workaround
to make this warning disappear that I am aware of.

2018-11-09 16:12:17

by Marc Zyngier

[permalink] [raw]
Subject: Re: WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c

On 09/11/18 15:52, Qian Cai wrote:
>
>
>> On Nov 9, 2018, at 10:38 AM, Marc Zyngier <[email protected]> wrote:
>>
>> On 09/11/18 15:28, Qian Cai wrote:
>>>
>>>
>>>> On Nov 9, 2018, at 8:50 AM, Marc Zyngier <[email protected]> wrote:
>>>>
>>>> On 09/11/18 12:28, Qian Cai wrote:
>>>>>
>>>>> On 11/9/18 at 7:08 AM, Marc Zyngier wrote:
>>>>>
>>>>>> [+Ard]
>>>>>>
>>>>>> On 08/11/18 20:59, Qian Cai wrote:
>>>>>>> Just booting up the latest git master (b00d209) on an aarch64 server and saw
>>>>>>> this. Not sure about the third warning (at kernel/cpu.c:315
>>>>>>> lockdep_assert_cpus_held+0x50/0x60) relates to irqchip or not, but appended it
>>>>>>> to here anyway just in case.
>>>>>>>
>>>>>>> [ 0.000000] WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c:1696
>>>>>>> its_init+0x588/0xb54
>>>>>>
>>>>>> It looks like EFI cannot manage to reserve the memory for your GIC
>>>>>> redistributors. Pretty annoying. At the same time, you have reported
>>>>>> other issues with the EFI reservation mechanism, such as:
>>>>>>
>>>>>> https://lore.kernel.org/patchwork/patch/1008413/
>>>>>>
>>>>>> for which you have given a "Tested-by:". Is that related?
>>>>> No, I don’t think so. Those warnings are still there even after applied the patch above.
>>>>
>>>> Do you also have this series[1] applied? I'd otherwise need your
>>>> configuration to try and reproduce it, as I can't manage to trigger it
>>>> on my own setup.
>>>>
>>>> Thanks,
>>>>
>>>> M.
>>>>
>>>> [1] https://www.spinics.net/lists/arm-kernel/msg685751.html
>>> After applied the above series on the top of the mainline (b00d209), the only
>>> warning exist is,
>>
>> OK, so the EFI/GICv3 stuff is sorted.
>>
>>>
>>> [ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/cpu.c:315
>>> lockdep_assert_cpus_held+0x50/0x60
>>> [ 0.000000] Modules linked in:
>>> [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G T 4.20.0-rc1+ #9
>>> [ 0.000000] pstate: 20000085 (nzCv daIf -PAN -UAO)
>>> [ 0.000000] pc : lockdep_assert_cpus_held+0x50/0x60
>>> [ 0.000000] lr : lockdep_assert_cpus_held+0x4c/0x60
>>> [ 0.000000] sp : ffff200009c97b10
>>> [ 0.000000] x29: ffff200009c97b10 x28: ffff200009e39000
>>> [ 0.000000] x27: ffff200009cd1000 x26: ffff200009cd2000
>>> [ 0.000000] x25: ffff200009125000 x24: ffff200009cc9868
>>> [ 0.000000] x23: ffff200009c7c040 x22: 0000000000001000
>>> [ 0.000000] x21: 0000000000000012 x20: ffff200009cc9000
>>> [ 0.000000] x19: ffff200009cd5000 x18: 000000000000003f
>>> [ 0.000000] x17: 0000000000000000 x16: 0000000000000000
>>> [ 0.000000] x15: 0000000000000007 x14: ffff200009461cd4
>>> [ 0.000000] x13: ffff2000094695ac x12: ffff2000095149a4
>>> [ 0.000000] x11: ffff2000094e4478 x10: ffff2000094e0a50
>>> [ 0.000000] x9 : ffff200009516aa8 x8 : ffff0ffbffcc4004
>>> [ 0.000000] x7 : 1fffeffbffcc4003 x6 : ffff0ffbffcc4003
>>> [ 0.000000] x5 : ffff7fdffe62001b x4 : ffff0ffbffcc4004
>>> [ 0.000000] x3 : ffff0ffbffcc4004 x2 : dfff200000000000
>>> [ 0.000000] x1 : 0000000000000000 x0 : 0000000000000000
>>> [ 0.000000] Call trace:
>>> [ 0.000000] lockdep_assert_cpus_held+0x50/0x60
>>> [ 0.000000] static_key_enable_cpuslocked+0x30/0xe8
>>> [ 0.000000] arch_timer_check_ool_workaround+0x128/0x2d0
>>> [ 0.000000] arch_timer_acpi_init+0x274/0x6ac
>>> [ 0.000000] acpi_table_parse+0x1ac/0x218
>>> [ 0.000000] __acpi_probe_device_table+0x164/0x1ec
>>> [ 0.000000] timer_probe+0x1bc/0x254
>>> [ 0.000000] time_init+0x44/0x98
>>> [ 0.000000] start_kernel+0x4ec/0x7d4
>>> [ 0.000000] irq event stamp: 0
>>> [ 0.000000] hardirqs last enabled at (0): [<0000000000000000>] (null)
>>> [ 0.000000] hardirqs last disabled at (0): [<0000000000000000>] (null)
>>> [ 0.000000] softirqs last enabled at (0): [<0000000000000000>] (null)
>>> [ 0.000000] softirqs last disabled at (0): [<0000000000000000>] (null)
>>> [ 0.000000] ---[ end trace 1dc5085680256ac1 ]—
>>
>> Now this one: what machine is this? What is the workaround that gets
>> applied?
>
> # dmidecode
> # dmidecode 3.1
> Getting SMBIOS data from sysfs.
> SMBIOS 3.0.0 present.
> Table at 0x397D0000.
>
> Handle 0x0000, DMI type 0, 24 bytes
> BIOS Information
> Vendor: Huawei
> Version: 1.50
> Release Date: 06/01/2018
> Address: 0xA4800
> Runtime Size: 366 kB
> ROM Size: 8192 kB
> Characteristics:
> PCI is supported
> BIOS is upgradeable
> BIOS shadowing is allowed
> Boot from CD is supported
> Selectable boot is supported
> EDD is supported
> Japanese floppy for NEC 9800 1.2 MB is supported (int 13h)
> Japanese floppy for Toshiba 1.2 MB is supported (int 13h)
> 5.25"/360 kB floppy services are supported (int 13h)
> 5.25"/1.2 MB floppy services are supported (int 13h)
> 3.5"/720 kB floppy services are supported (int 13h)
> 3.5"/2.88 MB floppy services are supported (int 13h)
> 8042 keyboard services are supported (int 9h)
> CGA/mono video services are supported (int 10h)
> ACPI is supported
> USB legacy is supported
> BIOS boot specification is supported
> Targeted content distribution is supported
> UEFI is supported
> BIOS Revision: 0.0
>
> Handle 0x0001, DMI type 1, 27 bytes
> System Information
> Manufacturer: Huawei
> Product Name: TaiShan 2280
> Version: V100R001C00
> Serial Number: 2102311TBJ10H7000017
> UUID: 8d9f1d7e-639a-11e7-9a0a-a08cf8625acc
> Wake-up Type: Power Switch
> SKU Number: To be filled by O.E.M.
> Family: To be filled by O.E.M.

Great stuff. This is what I know as a D05. I used to have one, but it
dropped of the network and I need to find someone to locate it in the
server room.

>
> I am not sure if I understand your second question, but there is no workaround
> to make this warning disappear that I am aware of.

it is not about having a workaround to disable the warning, but a
workaround that the kernel is applying because the HW is busted:

>>> [ 0.000000] arch_timer_check_ool_workaround+0x128/0x2d0

See bb42ca474010 and d003d029cea8 for details.

Now, activating this workaround leads to lockdep being really angry,
most likely because the cpus_read_lock is not taken, which is a change
in behaviour...

I'm trying to dig into this now.

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2018-11-09 17:30:31

by Sudeep Holla

[permalink] [raw]
Subject: Re: WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c

On Fri, Nov 9, 2018 at 4:10 PM Marc Zyngier <[email protected]> wrote:
>
[...]

>
> See bb42ca474010 and d003d029cea8 for details.
>
> Now, activating this workaround leads to lockdep being really angry,
> most likely because the cpus_read_lock is not taken, which is a change
> in behaviour...
>
> I'm trying to dig into this now.
>

Yes we found similar issue in kernel/sched/core.c sched_init_smp
There's a fix with detailed description in -next
(Commit 40fa3780bac2 ("sched/core: Take the hotplug lock in sched_init_smp()")

The behaviour changed since commit cb538267ea1e ("jump_label/lockdep:
Assert we hold the hotplug lock for _cpuslocked() operations")

--
Regards,
Sudeep

2018-11-09 17:42:53

by Marc Zyngier

[permalink] [raw]
Subject: Re: WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c

On 09/11/18 17:28, Sudeep Holla wrote:
> On Fri, Nov 9, 2018 at 4:10 PM Marc Zyngier <[email protected]> wrote:
>>
> [...]
>
>>
>> See bb42ca474010 and d003d029cea8 for details.
>>
>> Now, activating this workaround leads to lockdep being really angry,
>> most likely because the cpus_read_lock is not taken, which is a change
>> in behaviour...
>>
>> I'm trying to dig into this now.
>>
>
> Yes we found similar issue in kernel/sched/core.c sched_init_smp
> There's a fix with detailed description in -next
> (Commit 40fa3780bac2 ("sched/core: Take the hotplug lock in sched_init_smp()")
>
> The behaviour changed since commit cb538267ea1e ("jump_label/lockdep:
> Assert we hold the hotplug lock for _cpuslocked() operations")

I indeed came to the same conclusion, but the fix is slightly less than
obvious. I have the following arm64-specific crap, but it is pretty
terrible:

diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
index f258636273c9..9e96e9eaca9b 100644
--- a/arch/arm64/kernel/time.c
+++ b/arch/arm64/kernel/time.c
@@ -36,6 +36,7 @@
#include <linux/clocksource.h>
#include <linux/clk-provider.h>
#include <linux/acpi.h>
+#include <linux/cpu.h>

#include <clocksource/arm_arch_timer.h>

@@ -69,7 +70,9 @@ void __init time_init(void)
u32 arch_timer_rate;

of_clk_init(NULL);
+ cpus_read_lock();
timer_probe();
+ cpus_read_unlock();

tick_setup_hrtimer_broadcast();

Qian, can you please let me know if this helps? If it does, we'll have
to think of something a bit better...

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2018-11-09 18:43:25

by Qian Cai

[permalink] [raw]
Subject: Re: WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c



> On Nov 9, 2018, at 12:41 PM, Marc Zyngier <[email protected]> wrote:
>
> On 09/11/18 17:28, Sudeep Holla wrote:
>> On Fri, Nov 9, 2018 at 4:10 PM Marc Zyngier <[email protected]> wrote:
>>>
>> [...]
>>
>>>
>>> See bb42ca474010 and d003d029cea8 for details.
>>>
>>> Now, activating this workaround leads to lockdep being really angry,
>>> most likely because the cpus_read_lock is not taken, which is a change
>>> in behaviour...
>>>
>>> I'm trying to dig into this now.
>>>
>>
>> Yes we found similar issue in kernel/sched/core.c sched_init_smp
>> There's a fix with detailed description in -next
>> (Commit 40fa3780bac2 ("sched/core: Take the hotplug lock in sched_init_smp()")
>>
>> The behaviour changed since commit cb538267ea1e ("jump_label/lockdep:
>> Assert we hold the hotplug lock for _cpuslocked() operations")
>
> I indeed came to the same conclusion, but the fix is slightly less than
> obvious. I have the following arm64-specific crap, but it is pretty
> terrible:
>
> diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
> index f258636273c9..9e96e9eaca9b 100644
> --- a/arch/arm64/kernel/time.c
> +++ b/arch/arm64/kernel/time.c
> @@ -36,6 +36,7 @@
> #include <linux/clocksource.h>
> #include <linux/clk-provider.h>
> #include <linux/acpi.h>
> +#include <linux/cpu.h>
>
> #include <clocksource/arm_arch_timer.h>
>
> @@ -69,7 +70,9 @@ void __init time_init(void)
> u32 arch_timer_rate;
>
> of_clk_init(NULL);
> + cpus_read_lock();
> timer_probe();
> + cpus_read_unlock();
>
> tick_setup_hrtimer_broadcast();
>
> Qian, can you please let me know if this helps? If it does, we'll have
> to think of something a bit better…
After applied the above patch, the original warning is gone but there
Is now a new warning.

> [ 0.000000] rcu: Offload RCU callbacks from CPUs: (none).
> [ 0.000000]
> [ 0.000000] ======================================================
> [ 0.000000] WARNING: possible circular locking dependency detected
> [ 0.000000] 4.20.0-rc1+ #10 Tainted: G T
> [ 0.000000] ------------------------------------------------------
> [ 0.000000] swapper/0/0 is trying to acquire lock:
> [ 0.000000] (____ptrval____) (acpi_probe_mutex){....}, at: __acpi_probe_device_table+0xac/0x1ec
> [ 0.000000]
> [ 0.000000] but task is already holding lock:
> [ 0.000000] (____ptrval____) (cpu_hotplug_lock.rw_sem){....}, at: time_init+0x44/0xa0
> [ 0.000000]
> [ 0.000000] which lock already depends on the new lock.
> [ 0.000000]
> [ 0.000000]
> [ 0.000000] the existing dependency chain (in reverse order) is:
> [ 0.000000]
> [ 0.000000] -> #1 (cpu_hotplug_lock.rw_sem){....}:
> [ 0.000000] __lock_acquire+0x3cc/0x858
> [ 0.000000] lock_acquire+0x124/0x330
> [ 0.000000] cpus_read_lock+0x6c/0x100
> [ 0.000000] __cpuhp_setup_state+0x38/0x78
> [ 0.000000] gic_init_bases+0x3ac/0x5d8
> [ 0.000000] gic_acpi_init+0x2cc/0x564
> [ 0.000000] acpi_match_madt+0x9c/0x15c
> [ 0.000000] acpi_table_parse_entries_array+0x3e0/0x5d8
> [ 0.000000] acpi_table_parse_entries+0xbc/0x114
> [ 0.000000] acpi_table_parse_madt+0x4c/0x80
> [ 0.000000] __acpi_probe_device_table+0x134/0x1ec
> [ 0.000000] irqchip_init+0x48/0x74
> [ 0.000000] init_IRQ+0xe4/0x12c
> [ 0.000000] start_kernel+0x4d0/0x7d4
> [ 0.000000]
> [ 0.000000] -> #0 (acpi_probe_mutex){....}:
> [ 0.000000] validate_chain.isra.19+0xcd8/0x1158
> [ 0.000000] __lock_acquire+0x3cc/0x858
> [ 0.000000] lock_acquire+0x124/0x330
> [ 0.000000] __mutex_lock+0x110/0xa68
> [ 0.000000] mutex_lock_nested+0x3c/0x50
> [ 0.000000] __acpi_probe_device_table+0xac/0x1ec
> [ 0.000000] timer_probe+0x1bc/0x254
> [ 0.000000] time_init+0x48/0xa0
> [ 0.000000] start_kernel+0x4ec/0x7d4
> [ 0.000000]
> [ 0.000000] other info that might help us debug this:
> [ 0.000000]
> [ 0.000000] Possible unsafe locking scenario:
> [ 0.000000]
> [ 0.000000] CPU0 CPU1
> [ 0.000000] ---- ----
> [ 0.000000] lock(cpu_hotplug_lock.rw_sem);
> [ 0.000000] lock(acpi_probe_mutex);
> [ 0.000000] lock(cpu_hotplug_lock.rw_sem);
> [ 0.000000] lock(acpi_probe_mutex);
> [ 0.000000]
> [ 0.000000] *** DEADLOCK ***
> [ 0.000000]
> [ 0.000000] 1 lock held by swapper/0/0:
> [ 0.000000] #0: (____ptrval____) (cpu_hotplug_lock.rw_sem){....}, at: time_init+0x44/0xa0
> [ 0.000000]
> [ 0.000000] stack backtrace:
> [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G T 4.20.0-rc1+ #10
> [ 0.000000] Call trace:
> [ 0.000000] dump_backtrace+0x0/0x248
> [ 0.000000] show_stack+0x24/0x30
> [ 0.000000] dump_stack+0xb8/0xf4
> [ 0.000000] print_circular_bug.isra.15+0x240/0x368
> [ 0.000000] check_prev_add.constprop.24+0x444/0xa38
> [ 0.000000] validate_chain.isra.19+0xcd8/0x1158
> [ 0.000000] __lock_acquire+0x3cc/0x858
> [ 0.000000] lock_acquire+0x124/0x330
> [ 0.000000] __mutex_lock+0x110/0xa68
> [ 0.000000] mutex_lock_nested+0x3c/0x50
> [ 0.000000] __acpi_probe_device_table+0xac/0x1ec
> [ 0.000000] timer_probe+0x1bc/0x254
> [ 0.000000] time_init+0x48/0xa0
> [ 0.000000] start_kernel+0x4ec/0x7d4
> [ 0.000000] arch_timer: Enabling global workaround for HiSilicon erratum 161010101
> [ 0.000000] arch_timer: CPU0: Trapping CNTVCT access
> [ 0.000000] arch_timer: cp15 timer(s) running at 50.00MHz (phys).
> [ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0xb8812736b, max_idle_ns: 440795202655 ns
> [ 0.000002] sched_clock: 56 bits at 50MHz, resolution 20ns, wraps every 4398046511100ns


2018-11-12 08:41:51

by Marc Zyngier

[permalink] [raw]
Subject: Re: WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c

On Fri, 09 Nov 2018 18:41:03 +0000,
Qian Cai <[email protected]> wrote:
>
>
>
> > On Nov 9, 2018, at 12:41 PM, Marc Zyngier <[email protected]> wrote:
> >
> > On 09/11/18 17:28, Sudeep Holla wrote:
> >> On Fri, Nov 9, 2018 at 4:10 PM Marc Zyngier <[email protected]> wrote:
> >>>
> >> [...]
> >>
> >>>
> >>> See bb42ca474010 and d003d029cea8 for details.
> >>>
> >>> Now, activating this workaround leads to lockdep being really angry,
> >>> most likely because the cpus_read_lock is not taken, which is a change
> >>> in behaviour...
> >>>
> >>> I'm trying to dig into this now.
> >>>
> >>
> >> Yes we found similar issue in kernel/sched/core.c sched_init_smp
> >> There's a fix with detailed description in -next
> >> (Commit 40fa3780bac2 ("sched/core: Take the hotplug lock in sched_init_smp()")
> >>
> >> The behaviour changed since commit cb538267ea1e ("jump_label/lockdep:
> >> Assert we hold the hotplug lock for _cpuslocked() operations")
> >
> > I indeed came to the same conclusion, but the fix is slightly less than
> > obvious. I have the following arm64-specific crap, but it is pretty
> > terrible:
> >
> > diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
> > index f258636273c9..9e96e9eaca9b 100644
> > --- a/arch/arm64/kernel/time.c
> > +++ b/arch/arm64/kernel/time.c
> > @@ -36,6 +36,7 @@
> > #include <linux/clocksource.h>
> > #include <linux/clk-provider.h>
> > #include <linux/acpi.h>
> > +#include <linux/cpu.h>
> >
> > #include <clocksource/arm_arch_timer.h>
> >
> > @@ -69,7 +70,9 @@ void __init time_init(void)
> > u32 arch_timer_rate;
> >
> > of_clk_init(NULL);
> > + cpus_read_lock();
> > timer_probe();
> > + cpus_read_unlock();
> >
> > tick_setup_hrtimer_broadcast();
> >
> > Qian, can you please let me know if this helps? If it does, we'll have
> > to think of something a bit better…
> After applied the above patch, the original warning is gone but there
> Is now a new warning.

[...]

Which was ful;ly expected, given that I've taken the cpu lock at some
semi-random location. I'll try to talk to PeterZ this week to try and
solve this.

Thanks,

M.

--
Jazz is not dead, it just smell funny.

2018-12-02 04:18:13

by Qian Cai

[permalink] [raw]
Subject: Re: WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c



On 11/12/18 3:39 AM, Marc Zyngier wrote:
> On Fri, 09 Nov 2018 18:41:03 +0000,
> Qian Cai <[email protected]> wrote:
>>
>>
>>
>>> On Nov 9, 2018, at 12:41 PM, Marc Zyngier <[email protected]> wrote:
>>>
>>> On 09/11/18 17:28, Sudeep Holla wrote:
>>>> On Fri, Nov 9, 2018 at 4:10 PM Marc Zyngier <[email protected]> wrote:
>>>>>
>>>> [...]
>>>>
>>>>>
>>>>> See bb42ca474010 and d003d029cea8 for details.
>>>>>
>>>>> Now, activating this workaround leads to lockdep being really angry,
>>>>> most likely because the cpus_read_lock is not taken, which is a change
>>>>> in behaviour...
>>>>>
>>>>> I'm trying to dig into this now.
>>>>>
>>>>
>>>> Yes we found similar issue in kernel/sched/core.c sched_init_smp
>>>> There's a fix with detailed description in -next
>>>> (Commit 40fa3780bac2 ("sched/core: Take the hotplug lock in sched_init_smp()")
>>>>
>>>> The behaviour changed since commit cb538267ea1e ("jump_label/lockdep:
>>>> Assert we hold the hotplug lock for _cpuslocked() operations")
>>>
>>> I indeed came to the same conclusion, but the fix is slightly less than
>>> obvious. I have the following arm64-specific crap, but it is pretty
>>> terrible:
>>>
>>> diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
>>> index f258636273c9..9e96e9eaca9b 100644
>>> --- a/arch/arm64/kernel/time.c
>>> +++ b/arch/arm64/kernel/time.c
>>> @@ -36,6 +36,7 @@
>>> #include <linux/clocksource.h>
>>> #include <linux/clk-provider.h>
>>> #include <linux/acpi.h>
>>> +#include <linux/cpu.h>
>>>
>>> #include <clocksource/arm_arch_timer.h>
>>>
>>> @@ -69,7 +70,9 @@ void __init time_init(void)
>>> u32 arch_timer_rate;
>>>
>>> of_clk_init(NULL);
>>> + cpus_read_lock();
>>> timer_probe();
>>> + cpus_read_unlock();
>>>
>>> tick_setup_hrtimer_broadcast();
>>>
>>> Qian, can you please let me know if this helps? If it does, we'll have
>>> to think of something a bit better…
>> After applied the above patch, the original warning is gone but there
>> Is now a new warning.
>
> [...]
>
> Which was ful;ly expected, given that I've taken the cpu lock at some
> semi-random location. I'll try to talk to PeterZ this week to try and
> solve this.
>

Marc, did you have a chance to investigate this further? I have still seen it in
the latest mainline today. This is the only warning left on this Huawei TaiShan
2280 server now after confirmed that those GICv3 warnings were gone.