LinuxLists.cc - New kernel warning after updating from LTS 5.15.110 to 5.15.112 (and 5.15.113)

2023-05-29 01:06:52

Subject: New kernel warning after updating from LTS 5.15.110 to 5.15.112 (and 5.15.113)

Hi,

We have an embedded product with an Infineon SLM9670 TPM. After updating
to a newer LTS kernel version we started seeing the following warning at
boot.

[    4.741025] ------------[ cut here ]------------
[    4.749894] irq 38 handler tis_int_handler+0x0/0x154 enabled interrupts
[    4.756555] WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:159
__handle_irq_event_percpu+0xf4/0x180
[    4.765557] Modules linked in:
[    4.768626] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.113 #1
[    4.774747] Hardware name: Allied Telesis x250-18XS (DT)
[    4.780080] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS
BTYPE=--)
[    4.787072] pc : __handle_irq_event_percpu+0xf4/0x180
[    4.792146] lr : __handle_irq_event_percpu+0xf4/0x180
[    4.797220] sp : ffff800008003e40
[    4.800547] x29: ffff800008003e40 x28: ffff8000093951c0 x27:
ffff80000902a9b8
[    4.807716] x26: ffff800008fe8d28 x25: ffff8000094a62bd x24:
ffff000001b92400
[    4.814885] x23: 0000000000000026 x22: ffff800008003ec4 x21:
0000000000000000
[    4.822053] x20: 0000000000000001 x19: ffff000002381200 x18:
ffffffffffffffff
[    4.829222] x17: ffff800076962000 x16: ffff800008000000 x15:
ffff800088003b57
[    4.836390] x14: 0000000000000000 x13: ffff8000093a5078 x12:
000000000000035d
[    4.843558] x11: 000000000000011f x10: ffff8000093a5078 x9 :
ffff8000093a5078
[    4.850727] x8 : 00000000ffffefff x7 : ffff8000093fd078 x6 :
ffff8000093fd078
[    4.857895] x5 : 000000000000bff4 x4 : 0000000000000000 x3 :
0000000000000000
[    4.865062] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
ffff8000093951c0
[    4.872230] Call trace:
[    4.874686] __handle_irq_event_percpu+0xf4/0x180
[    4.879411] handle_irq_event+0x64/0xec
[    4.883264] handle_level_irq+0xc0/0x1b0
[    4.887202] generic_handle_irq+0x30/0x50
[    4.891229] mvebu_gpio_irq_handler+0x11c/0x2a0
[    4.895780] handle_domain_irq+0x60/0x90
[    4.899720] gic_handle_irq+0x4c/0xd0
[    4.903398] call_on_irq_stack+0x20/0x4c
[    4.907338] do_interrupt_handler+0x54/0x60
[    4.911538] el1_interrupt+0x30/0x80
[    4.915130] el1h_64_irq_handler+0x18/0x24
[    4.919244] el1h_64_irq+0x78/0x7c
[    4.922659] arch_cpu_idle+0x18/0x2c
[    4.926249] do_idle+0xc4/0x150
[    4.929404] cpu_startup_entry+0x28/0x60
[    4.933343] rest_init+0xe4/0xf4
[    4.936584] arch_call_rest_init+0x10/0x1c
[    4.940699] start_kernel+0x600/0x640
[    4.944375] __primary_switched+0xbc/0xc4
[    4.948402] ---[ end trace 940193047b35b311 ]---

Initially I dismissed this as a warning that would probably be cleaned
up when we did more work on the TPM support for our product but we also
seem to be getting some new i2c issues and possibly a kernel stack
corruption that we've conflated with this TPM warning.

2023-05-29 02:39:56

by Bagas Sanjaya

[permalink] [raw]

Subject: Re: New kernel warning after updating from LTS 5.15.110 to 5.15.112 (and 5.15.113)

On Sun, May 28, 2023 at 11:42:50PM +0000, Chris Packham wrote:
> Hi,
>
> We have an embedded product with an Infineon SLM9670 TPM. After updating
> to a newer LTS kernel version we started seeing the following warning at
> boot.
>
> [    4.741025] ------------[ cut here ]------------
> [    4.749894] irq 38 handler tis_int_handler+0x0/0x154 enabled interrupts
> [    4.756555] WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:159
> __handle_irq_event_percpu+0xf4/0x180
> [    4.765557] Modules linked in:
> [    4.768626] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.113 #1
> [    4.774747] Hardware name: Allied Telesis x250-18XS (DT)
> [    4.780080] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS
> BTYPE=--)
> [    4.787072] pc : __handle_irq_event_percpu+0xf4/0x180
> [    4.792146] lr : __handle_irq_event_percpu+0xf4/0x180
> [    4.797220] sp : ffff800008003e40
> [    4.800547] x29: ffff800008003e40 x28: ffff8000093951c0 x27:
> ffff80000902a9b8
> [    4.807716] x26: ffff800008fe8d28 x25: ffff8000094a62bd x24:
> ffff000001b92400
> [    4.814885] x23: 0000000000000026 x22: ffff800008003ec4 x21:
> 0000000000000000
> [    4.822053] x20: 0000000000000001 x19: ffff000002381200 x18:
> ffffffffffffffff
> [    4.829222] x17: ffff800076962000 x16: ffff800008000000 x15:
> ffff800088003b57
> [    4.836390] x14: 0000000000000000 x13: ffff8000093a5078 x12:
> 000000000000035d
> [    4.843558] x11: 000000000000011f x10: ffff8000093a5078 x9 :
> ffff8000093a5078
> [    4.850727] x8 : 00000000ffffefff x7 : ffff8000093fd078 x6 :
> ffff8000093fd078
> [    4.857895] x5 : 000000000000bff4 x4 : 0000000000000000 x3 :
> 0000000000000000
> [    4.865062] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
> ffff8000093951c0
> [    4.872230] Call trace:
> [    4.874686] __handle_irq_event_percpu+0xf4/0x180
> [    4.879411] handle_irq_event+0x64/0xec
> [    4.883264] handle_level_irq+0xc0/0x1b0
> [    4.887202] generic_handle_irq+0x30/0x50
> [    4.891229] mvebu_gpio_irq_handler+0x11c/0x2a0
> [    4.895780] handle_domain_irq+0x60/0x90
> [    4.899720] gic_handle_irq+0x4c/0xd0
> [    4.903398] call_on_irq_stack+0x20/0x4c
> [    4.907338] do_interrupt_handler+0x54/0x60
> [    4.911538] el1_interrupt+0x30/0x80
> [    4.915130] el1h_64_irq_handler+0x18/0x24
> [    4.919244] el1h_64_irq+0x78/0x7c
> [    4.922659] arch_cpu_idle+0x18/0x2c
> [    4.926249] do_idle+0xc4/0x150
> [    4.929404] cpu_startup_entry+0x28/0x60
> [    4.933343] rest_init+0xe4/0xf4
> [    4.936584] arch_call_rest_init+0x10/0x1c
> [    4.940699] start_kernel+0x600/0x640
> [    4.944375] __primary_switched+0xbc/0xc4
> [    4.948402] ---[ end trace 940193047b35b311 ]---
>
> Initially I dismissed this as a warning that would probably be cleaned
> up when we did more work on the TPM support for our product but we also
> seem to be getting some new i2c issues and possibly a kernel stack
> corruption that we've conflated with this TPM warning.

Can you reproduce this issue on mainline? Can you also bisect to find
the culprit?

Anyway, I'm adding it to regzbot:

#regzbot ^introduced: v5.15.110..v5.15.112
#regzbot title: Possible stack corruption and i2c issues due to irq warning on Inifineon SLM9670 TPM

Thanks.

--
An old man doll... just what I always wanted! - Clara

Attachments:

(No filename) (3.44 kB)
signature.asc (235.00 B)
Download all attachments

2023-05-29 02:45:57

by Chris Packham

[permalink] [raw]

Subject: Re: New kernel warning after updating from LTS 5.15.110 to 5.15.112 (and 5.15.113)

On 29/05/23 14:04, Bagas Sanjaya wrote:
> On Sun, May 28, 2023 at 11:42:50PM +0000, Chris Packham wrote:
>> Hi,
>>
>> We have an embedded product with an Infineon SLM9670 TPM. After updating
>> to a newer LTS kernel version we started seeing the following warning at
>> boot.
>>
>> [    4.741025] ------------[ cut here ]------------
>> [    4.749894] irq 38 handler tis_int_handler+0x0/0x154 enabled interrupts
>> [    4.756555] WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:159
>> __handle_irq_event_percpu+0xf4/0x180
>> [    4.765557] Modules linked in:
>> [    4.768626] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.113 #1
>> [    4.774747] Hardware name: Allied Telesis x250-18XS (DT)
>> [    4.780080] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS
>> BTYPE=--)
>> [    4.787072] pc : __handle_irq_event_percpu+0xf4/0x180
>> [    4.792146] lr : __handle_irq_event_percpu+0xf4/0x180
>> [    4.797220] sp : ffff800008003e40
>> [    4.800547] x29: ffff800008003e40 x28: ffff8000093951c0 x27:
>> ffff80000902a9b8
>> [    4.807716] x26: ffff800008fe8d28 x25: ffff8000094a62bd x24:
>> ffff000001b92400
>> [    4.814885] x23: 0000000000000026 x22: ffff800008003ec4 x21:
>> 0000000000000000
>> [    4.822053] x20: 0000000000000001 x19: ffff000002381200 x18:
>> ffffffffffffffff
>> [    4.829222] x17: ffff800076962000 x16: ffff800008000000 x15:
>> ffff800088003b57
>> [    4.836390] x14: 0000000000000000 x13: ffff8000093a5078 x12:
>> 000000000000035d
>> [    4.843558] x11: 000000000000011f x10: ffff8000093a5078 x9 :
>> ffff8000093a5078
>> [    4.850727] x8 : 00000000ffffefff x7 : ffff8000093fd078 x6 :
>> ffff8000093fd078
>> [    4.857895] x5 : 000000000000bff4 x4 : 0000000000000000 x3 :
>> 0000000000000000
>> [    4.865062] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
>> ffff8000093951c0
>> [    4.872230] Call trace:
>> [    4.874686] __handle_irq_event_percpu+0xf4/0x180
>> [    4.879411] handle_irq_event+0x64/0xec
>> [    4.883264] handle_level_irq+0xc0/0x1b0
>> [    4.887202] generic_handle_irq+0x30/0x50
>> [    4.891229] mvebu_gpio_irq_handler+0x11c/0x2a0
>> [    4.895780] handle_domain_irq+0x60/0x90
>> [    4.899720] gic_handle_irq+0x4c/0xd0
>> [    4.903398] call_on_irq_stack+0x20/0x4c
>> [    4.907338] do_interrupt_handler+0x54/0x60
>> [    4.911538] el1_interrupt+0x30/0x80
>> [    4.915130] el1h_64_irq_handler+0x18/0x24
>> [    4.919244] el1h_64_irq+0x78/0x7c
>> [    4.922659] arch_cpu_idle+0x18/0x2c
>> [    4.926249] do_idle+0xc4/0x150
>> [    4.929404] cpu_startup_entry+0x28/0x60
>> [    4.933343] rest_init+0xe4/0xf4
>> [    4.936584] arch_call_rest_init+0x10/0x1c
>> [    4.940699] start_kernel+0x600/0x640
>> [    4.944375] __primary_switched+0xbc/0xc4
>> [    4.948402] ---[ end trace 940193047b35b311 ]---
>>
>> Initially I dismissed this as a warning that would probably be cleaned
>> up when we did more work on the TPM support for our product but we also
>> seem to be getting some new i2c issues and possibly a kernel stack
>> corruption that we've conflated with this TPM warning.
> Can you reproduce this issue on mainline? Can you also bisect to find
> the culprit?

No the error doesn't appear on a recent mainline kernel. I do still get

tpm_tis_spi spi1.1: 2.0 TPM (device-id 0x1B, rev-id 22)
tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling instead
tpm tpm0: A TPM error (256) occurred attempting the self test

but I think I was getting that on v5.15.110

>
> Anyway, I'm adding it to regzbot:
>
> #regzbot ^introduced: v5.15.110..v5.15.112
> #regzbot title: Possible stack corruption and i2c issues due to irq warning on Inifineon SLM9670 TPM
>
> Thanks.
>

2023-06-02 04:18:40

by Bagas Sanjaya

[permalink] [raw]

Subject: Re: New kernel warning after updating from LTS 5.15.110 to 5.15.112 (and 5.15.113)

On 5/29/23 09:37, Chris Packham wrote:
>
> On 29/05/23 14:04, Bagas Sanjaya wrote:
>> On Sun, May 28, 2023 at 11:42:50PM +0000, Chris Packham wrote:
>>> Hi,
>>>
>>> We have an embedded product with an Infineon SLM9670 TPM. After updating
>>> to a newer LTS kernel version we started seeing the following warning at
>>> boot.
>>>
>>> [    4.741025] ------------[ cut here ]------------
>>> [    4.749894] irq 38 handler tis_int_handler+0x0/0x154 enabled interrupts
>>> [    4.756555] WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:159
>>> __handle_irq_event_percpu+0xf4/0x180
>>> [    4.765557] Modules linked in:
>>> [    4.768626] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.113 #1
>>> [    4.774747] Hardware name: Allied Telesis x250-18XS (DT)
>>> [    4.780080] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS
>>> BTYPE=--)
>>> [    4.787072] pc : __handle_irq_event_percpu+0xf4/0x180
>>> [    4.792146] lr : __handle_irq_event_percpu+0xf4/0x180
>>> [    4.797220] sp : ffff800008003e40
>>> [    4.800547] x29: ffff800008003e40 x28: ffff8000093951c0 x27:
>>> ffff80000902a9b8
>>> [    4.807716] x26: ffff800008fe8d28 x25: ffff8000094a62bd x24:
>>> ffff000001b92400
>>> [    4.814885] x23: 0000000000000026 x22: ffff800008003ec4 x21:
>>> 0000000000000000
>>> [    4.822053] x20: 0000000000000001 x19: ffff000002381200 x18:
>>> ffffffffffffffff
>>> [    4.829222] x17: ffff800076962000 x16: ffff800008000000 x15:
>>> ffff800088003b57
>>> [    4.836390] x14: 0000000000000000 x13: ffff8000093a5078 x12:
>>> 000000000000035d
>>> [    4.843558] x11: 000000000000011f x10: ffff8000093a5078 x9 :
>>> ffff8000093a5078
>>> [    4.850727] x8 : 00000000ffffefff x7 : ffff8000093fd078 x6 :
>>> ffff8000093fd078
>>> [    4.857895] x5 : 000000000000bff4 x4 : 0000000000000000 x3 :
>>> 0000000000000000
>>> [    4.865062] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
>>> ffff8000093951c0
>>> [    4.872230] Call trace:
>>> [    4.874686] __handle_irq_event_percpu+0xf4/0x180
>>> [    4.879411] handle_irq_event+0x64/0xec
>>> [    4.883264] handle_level_irq+0xc0/0x1b0
>>> [    4.887202] generic_handle_irq+0x30/0x50
>>> [    4.891229] mvebu_gpio_irq_handler+0x11c/0x2a0
>>> [    4.895780] handle_domain_irq+0x60/0x90
>>> [    4.899720] gic_handle_irq+0x4c/0xd0
>>> [    4.903398] call_on_irq_stack+0x20/0x4c
>>> [    4.907338] do_interrupt_handler+0x54/0x60
>>> [    4.911538] el1_interrupt+0x30/0x80
>>> [    4.915130] el1h_64_irq_handler+0x18/0x24
>>> [    4.919244] el1h_64_irq+0x78/0x7c
>>> [    4.922659] arch_cpu_idle+0x18/0x2c
>>> [    4.926249] do_idle+0xc4/0x150
>>> [    4.929404] cpu_startup_entry+0x28/0x60
>>> [    4.933343] rest_init+0xe4/0xf4
>>> [    4.936584] arch_call_rest_init+0x10/0x1c
>>> [    4.940699] start_kernel+0x600/0x640
>>> [    4.944375] __primary_switched+0xbc/0xc4
>>> [    4.948402] ---[ end trace 940193047b35b311 ]---
>>>
>>> Initially I dismissed this as a warning that would probably be cleaned
>>> up when we did more work on the TPM support for our product but we also
>>> seem to be getting some new i2c issues and possibly a kernel stack
>>> corruption that we've conflated with this TPM warning.
>> Can you reproduce this issue on mainline? Can you also bisect to find
>> the culprit?
>
> No the error doesn't appear on a recent mainline kernel. I do still get
>
> tpm_tis_spi spi1.1: 2.0 TPM (device-id 0x1B, rev-id 22)
> tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling instead
> tpm tpm0: A TPM error (256) occurred attempting the self test
>
> but I think I was getting that on v5.15.110
>
>>

I repeat: Can you bisect between v5.15 and v5.15.112?

--
An old man doll... just what I always wanted! - Clara

2023-06-02 04:49:16

by Chris Packham

[permalink] [raw]

Subject: Re: New kernel warning after updating from LTS 5.15.110 to 5.15.112 (and 5.15.113)

On 2/06/23 16:10, Bagas Sanjaya wrote:
> On 5/29/23 09:37, Chris Packham wrote:
>> On 29/05/23 14:04, Bagas Sanjaya wrote:
>>> On Sun, May 28, 2023 at 11:42:50PM +0000, Chris Packham wrote:
>>>> Hi,
>>>>
>>>> We have an embedded product with an Infineon SLM9670 TPM. After updating
>>>> to a newer LTS kernel version we started seeing the following warning at
>>>> boot.
>>>>
>>>> [    4.741025] ------------[ cut here ]------------
>>>> [    4.749894] irq 38 handler tis_int_handler+0x0/0x154 enabled interrupts
>>>> [    4.756555] WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:159
>>>> __handle_irq_event_percpu+0xf4/0x180
>>>> [    4.765557] Modules linked in:
>>>> [    4.768626] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.113 #1
>>>> [    4.774747] Hardware name: Allied Telesis x250-18XS (DT)
>>>> [    4.780080] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS
>>>> BTYPE=--)
>>>> [    4.787072] pc : __handle_irq_event_percpu+0xf4/0x180
>>>> [    4.792146] lr : __handle_irq_event_percpu+0xf4/0x180
>>>> [    4.797220] sp : ffff800008003e40
>>>> [    4.800547] x29: ffff800008003e40 x28: ffff8000093951c0 x27:
>>>> ffff80000902a9b8
>>>> [    4.807716] x26: ffff800008fe8d28 x25: ffff8000094a62bd x24:
>>>> ffff000001b92400
>>>> [    4.814885] x23: 0000000000000026 x22: ffff800008003ec4 x21:
>>>> 0000000000000000
>>>> [    4.822053] x20: 0000000000000001 x19: ffff000002381200 x18:
>>>> ffffffffffffffff
>>>> [    4.829222] x17: ffff800076962000 x16: ffff800008000000 x15:
>>>> ffff800088003b57
>>>> [    4.836390] x14: 0000000000000000 x13: ffff8000093a5078 x12:
>>>> 000000000000035d
>>>> [    4.843558] x11: 000000000000011f x10: ffff8000093a5078 x9 :
>>>> ffff8000093a5078
>>>> [    4.850727] x8 : 00000000ffffefff x7 : ffff8000093fd078 x6 :
>>>> ffff8000093fd078
>>>> [    4.857895] x5 : 000000000000bff4 x4 : 0000000000000000 x3 :
>>>> 0000000000000000
>>>> [    4.865062] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
>>>> ffff8000093951c0
>>>> [    4.872230] Call trace:
>>>> [    4.874686] __handle_irq_event_percpu+0xf4/0x180
>>>> [    4.879411] handle_irq_event+0x64/0xec
>>>> [    4.883264] handle_level_irq+0xc0/0x1b0
>>>> [    4.887202] generic_handle_irq+0x30/0x50
>>>> [    4.891229] mvebu_gpio_irq_handler+0x11c/0x2a0
>>>> [    4.895780] handle_domain_irq+0x60/0x90
>>>> [    4.899720] gic_handle_irq+0x4c/0xd0
>>>> [    4.903398] call_on_irq_stack+0x20/0x4c
>>>> [    4.907338] do_interrupt_handler+0x54/0x60
>>>> [    4.911538] el1_interrupt+0x30/0x80
>>>> [    4.915130] el1h_64_irq_handler+0x18/0x24
>>>> [    4.919244] el1h_64_irq+0x78/0x7c
>>>> [    4.922659] arch_cpu_idle+0x18/0x2c
>>>> [    4.926249] do_idle+0xc4/0x150
>>>> [    4.929404] cpu_startup_entry+0x28/0x60
>>>> [    4.933343] rest_init+0xe4/0xf4
>>>> [    4.936584] arch_call_rest_init+0x10/0x1c
>>>> [    4.940699] start_kernel+0x600/0x640
>>>> [    4.944375] __primary_switched+0xbc/0xc4
>>>> [    4.948402] ---[ end trace 940193047b35b311 ]---
>>>>
>>>> Initially I dismissed this as a warning that would probably be cleaned
>>>> up when we did more work on the TPM support for our product but we also
>>>> seem to be getting some new i2c issues and possibly a kernel stack
>>>> corruption that we've conflated with this TPM warning.
>>> Can you reproduce this issue on mainline? Can you also bisect to find
>>> the culprit?
>> No the error doesn't appear on a recent mainline kernel. I do still get
>>
>> tpm_tis_spi spi1.1: 2.0 TPM (device-id 0x1B, rev-id 22)
>> tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling instead
>> tpm tpm0: A TPM error (256) occurred attempting the self test
>>
>> but I think I was getting that on v5.15.110
>>
> I repeat: Can you bisect between v5.15 and v5.15.112?

It's definitely between v5.15.110 and v5.15.112.

I'll do a proper bisect next week but I'm pretty sure it's related to
the "tpm, tpm_tis:" series. The problem can be worked around by removing
the TPM interrupt from the device tree for the board.

2023-06-06 01:47:00

by Chris Packham

[permalink] [raw]

Subject: Re: New kernel warning after updating from LTS 5.15.110 to 5.15.112 (and 5.15.113)

On 2/06/23 16:19, Chris Packham wrote:
>
> On 2/06/23 16:10, Bagas Sanjaya wrote:
>> On 5/29/23 09:37, Chris Packham wrote:
>>> On 29/05/23 14:04, Bagas Sanjaya wrote:
>>>> On Sun, May 28, 2023 at 11:42:50PM +0000, Chris Packham wrote:
>>>>> Hi,
>>>>>
>>>>> We have an embedded product with an Infineon SLM9670 TPM. After
>>>>> updating
>>>>> to a newer LTS kernel version we started seeing the following
>>>>> warning at
>>>>> boot.
>>>>>
>>>>> [    4.741025] ------------[ cut here ]------------
>>>>> [    4.749894] irq 38 handler tis_int_handler+0x0/0x154 enabled
>>>>> interrupts
>>>>> [    4.756555] WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:159
>>>>> __handle_irq_event_percpu+0xf4/0x180
>>>>> [    4.765557] Modules linked in:
>>>>> [    4.768626] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.113 #1
>>>>> [    4.774747] Hardware name: Allied Telesis x250-18XS (DT)
>>>>> [    4.780080] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS
>>>>> BTYPE=--)
>>>>> [    4.787072] pc : __handle_irq_event_percpu+0xf4/0x180
>>>>> [    4.792146] lr : __handle_irq_event_percpu+0xf4/0x180
>>>>> [    4.797220] sp : ffff800008003e40
>>>>> [    4.800547] x29: ffff800008003e40 x28: ffff8000093951c0 x27:
>>>>> ffff80000902a9b8
>>>>> [    4.807716] x26: ffff800008fe8d28 x25: ffff8000094a62bd x24:
>>>>> ffff000001b92400
>>>>> [    4.814885] x23: 0000000000000026 x22: ffff800008003ec4 x21:
>>>>> 0000000000000000
>>>>> [    4.822053] x20: 0000000000000001 x19: ffff000002381200 x18:
>>>>> ffffffffffffffff
>>>>> [    4.829222] x17: ffff800076962000 x16: ffff800008000000 x15:
>>>>> ffff800088003b57
>>>>> [    4.836390] x14: 0000000000000000 x13: ffff8000093a5078 x12:
>>>>> 000000000000035d
>>>>> [    4.843558] x11: 000000000000011f x10: ffff8000093a5078 x9 :
>>>>> ffff8000093a5078
>>>>> [    4.850727] x8 : 00000000ffffefff x7 : ffff8000093fd078 x6 :
>>>>> ffff8000093fd078
>>>>> [    4.857895] x5 : 000000000000bff4 x4 : 0000000000000000 x3 :
>>>>> 0000000000000000
>>>>> [    4.865062] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
>>>>> ffff8000093951c0
>>>>> [    4.872230] Call trace:
>>>>> [    4.874686] __handle_irq_event_percpu+0xf4/0x180
>>>>> [    4.879411] handle_irq_event+0x64/0xec
>>>>> [    4.883264] handle_level_irq+0xc0/0x1b0
>>>>> [    4.887202] generic_handle_irq+0x30/0x50
>>>>> [    4.891229] mvebu_gpio_irq_handler+0x11c/0x2a0
>>>>> [    4.895780] handle_domain_irq+0x60/0x90
>>>>> [    4.899720] gic_handle_irq+0x4c/0xd0
>>>>> [    4.903398] call_on_irq_stack+0x20/0x4c
>>>>> [    4.907338] do_interrupt_handler+0x54/0x60
>>>>> [    4.911538] el1_interrupt+0x30/0x80
>>>>> [    4.915130] el1h_64_irq_handler+0x18/0x24
>>>>> [    4.919244] el1h_64_irq+0x78/0x7c
>>>>> [    4.922659] arch_cpu_idle+0x18/0x2c
>>>>> [    4.926249] do_idle+0xc4/0x150
>>>>> [    4.929404] cpu_startup_entry+0x28/0x60
>>>>> [    4.933343] rest_init+0xe4/0xf4
>>>>> [    4.936584] arch_call_rest_init+0x10/0x1c
>>>>> [    4.940699] start_kernel+0x600/0x640
>>>>> [    4.944375] __primary_switched+0xbc/0xc4
>>>>> [    4.948402] ---[ end trace 940193047b35b311 ]---
>>>>>
>>>>> Initially I dismissed this as a warning that would probably be
>>>>> cleaned
>>>>> up when we did more work on the TPM support for our product but we
>>>>> also
>>>>> seem to be getting some new i2c issues and possibly a kernel stack
>>>>> corruption that we've conflated with this TPM warning.
>>>> Can you reproduce this issue on mainline? Can you also bisect to find
>>>> the culprit?
>>> No the error doesn't appear on a recent mainline kernel. I do still get
>>>
>>> tpm_tis_spi spi1.1: 2.0 TPM (device-id 0x1B, rev-id 22)
>>> tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling instead
>>> tpm tpm0: A TPM error (256) occurred attempting the self test
>>>
>>> but I think I was getting that on v5.15.110
>>>
>> I repeat: Can you bisect between v5.15 and v5.15.112?
>
> It's definitely between v5.15.110 and v5.15.112.
>
> I'll do a proper bisect next week but I'm pretty sure it's related to
> the "tpm, tpm_tis:" series. The problem can be worked around by
> removing the TPM interrupt from the device tree for the board.

Bisecting between v5.15.110 and v5.15.112 points to

51162b05a44cb5d98fb0ae2519a860910a47fd4b is the first bad commit
commit 51162b05a44cb5d98fb0ae2519a860910a47fd4b
Author: Lino Sanfilippo <[email protected]>
Date:   Thu Nov 24 14:55:29 2022 +0100

    tpm, tpm_tis: Claim locality before writing interrupt registers

    [ Upstream commit 15d7aa4e46eba87242a320f39773aa16faddadee ]

    In tpm_tis_probe_single_irq() interrupt registers TPM_INT_VECTOR,
    TPM_INT_STATUS and TPM_INT_ENABLE are modified to setup the interrupts.
    Currently these modifications are done without holding a locality
thus they
    have no effect. Fix this by claiming the (default) locality before the
    registers are written.

    Since now tpm_tis_gen_interrupt() is called with the locality already
    claimed remove locality request and release from this function.

    Signed-off-by: Lino Sanfilippo <[email protected]>
    Tested-by: Jarkko Sakkinen <[email protected]>
    Reviewed-by: Jarkko Sakkinen <[email protected]>
    Signed-off-by: Jarkko Sakkinen <[email protected]>
    Stable-dep-of: 955df4f87760 ("tpm, tpm_tis: Claim locality when
interrupts are reenabled on resume")
    Signed-off-by: Sasha Levin <[email protected]>

drivers/char/tpm/tpm_tis_core.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(

2023-06-06 02:41:27

by Bagas Sanjaya

[permalink] [raw]

Subject: Re: New kernel warning after updating from LTS 5.15.110 to 5.15.112 (and 5.15.113)

On Tue, Jun 06, 2023 at 01:41:01AM +0000, Chris Packham wrote:
>
> On 2/06/23 16:19, Chris Packham wrote:
> >
> > On 2/06/23 16:10, Bagas Sanjaya wrote:
> >> On 5/29/23 09:37, Chris Packham wrote:
> >>> On 29/05/23 14:04, Bagas Sanjaya wrote:
> >>>> On Sun, May 28, 2023 at 11:42:50PM +0000, Chris Packham wrote:
> >>>>> Hi,
> >>>>>
> >>>>> We have an embedded product with an Infineon SLM9670 TPM. After
> >>>>> updating
> >>>>> to a newer LTS kernel version we started seeing the following
> >>>>> warning at
> >>>>> boot.
> >>>>>
> >>>>> [    4.741025] ------------[ cut here ]------------
> >>>>> [    4.749894] irq 38 handler tis_int_handler+0x0/0x154 enabled
> >>>>> interrupts
> >>>>> [    4.756555] WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:159
> >>>>> __handle_irq_event_percpu+0xf4/0x180
> >>>>> [    4.765557] Modules linked in:
> >>>>> [    4.768626] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.113 #1
> >>>>> [    4.774747] Hardware name: Allied Telesis x250-18XS (DT)
> >>>>> [    4.780080] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS
> >>>>> BTYPE=--)
> >>>>> [    4.787072] pc : __handle_irq_event_percpu+0xf4/0x180
> >>>>> [    4.792146] lr : __handle_irq_event_percpu+0xf4/0x180
> >>>>> [    4.797220] sp : ffff800008003e40
> >>>>> [    4.800547] x29: ffff800008003e40 x28: ffff8000093951c0 x27:
> >>>>> ffff80000902a9b8
> >>>>> [    4.807716] x26: ffff800008fe8d28 x25: ffff8000094a62bd x24:
> >>>>> ffff000001b92400
> >>>>> [    4.814885] x23: 0000000000000026 x22: ffff800008003ec4 x21:
> >>>>> 0000000000000000
> >>>>> [    4.822053] x20: 0000000000000001 x19: ffff000002381200 x18:
> >>>>> ffffffffffffffff
> >>>>> [    4.829222] x17: ffff800076962000 x16: ffff800008000000 x15:
> >>>>> ffff800088003b57
> >>>>> [    4.836390] x14: 0000000000000000 x13: ffff8000093a5078 x12:
> >>>>> 000000000000035d
> >>>>> [    4.843558] x11: 000000000000011f x10: ffff8000093a5078 x9 :
> >>>>> ffff8000093a5078
> >>>>> [    4.850727] x8 : 00000000ffffefff x7 : ffff8000093fd078 x6 :
> >>>>> ffff8000093fd078
> >>>>> [    4.857895] x5 : 000000000000bff4 x4 : 0000000000000000 x3 :
> >>>>> 0000000000000000
> >>>>> [    4.865062] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
> >>>>> ffff8000093951c0
> >>>>> [    4.872230] Call trace:
> >>>>> [    4.874686] __handle_irq_event_percpu+0xf4/0x180
> >>>>> [    4.879411] handle_irq_event+0x64/0xec
> >>>>> [    4.883264] handle_level_irq+0xc0/0x1b0
> >>>>> [    4.887202] generic_handle_irq+0x30/0x50
> >>>>> [    4.891229] mvebu_gpio_irq_handler+0x11c/0x2a0
> >>>>> [    4.895780] handle_domain_irq+0x60/0x90
> >>>>> [    4.899720] gic_handle_irq+0x4c/0xd0
> >>>>> [    4.903398] call_on_irq_stack+0x20/0x4c
> >>>>> [    4.907338] do_interrupt_handler+0x54/0x60
> >>>>> [    4.911538] el1_interrupt+0x30/0x80
> >>>>> [    4.915130] el1h_64_irq_handler+0x18/0x24
> >>>>> [    4.919244] el1h_64_irq+0x78/0x7c
> >>>>> [    4.922659] arch_cpu_idle+0x18/0x2c
> >>>>> [    4.926249] do_idle+0xc4/0x150
> >>>>> [    4.929404] cpu_startup_entry+0x28/0x60
> >>>>> [    4.933343] rest_init+0xe4/0xf4
> >>>>> [    4.936584] arch_call_rest_init+0x10/0x1c
> >>>>> [    4.940699] start_kernel+0x600/0x640
> >>>>> [    4.944375] __primary_switched+0xbc/0xc4
> >>>>> [    4.948402] ---[ end trace 940193047b35b311 ]---
> >>>>>
> >>>>> Initially I dismissed this as a warning that would probably be
> >>>>> cleaned
> >>>>> up when we did more work on the TPM support for our product but we
> >>>>> also
> >>>>> seem to be getting some new i2c issues and possibly a kernel stack
> >>>>> corruption that we've conflated with this TPM warning.
> >>>> Can you reproduce this issue on mainline? Can you also bisect to find
> >>>> the culprit?
> >>> No the error doesn't appear on a recent mainline kernel. I do still get
> >>>
> >>> tpm_tis_spi spi1.1: 2.0 TPM (device-id 0x1B, rev-id 22)
> >>> tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling instead
> >>> tpm tpm0: A TPM error (256) occurred attempting the self test
> >>>
> >>> but I think I was getting that on v5.15.110
> >>>
> >> I repeat: Can you bisect between v5.15 and v5.15.112?
> >
> > It's definitely between v5.15.110 and v5.15.112.
> >
> > I'll do a proper bisect next week but I'm pretty sure it's related to
> > the "tpm, tpm_tis:" series. The problem can be worked around by
> > removing the TPM interrupt from the device tree for the board.
>
> Bisecting between v5.15.110 and v5.15.112 points to
>
> 51162b05a44cb5d98fb0ae2519a860910a47fd4b is the first bad commit

Thanks for the bisection.

Lino, it looks like this regression is caused by (backported) commit of yours.
Would you like to take a look on it?

Anyway, telling regzbot:

#regzbot introduced: 51162b05a44cb5

--
An old man doll... just what I always wanted! - Clara

Attachments:

(No filename) (4.93 kB)
signature.asc (235.00 B)
Download all attachments

2023-06-06 07:41:01

by Greg KH

[permalink] [raw]

Subject: Re: New kernel warning after updating from LTS 5.15.110 to 5.15.112 (and 5.15.113)

On Tue, Jun 06, 2023 at 09:00:02AM +0700, Bagas Sanjaya wrote:
> On Tue, Jun 06, 2023 at 01:41:01AM +0000, Chris Packham wrote:
> >
> > On 2/06/23 16:19, Chris Packham wrote:
> > >
> > > On 2/06/23 16:10, Bagas Sanjaya wrote:
> > >> On 5/29/23 09:37, Chris Packham wrote:
> > >>> On 29/05/23 14:04, Bagas Sanjaya wrote:
> > >>>> On Sun, May 28, 2023 at 11:42:50PM +0000, Chris Packham wrote:
> > >>>>> Hi,
> > >>>>>
> > >>>>> We have an embedded product with an Infineon SLM9670 TPM. After
> > >>>>> updating
> > >>>>> to a newer LTS kernel version we started seeing the following
> > >>>>> warning at
> > >>>>> boot.
> > >>>>>
> > >>>>> [??? 4.741025] ------------[ cut here ]------------
> > >>>>> [??? 4.749894] irq 38 handler tis_int_handler+0x0/0x154 enabled
> > >>>>> interrupts
> > >>>>> [??? 4.756555] WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:159
> > >>>>> __handle_irq_event_percpu+0xf4/0x180
> > >>>>> [??? 4.765557] Modules linked in:
> > >>>>> [??? 4.768626] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.113 #1
> > >>>>> [??? 4.774747] Hardware name: Allied Telesis x250-18XS (DT)
> > >>>>> [??? 4.780080] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS
> > >>>>> BTYPE=--)
> > >>>>> [??? 4.787072] pc : __handle_irq_event_percpu+0xf4/0x180
> > >>>>> [??? 4.792146] lr : __handle_irq_event_percpu+0xf4/0x180
> > >>>>> [??? 4.797220] sp : ffff800008003e40
> > >>>>> [??? 4.800547] x29: ffff800008003e40 x28: ffff8000093951c0 x27:
> > >>>>> ffff80000902a9b8
> > >>>>> [??? 4.807716] x26: ffff800008fe8d28 x25: ffff8000094a62bd x24:
> > >>>>> ffff000001b92400
> > >>>>> [??? 4.814885] x23: 0000000000000026 x22: ffff800008003ec4 x21:
> > >>>>> 0000000000000000
> > >>>>> [??? 4.822053] x20: 0000000000000001 x19: ffff000002381200 x18:
> > >>>>> ffffffffffffffff
> > >>>>> [??? 4.829222] x17: ffff800076962000 x16: ffff800008000000 x15:
> > >>>>> ffff800088003b57
> > >>>>> [??? 4.836390] x14: 0000000000000000 x13: ffff8000093a5078 x12:
> > >>>>> 000000000000035d
> > >>>>> [??? 4.843558] x11: 000000000000011f x10: ffff8000093a5078 x9 :
> > >>>>> ffff8000093a5078
> > >>>>> [??? 4.850727] x8 : 00000000ffffefff x7 : ffff8000093fd078 x6 :
> > >>>>> ffff8000093fd078
> > >>>>> [??? 4.857895] x5 : 000000000000bff4 x4 : 0000000000000000 x3 :
> > >>>>> 0000000000000000
> > >>>>> [??? 4.865062] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
> > >>>>> ffff8000093951c0
> > >>>>> [??? 4.872230] Call trace:
> > >>>>> [??? 4.874686]? __handle_irq_event_percpu+0xf4/0x180
> > >>>>> [??? 4.879411]? handle_irq_event+0x64/0xec
> > >>>>> [??? 4.883264]? handle_level_irq+0xc0/0x1b0
> > >>>>> [??? 4.887202]? generic_handle_irq+0x30/0x50
> > >>>>> [??? 4.891229]? mvebu_gpio_irq_handler+0x11c/0x2a0
> > >>>>> [??? 4.895780]? handle_domain_irq+0x60/0x90
> > >>>>> [??? 4.899720]? gic_handle_irq+0x4c/0xd0
> > >>>>> [??? 4.903398]? call_on_irq_stack+0x20/0x4c
> > >>>>> [??? 4.907338]? do_interrupt_handler+0x54/0x60
> > >>>>> [??? 4.911538]? el1_interrupt+0x30/0x80
> > >>>>> [??? 4.915130]? el1h_64_irq_handler+0x18/0x24
> > >>>>> [??? 4.919244]? el1h_64_irq+0x78/0x7c
> > >>>>> [??? 4.922659]? arch_cpu_idle+0x18/0x2c
> > >>>>> [??? 4.926249]? do_idle+0xc4/0x150
> > >>>>> [??? 4.929404]? cpu_startup_entry+0x28/0x60
> > >>>>> [??? 4.933343]? rest_init+0xe4/0xf4
> > >>>>> [??? 4.936584]? arch_call_rest_init+0x10/0x1c
> > >>>>> [??? 4.940699]? start_kernel+0x600/0x640
> > >>>>> [??? 4.944375]? __primary_switched+0xbc/0xc4
> > >>>>> [??? 4.948402] ---[ end trace 940193047b35b311 ]---
> > >>>>>
> > >>>>> Initially I dismissed this as a warning that would probably be
> > >>>>> cleaned
> > >>>>> up when we did more work on the TPM support for our product but we
> > >>>>> also
> > >>>>> seem to be getting some new i2c issues and possibly a kernel stack
> > >>>>> corruption that we've conflated with this TPM warning.
> > >>>> Can you reproduce this issue on mainline? Can you also bisect to find
> > >>>> the culprit?
> > >>> No the error doesn't appear on a recent mainline kernel. I do still get
> > >>>
> > >>> tpm_tis_spi spi1.1: 2.0 TPM (device-id 0x1B, rev-id 22)
> > >>> tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling instead
> > >>> tpm tpm0: A TPM error (256) occurred attempting the self test
> > >>>
> > >>> but I think I was getting that on v5.15.110
> > >>>
> > >> I repeat: Can you bisect between v5.15 and v5.15.112?
> > >
> > > It's definitely between v5.15.110 and v5.15.112.
> > >
> > > I'll do a proper bisect next week but I'm pretty sure it's related to
> > > the "tpm, tpm_tis:" series. The problem can be worked around by
> > > removing the TPM interrupt from the device tree for the board.
> >
> > Bisecting between v5.15.110 and v5.15.112 points to
> >
> > 51162b05a44cb5d98fb0ae2519a860910a47fd4b is the first bad commit
>
> Thanks for the bisection.
>
> Lino, it looks like this regression is caused by (backported) commit of yours.
> Would you like to take a look on it?
>
> Anyway, telling regzbot:
>
> #regzbot introduced: 51162b05a44cb5

There's some tpm backports to 5.15.y that were suspect and I'll look
into reverting them and see if this was one of the ones that was on that
list. Give me a few days...

thanks,

greg k-h

2023-06-06 10:12:38

by Jarkko Sakkinen

[permalink] [raw]

Subject: Re: New kernel warning after updating from LTS 5.15.110 to 5.15.112 (and 5.15.113)

On Sun, 2023-05-28 at 23:42 +0000, Chris Packham wrote:
> Hi,
>
> We have an embedded product with an Infineon SLM9670 TPM. After updating
> to a newer LTS kernel version we started seeing the following warning at
> boot.
>
> [    4.741025] ------------[ cut here ]------------
> [    4.749894] irq 38 handler tis_int_handler+0x0/0x154 enabled interrupts
> [    4.756555] WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:159
> __handle_irq_event_percpu+0xf4/0x180
> [    4.765557] Modules linked in:
> [    4.768626] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.113 #1
> [    4.774747] Hardware name: Allied Telesis x250-18XS (DT)
> [    4.780080] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS
> BTYPE=--)
> [    4.787072] pc : __handle_irq_event_percpu+0xf4/0x180
> [    4.792146] lr : __handle_irq_event_percpu+0xf4/0x180
> [    4.797220] sp : ffff800008003e40
> [    4.800547] x29: ffff800008003e40 x28: ffff8000093951c0 x27:
> ffff80000902a9b8
> [    4.807716] x26: ffff800008fe8d28 x25: ffff8000094a62bd x24:
> ffff000001b92400
> [    4.814885] x23: 0000000000000026 x22: ffff800008003ec4 x21:
> 0000000000000000
> [    4.822053] x20: 0000000000000001 x19: ffff000002381200 x18:
> ffffffffffffffff
> [    4.829222] x17: ffff800076962000 x16: ffff800008000000 x15:
> ffff800088003b57
> [    4.836390] x14: 0000000000000000 x13: ffff8000093a5078 x12:
> 000000000000035d
> [    4.843558] x11: 000000000000011f x10: ffff8000093a5078 x9 :
> ffff8000093a5078
> [    4.850727] x8 : 00000000ffffefff x7 : ffff8000093fd078 x6 :
> ffff8000093fd078
> [    4.857895] x5 : 000000000000bff4 x4 : 0000000000000000 x3 :
> 0000000000000000
> [    4.865062] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
> ffff8000093951c0
> [    4.872230] Call trace:
> [    4.874686] __handle_irq_event_percpu+0xf4/0x180
> [    4.879411] handle_irq_event+0x64/0xec
> [    4.883264] handle_level_irq+0xc0/0x1b0
> [    4.887202] generic_handle_irq+0x30/0x50
> [    4.891229] mvebu_gpio_irq_handler+0x11c/0x2a0
> [    4.895780] handle_domain_irq+0x60/0x90
> [    4.899720] gic_handle_irq+0x4c/0xd0
> [    4.903398] call_on_irq_stack+0x20/0x4c
> [    4.907338] do_interrupt_handler+0x54/0x60
> [    4.911538] el1_interrupt+0x30/0x80
> [    4.915130] el1h_64_irq_handler+0x18/0x24
> [    4.919244] el1h_64_irq+0x78/0x7c
> [    4.922659] arch_cpu_idle+0x18/0x2c
> [    4.926249] do_idle+0xc4/0x150
> [    4.929404] cpu_startup_entry+0x28/0x60
> [    4.933343] rest_init+0xe4/0xf4
> [    4.936584] arch_call_rest_init+0x10/0x1c
> [    4.940699] start_kernel+0x600/0x640
> [    4.944375] __primary_switched+0xbc/0xc4
> [    4.948402] ---[ end trace 940193047b35b311 ]---
>
> Initially I dismissed this as a warning that would probably be cleaned
> up when we did more work on the TPM support for our product but we also
> seem to be getting some new i2c issues and possibly a kernel stack
> corruption that we've conflated with this TPM warning.

Hi, sorry for late response. I've been moving my (home) office to
a different location during last couple of weeks, and email has been
piling up.

What does dmidecode give you?

More specific, I'm interested on DMI type 43:

$ sudo dmidecode -t 43
# dmidecode 3.4
Getting SMBIOS data from sysfs.
SMBIOS 3.4.0 present.

Handle 0x004D, DMI type 43, 31 bytes
TPM Device
Vendor ID: INTC
Specification Version: 2.0
Firmware Revision: 600.18
Description: INTEL
Characteristics:
Family configurable via platform software support
OEM-specific Information: 0x00000000

BR, Jarkko

2023-06-06 21:24:01

by Chris Packham

[permalink] [raw]

Subject: Re: New kernel warning after updating from LTS 5.15.110 to 5.15.112 (and 5.15.113)

Hi Jarkko,

On 6/06/23 21:39, Jarkko Sakkinen wrote:
> On Sun, 2023-05-28 at 23:42 +0000, Chris Packham wrote:
>> Hi,
>>
>> We have an embedded product with an Infineon SLM9670 TPM. After updating
>> to a newer LTS kernel version we started seeing the following warning at
>> boot.
>>
>> [    4.741025] ------------[ cut here ]------------
>> [    4.749894] irq 38 handler tis_int_handler+0x0/0x154 enabled interrupts
>> [    4.756555] WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:159
>> __handle_irq_event_percpu+0xf4/0x180
>> [    4.765557] Modules linked in:
>> [    4.768626] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.113 #1
>> [    4.774747] Hardware name: Allied Telesis x250-18XS (DT)
>> [    4.780080] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS
>> BTYPE=--)
>> [    4.787072] pc : __handle_irq_event_percpu+0xf4/0x180
>> [    4.792146] lr : __handle_irq_event_percpu+0xf4/0x180
>> [    4.797220] sp : ffff800008003e40
>> [    4.800547] x29: ffff800008003e40 x28: ffff8000093951c0 x27:
>> ffff80000902a9b8
>> [    4.807716] x26: ffff800008fe8d28 x25: ffff8000094a62bd x24:
>> ffff000001b92400
>> [    4.814885] x23: 0000000000000026 x22: ffff800008003ec4 x21:
>> 0000000000000000
>> [    4.822053] x20: 0000000000000001 x19: ffff000002381200 x18:
>> ffffffffffffffff
>> [    4.829222] x17: ffff800076962000 x16: ffff800008000000 x15:
>> ffff800088003b57
>> [    4.836390] x14: 0000000000000000 x13: ffff8000093a5078 x12:
>> 000000000000035d
>> [    4.843558] x11: 000000000000011f x10: ffff8000093a5078 x9 :
>> ffff8000093a5078
>> [    4.850727] x8 : 00000000ffffefff x7 : ffff8000093fd078 x6 :
>> ffff8000093fd078
>> [    4.857895] x5 : 000000000000bff4 x4 : 0000000000000000 x3 :
>> 0000000000000000
>> [    4.865062] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
>> ffff8000093951c0
>> [    4.872230] Call trace:
>> [    4.874686] __handle_irq_event_percpu+0xf4/0x180
>> [    4.879411] handle_irq_event+0x64/0xec
>> [    4.883264] handle_level_irq+0xc0/0x1b0
>> [    4.887202] generic_handle_irq+0x30/0x50
>> [    4.891229] mvebu_gpio_irq_handler+0x11c/0x2a0
>> [    4.895780] handle_domain_irq+0x60/0x90
>> [    4.899720] gic_handle_irq+0x4c/0xd0
>> [    4.903398] call_on_irq_stack+0x20/0x4c
>> [    4.907338] do_interrupt_handler+0x54/0x60
>> [    4.911538] el1_interrupt+0x30/0x80
>> [    4.915130] el1h_64_irq_handler+0x18/0x24
>> [    4.919244] el1h_64_irq+0x78/0x7c
>> [    4.922659] arch_cpu_idle+0x18/0x2c
>> [    4.926249] do_idle+0xc4/0x150
>> [    4.929404] cpu_startup_entry+0x28/0x60
>> [    4.933343] rest_init+0xe4/0xf4
>> [    4.936584] arch_call_rest_init+0x10/0x1c
>> [    4.940699] start_kernel+0x600/0x640
>> [    4.944375] __primary_switched+0xbc/0xc4
>> [    4.948402] ---[ end trace 940193047b35b311 ]---
>>
>> Initially I dismissed this as a warning that would probably be cleaned
>> up when we did more work on the TPM support for our product but we also
>> seem to be getting some new i2c issues and possibly a kernel stack
>> corruption that we've conflated with this TPM warning.
> Hi, sorry for late response. I've been moving my (home) office to
> a different location during last couple of weeks, and email has been
> piling up.
>
> What does dmidecode give you?
>
> More specific, I'm interested on DMI type 43:
>
> $ sudo dmidecode -t 43
> # dmidecode 3.4
> Getting SMBIOS data from sysfs.
> SMBIOS 3.4.0 present.
>
> Handle 0x004D, DMI type 43, 31 bytes
> TPM Device
> Vendor ID: INTC
> Specification Version: 2.0
> Firmware Revision: 600.18
> Description: INTEL
> Characteristics:
> Family configurable via platform software support
> OEM-specific Information: 0x00000000
>
> BR, Jarkko

This is an embedded ARM64 (Marvell CN9130 SoC) device so no BIOS. The
relevant snippet from the device tree is

        tpm@1 {
                compatible = "infineon,slb9670";
                reg = <1>; /* Chip select 1 */
                interrupt-parent = <&cp0_gpio2>;
                interrupts = <30 IRQ_TYPE_LEVEL_LOW>;
                spi-max-frequency = <31250000>;
        };

and I can tell you that the specific TPM chip is an Infinieon
SLM9670AQ20FW1311XTMA1

2023-06-07 02:05:33

by Lino Sanfilippo

[permalink] [raw]

Subject: Re: New kernel warning after updating from LTS 5.15.110 to 5.15.112 (and 5.15.113)

Hi Bagas, hi Chris

On 06.06.23 04:00, Bagas Sanjaya wrote:

> On Tue, Jun 06, 2023 at 01:41:01AM +0000, Chris Packham wrote:
>>

>>
>> Bisecting between v5.15.110 and v5.15.112 points to
>>
>> 51162b05a44cb5d98fb0ae2519a860910a47fd4b is the first bad commit
>
> Thanks for the bisection.
>
> Lino, it looks like this regression is caused by (backported) commit of yours.
> Would you like to take a look on it?
>

Before commit 51162b05a44c interrupt activation has failed since the concerning register was accessed
without holding the required locality.

Now with the commit applied the activation is successful and the interrupt handler is called as soon
as an interrupt fires. However the handler runs in interrupt context while the register accesses
are done via SPI which involves the SPI bus_lock_mutex. Calling the (sleepable) SPI functions in
interrupt context results in the observed warning.

To fix this additionally upstream commit 0c7e66e5fd69 ("tpm, tpm_tis: Request threaded interrupt handler") is
required, since it ensures that the handler runs in process context.

Note that even with this commit interrupts will eventually be disabled since the test for interrupts
still fails (for the test to succeed at least upstream commit e644b2f498d2 "tpm, tpm_tis: Enable interrupt test"
would be required).

Chris, could you test again with commit 0c7e66e5fd69 additionally applied and confirm that the warning is gone?

Regards,
Lino

2023-06-07 03:55:59

by Chris Packham

[permalink] [raw]

Subject: Re: New kernel warning after updating from LTS 5.15.110 to 5.15.112 (and 5.15.113)

Hi Lino,

On 7/06/23 13:43, Lino Sanfilippo wrote:
> Hi Bagas, hi Chris
>
> On 06.06.23 04:00, Bagas Sanjaya wrote:
>
>> On Tue, Jun 06, 2023 at 01:41:01AM +0000, Chris Packham wrote:
>>> Bisecting between v5.15.110 and v5.15.112 points to
>>>
>>> 51162b05a44cb5d98fb0ae2519a860910a47fd4b is the first bad commit
>> Thanks for the bisection.
>>
>> Lino, it looks like this regression is caused by (backported) commit of yours.
>> Would you like to take a look on it?
>>
> Before commit 51162b05a44c interrupt activation has failed since the concerning register was accessed
> without holding the required locality.
>
> Now with the commit applied the activation is successful and the interrupt handler is called as soon
> as an interrupt fires. However the handler runs in interrupt context while the register accesses
> are done via SPI which involves the SPI bus_lock_mutex. Calling the (sleepable) SPI functions in
> interrupt context results in the observed warning.
>
> To fix this additionally upstream commit 0c7e66e5fd69 ("tpm, tpm_tis: Request threaded interrupt handler") is
> required, since it ensures that the handler runs in process context.
>
> Note that even with this commit interrupts will eventually be disabled since the test for interrupts
> still fails (for the test to succeed at least upstream commit e644b2f498d2 "tpm, tpm_tis: Enable interrupt test"
> would be required).
>
> Chris, could you test again with commit 0c7e66e5fd69 additionally applied and confirm that the warning is gone?

Yes with 0c7e66e5fd69 cherry-picked on top the warning goes away. Adding
e644b2f498d2 doesn't seem to change anything (still reports that
interrupts aren't working) but that's the same as the latest mainline on
this hardware.

2023-06-07 15:21:42

by Lino Sanfilippo

[permalink] [raw]

Subject: Re: New kernel warning after updating from LTS 5.15.110 to 5.15.112 (and 5.15.113)

Hi Chris,

On 07.06.23 05:23, Chris Packham wrote:

>> Chris, could you test again with commit 0c7e66e5fd69 additionally applied and confirm that the warning is gone?
>
> Yes with 0c7e66e5fd69 cherry-picked on top the warning goes away. Adding
> e644b2f498d2 doesn't seem to change anything (still reports that
> interrupts aren't working) but that's the same as the latest mainline on
> this hardware.

Thanks a lot for testing this!

If interrupts do not even work with the latest mainline kernel something
else must be wrong.

But it is good to know that the cherry-pick fixes at least the regression.

Best regards,
Lino

2023-06-07 16:04:40

by Lino Sanfilippo

[permalink] [raw]

Subject: Re: New kernel warning after updating from LTS 5.15.110 to 5.15.112 (and 5.15.113)

Hi Greg,

On 06.06.23 08:45, Greg KH wrote:
>>
>> Lino, it looks like this regression is caused by (backported) commit of yours.
>> Would you like to take a look on it?
>>
>> Anyway, telling regzbot:
>>
>> #regzbot introduced: 51162b05a44cb5
>
> There's some tpm backports to 5.15.y that were suspect and I'll look
> into reverting them and see if this was one of the ones that was on that
> list. Give me a few days...
>

Could you please consider to apply (mainline) commit 0c7e66e5fd69 ("tpm, tpm_tis: Request threaded
interrupt handler") to 5.15.y?

As Chris confirmed it fixes the regression caused by 51162b05a44cb5 ("tpm, tpm_tis: Claim locality
before writing interrupt registers").

Commit 0c7e66e5fd69 is also needed for 5.10.y, 6.1.y and 6.3.y.

Best regards,
Lino

2023-06-07 16:50:22

by Jarkko Sakkinen

[permalink] [raw]

Subject: Re: New kernel warning after updating from LTS 5.15.110 to 5.15.112 (and 5.15.113)

On Wed Jun 7, 2023 at 12:04 AM EEST, Chris Packham wrote:
> Hi Jarkko,
>
> On 6/06/23 21:39, Jarkko Sakkinen wrote:
> > On Sun, 2023-05-28 at 23:42 +0000, Chris Packham wrote:
> >> Hi,
> >>
> >> We have an embedded product with an Infineon SLM9670 TPM. After updating
> >> to a newer LTS kernel version we started seeing the following warning at
> >> boot.
> >>
> >> [    4.741025] ------------[ cut here ]------------
> >> [    4.749894] irq 38 handler tis_int_handler+0x0/0x154 enabled interrupts
> >> [    4.756555] WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:159
> >> __handle_irq_event_percpu+0xf4/0x180
> >> [    4.765557] Modules linked in:
> >> [    4.768626] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.113 #1
> >> [    4.774747] Hardware name: Allied Telesis x250-18XS (DT)
> >> [    4.780080] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS
> >> BTYPE=--)
> >> [    4.787072] pc : __handle_irq_event_percpu+0xf4/0x180
> >> [    4.792146] lr : __handle_irq_event_percpu+0xf4/0x180
> >> [    4.797220] sp : ffff800008003e40
> >> [    4.800547] x29: ffff800008003e40 x28: ffff8000093951c0 x27:
> >> ffff80000902a9b8
> >> [    4.807716] x26: ffff800008fe8d28 x25: ffff8000094a62bd x24:
> >> ffff000001b92400
> >> [    4.814885] x23: 0000000000000026 x22: ffff800008003ec4 x21:
> >> 0000000000000000
> >> [    4.822053] x20: 0000000000000001 x19: ffff000002381200 x18:
> >> ffffffffffffffff
> >> [    4.829222] x17: ffff800076962000 x16: ffff800008000000 x15:
> >> ffff800088003b57
> >> [    4.836390] x14: 0000000000000000 x13: ffff8000093a5078 x12:
> >> 000000000000035d
> >> [    4.843558] x11: 000000000000011f x10: ffff8000093a5078 x9 :
> >> ffff8000093a5078
> >> [    4.850727] x8 : 00000000ffffefff x7 : ffff8000093fd078 x6 :
> >> ffff8000093fd078
> >> [    4.857895] x5 : 000000000000bff4 x4 : 0000000000000000 x3 :
> >> 0000000000000000
> >> [    4.865062] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
> >> ffff8000093951c0
> >> [    4.872230] Call trace:
> >> [    4.874686] __handle_irq_event_percpu+0xf4/0x180
> >> [    4.879411] handle_irq_event+0x64/0xec
> >> [    4.883264] handle_level_irq+0xc0/0x1b0
> >> [    4.887202] generic_handle_irq+0x30/0x50
> >> [    4.891229] mvebu_gpio_irq_handler+0x11c/0x2a0
> >> [    4.895780] handle_domain_irq+0x60/0x90
> >> [    4.899720] gic_handle_irq+0x4c/0xd0
> >> [    4.903398] call_on_irq_stack+0x20/0x4c
> >> [    4.907338] do_interrupt_handler+0x54/0x60
> >> [    4.911538] el1_interrupt+0x30/0x80
> >> [    4.915130] el1h_64_irq_handler+0x18/0x24
> >> [    4.919244] el1h_64_irq+0x78/0x7c
> >> [    4.922659] arch_cpu_idle+0x18/0x2c
> >> [    4.926249] do_idle+0xc4/0x150
> >> [    4.929404] cpu_startup_entry+0x28/0x60
> >> [    4.933343] rest_init+0xe4/0xf4
> >> [    4.936584] arch_call_rest_init+0x10/0x1c
> >> [    4.940699] start_kernel+0x600/0x640
> >> [    4.944375] __primary_switched+0xbc/0xc4
> >> [    4.948402] ---[ end trace 940193047b35b311 ]---
> >>
> >> Initially I dismissed this as a warning that would probably be cleaned
> >> up when we did more work on the TPM support for our product but we also
> >> seem to be getting some new i2c issues and possibly a kernel stack
> >> corruption that we've conflated with this TPM warning.
> > Hi, sorry for late response. I've been moving my (home) office to
> > a different location during last couple of weeks, and email has been
> > piling up.
> >
> > What does dmidecode give you?
> >
> > More specific, I'm interested on DMI type 43:
> >
> > $ sudo dmidecode -t 43
> > # dmidecode 3.4
> > Getting SMBIOS data from sysfs.
> > SMBIOS 3.4.0 present.
> >
> > Handle 0x004D, DMI type 43, 31 bytes
> > TPM Device
> > Vendor ID: INTC
> > Specification Version: 2.0
> > Firmware Revision: 600.18
> > Description: INTEL
> > Characteristics:
> > Family configurable via platform software support
> > OEM-specific Information: 0x00000000
> >
> > BR, Jarkko
>
> This is an embedded ARM64 (Marvell CN9130 SoC) device so no BIOS. The
> relevant snippet from the device tree is
>
>         tpm@1 {
>                 compatible = "infineon,slb9670";
>                 reg = <1>; /* Chip select 1 */
>                 interrupt-parent = <&cp0_gpio2>;
>                 interrupts = <30 IRQ_TYPE_LEVEL_LOW>;
>                 spi-max-frequency = <31250000>;
>         };
>
> and I can tell you that the specific TPM chip is an Infinieon
> SLM9670AQ20FW1311XTMA1

OK, you know what I own that chip in the form of LetsTrustTPM
product.

I have not used it a lot because of lack of time but I could try
to reproduce the bug with that and RPi 3B, or at least see what
happens with different hardware platform with the same TPM chip.

BR, Jarkko

2023-06-07 18:12:22

by Greg Kroah-Hartman

[permalink] [raw]

Subject: Re: New kernel warning after updating from LTS 5.15.110 to 5.15.112 (and 5.15.113)

On Wed, Jun 07, 2023 at 05:47:57PM +0200, Lino Sanfilippo wrote:
>
> Hi Greg,
>
> On 06.06.23 08:45, Greg KH wrote:
> >>
> >> Lino, it looks like this regression is caused by (backported) commit of yours.
> >> Would you like to take a look on it?
> >>
> >> Anyway, telling regzbot:
> >>
> >> #regzbot introduced: 51162b05a44cb5
> >
> > There's some tpm backports to 5.15.y that were suspect and I'll look
> > into reverting them and see if this was one of the ones that was on that
> > list. Give me a few days...
> >
>
> Could you please consider to apply (mainline) commit 0c7e66e5fd69 ("tpm, tpm_tis: Request threaded
> interrupt handler") to 5.15.y?
>
> As Chris confirmed it fixes the regression caused by 51162b05a44cb5 ("tpm, tpm_tis: Claim locality
> before writing interrupt registers").
>
> Commit 0c7e66e5fd69 is also needed for 5.10.y, 6.1.y and 6.3.y.

Now queued up, thanks.

greg k-h

2023-06-08 20:57:59

by Chris Packham

[permalink] [raw]

Subject: Re: New kernel warning after updating from LTS 5.15.110 to 5.15.112 (and 5.15.113)

Hi Jarkko,

On 9/06/23 03:17, Jarkko Sakkinen wrote:
> On Wed Jun 7, 2023 at 7:15 PM EEST, Jarkko Sakkinen wrote:
>> On Wed Jun 7, 2023 at 12:04 AM EEST, Chris Packham wrote:
>>> Hi Jarkko,
>>>
>>> On 6/06/23 21:39, Jarkko Sakkinen wrote:
>>>> On Sun, 2023-05-28 at 23:42 +0000, Chris Packham wrote:
>>>>> Hi,
>>>>>
>>>>> We have an embedded product with an Infineon SLM9670 TPM. After updating
>>>>> to a newer LTS kernel version we started seeing the following warning at
>>>>> boot.
>>>>>
>>>>> [    4.741025] ------------[ cut here ]------------
>>>>> [    4.749894] irq 38 handler tis_int_handler+0x0/0x154 enabled interrupts
>>>>> [    4.756555] WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:159
>>>>> __handle_irq_event_percpu+0xf4/0x180
>>>>> [    4.765557] Modules linked in:
>>>>> [    4.768626] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.113 #1
>>>>> [    4.774747] Hardware name: Allied Telesis x250-18XS (DT)
>>>>> [    4.780080] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS
>>>>> BTYPE=--)
>>>>> [    4.787072] pc : __handle_irq_event_percpu+0xf4/0x180
>>>>> [    4.792146] lr : __handle_irq_event_percpu+0xf4/0x180
>>>>> [    4.797220] sp : ffff800008003e40
>>>>> [    4.800547] x29: ffff800008003e40 x28: ffff8000093951c0 x27:
>>>>> ffff80000902a9b8
>>>>> [    4.807716] x26: ffff800008fe8d28 x25: ffff8000094a62bd x24:
>>>>> ffff000001b92400
>>>>> [    4.814885] x23: 0000000000000026 x22: ffff800008003ec4 x21:
>>>>> 0000000000000000
>>>>> [    4.822053] x20: 0000000000000001 x19: ffff000002381200 x18:
>>>>> ffffffffffffffff
>>>>> [    4.829222] x17: ffff800076962000 x16: ffff800008000000 x15:
>>>>> ffff800088003b57
>>>>> [    4.836390] x14: 0000000000000000 x13: ffff8000093a5078 x12:
>>>>> 000000000000035d
>>>>> [    4.843558] x11: 000000000000011f x10: ffff8000093a5078 x9 :
>>>>> ffff8000093a5078
>>>>> [    4.850727] x8 : 00000000ffffefff x7 : ffff8000093fd078 x6 :
>>>>> ffff8000093fd078
>>>>> [    4.857895] x5 : 000000000000bff4 x4 : 0000000000000000 x3 :
>>>>> 0000000000000000
>>>>> [    4.865062] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
>>>>> ffff8000093951c0
>>>>> [    4.872230] Call trace:
>>>>> [    4.874686] __handle_irq_event_percpu+0xf4/0x180
>>>>> [    4.879411] handle_irq_event+0x64/0xec
>>>>> [    4.883264] handle_level_irq+0xc0/0x1b0
>>>>> [    4.887202] generic_handle_irq+0x30/0x50
>>>>> [    4.891229] mvebu_gpio_irq_handler+0x11c/0x2a0
>>>>> [    4.895780] handle_domain_irq+0x60/0x90
>>>>> [    4.899720] gic_handle_irq+0x4c/0xd0
>>>>> [    4.903398] call_on_irq_stack+0x20/0x4c
>>>>> [    4.907338] do_interrupt_handler+0x54/0x60
>>>>> [    4.911538] el1_interrupt+0x30/0x80
>>>>> [    4.915130] el1h_64_irq_handler+0x18/0x24
>>>>> [    4.919244] el1h_64_irq+0x78/0x7c
>>>>> [    4.922659] arch_cpu_idle+0x18/0x2c
>>>>> [    4.926249] do_idle+0xc4/0x150
>>>>> [    4.929404] cpu_startup_entry+0x28/0x60
>>>>> [    4.933343] rest_init+0xe4/0xf4
>>>>> [    4.936584] arch_call_rest_init+0x10/0x1c
>>>>> [    4.940699] start_kernel+0x600/0x640
>>>>> [    4.944375] __primary_switched+0xbc/0xc4
>>>>> [    4.948402] ---[ end trace 940193047b35b311 ]---
>>>>>
>>>>> Initially I dismissed this as a warning that would probably be cleaned
>>>>> up when we did more work on the TPM support for our product but we also
>>>>> seem to be getting some new i2c issues and possibly a kernel stack
>>>>> corruption that we've conflated with this TPM warning.
>>>> Hi, sorry for late response. I've been moving my (home) office to
>>>> a different location during last couple of weeks, and email has been
>>>> piling up.
>>>>
>>>> What does dmidecode give you?
>>>>
>>>> More specific, I'm interested on DMI type 43:
>>>>
>>>> $ sudo dmidecode -t 43
>>>> # dmidecode 3.4
>>>> Getting SMBIOS data from sysfs.
>>>> SMBIOS 3.4.0 present.
>>>>
>>>> Handle 0x004D, DMI type 43, 31 bytes
>>>> TPM Device
>>>> Vendor ID: INTC
>>>> Specification Version: 2.0
>>>> Firmware Revision: 600.18
>>>> Description: INTEL
>>>> Characteristics:
>>>> Family configurable via platform software support
>>>> OEM-specific Information: 0x00000000
>>>>
>>>> BR, Jarkko
>>> This is an embedded ARM64 (Marvell CN9130 SoC) device so no BIOS. The
>>> relevant snippet from the device tree is
>>>
>>>         tpm@1 {
>>>                 compatible = "infineon,slb9670";
>>>                 reg = <1>; /* Chip select 1 */
>>>                 interrupt-parent = <&cp0_gpio2>;
>>>                 interrupts = <30 IRQ_TYPE_LEVEL_LOW>;
>>>                 spi-max-frequency = <31250000>;
>>>         };
>>>
>>> and I can tell you that the specific TPM chip is an Infinieon
>>> SLM9670AQ20FW1311XTMA1
>> OK, you know what I own that chip in the form of LetsTrustTPM
>> product.
>>
>> I have not used it a lot because of lack of time but I could try
>> to reproduce the bug with that and RPi 3B, or at least see what
>> happens with different hardware platform with the same TPM chip.
> I'm not device tree expert but with my limited knowledge, I guess kwe
> could add a quirk that uses of_machine_is_compatible(), to disable
> IRQ's, i.e. base the policy on specific boards rather than specific
> chips: [*]
>
> if (of_machine_is_compatible("marvell,cn9130")) {
> dev_notice(dev, "disable interrupts");
> interrupts = 0;
> }
>
> [*] I looked up arch/arm64/boot/dts/marvell/cn9130.dtsi. I hope I picked
> the correct file.

The warning itself was resolved by bringing in a further change for the
LTS branch[1]. There does still seem to be an issue with the interrupts
actually working (same behaviour on mainline) but at least now there is
no warning and no adverse downstream effects.

In terms of device tree stuff to disable the interrupt I could simply
remove the interrupt properties from the board DTS (I was doing this as
a workaround before the correct fix was identified).

[1] -
https://lore.kernel.org/linux-integrity/[email protected]/

2023-06-09 06:38:20

by Jarkko Sakkinen

[permalink] [raw]

Subject: Re: New kernel warning after updating from LTS 5.15.110 to 5.15.112 (and 5.15.113)

On Thu Jun 8, 2023 at 11:39 PM EEST, Chris Packham wrote:
> Hi Jarkko,
>
> On 9/06/23 03:17, Jarkko Sakkinen wrote:
> > On Wed Jun 7, 2023 at 7:15 PM EEST, Jarkko Sakkinen wrote:
> >> On Wed Jun 7, 2023 at 12:04 AM EEST, Chris Packham wrote:
> >>> Hi Jarkko,
> >>>
> >>> On 6/06/23 21:39, Jarkko Sakkinen wrote:
> >>>> On Sun, 2023-05-28 at 23:42 +0000, Chris Packham wrote:
> >>>>> Hi,
> >>>>>
> >>>>> We have an embedded product with an Infineon SLM9670 TPM. After updating
> >>>>> to a newer LTS kernel version we started seeing the following warning at
> >>>>> boot.
> >>>>>
> >>>>> [    4.741025] ------------[ cut here ]------------
> >>>>> [    4.749894] irq 38 handler tis_int_handler+0x0/0x154 enabled interrupts
> >>>>> [    4.756555] WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:159
> >>>>> __handle_irq_event_percpu+0xf4/0x180
> >>>>> [    4.765557] Modules linked in:
> >>>>> [    4.768626] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.113 #1
> >>>>> [    4.774747] Hardware name: Allied Telesis x250-18XS (DT)
> >>>>> [    4.780080] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS
> >>>>> BTYPE=--)
> >>>>> [    4.787072] pc : __handle_irq_event_percpu+0xf4/0x180
> >>>>> [    4.792146] lr : __handle_irq_event_percpu+0xf4/0x180
> >>>>> [    4.797220] sp : ffff800008003e40
> >>>>> [    4.800547] x29: ffff800008003e40 x28: ffff8000093951c0 x27:
> >>>>> ffff80000902a9b8
> >>>>> [    4.807716] x26: ffff800008fe8d28 x25: ffff8000094a62bd x24:
> >>>>> ffff000001b92400
> >>>>> [    4.814885] x23: 0000000000000026 x22: ffff800008003ec4 x21:
> >>>>> 0000000000000000
> >>>>> [    4.822053] x20: 0000000000000001 x19: ffff000002381200 x18:
> >>>>> ffffffffffffffff
> >>>>> [    4.829222] x17: ffff800076962000 x16: ffff800008000000 x15:
> >>>>> ffff800088003b57
> >>>>> [    4.836390] x14: 0000000000000000 x13: ffff8000093a5078 x12:
> >>>>> 000000000000035d
> >>>>> [    4.843558] x11: 000000000000011f x10: ffff8000093a5078 x9 :
> >>>>> ffff8000093a5078
> >>>>> [    4.850727] x8 : 00000000ffffefff x7 : ffff8000093fd078 x6 :
> >>>>> ffff8000093fd078
> >>>>> [    4.857895] x5 : 000000000000bff4 x4 : 0000000000000000 x3 :
> >>>>> 0000000000000000
> >>>>> [    4.865062] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
> >>>>> ffff8000093951c0
> >>>>> [    4.872230] Call trace:
> >>>>> [    4.874686] __handle_irq_event_percpu+0xf4/0x180
> >>>>> [    4.879411] handle_irq_event+0x64/0xec
> >>>>> [    4.883264] handle_level_irq+0xc0/0x1b0
> >>>>> [    4.887202] generic_handle_irq+0x30/0x50
> >>>>> [    4.891229] mvebu_gpio_irq_handler+0x11c/0x2a0
> >>>>> [    4.895780] handle_domain_irq+0x60/0x90
> >>>>> [    4.899720] gic_handle_irq+0x4c/0xd0
> >>>>> [    4.903398] call_on_irq_stack+0x20/0x4c
> >>>>> [    4.907338] do_interrupt_handler+0x54/0x60
> >>>>> [    4.911538] el1_interrupt+0x30/0x80
> >>>>> [    4.915130] el1h_64_irq_handler+0x18/0x24
> >>>>> [    4.919244] el1h_64_irq+0x78/0x7c
> >>>>> [    4.922659] arch_cpu_idle+0x18/0x2c
> >>>>> [    4.926249] do_idle+0xc4/0x150
> >>>>> [    4.929404] cpu_startup_entry+0x28/0x60
> >>>>> [    4.933343] rest_init+0xe4/0xf4
> >>>>> [    4.936584] arch_call_rest_init+0x10/0x1c
> >>>>> [    4.940699] start_kernel+0x600/0x640
> >>>>> [    4.944375] __primary_switched+0xbc/0xc4
> >>>>> [    4.948402] ---[ end trace 940193047b35b311 ]---
> >>>>>
> >>>>> Initially I dismissed this as a warning that would probably be cleaned
> >>>>> up when we did more work on the TPM support for our product but we also
> >>>>> seem to be getting some new i2c issues and possibly a kernel stack
> >>>>> corruption that we've conflated with this TPM warning.
> >>>> Hi, sorry for late response. I've been moving my (home) office to
> >>>> a different location during last couple of weeks, and email has been
> >>>> piling up.
> >>>>
> >>>> What does dmidecode give you?
> >>>>
> >>>> More specific, I'm interested on DMI type 43:
> >>>>
> >>>> $ sudo dmidecode -t 43
> >>>> # dmidecode 3.4
> >>>> Getting SMBIOS data from sysfs.
> >>>> SMBIOS 3.4.0 present.
> >>>>
> >>>> Handle 0x004D, DMI type 43, 31 bytes
> >>>> TPM Device
> >>>> Vendor ID: INTC
> >>>> Specification Version: 2.0
> >>>> Firmware Revision: 600.18
> >>>> Description: INTEL
> >>>> Characteristics:
> >>>> Family configurable via platform software support
> >>>> OEM-specific Information: 0x00000000
> >>>>
> >>>> BR, Jarkko
> >>> This is an embedded ARM64 (Marvell CN9130 SoC) device so no BIOS. The
> >>> relevant snippet from the device tree is
> >>>
> >>>         tpm@1 {
> >>>                 compatible = "infineon,slb9670";
> >>>                 reg = <1>; /* Chip select 1 */
> >>>                 interrupt-parent = <&cp0_gpio2>;
> >>>                 interrupts = <30 IRQ_TYPE_LEVEL_LOW>;
> >>>                 spi-max-frequency = <31250000>;
> >>>         };
> >>>
> >>> and I can tell you that the specific TPM chip is an Infinieon
> >>> SLM9670AQ20FW1311XTMA1
> >> OK, you know what I own that chip in the form of LetsTrustTPM
> >> product.
> >>
> >> I have not used it a lot because of lack of time but I could try
> >> to reproduce the bug with that and RPi 3B, or at least see what
> >> happens with different hardware platform with the same TPM chip.
> > I'm not device tree expert but with my limited knowledge, I guess kwe
> > could add a quirk that uses of_machine_is_compatible(), to disable
> > IRQ's, i.e. base the policy on specific boards rather than specific
> > chips: [*]
> >
> > if (of_machine_is_compatible("marvell,cn9130")) {
> > dev_notice(dev, "disable interrupts");
> > interrupts = 0;
> > }
> >
> > [*] I looked up arch/arm64/boot/dts/marvell/cn9130.dtsi. I hope I picked
> > the correct file.
>
> The warning itself was resolved by bringing in a further change for the
> LTS branch[1]. There does still seem to be an issue with the interrupts
> actually working (same behaviour on mainline) but at least now there is
> no warning and no adverse downstream effects.
>
> In terms of device tree stuff to disable the interrupt I could simply
> remove the interrupt properties from the board DTS (I was doing this as
> a workaround before the correct fix was identified).

I think it would be a good call because if a product creator *wants*
interrupts they will know it. Thus, it is perfectly fine IMHO to
disable them in the board DTS. I.e. very different case from
PC/workstation computing.

BR, Jarkko

2023-06-20 12:54:36

by Thorsten Leemhuis

[permalink] [raw]

Subject: Re: New kernel warning after updating from LTS 5.15.110 to 5.15.112 (and 5.15.113)

[TLDR: This mail in primarily relevant for Linux kernel regression
tracking. See link in footer if these mails annoy you.]

On 07.06.23 19:49, Greg KH wrote:
> On Wed, Jun 07, 2023 at 05:47:57PM +0200, Lino Sanfilippo wrote:
>> On 06.06.23 08:45, Greg KH wrote:
>>>>
>>>> Lino, it looks like this regression is caused by (backported) commit of yours.
>>>> Would you like to take a look on it?

>>>> Anyway, telling regzbot:
>>>>
>>>> #regzbot introduced: 51162b05a44cb5
>>>
>>> There's some tpm backports to 5.15.y that were suspect and I'll look
>>> into reverting them and see if this was one of the ones that was on that
>>> list. Give me a few days...
>>
>> Could you please consider to apply (mainline) commit 0c7e66e5fd69 ("tpm, tpm_tis: Request threaded
>> interrupt handler") to 5.15.y?
>>
>> As Chris confirmed it fixes the regression caused by 51162b05a44cb5 ("tpm, tpm_tis: Claim locality
>> before writing interrupt registers").
>>
>> Commit 0c7e66e5fd69 is also needed for 5.10.y, 6.1.y and 6.3.y.
>
> Now queued up, thanks.

#regzbot fix: 0c7e66e5fd69
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

2023-08-31 11:53:08

by Thorsten Leemhuis

[permalink] [raw]

Subject: Re: New kernel warning after updating from LTS 5.15.110 to 5.15.112 (and 5.15.113)

[TLDR: This mail in primarily relevant for Linux kernel regression
tracking. See link in footer if these mails annoy you.]

On 20.06.23 14:41, Linux regression tracking #update (Thorsten Leemhuis)
wrote:
> On 07.06.23 19:49, Greg KH wrote:
>> On Wed, Jun 07, 2023 at 05:47:57PM +0200, Lino Sanfilippo wrote:
>>> On 06.06.23 08:45, Greg KH wrote:
>>>>>
>>>>> Lino, it looks like this regression is caused by (backported) commit of yours.
>>>>> Would you like to take a look on it?
>
>>>>> Anyway, telling regzbot:
>>>>>
>>>>> #regzbot introduced: 51162b05a44cb5
>>>>
>>>> There's some tpm backports to 5.15.y that were suspect and I'll look
>>>> into reverting them and see if this was one of the ones that was on that
>>>> list. Give me a few days...
>>>
>>> Could you please consider to apply (mainline) commit 0c7e66e5fd69 ("tpm, tpm_tis: Request threaded
>>> interrupt handler") to 5.15.y?
>>>
>>> As Chris confirmed it fixes the regression caused by 51162b05a44cb5 ("tpm, tpm_tis: Claim locality
>>> before writing interrupt registers").
>>>
>>> Commit 0c7e66e5fd69 is also needed for 5.10.y, 6.1.y and 6.3.y.
>>
>> Now queued up, thanks.
>
> #regzbot fix: 0c7e66e5fd69
> #regzbot ignore-activity

Brown paperback fix: should have used a stable commit id. Sory for the noise

#regzbot fix: 4c3dda6b7cfd73fe818e424fe89ea19674ddb
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.