Hello,
kernel test robot noticed "RIP:acpi_safe_halt" on:
commit: e644b2f498d297a928efcb7ff6f900c27f8b788e ("tpm, tpm_tis: Enable interrupt test")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
[test failed on linux-next/master 84e2893b4573da3bc0c9f24e2005442e420e3831]
in testcase: stress-ng
version: stress-ng-x86_64-0.15.04-1_20230427
with following parameters:
nr_threads: 100%
disk: 1HDD
testtime: 60s
class: interrupt
test: signest
cpufreq_governor: performance
compiler: gcc-11
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-lkp/[email protected]
kern :warn : [ 26.609994] CPU: 21 PID: 0 Comm: swapper/21 Not tainted 6.3.0-00022-ge644b2f498d2 #1
kern :warn : [ 26.609994] Hardware name: Inspur NF5180M6/NF5180M6, BIOS 06.00.04 04/12/2022
kern :warn : [ 26.609994] Call Trace:
kern :warn : [ 26.609994] <IRQ>
kern :warn : [ 26.609994] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1))
kern :warn : [ 26.609994] __report_bad_irq (kernel/irq/spurious.c:214)
kern :warn : [ 26.609994] note_interrupt (kernel/irq/spurious.c:423)
kern :warn : [ 26.609994] handle_irq_event (kernel/irq/handle.c:198 kernel/irq/handle.c:210)
kern :warn : [ 26.609994] handle_fasteoi_irq (kernel/irq/chip.c:661 kernel/irq/chip.c:716)
kern :warn : [ 26.609994] __common_interrupt (include/linux/irqdesc.h:158 arch/x86/kernel/irq.c:231 arch/x86/kernel/irq.c:250)
kern :warn : [ 26.609994] common_interrupt (arch/x86/kernel/irq.c:240 (discriminator 14))
kern :warn : [ 26.609994] </IRQ>
kern :warn : [ 26.713811] <TASK>
kern :warn : [ 26.713811] asm_common_interrupt (arch/x86/include/asm/idtentry.h:636)
kern :warn : [ 26.713811] RIP: 0010:acpi_safe_halt (arch/x86/include/asm/irqflags.h:37 arch/x86/include/asm/irqflags.h:72 drivers/acpi/processor_idle.c:113)
kern :warn : [ 26.713811] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 65 48 8b 04 25 80 ce 02 00 48 8b 00 a8 08 75 0c 66 90 0f 00 2d f1 dc 38 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 90
All code
========
0: 90 nop
1: 90 nop
2: 90 nop
3: 90 nop
4: 90 nop
5: 90 nop
6: 90 nop
7: 90 nop
8: 90 nop
9: 90 nop
a: 90 nop
b: 90 nop
c: 90 nop
d: 90 nop
e: 90 nop
f: 65 48 8b 04 25 80 ce mov %gs:0x2ce80,%rax
16: 02 00
18: 48 8b 00 mov (%rax),%rax
1b: a8 08 test $0x8,%al
1d: 75 0c jne 0x2b
1f: 66 90 xchg %ax,%ax
21: 0f 00 2d f1 dc 38 00 verw 0x38dcf1(%rip) # 0x38dd19
28: fb sti
29: f4 hlt
2a:* fa cli <-- trapping instruction
2b: c3 ret
2c: cc int3
2d: cc int3
2e: cc int3
2f: cc int3
30: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1)
37: 00 00 00 00
3b: 0f 1f 40 00 nopl 0x0(%rax)
3f: 90 nop
Code starting with the faulting instruction
===========================================
0: fa cli
1: c3 ret
2: cc int3
3: cc int3
4: cc int3
5: cc int3
6: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1)
d: 00 00 00 00
11: 0f 1f 40 00 nopl 0x0(%rax)
15: 90 nop
kern :warn : [ 26.713811] RSP: 0000:ffa00000003cfe68 EFLAGS: 00000246
kern :warn : [ 26.713811] RAX: 0000000000004000 RBX: ff11002088776400 RCX: 00000000000000a0
kern :warn : [ 26.713811] RDX: ff11003fc2d40000 RSI: ff110020896fbc00 RDI: ff110020896fbc64
kern :warn : [ 26.713811] RBP: 0000000000000001 R08: ffffffff82cc6620 R09: 0000000000000008
kern :warn : [ 26.713811] R10: 0000000000000006 R11: 0000000000000006 R12: 0000000000000001
kern :warn : [ 26.713811] R13: ffffffff82cc66a0 R14: 0000000000000001 R15: 0000000000000000
kern :warn : [ 26.713811] ? ct_kernel_exit+0x6b/0xb0
kern :warn : [ 26.713811] acpi_idle_enter (drivers/acpi/processor_idle.c:713 (discriminator 3))
kern :warn : [ 26.713811] cpuidle_enter_state (drivers/cpuidle/cpuidle.c:267)
kern :warn : [ 26.713811] cpuidle_enter (drivers/cpuidle/cpuidle.c:390)
kern :warn : [ 26.713811] cpuidle_idle_call (kernel/sched/idle.c:219)
kern :warn : [ 26.713811] do_idle (kernel/sched/idle.c:284)
kern :warn : [ 26.713811] cpu_startup_entry (kernel/sched/idle.c:378 (discriminator 1))
kern :warn : [ 26.713811] start_secondary (arch/x86/kernel/smpboot.c:198 arch/x86/kernel/smpboot.c:232)
kern :warn : [ 26.713811] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:358)
kern :warn : [ 26.713811] </TASK>
kern :err : [ 26.713811] handlers:
kern :err : [ 26.713811] irq_default_primary_handler (kernel/irq/manage.c:1027)
kern :warn : [ 26.713811] threaded tis_int_handler (drivers/char/tpm/tpm_tis_core.c:756)
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
Hi Lino,
Have you had time to peek into this? I just noticed the email, just
asking if some findings have been already made or not.
BR, Jarkko
On Thu, 2023-05-04 at 14:12 +0800, kernel test robot wrote:
> Hello,
>
> kernel test robot noticed "RIP:acpi_safe_halt" on:
>
> commit: e644b2f498d297a928efcb7ff6f900c27f8b788e ("tpm, tpm_tis: Enable interrupt test")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> [test failed on linux-next/master 84e2893b4573da3bc0c9f24e2005442e420e3831]
>
> in testcase: stress-ng
> version: stress-ng-x86_64-0.15.04-1_20230427
> with following parameters:
>
> nr_threads: 100%
> disk: 1HDD
> testtime: 60s
> class: interrupt
> test: signest
> cpufreq_governor: performance
>
> compiler: gcc-11
> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
> If you fix the issue, kindly add following tag
> > Reported-by: kernel test robot <[email protected]>
> > Link: https://lore.kernel.org/oe-lkp/[email protected]
>
>
> kern :warn : [ 26.609994] CPU: 21 PID: 0 Comm: swapper/21 Not tainted 6.3.0-00022-ge644b2f498d2 #1
> kern :warn : [ 26.609994] Hardware name: Inspur NF5180M6/NF5180M6, BIOS 06.00.04 04/12/2022
> kern :warn : [ 26.609994] Call Trace:
> kern :warn : [ 26.609994] <IRQ>
> kern :warn : [ 26.609994] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1))
> kern :warn : [ 26.609994] __report_bad_irq (kernel/irq/spurious.c:214)
> kern :warn : [ 26.609994] note_interrupt (kernel/irq/spurious.c:423)
> kern :warn : [ 26.609994] handle_irq_event (kernel/irq/handle.c:198 kernel/irq/handle.c:210)
> kern :warn : [ 26.609994] handle_fasteoi_irq (kernel/irq/chip.c:661 kernel/irq/chip.c:716)
> kern :warn : [ 26.609994] __common_interrupt (include/linux/irqdesc.h:158 arch/x86/kernel/irq.c:231 arch/x86/kernel/irq.c:250)
> kern :warn : [ 26.609994] common_interrupt (arch/x86/kernel/irq.c:240 (discriminator 14))
> kern :warn : [ 26.609994] </IRQ>
> kern :warn : [ 26.713811] <TASK>
> kern :warn : [ 26.713811] asm_common_interrupt (arch/x86/include/asm/idtentry.h:636)
> kern :warn : [ 26.713811] RIP: 0010:acpi_safe_halt (arch/x86/include/asm/irqflags.h:37 arch/x86/include/asm/irqflags.h:72 drivers/acpi/processor_idle.c:113)
> kern :warn : [ 26.713811] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 65 48 8b 04 25 80 ce 02 00 48 8b 00 a8 08 75 0c 66 90 0f 00 2d f1 dc 38 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 90
> All code
> ========
> 0: 90 nop
> 1: 90 nop
> 2: 90 nop
> 3: 90 nop
> 4: 90 nop
> 5: 90 nop
> 6: 90 nop
> 7: 90 nop
> 8: 90 nop
> 9: 90 nop
> a: 90 nop
> b: 90 nop
> c: 90 nop
> d: 90 nop
> e: 90 nop
> f: 65 48 8b 04 25 80 ce mov %gs:0x2ce80,%rax
> 16: 02 00
> 18: 48 8b 00 mov (%rax),%rax
> 1b: a8 08 test $0x8,%al
> 1d: 75 0c jne 0x2b
> 1f: 66 90 xchg %ax,%ax
> 21: 0f 00 2d f1 dc 38 00 verw 0x38dcf1(%rip) # 0x38dd19
> 28: fb sti
> 29: f4 hlt
> 2a:* fa cli <-- trapping instruction
> 2b: c3 ret
> 2c: cc int3
> 2d: cc int3
> 2e: cc int3
> 2f: cc int3
> 30: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1)
> 37: 00 00 00 00
> 3b: 0f 1f 40 00 nopl 0x0(%rax)
> 3f: 90 nop
>
> Code starting with the faulting instruction
> ===========================================
> 0: fa cli
> 1: c3 ret
> 2: cc int3
> 3: cc int3
> 4: cc int3
> 5: cc int3
> 6: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1)
> d: 00 00 00 00
> 11: 0f 1f 40 00 nopl 0x0(%rax)
> 15: 90 nop
> kern :warn : [ 26.713811] RSP: 0000:ffa00000003cfe68 EFLAGS: 00000246
> kern :warn : [ 26.713811] RAX: 0000000000004000 RBX: ff11002088776400 RCX: 00000000000000a0
> kern :warn : [ 26.713811] RDX: ff11003fc2d40000 RSI: ff110020896fbc00 RDI: ff110020896fbc64
> kern :warn : [ 26.713811] RBP: 0000000000000001 R08: ffffffff82cc6620 R09: 0000000000000008
> kern :warn : [ 26.713811] R10: 0000000000000006 R11: 0000000000000006 R12: 0000000000000001
> kern :warn : [ 26.713811] R13: ffffffff82cc66a0 R14: 0000000000000001 R15: 0000000000000000
> kern :warn : [ 26.713811] ? ct_kernel_exit+0x6b/0xb0
> kern :warn : [ 26.713811] acpi_idle_enter (drivers/acpi/processor_idle.c:713 (discriminator 3))
> kern :warn : [ 26.713811] cpuidle_enter_state (drivers/cpuidle/cpuidle.c:267)
> kern :warn : [ 26.713811] cpuidle_enter (drivers/cpuidle/cpuidle.c:390)
> kern :warn : [ 26.713811] cpuidle_idle_call (kernel/sched/idle.c:219)
> kern :warn : [ 26.713811] do_idle (kernel/sched/idle.c:284)
> kern :warn : [ 26.713811] cpu_startup_entry (kernel/sched/idle.c:378 (discriminator 1))
> kern :warn : [ 26.713811] start_secondary (arch/x86/kernel/smpboot.c:198 arch/x86/kernel/smpboot.c:232)
> kern :warn : [ 26.713811] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:358)
> kern :warn : [ 26.713811] </TASK>
> kern :err : [ 26.713811] handlers:
> kern :err : [ 26.713811] irq_default_primary_handler (kernel/irq/manage.c:1027)
> kern :warn : [ 26.713811] threaded tis_int_handler (drivers/char/tpm/tpm_tis_core.c:756)
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> sudo bin/lkp install job.yaml # job file is attached in this email
> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
> sudo bin/lkp run generated-yaml-file
>
> # if come across any failure that blocks the test,
> # please remove ~/.lkp and /lkp dir to run from a clean state.
>
>
Hi,
On 11.05.23 00:00, Jarkko Sakkinen wrote:
>
> Hi Lino,
>
> Have you had time to peek into this? I just noticed the email, just
> asking if some findings have been already made or not.
>
> BR, Jarkko
>
Since beside the one reported by Peter Zijlstra
(https://lore.kernel.org/linux-integrity/CSJ0AD1CFYQP.T6T68M6ZVK49@suppilovahvero/T/#t)
we have another interrupt storm here, it is probably the best to handle those in general
and to disable interrupts in this case to fall back to polling (this is also what Jerry
suggested in the thread above).
I will try to provide a patch for this.
Regards,
Lino
> On Thu, 2023-05-04 at 14:12 +0800, kernel test robot wrote:
>> Hello,
>>
>> kernel test robot noticed "RIP:acpi_safe_halt" on:
>>
>> commit: e644b2f498d297a928efcb7ff6f900c27f8b788e ("tpm, tpm_tis: Enable interrupt test")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>
>> [test failed on linux-next/master 84e2893b4573da3bc0c9f24e2005442e420e3831]
>>
>> in testcase: stress-ng
>> version: stress-ng-x86_64-0.15.04-1_20230427
>> with following parameters:
>>
>> nr_threads: 100%
>> disk: 1HDD
>> testtime: 60s
>> class: interrupt
>> test: signest
>> cpufreq_governor: performance
>>
>> compiler: gcc-11
>> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
>>
>> (please refer to attached dmesg/kmsg for entire log/backtrace)
>>
>>
>> If you fix the issue, kindly add following tag
>>> Reported-by: kernel test robot <[email protected]>
>>> Link: https://lore.kernel.org/oe-lkp/[email protected]
>>
>>
>> kern :warn : [ 26.609994] CPU: 21 PID: 0 Comm: swapper/21 Not tainted 6.3.0-00022-ge644b2f498d2 #1
>> kern :warn : [ 26.609994] Hardware name: Inspur NF5180M6/NF5180M6, BIOS 06.00.04 04/12/2022
>> kern :warn : [ 26.609994] Call Trace:
>> kern :warn : [ 26.609994] <IRQ>
>> kern :warn : [ 26.609994] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1))
>> kern :warn : [ 26.609994] __report_bad_irq (kernel/irq/spurious.c:214)
>> kern :warn : [ 26.609994] note_interrupt (kernel/irq/spurious.c:423)
>> kern :warn : [ 26.609994] handle_irq_event (kernel/irq/handle.c:198 kernel/irq/handle.c:210)
>> kern :warn : [ 26.609994] handle_fasteoi_irq (kernel/irq/chip.c:661 kernel/irq/chip.c:716)
>> kern :warn : [ 26.609994] __common_interrupt (include/linux/irqdesc.h:158 arch/x86/kernel/irq.c:231 arch/x86/kernel/irq.c:250)
>> kern :warn : [ 26.609994] common_interrupt (arch/x86/kernel/irq.c:240 (discriminator 14))
>> kern :warn : [ 26.609994] </IRQ>
>> kern :warn : [ 26.713811] <TASK>
>> kern :warn : [ 26.713811] asm_common_interrupt (arch/x86/include/asm/idtentry.h:636)
>> kern :warn : [ 26.713811] RIP: 0010:acpi_safe_halt (arch/x86/include/asm/irqflags.h:37 arch/x86/include/asm/irqflags.h:72 drivers/acpi/processor_idle.c:113)
>> kern :warn : [ 26.713811] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 65 48 8b 04 25 80 ce 02 00 48 8b 00 a8 08 75 0c 66 90 0f 00 2d f1 dc 38 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 90
>> All code
>> ========
>> 0: 90 nop
>> 1: 90 nop
>> 2: 90 nop
>> 3: 90 nop
>> 4: 90 nop
>> 5: 90 nop
>> 6: 90 nop
>> 7: 90 nop
>> 8: 90 nop
>> 9: 90 nop
>> a: 90 nop
>> b: 90 nop
>> c: 90 nop
>> d: 90 nop
>> e: 90 nop
>> f: 65 48 8b 04 25 80 ce mov %gs:0x2ce80,%rax
>> 16: 02 00
>> 18: 48 8b 00 mov (%rax),%rax
>> 1b: a8 08 test $0x8,%al
>> 1d: 75 0c jne 0x2b
>> 1f: 66 90 xchg %ax,%ax
>> 21: 0f 00 2d f1 dc 38 00 verw 0x38dcf1(%rip) # 0x38dd19
>> 28: fb sti
>> 29: f4 hlt
>> 2a:* fa cli <-- trapping instruction
>> 2b: c3 ret
>> 2c: cc int3
>> 2d: cc int3
>> 2e: cc int3
>> 2f: cc int3
>> 30: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1)
>> 37: 00 00 00 00
>> 3b: 0f 1f 40 00 nopl 0x0(%rax)
>> 3f: 90 nop
>>
>> Code starting with the faulting instruction
>> ===========================================
>> 0: fa cli
>> 1: c3 ret
>> 2: cc int3
>> 3: cc int3
>> 4: cc int3
>> 5: cc int3
>> 6: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1)
>> d: 00 00 00 00
>> 11: 0f 1f 40 00 nopl 0x0(%rax)
>> 15: 90 nop
>> kern :warn : [ 26.713811] RSP: 0000:ffa00000003cfe68 EFLAGS: 00000246
>> kern :warn : [ 26.713811] RAX: 0000000000004000 RBX: ff11002088776400 RCX: 00000000000000a0
>> kern :warn : [ 26.713811] RDX: ff11003fc2d40000 RSI: ff110020896fbc00 RDI: ff110020896fbc64
>> kern :warn : [ 26.713811] RBP: 0000000000000001 R08: ffffffff82cc6620 R09: 0000000000000008
>> kern :warn : [ 26.713811] R10: 0000000000000006 R11: 0000000000000006 R12: 0000000000000001
>> kern :warn : [ 26.713811] R13: ffffffff82cc66a0 R14: 0000000000000001 R15: 0000000000000000
>> kern :warn : [ 26.713811] ? ct_kernel_exit+0x6b/0xb0
>> kern :warn : [ 26.713811] acpi_idle_enter (drivers/acpi/processor_idle.c:713 (discriminator 3))
>> kern :warn : [ 26.713811] cpuidle_enter_state (drivers/cpuidle/cpuidle.c:267)
>> kern :warn : [ 26.713811] cpuidle_enter (drivers/cpuidle/cpuidle.c:390)
>> kern :warn : [ 26.713811] cpuidle_idle_call (kernel/sched/idle.c:219)
>> kern :warn : [ 26.713811] do_idle (kernel/sched/idle.c:284)
>> kern :warn : [ 26.713811] cpu_startup_entry (kernel/sched/idle.c:378 (discriminator 1))
>> kern :warn : [ 26.713811] start_secondary (arch/x86/kernel/smpboot.c:198 arch/x86/kernel/smpboot.c:232)
>> kern :warn : [ 26.713811] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:358)
>> kern :warn : [ 26.713811] </TASK>
>> kern :err : [ 26.713811] handlers:
>> kern :err : [ 26.713811] irq_default_primary_handler (kernel/irq/manage.c:1027)
>> kern :warn : [ 26.713811] threaded tis_int_handler (drivers/char/tpm/tpm_tis_core.c:756)
>>
>>
>> To reproduce:
>>
>> git clone https://github.com/intel/lkp-tests.git
>> cd lkp-tests
>> sudo bin/lkp install job.yaml # job file is attached in this email
>> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
>> sudo bin/lkp run generated-yaml-file
>>
>> # if come across any failure that blocks the test,
>> # please remove ~/.lkp and /lkp dir to run from a clean state.
>>
>>
>
Hi Lukas,
On 11.05.23 16:16, Lukas Wunner wrote:
> ATTENTION: This e-mail is from an external sender. Please check attachments and links before opening e.g. with mouseover.
>
>
> On Thu, May 11, 2023 at 01:22:19PM +0200, Lino Sanfilippo wrote:
>> Since beside the one reported by Peter Zijlstra
>> (https://lore.kernel.org/linux-integrity/CSJ0AD1CFYQP.T6T68M6ZVK49@suppilovahvero/T/#t)
>> we have another interrupt storm here, it is probably the best to handle those in general
>> and to disable interrupts in this case to fall back to polling (this is also what Jerry
>> suggested in the thread above).
>>
>> I will try to provide a patch for this.
>
> In tpm_tis_probe_irq_single(), after you've requested the irq,
> you could convert it to a struct irq_desc (via irq_to_desc()
> from <linux/irqnr.h>) and cache that pointer in priv.
>
> Then in tis_int_handler(), you could access the irqs_unhandled
> member of struct irq_desc (from <linux/irqdesc.h>) and check
> if it exceeds, say, 5000.
This is the solution I am currently working on, but thanks for confirming that
I am on the right track with this :)
>
> If it does, schedule a work_struct which calls disable_interrupts().
> You can't call that from the IRQ handler because devm_free_irq()
> waits for the IRQ handler to finish, so you'd deadlock. You *can*
> of course clear the TPM_GLOBAL_INT_ENABLE bit from the IRQ handler,
> though it's unclear to me if that's sufficient to quiesce the
> interrupt line.
>
Will try this, thx.
> By reusing the genirq subsystem's irqs_unhandled infrastructure,
> you avoid having to reimplement all of that.
>
Agreed.
Regards,
Lino
> Thanks,
>
> Lukas
On Thu, May 11, 2023 at 01:22:19PM +0200, Lino Sanfilippo wrote:
> Since beside the one reported by Peter Zijlstra
> (https://lore.kernel.org/linux-integrity/CSJ0AD1CFYQP.T6T68M6ZVK49@suppilovahvero/T/#t)
> we have another interrupt storm here, it is probably the best to handle those in general
> and to disable interrupts in this case to fall back to polling (this is also what Jerry
> suggested in the thread above).
>
> I will try to provide a patch for this.
In tpm_tis_probe_irq_single(), after you've requested the irq,
you could convert it to a struct irq_desc (via irq_to_desc()
from <linux/irqnr.h>) and cache that pointer in priv.
Then in tis_int_handler(), you could access the irqs_unhandled
member of struct irq_desc (from <linux/irqdesc.h>) and check
if it exceeds, say, 5000.
If it does, schedule a work_struct which calls disable_interrupts().
You can't call that from the IRQ handler because devm_free_irq()
waits for the IRQ handler to finish, so you'd deadlock. You *can*
of course clear the TPM_GLOBAL_INT_ENABLE bit from the IRQ handler,
though it's unclear to me if that's sufficient to quiesce the
interrupt line.
By reusing the genirq subsystem's irqs_unhandled infrastructure,
you avoid having to reimplement all of that.
Thanks,
Lukas
On Thu, May 11, 2023 at 04:22:02PM +0200, Lino Sanfilippo wrote:
>
> Hi Lukas,
>
> On 11.05.23 16:16, Lukas Wunner wrote:
> > ATTENTION: This e-mail is from an external sender. Please check attachments and links before opening e.g. with mouseover.
> >
> >
> > On Thu, May 11, 2023 at 01:22:19PM +0200, Lino Sanfilippo wrote:
> >> Since beside the one reported by Peter Zijlstra
> >> (https://lore.kernel.org/linux-integrity/CSJ0AD1CFYQP.T6T68M6ZVK49@suppilovahvero/T/#t)
> >> we have another interrupt storm here, it is probably the best to handle those in general
> >> and to disable interrupts in this case to fall back to polling (this is also what Jerry
> >> suggested in the thread above).
> >>
> >> I will try to provide a patch for this.
> >
> > In tpm_tis_probe_irq_single(), after you've requested the irq,
> > you could convert it to a struct irq_desc (via irq_to_desc()
> > from <linux/irqnr.h>) and cache that pointer in priv.
> >
> > Then in tis_int_handler(), you could access the irqs_unhandled
> > member of struct irq_desc (from <linux/irqdesc.h>) and check
> > if it exceeds, say, 5000.
>
> This is the solution I am currently working on, but thanks for confirming that
> I am on the right track with this :)
>
> >
> > If it does, schedule a work_struct which calls disable_interrupts().
> > You can't call that from the IRQ handler because devm_free_irq()
> > waits for the IRQ handler to finish, so you'd deadlock. You *can*
> > of course clear the TPM_GLOBAL_INT_ENABLE bit from the IRQ handler,
> > though it's unclear to me if that's sufficient to quiesce the
> > interrupt line.
> >
>
> Will try this, thx.
>
> > By reusing the genirq subsystem's irqs_unhandled infrastructure,
> > you avoid having to reimplement all of that.
> >
>
> Agreed.
>
> Regards,
> Lino
>
Thanks for finishing this off Lino.
IIRC trying to catch the irq storm didn't work in the L490 case for
some reason, so we might still need the dmi entry for that one.
The info that the T490s had a pin wired up wrong came from Lenovo, but
this one even looks to be a different vendor so I wonder how often
this happens or if there is something else going on. Is it possible to
get info about the tpm used in the Inspur system? The datasheet online
doesn't mention it.
Regards,
Jerry
> > Thanks,
> >
> > Lukas
Hi Jerry,
On Thu, 2023-05-11 at 07:59 -0700, Jerry Snitselaar wrote:
>
> IIRC trying to catch the irq storm didn't work in the L490 case for
> some reason, so we might still need the dmi entry for that one.
>
> The info that the T490s had a pin wired up wrong came from Lenovo, but
> this one even looks to be a different vendor so I wonder how often
> this happens or if there is something else going on. Is it possible to
> get info about the tpm used in the Inspur system? The datasheet online
> doesn't mention it.
Are you sure about T490s? To me the wiring looks right on both s and non-s: Pin
18 / PIRQ# goes to PIRQA# of the PCH/SoC.
However on L490 Pin 18 / PIRQ# is wired wrongly to SERIRQ, which probably is the
reason that catching the interrupt storm didn't work: I guess this completely
messes up LPC communication and causes way more problems. In this case only a
DMI quirk can help.
BR
Michael
>
> Regards,
> Jerry
>
> > > Thanks,
> > >
> > > Lukas
>
On Mon, May 29, 2023 at 05:07:54PM +0200, Michael Niew?hner wrote:
> Hi Jerry,
>
> On Thu, 2023-05-11 at 07:59 -0700, Jerry Snitselaar wrote:
> >
> > IIRC trying to catch the irq storm didn't work in the L490 case for
> > some reason, so we might still need the dmi entry for that one.
> >
> > The info that the T490s had a pin wired up wrong came from Lenovo, but
> > this one even looks to be a different vendor so I wonder how often
> > this happens or if there is something else going on. Is it possible to
> > get info about the tpm used in the Inspur system? The datasheet online
> > doesn't mention it.
>
> Are you sure about T490s? To me the wiring looks right on both s and non-s: Pin
> 18 / PIRQ# goes to PIRQA# of the PCH/SoC.
>
> However on L490 Pin 18 / PIRQ# is wired wrongly to SERIRQ, which probably is the
> reason that catching the interrupt storm didn't work: I guess this completely
> messes up LPC communication and causes way more problems. In this case only a
> DMI quirk can help.
>
> BR
> Michael
>
I'm searching to see if I can find the old bug/email where that info
from Lenovo originated. It could be that the info was wrong, and
it was some firmware issue instead. IIRC the the T490s issue could be
solved with the code looking for the irq storm, but the L490 needed
the dmi check even with the irq storm checking code.
I haven't found the info yet, but I did find some other reports from back
then.
Spurious irq reported with 5.5.7, so after the irq reverts in v5.5:
tpm_tis IFX0785:00: 2.0 TPM
Hardware name: Entroware Proteus/Proteus, BIOS 1.07.07TE0 11/15/2019
Thinkpad P53
tpm_tis STM7308:00: 2.0 TPM
Hardware name: LENOVO 20QNCTO1WW/20QNCTO1WW, BIOS N2NET34W (1.19 ) 11/28/2019
Reports from before the 5.5 reverts:
tpm_tis MSFT0101:00: 2.0 TPM
Hyperbook NH5/Clevo NH55RCQ
tpm_tis IFX0785:00: 2.0 TPM
Clevo N151CU-derived notebook
Regards,
Jerry
> >
> > Regards,
> > Jerry
> >
> > > > Thanks,
> > > >
> > > > Lukas
> >
>
On Mon, 2023-05-29 at 13:58 -0700, Jerry Snitselaar wrote:
> On Mon, May 29, 2023 at 05:07:54PM +0200, Michael Niewöhner wrote:
> > Hi Jerry,
> >
> > On Thu, 2023-05-11 at 07:59 -0700, Jerry Snitselaar wrote:
> > >
> > > IIRC trying to catch the irq storm didn't work in the L490 case for
> > > some reason, so we might still need the dmi entry for that one.
> > >
> > > The info that the T490s had a pin wired up wrong came from Lenovo, but
> > > this one even looks to be a different vendor so I wonder how often
> > > this happens or if there is something else going on. Is it possible to
> > > get info about the tpm used in the Inspur system? The datasheet online
> > > doesn't mention it.
> >
> > Are you sure about T490s? To me the wiring looks right on both s and non-s:
> > Pin
> > 18 / PIRQ# goes to PIRQA# of the PCH/SoC.
> >
> > However on L490 Pin 18 / PIRQ# is wired wrongly to SERIRQ, which probably is
> > the
> > reason that catching the interrupt storm didn't work: I guess this
> > completely
> > messes up LPC communication and causes way more problems. In this case only
> > a
> > DMI quirk can help.
> >
> > BR
> > Michael
> >
>
> I'm searching to see if I can find the old bug/email where that info
> from Lenovo originated. It could be that the info was wrong, and
> it was some firmware issue instead. IIRC the the T490s issue could be
> solved with the code looking for the irq storm, but the L490 needed
> the dmi check even with the irq storm checking code.
Tbh I wonder why the T490s suffers from that interrupt storm at all, but that
might be a FW bug (it's not handling the interrupt). L490 definitely needs that
DMI check, right.
>
>
> I haven't found the info yet, but I did find some other reports from back
> then.
>
> Spurious irq reported with 5.5.7, so after the irq reverts in v5.5:
>
> tpm_tis IFX0785:00: 2.0 TPM
> Hardware name: Entroware Proteus/Proteus, BIOS 1.07.07TE0 11/15/2019
That's actually a Clevo N151CU. According to schematics it's wired correctly to
PIRQA#, so probably a FW bug as well.
>
> Thinkpad P53
> tpm_tis STM7308:00: 2.0 TPM
> Hardware name: LENOVO 20QNCTO1WW/20QNCTO1WW, BIOS N2NET34W (1.19 ) 11/28/2019
>
>
>
> Reports from before the 5.5 reverts:
>
> tpm_tis MSFT0101:00: 2.0 TPM
> Hyperbook NH5/Clevo NH55RCQ
PIRQ# wired to GPP_B0 - needs correct setup in firmware -> probably a firmware
bug.
>
> tpm_tis IFX0785:00: 2.0 TPM
> Clevo N151CU-derived notebook
Same device as Entroware Proteus.
>
>
> Regards,
> Jerry
>
> > >
> > > Regards,
> > > Jerry
> > >
> > > > > Thanks,
> > > > >
> > > > > Lukas
> > >
> >
>
On Tue, May 30, 2023 at 04:40:26PM +0200, Michael Niew?hner wrote:
> On Mon, 2023-05-29 at 13:58 -0700, Jerry Snitselaar wrote:
> > On Mon, May 29, 2023 at 05:07:54PM +0200, Michael Niew?hner wrote:
> > > Hi Jerry,
> > >
> > > On Thu, 2023-05-11 at 07:59 -0700, Jerry Snitselaar wrote:
> > > >
> > > > IIRC trying to catch the irq storm didn't work in the L490 case for
> > > > some reason, so we might still need the dmi entry for that one.
> > > >
> > > > The info that the T490s had a pin wired up wrong came from Lenovo, but
> > > > this one even looks to be a different vendor so I wonder how often
> > > > this happens or if there is something else going on. Is it possible to
> > > > get info about the tpm used in the Inspur system? The datasheet online
> > > > doesn't mention it.
> > >
> > > Are you sure about T490s? To me the wiring looks right on both s and non-s:
> > > Pin
> > > 18 / PIRQ# goes to PIRQA# of the PCH/SoC.
> > >
> > > However on L490 Pin 18 / PIRQ# is wired wrongly to SERIRQ, which probably is
> > > the
> > > reason that catching the interrupt storm didn't work: I guess this
> > > completely
> > > messes up LPC communication and causes way more problems. In this case only
> > > a
> > > DMI quirk can help.
> > >
> > > BR
> > > Michael
> > >
> >
> > I'm searching to see if I can find the old bug/email where that info
> > from Lenovo originated.? It could be that the info was wrong, and
> > it was some firmware issue instead. IIRC the the T490s issue could be
> > solved with the code looking for the irq storm, but the L490 needed
> > the dmi check even with the irq storm checking code.
>
> Tbh I wonder why the T490s suffers from that interrupt storm at all, but that
> might be a FW bug (it's not handling the interrupt). L490 definitely needs that
> DMI check, right.
>
> >
> >
> > I haven't found the info yet, but I did find some other reports from back
> > then.
> >
> > Spurious irq reported with 5.5.7, so after the irq reverts in v5.5:
> >
> > tpm_tis IFX0785:00: 2.0 TPM
> > Hardware name: Entroware Proteus/Proteus, BIOS 1.07.07TE0 11/15/2019
>
> That's actually a Clevo N151CU. According to schematics it's wired correctly to
> PIRQA#, so probably a FW bug as well.
>
> >
> > Thinkpad P53
> > tpm_tis STM7308:00: 2.0 TPM
> > Hardware name: LENOVO 20QNCTO1WW/20QNCTO1WW, BIOS N2NET34W (1.19 ) 11/28/2019
> >
> >
> >
> > Reports from before the 5.5 reverts:
> >
> > tpm_tis MSFT0101:00: 2.0 TPM
> > Hyperbook NH5/Clevo NH55RCQ
>
> PIRQ# wired to GPP_B0 - needs correct setup in firmware -> probably a firmware
> bug.
>
> >
> > tpm_tis IFX0785:00: 2.0 TPM
> > Clevo N151CU-derived notebook
>
> Same device as Entroware Proteus.
>
Hi Michael,
Out of curiousity, where are you grabbing the schematics from?
Regards,
Jerry
> >
> >
> > Regards,
> > Jerry
> >
> > > >
> > > > Regards,
> > > > Jerry
> > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Lukas
> > > >
> > >
> >
>