2022-09-23 19:49:12

by Borislav Petkov

[permalink] [raw]
Subject: INFO: task systemd-udevd:94 blocked for more than 120 seconds.

Hi folks,

I'm seeing the below on linux-next-20220923 and thermal_zone is in the
stacktrace so I thought I should Cc relevant folks in case you have an
idea...

[ 155.065997] radeon 0000:00:01.0: [drm] fb0: radeondrmfb frame buffer device
[ 155.152551] [drm] Initialized radeon 2.50.0 20080528 for 0000:00:01.0 on minor 0
[ 242.946675] INFO: task systemd-udevd:94 blocked for more than 120 seconds.
[ 242.946820] Not tainted 6.0.0-rc6-next-20220923 #1
[ 242.946902] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 242.946995] task:systemd-udevd state:D stack:13344 pid:94 ppid:88 flags:0x00004004
[ 242.947118] Call Trace:
[ 242.947166] <TASK>
[ 242.947216] __schedule+0x2af/0x830
[ 242.947297] schedule+0x5d/0xc0
[ 242.947360] schedule_preempt_disabled+0x14/0x30
[ 242.947442] __mutex_lock.constprop.0+0x2d0/0x7b0
[ 242.947524] thermal_zone_device_update.part.0+0xf8/0x2f0
[ 242.947607] ? acpi_system_write_wakeup_device+0x170/0x170
[ 242.947700] thermal_zone_device_set_mode+0x75/0xb0
[ 242.947781] acpi_thermal_add+0x3c2/0x400 [thermal]
[ 242.947881] acpi_device_probe+0x56/0x110
[ 242.947953] really_probe+0xc7/0x280
[ 242.948022] ? pm_runtime_barrier+0x61/0xb0
[ 242.948098] __driver_probe_device+0x71/0xd0
[ 242.948175] driver_probe_device+0x2d/0x100
[ 242.948245] __driver_attach+0xa6/0x1a0
[ 242.950659] ? __device_attach_driver+0x110/0x110
[ 242.953030] bus_for_each_dev+0x69/0xa0
[ 242.955433] bus_add_driver+0x1b0/0x200
[ 242.957806] ? _raw_spin_unlock+0x12/0x40
[ 242.959211] driver_register+0x89/0xe0
[ 242.960589] ? 0xffffffffa0006000
[ 242.961957] acpi_thermal_init+0x5c/0x1000 [thermal]
[ 242.963375] do_one_initcall+0x4d/0x210
[ 242.964756] ? kmalloc_trace+0x38/0xc0
[ 242.966130] do_init_module+0x4a/0x1e0
[ 242.967503] __do_sys_finit_module+0xa7/0x100
[ 242.968858] do_syscall_64+0x42/0x90
[ 242.970202] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 242.971561] RIP: 0033:0x7fd31c8189b9
[ 242.972910] RSP: 002b:00007ffd203a81b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 242.974318] RAX: ffffffffffffffda RBX: 00005571cdddfb30 RCX: 00007fd31c8189b9
[ 242.975720] RDX: 0000000000000000 RSI: 00007fd31c9a3e2d RDI: 0000000000000005
[ 242.977089] RBP: 0000000000020000 R08: 0000000000000000 R09: 00005571cddc6f50
[ 242.978449] R10: 0000000000000005 R11: 0000000000000246 R12: 00007fd31c9a3e2d
[ 242.979792] R13: 0000000000000000 R14: 00005571cddd8170 R15: 00005571cdddfb30
[ 242.981143] </TASK>

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


2022-09-23 19:53:22

by Borislav Petkov

[permalink] [raw]
Subject: Re: INFO: task systemd-udevd:94 blocked for more than 120 seconds.

On Fri, Sep 23, 2022 at 12:18:56PM -0700, Nathan Chancellor wrote:
> I have not seen a stacktrace like this on my machines (although I
> suspect that is because I don't have CONFIG_DETECT_HUNG_TASK enabled in
> my configs) but my Honeycomb LX2 hangs while booting after commit
> 78ffa3e58d93 ("thermal/core: Add a generic thermal_zone_get_trip()
> function") according to my bisect, which certainly seems like it could
> be related to the trace you are seeing.

Don't you just love how well our community works - one reports a bug and
someone else has already bisected it and thus saves one the work?!

:-)))

Thanks Nathan, I can confirm your bisection. The commit above doesn't
revert cleanly ontop of linux-next so I tried it and the patch before
it:

78ffa3e58d93 ("thermal/core: Add a generic thermal_zone_get_trip() function") <- BAD
62022c15f627 ("Merge branch 'pm-opp' into linux-next") <- GOOD

so it looks like that one is somehow b0rked.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-09-23 20:06:06

by Nathan Chancellor

[permalink] [raw]
Subject: Re: INFO: task systemd-udevd:94 blocked for more than 120 seconds.

On Fri, Sep 23, 2022 at 08:59:07PM +0200, Borislav Petkov wrote:
> Hi folks,
>
> I'm seeing the below on linux-next-20220923 and thermal_zone is in the
> stacktrace so I thought I should Cc relevant folks in case you have an
> idea...
>
> [ 155.065997] radeon 0000:00:01.0: [drm] fb0: radeondrmfb frame buffer device
> [ 155.152551] [drm] Initialized radeon 2.50.0 20080528 for 0000:00:01.0 on minor 0
> [ 242.946675] INFO: task systemd-udevd:94 blocked for more than 120 seconds.
> [ 242.946820] Not tainted 6.0.0-rc6-next-20220923 #1
> [ 242.946902] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 242.946995] task:systemd-udevd state:D stack:13344 pid:94 ppid:88 flags:0x00004004
> [ 242.947118] Call Trace:
> [ 242.947166] <TASK>
> [ 242.947216] __schedule+0x2af/0x830
> [ 242.947297] schedule+0x5d/0xc0
> [ 242.947360] schedule_preempt_disabled+0x14/0x30
> [ 242.947442] __mutex_lock.constprop.0+0x2d0/0x7b0
> [ 242.947524] thermal_zone_device_update.part.0+0xf8/0x2f0
> [ 242.947607] ? acpi_system_write_wakeup_device+0x170/0x170
> [ 242.947700] thermal_zone_device_set_mode+0x75/0xb0
> [ 242.947781] acpi_thermal_add+0x3c2/0x400 [thermal]
> [ 242.947881] acpi_device_probe+0x56/0x110
> [ 242.947953] really_probe+0xc7/0x280
> [ 242.948022] ? pm_runtime_barrier+0x61/0xb0
> [ 242.948098] __driver_probe_device+0x71/0xd0
> [ 242.948175] driver_probe_device+0x2d/0x100
> [ 242.948245] __driver_attach+0xa6/0x1a0
> [ 242.950659] ? __device_attach_driver+0x110/0x110
> [ 242.953030] bus_for_each_dev+0x69/0xa0
> [ 242.955433] bus_add_driver+0x1b0/0x200
> [ 242.957806] ? _raw_spin_unlock+0x12/0x40
> [ 242.959211] driver_register+0x89/0xe0
> [ 242.960589] ? 0xffffffffa0006000
> [ 242.961957] acpi_thermal_init+0x5c/0x1000 [thermal]
> [ 242.963375] do_one_initcall+0x4d/0x210
> [ 242.964756] ? kmalloc_trace+0x38/0xc0
> [ 242.966130] do_init_module+0x4a/0x1e0
> [ 242.967503] __do_sys_finit_module+0xa7/0x100
> [ 242.968858] do_syscall_64+0x42/0x90
> [ 242.970202] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [ 242.971561] RIP: 0033:0x7fd31c8189b9
> [ 242.972910] RSP: 002b:00007ffd203a81b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> [ 242.974318] RAX: ffffffffffffffda RBX: 00005571cdddfb30 RCX: 00007fd31c8189b9
> [ 242.975720] RDX: 0000000000000000 RSI: 00007fd31c9a3e2d RDI: 0000000000000005
> [ 242.977089] RBP: 0000000000020000 R08: 0000000000000000 R09: 00005571cddc6f50
> [ 242.978449] R10: 0000000000000005 R11: 0000000000000246 R12: 00007fd31c9a3e2d
> [ 242.979792] R13: 0000000000000000 R14: 00005571cddd8170 R15: 00005571cdddfb30
> [ 242.981143] </TASK>

I have not seen a stacktrace like this on my machines (although I
suspect that is because I don't have CONFIG_DETECT_HUNG_TASK enabled in
my configs) but my Honeycomb LX2 hangs while booting after commit
78ffa3e58d93 ("thermal/core: Add a generic thermal_zone_get_trip()
function") according to my bisect, which certainly seems like it could
be related to the trace you are seeing.

Cheers,
Nathan

2022-09-24 08:05:57

by Hugh Dickins

[permalink] [raw]
Subject: Re: INFO: task systemd-udevd:94 blocked for more than 120 seconds.

On Fri, 23 Sep 2022, Borislav Petkov wrote:
> On Fri, Sep 23, 2022 at 12:18:56PM -0700, Nathan Chancellor wrote:
> > I have not seen a stacktrace like this on my machines (although I
> > suspect that is because I don't have CONFIG_DETECT_HUNG_TASK enabled in
> > my configs) but my Honeycomb LX2 hangs while booting after commit
> > 78ffa3e58d93 ("thermal/core: Add a generic thermal_zone_get_trip()
> > function") according to my bisect, which certainly seems like it could
> > be related to the trace you are seeing.
>
> Don't you just love how well our community works - one reports a bug and
> someone else has already bisected it and thus saves one the work?!
>
> :-)))
>
> Thanks Nathan, I can confirm your bisection. The commit above doesn't
> revert cleanly ontop of linux-next so I tried it and the patch before
> it:
>
> 78ffa3e58d93 ("thermal/core: Add a generic thermal_zone_get_trip() function") <- BAD
> 62022c15f627 ("Merge branch 'pm-opp' into linux-next") <- GOOD
>
> so it looks like that one is somehow b0rked.

Yes, Nathan was alert, and saved me too from bisecting failure to boot
linux-next in another thread.

And I see from
https://lore.kernel.org/lkml/[email protected]/
that Marek also found it: and tried to fix it, but found it goes too wide.

I made a patch of the offending series with
git diff 78ffa3e58d93^ 2b109cffe683
and then reverted that cleanly from next-20220923: works for me.

Hugh

2022-09-24 09:14:35

by Daniel Lezcano

[permalink] [raw]
Subject: Re: INFO: task systemd-udevd:94 blocked for more than 120 seconds.


Hi Hugh,

On 24/09/2022 09:40, Hugh Dickins wrote:
> On Fri, 23 Sep 2022, Borislav Petkov wrote:
>> On Fri, Sep 23, 2022 at 12:18:56PM -0700, Nathan Chancellor wrote:
>>> I have not seen a stacktrace like this on my machines (although I
>>> suspect that is because I don't have CONFIG_DETECT_HUNG_TASK enabled in
>>> my configs) but my Honeycomb LX2 hangs while booting after commit
>>> 78ffa3e58d93 ("thermal/core: Add a generic thermal_zone_get_trip()
>>> function") according to my bisect, which certainly seems like it could
>>> be related to the trace you are seeing.
>>
>> Don't you just love how well our community works - one reports a bug and
>> someone else has already bisected it and thus saves one the work?!
>>
>> :-)))
>>
>> Thanks Nathan, I can confirm your bisection. The commit above doesn't
>> revert cleanly ontop of linux-next so I tried it and the patch before
>> it:
>>
>> 78ffa3e58d93 ("thermal/core: Add a generic thermal_zone_get_trip() function") <- BAD
>> 62022c15f627 ("Merge branch 'pm-opp' into linux-next") <- GOOD
>>
>> so it looks like that one is somehow b0rked.
>
> Yes, Nathan was alert, and saved me too from bisecting failure to boot
> linux-next in another thread.
>
> And I see from
> https://lore.kernel.org/lkml/[email protected]/
> that Marek also found it: and tried to fix it, but found it goes too wide.
>
> I made a patch of the offending series with
> git diff 78ffa3e58d93^ 2b109cffe683
> and then reverted that cleanly from next-20220923: works for me.

Thanks for investigating. I believe I found from where is coming the
deadlock. I'll send a fix for that.


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog