2020-02-18 14:19:16

by Nicholas Johnson

[permalink] [raw]
Subject: Stack trace when removing Thunderbolt devices while kernel shutting down

Hi Bjorn,

If I surprise remove Thunderbolt 3 devices just as the kernel is
shutting down, I get stack dumps, when those devices would not normally
cause stack dumps if the kernel were not shutting down.

Because the kernel is shutting down, it makes it difficult to capture
the logs without a serial console.

In your mind, is this cause for concern? There is no harm caused and the
kernel still shuts down. The main thing I am worried about is if this
means that the locking around the subsystem is not strict enough.

If you think this is worth looking into, I will try to learn about how
the native interrupts are handled and try to investigate, and I will
also try to get my serial console working again to capture the details.

Thank you for any thoughts you may give.

Kind regards,
Nicholas Johnson.


2020-02-18 14:24:20

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: Stack trace when removing Thunderbolt devices while kernel shutting down

On Tue, Feb 18, 2020 at 02:18:40PM +0000, Nicholas Johnson wrote:
> Hi Bjorn,
>
> If I surprise remove Thunderbolt 3 devices just as the kernel is
> shutting down, I get stack dumps, when those devices would not normally
> cause stack dumps if the kernel were not shutting down.
>
> Because the kernel is shutting down, it makes it difficult to capture
> the logs without a serial console.
>
> In your mind, is this cause for concern? There is no harm caused and the
> kernel still shuts down. The main thing I am worried about is if this
> means that the locking around the subsystem is not strict enough.
>
> If you think this is worth looking into, I will try to learn about how
> the native interrupts are handled and try to investigate, and I will
> also try to get my serial console working again to capture the details.

Yes, I think this is worth looking into.

2020-02-18 15:03:25

by Lukas Wunner

[permalink] [raw]
Subject: Re: Stack trace when removing Thunderbolt devices while kernel shutting down

On Tue, Feb 18, 2020 at 02:18:40PM +0000, Nicholas Johnson wrote:
> If I surprise remove Thunderbolt 3 devices just as the kernel is
> shutting down, I get stack dumps, when those devices would not normally
> cause stack dumps if the kernel were not shutting down.
>
> Because the kernel is shutting down, it makes it difficult to capture
> the logs without a serial console.

Hold a camera in front of the screen and try to capture the messages
as an MP4 movie which can be uploaded to YouTube or something.

If the output moves too fast to capture it, artificially slow it down
by adding a udelay() to call_console_drivers() in kernel/printk/printk.c.

Thanks,

Lukas

2020-02-18 16:58:01

by Nicholas Johnson

[permalink] [raw]
Subject: Re: Stack trace when removing Thunderbolt devices while kernel shutting down

On Tue, Feb 18, 2020 at 04:01:24PM +0100, Lukas Wunner wrote:
> On Tue, Feb 18, 2020 at 02:18:40PM +0000, Nicholas Johnson wrote:
> > If I surprise remove Thunderbolt 3 devices just as the kernel is
> > shutting down, I get stack dumps, when those devices would not normally
> > cause stack dumps if the kernel were not shutting down.
> >
> > Because the kernel is shutting down, it makes it difficult to capture
> > the logs without a serial console.
>
> Hold a camera in front of the screen and try to capture the messages
> as an MP4 movie which can be uploaded to YouTube or something.
https://www.youtube.com/watch?v=sDcYmbz7GME

The above is unlisted and will not appear in any search, so it will not
confuse my subscribers, but anybody with the link can view it (public).

I am still not sure I like the sound of my own voice. *cringe*

>
> If the output moves too fast to capture it, artificially slow it down
> by adding a udelay() to call_console_drivers() in kernel/printk/printk.c.

If you cannot get anything useful out of the aforementioned video, then
I will do this tomorrow evening. It is almost 1am now.

>
> Thanks,
>
> Lukas

By the way, this is not just with Linux v5.6-rcX. I have noticed this
for some time but it has been lower down my list in terms of priority
and urgency. I half expected it to be of no interest. I should have
mentioned it earlier, but before I might not have had as much time to
help investigate. I am currently looking for new kernel development
tasks because my previous ones are done.

Thanks,
Nicholas