LinuxLists.cc - Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"

2016-11-16 13:56:02

Subject: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"

Hi,

I'm running an ARM FVP (virtual platform - simluated hardware), which
is failing to reach a login prompt due to extremely slow progress
during boot. systemd gives up waiting for the ttyAMA0 device to
appear, and never starts the getty.

I've bisected this to commit 4cd13c21b207 "softirq: Let ksoftirqd do
its job".

Without this commit, the system boots to a login prompt in 2 minutes.
With this commit, the system eventually manages to bring up sshd after
22 minutes, but as mentioned, the dev-ttyAMA0.device unit has timed
out and so I don't get a prompt on my console.

I only hit the issue when my rootfs is mounted over NFS, and with only
a single core enabled. The (simulated) network device is an SMC91C111.
With multiple cores enabled or a non-NFS filesystem, everything seems
to work OK.

I don't have an identical real hardware platform to try, but I
could not reproduce it on a real ARM Juno board, which is similar.

It looks from the logs that udev's workers are unable to make
progress, so the device nodes don't get created. Don't pay too much
attention to the timestamps in the logs below, they are "inside" the
virtual platform, and don't reflect wall-clock time.
Log before 4cd13c21b207:
https://drive.google.com/open?id=0B8siaK6ZjvEwMktoa0NUS2hJd1U
Log after 4cd13c21b207:
https://drive.google.com/open?id=0B8siaK6ZjvEwZXlfeFFSQl9xZTQ
Kernel config: arch/arm64/configs/defconfig

I'm not sure how to debug this further, so if you have any suggestions
I'd be glad to hear them.

Many thanks,
Brian

2016-11-16 15:52:46

by Eric Dumazet

[permalink] [raw]

Subject: Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"

On Wed, Nov 16, 2016 at 5:55 AM, Brian Starkey <[email protected]> wrote:
> Hi,
>
> I'm running an ARM FVP (virtual platform - simluated hardware), which
> is failing to reach a login prompt due to extremely slow progress
> during boot. systemd gives up waiting for the ttyAMA0 device to
> appear, and never starts the getty.
>
> I've bisected this to commit 4cd13c21b207 "softirq: Let ksoftirqd do
> its job".
>
> Without this commit, the system boots to a login prompt in 2 minutes.
> With this commit, the system eventually manages to bring up sshd after
> 22 minutes, but as mentioned, the dev-ttyAMA0.device unit has timed
> out and so I don't get a prompt on my console.
>
> I only hit the issue when my rootfs is mounted over NFS, and with only
> a single core enabled. The (simulated) network device is an SMC91C111.
> With multiple cores enabled or a non-NFS filesystem, everything seems
> to work OK.
>
> I don't have an identical real hardware platform to try, but I
> could not reproduce it on a real ARM Juno board, which is similar.
>
> It looks from the logs that udev's workers are unable to make
> progress, so the device nodes don't get created. Don't pay too much
> attention to the timestamps in the logs below, they are "inside" the
> virtual platform, and don't reflect wall-clock time.
> Log before 4cd13c21b207:
> https://drive.google.com/open?id=0B8siaK6ZjvEwMktoa0NUS2hJd1U
> Log after 4cd13c21b207:
> https://drive.google.com/open?id=0B8siaK6ZjvEwZXlfeFFSQl9xZTQ
> Kernel config: arch/arm64/configs/defconfig
>
> I'm not sure how to debug this further, so if you have any suggestions
> I'd be glad to hear them.
>
> Many thanks,
> Brian
>

Hi Brian.

Thanks a lot for this report.

If issue triggers when/if using one core, it is possible one driver
has a dependency on
softirqs being serviced during an initialization loop.

If the thread is not yielding cpu (holding something like a spinlock
thus disabling preemption),
then ksoftirqd might not be able to run on the (same) cpu.

I sent a patch for busy polling yesterday, but I am almost certain
this would not fix your issue
(assuming you have CONFIG_PREEMPT)

https://patchwork.ozlabs.org/patch/695185/

2016-11-16 18:02:06

by Brian Starkey

[permalink] [raw]

Subject: Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"

Hi Eric,

On Wed, Nov 16, 2016 at 07:52:42AM -0800, Eric Dumazet wrote:
>On Wed, Nov 16, 2016 at 5:55 AM, Brian Starkey <[email protected]> wrote:
>> Hi,
>>
>> I'm running an ARM FVP (virtual platform - simluated hardware), which
>> is failing to reach a login prompt due to extremely slow progress
>> during boot. systemd gives up waiting for the ttyAMA0 device to
>> appear, and never starts the getty.
>>
>> I've bisected this to commit 4cd13c21b207 "softirq: Let ksoftirqd do
>> its job".
>>
>> Without this commit, the system boots to a login prompt in 2 minutes.
>> With this commit, the system eventually manages to bring up sshd after
>> 22 minutes, but as mentioned, the dev-ttyAMA0.device unit has timed
>> out and so I don't get a prompt on my console.
>>
>> I only hit the issue when my rootfs is mounted over NFS, and with only
>> a single core enabled. The (simulated) network device is an SMC91C111.
>> With multiple cores enabled or a non-NFS filesystem, everything seems
>> to work OK.
>>
>> I don't have an identical real hardware platform to try, but I
>> could not reproduce it on a real ARM Juno board, which is similar.
>>
>> It looks from the logs that udev's workers are unable to make
>> progress, so the device nodes don't get created. Don't pay too much
>> attention to the timestamps in the logs below, they are "inside" the
>> virtual platform, and don't reflect wall-clock time.
>> Log before 4cd13c21b207:
>> https://drive.google.com/open?id=0B8siaK6ZjvEwMktoa0NUS2hJd1U
>> Log after 4cd13c21b207:
>> https://drive.google.com/open?id=0B8siaK6ZjvEwZXlfeFFSQl9xZTQ
>> Kernel config: arch/arm64/configs/defconfig
>>
>> I'm not sure how to debug this further, so if you have any suggestions
>> I'd be glad to hear them.
>>
>> Many thanks,
>> Brian
>>
>
>Hi Brian.
>
>Thanks a lot for this report.
>
>If issue triggers when/if using one core, it is possible one driver
>has a dependency on
>softirqs being serviced during an initialization loop.
>
>If the thread is not yielding cpu (holding something like a spinlock
>thus disabling preemption),
>then ksoftirqd might not be able to run on the (same) cpu.
>

The smc91x driver does seem to have some trickiness around softirqs.
I'm not familiar with net drivers, but I'll see if I can figure
anything out there.

>I sent a patch for busy polling yesterday, but I am almost certain
>this would not fix your issue
>(assuming you have CONFIG_PREEMPT)
>
>https://patchwork.ozlabs.org/patch/695185/

You're right in saying that this didn't help.

Thanks,
Brian

2016-11-16 18:49:33

by Eric Dumazet

[permalink] [raw]

Subject: Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"

On Wed, Nov 16, 2016 at 10:01 AM, Brian Starkey <[email protected]> wrote:

>
> The smc91x driver does seem to have some trickiness around softirqs.
> I'm not familiar with net drivers, but I'll see if I can figure
> anything out there.

Oh this code looks ugly :(

Do you have CONFIG_SMP=y or not ?

2016-11-16 21:01:49

by Brian Starkey

[permalink] [raw]

Subject: Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"

On Wed, Nov 16, 2016 at 10:49:06AM -0800, Eric Dumazet wrote:
>On Wed, Nov 16, 2016 at 10:01 AM, Brian Starkey <[email protected]> wrote:
>
>>
>> The smc91x driver does seem to have some trickiness around softirqs.
>> I'm not familiar with net drivers, but I'll see if I can figure
>> anything out there.
>
>Oh this code looks ugly :(
>
>Do you have CONFIG_SMP=y or not ?

Yeah CONFIG_SMP=y (and CONFIG_PREEMPT=y too, fwiw).

I did try forcing it into the no-op locking (as though config SMP
wasn't set), it didn't help (and it doesn't look like that would be
safe with CONFIG_PREEMPT=y either).

The bit in smc_hardware_send_pkt looks like skipping softirq
invocation when there's already one running wouldn't give the same
behaviour as before:

if (!smc_special_trylock(&lp->lock, flags)) {
netif_stop_queue(dev);
tasklet_schedule(&lp->tx_task);
return;
}

... that said, I've no idea if that matters.

Of course I also don't know if the network driver is even to blame :-(

Cheers,
-Brian

2016-11-17 17:07:25

by Brian Starkey

[permalink] [raw]

Subject: Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"

Hi Eric,

On Thu, Nov 17, 2016 at 07:29:14AM -0800, Eric Dumazet wrote:
>On Wed, Nov 16, 2016 at 1:01 PM, Brian Starkey <[email protected]> wrote:
>> On Wed, Nov 16, 2016 at 10:49:06AM -0800, Eric Dumazet wrote:
>>>
>>> On Wed, Nov 16, 2016 at 10:01 AM, Brian Starkey <[email protected]>
>>> wrote:
>>>
>>>>
>>>> The smc91x driver does seem to have some trickiness around softirqs.
>>>> I'm not familiar with net drivers, but I'll see if I can figure
>>>> anything out there.
>>>
>>>
>>> Oh this code looks ugly :(
>>>
>>> Do you have CONFIG_SMP=y or not ?
>>
>>
>> Yeah CONFIG_SMP=y (and CONFIG_PREEMPT=y too, fwiw).
>>
>> I did try forcing it into the no-op locking (as though config SMP
>> wasn't set), it didn't help (and it doesn't look like that would be
>> safe with CONFIG_PREEMPT=y either).
>>
>> The bit in smc_hardware_send_pkt looks like skipping softirq
>> invocation when there's already one running wouldn't give the same
>> behaviour as before:
>>
>> if (!smc_special_trylock(&lp->lock, flags)) {
>> netif_stop_queue(dev);
>> tasklet_schedule(&lp->tx_task);
>> return;
>> }
>>
>> ... that said, I've no idea if that matters.
>>
>> Of course I also don't know if the network driver is even to blame :-(
>>
>
>I believe the problem is in SMC_WAIT_MMU_BUSY()
>
>Could you try this patch ? (inlined and attached)
>

No joy with this patch :-(

I had to add an ioaddr argument because apparently that macro depends
on local context (yuck...), but it doesn't help my issue.

FWIW I don't see any timeouts, either with or without the patch.
(I don't know for sure, but I would guess that the model of the
network card doesn't model whatever stall that loop is checking for.
It probably just completes all MMU operations immediately)

-Brian

2016-11-17 17:08:55

by Eric Dumazet

[permalink] [raw]

Subject: Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"

On Wed, Nov 16, 2016 at 1:01 PM, Brian Starkey <[email protected]> wrote:
> On Wed, Nov 16, 2016 at 10:49:06AM -0800, Eric Dumazet wrote:
>>
>> On Wed, Nov 16, 2016 at 10:01 AM, Brian Starkey <[email protected]>
>> wrote:
>>
>>>
>>> The smc91x driver does seem to have some trickiness around softirqs.
>>> I'm not familiar with net drivers, but I'll see if I can figure
>>> anything out there.
>>
>>
>> Oh this code looks ugly :(
>>
>> Do you have CONFIG_SMP=y or not ?
>
>
> Yeah CONFIG_SMP=y (and CONFIG_PREEMPT=y too, fwiw).
>
> I did try forcing it into the no-op locking (as though config SMP
> wasn't set), it didn't help (and it doesn't look like that would be
> safe with CONFIG_PREEMPT=y either).
>
> The bit in smc_hardware_send_pkt looks like skipping softirq
> invocation when there's already one running wouldn't give the same
> behaviour as before:
>
> if (!smc_special_trylock(&lp->lock, flags)) {
> netif_stop_queue(dev);
> tasklet_schedule(&lp->tx_task);
> return;
> }
>
> ... that said, I've no idea if that matters.
>
> Of course I also don't know if the network driver is even to blame :-(
>

I believe the problem is in SMC_WAIT_MMU_BUSY()

Could you try this patch ? (inlined and attached)

diff --git a/drivers/net/ethernet/smsc/smc91x.c
b/drivers/net/ethernet/smsc/smc91x.c
index 65077c77082a2f042117a0889c2b15099c58eae5..4ef653ae8564c6d2a5120cdaef12db1e1b218b39
100644
--- a/drivers/net/ethernet/smsc/smc91x.c
+++ b/drivers/net/ethernet/smsc/smc91x.c
@@ -229,19 +229,22 @@ static inline void PRINT_PKT(u_char *buf, int length) { }
* if at all, but let's avoid deadlocking the system if the hardware
* decides to go south.
*/
-#define SMC_WAIT_MMU_BUSY(lp) do { \
- if (unlikely(SMC_GET_MMU_CMD(lp) & MC_BUSY)) { \
- unsigned long timeout = jiffies + 2; \
- while (SMC_GET_MMU_CMD(lp) & MC_BUSY) { \
- if (time_after(jiffies, timeout)) { \
- netdev_dbg(dev, "timeout %s line %d\n", \
- __FILE__, __LINE__); \
- break; \
- } \
- cpu_relax(); \
- } \
- } \
-} while (0)
+static void SMC_WAIT_MMU_BUSY(struct smc_local *lp)
+{
+ unsigned long timeout = jiffies + 2;
+ unsigned int count = 10000;
+
+ while (SMC_GET_MMU_CMD(lp) & MC_BUSY) {
+ count--;
+ if (!count || time_after(jiffies, timeout)) {
+ netdev_dbg(lp->dev, "timeout %s line %d\n",
+ __FILE__, __LINE__);
+ break;
+ }
+ /* TODO : investigate using cond_resched() from allowed contexts */
+ cpu_relax();
+ }
+}

/*

Attachments:

patch2250.txt (1.24 kB)

2016-11-18 00:43:26

by Thomas Gleixner

[permalink] [raw]

Subject: Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"

Brian,

On Thu, 17 Nov 2016, Brian Starkey wrote:
> No joy with this patch :-(
>
> I had to add an ioaddr argument because apparently that macro depends
> on local context (yuck...), but it doesn't help my issue.
>
> FWIW I don't see any timeouts, either with or without the patch.
> (I don't know for sure, but I would guess that the model of the
> network card doesn't model whatever stall that loop is checking for.
> It probably just completes all MMU operations immediately)

Is there a chance that you enable trace points at the kernel command line?

trace_event=sched_wakeup,sched_switch,irq_handler_entry,irq_handler_exit,softirq_raise,softirq_entry,softirq_exit

should be enough for a start. All we need aside of that is a trigger to
stop the trace so we can actually see the events around the time where
things go stale.

I assume that the whole issue is visible throughout the slow progress of
init towards a working system, so for a start it would be sufficient to add
something like this into the startup sequence at some point:

mount -t debugfs debugfs /sys/kernel/debug
echo 0 >/sys/kernel/debug/tracing/tracing_on

The only interesting challange is to get the trace data out of the
system. The trace is accessible via:

cat /sys/kernel/tracing/trace

So if your ssh works at some point, that might be an option or you just try
to store it over NFS (which will be slow, but better than nothing). Maybe
you have a better idea :)

Thanks,

tglx

2016-11-18 20:23:50

by Brian Starkey

[permalink] [raw]

Subject: Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"

Hi Thomas,

On Fri, Nov 18, 2016 at 01:40:43AM +0100, Thomas Gleixner wrote:
>Brian,
>
>On Thu, 17 Nov 2016, Brian Starkey wrote:
>> No joy with this patch :-(
>>
>> I had to add an ioaddr argument because apparently that macro depends
>> on local context (yuck...), but it doesn't help my issue.
>>
>> FWIW I don't see any timeouts, either with or without the patch.
>> (I don't know for sure, but I would guess that the model of the
>> network card doesn't model whatever stall that loop is checking for.
>> It probably just completes all MMU operations immediately)
>
>Is there a chance that you enable trace points at the kernel command line?
>
> trace_event=sched_wakeup,sched_switch,irq_handler_entry,irq_handler_exit,softirq_raise,softirq_entry,softirq_exit
>
>should be enough for a start. All we need aside of that is a trigger to
>stop the trace so we can actually see the events around the time where
>things go stale.
>
>I assume that the whole issue is visible throughout the slow progress of
>init towards a working system, so for a start it would be sufficient to add
>something like this into the startup sequence at some point:
>
> mount -t debugfs debugfs /sys/kernel/debug
> echo 0 >/sys/kernel/debug/tracing/tracing_on
>
>The only interesting challange is to get the trace data out of the
>system. The trace is accessible via:
>
> cat /sys/kernel/tracing/trace
>

Thanks for the pointers on tracing. I haven't used it before so that
was very helpful.

>So if your ssh works at some point, that might be an option or you just try
>to store it over NFS (which will be slow, but better than nothing). Maybe
>you have a better idea :)

I've tried a whole bunch of different ways to reproduce the problem
and get the logs out, so far they've all been unsuccessful
(reproducing is easy, getting data out is not).

I have a few more ideas to try, but it's pretty slow work - it's
taking at least 30 minutes per attempt. I'll let you know if I manage
something.

Thanks!
-Brian

>
>Thanks,
>
> tglx
>

2016-11-22 10:34:06

by Brian Starkey

[permalink] [raw]

Subject: Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"

Hi,

On Fri, Nov 18, 2016 at 01:40:43AM +0100, Thomas Gleixner wrote:
>Brian,
>
>On Thu, 17 Nov 2016, Brian Starkey wrote:
>> No joy with this patch :-(
>>
>> I had to add an ioaddr argument because apparently that macro depends
>> on local context (yuck...), but it doesn't help my issue.
>>
>> FWIW I don't see any timeouts, either with or without the patch.
>> (I don't know for sure, but I would guess that the model of the
>> network card doesn't model whatever stall that loop is checking for.
>> It probably just completes all MMU operations immediately)
>
>Is there a chance that you enable trace points at the kernel command line?
>
> trace_event=sched_wakeup,sched_switch,irq_handler_entry,irq_handler_exit,softirq_raise,softirq_entry,softirq_exit
>
>should be enough for a start. All we need aside of that is a trigger to
>stop the trace so we can actually see the events around the time where
>things go stale.
>
>I assume that the whole issue is visible throughout the slow progress of
>init towards a working system, so for a start it would be sufficient to add
>something like this into the startup sequence at some point:
>
> mount -t debugfs debugfs /sys/kernel/debug
> echo 0 >/sys/kernel/debug/tracing/tracing_on
>
>The only interesting challange is to get the trace data out of the
>system. The trace is accessible via:
>
> cat /sys/kernel/tracing/trace
>
>So if your ssh works at some point, that might be an option or you just try
>to store it over NFS (which will be slow, but better than nothing). Maybe
>you have a better idea :)

I finally managed to pry some traces out this morning. It seems like
the system struggles to even invoke echo when it's doing badly.

Trace before 4cd13c21b207: https://drive.google.com/open?id=0B8siaK6ZjvEwU21wNTdZS29kVXc
Trace after 4cd13c21b207: https://drive.google.com/open?id=0B8siaK6ZjvEwbXVzcnpieVkzWFU
(btw, if there's a preferred way to send the logs let me know. I
wasn't sure large or non-text attachments would be well received)

I'm not sure how much help the trace is, but it does look like the
system is spending far too much time in the ethernet device's IRQ
handler to be healthy.

Thanks,
Brian
>
>Thanks,
>
> tglx
>

2016-11-22 14:29:36

by Eric Dumazet

[permalink] [raw]

Subject: Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"

On Tue, Nov 22, 2016 at 2:33 AM, Brian Starkey <[email protected]> wrote:
>
> Hi,
>
> On Fri, Nov 18, 2016 at 01:40:43AM +0100, Thomas Gleixner wrote:
>>
>> Brian,
>>
>> On Thu, 17 Nov 2016, Brian Starkey wrote:
>>>
>>> No joy with this patch :-(
>>>
>>> I had to add an ioaddr argument because apparently that macro depends
>>> on local context (yuck...), but it doesn't help my issue.
>>>
>>> FWIW I don't see any timeouts, either with or without the patch.
>>> (I don't know for sure, but I would guess that the model of the
>>> network card doesn't model whatever stall that loop is checking for.
>>> It probably just completes all MMU operations immediately)
>>
>>
>> Is there a chance that you enable trace points at the kernel command line?
>>
>> trace_event=sched_wakeup,sched_switch,irq_handler_entry,irq_handler_exit,softirq_raise,softirq_entry,softirq_exit
>>
>> should be enough for a start. All we need aside of that is a trigger to
>> stop the trace so we can actually see the events around the time where
>> things go stale.
>>
>> I assume that the whole issue is visible throughout the slow progress of
>> init towards a working system, so for a start it would be sufficient to add
>> something like this into the startup sequence at some point:
>>
>> mount -t debugfs debugfs /sys/kernel/debug
>> echo 0 >/sys/kernel/debug/tracing/tracing_on
>>
>> The only interesting challange is to get the trace data out of the
>> system. The trace is accessible via:
>>
>> cat /sys/kernel/tracing/trace
>>
>> So if your ssh works at some point, that might be an option or you just try
>> to store it over NFS (which will be slow, but better than nothing). Maybe
>> you have a better idea :)
>
>
> I finally managed to pry some traces out this morning. It seems like
> the system struggles to even invoke echo when it's doing badly.
>
> Trace before 4cd13c21b207: https://drive.google.com/open?id=0B8siaK6ZjvEwU21wNTdZS29kVXc
> Trace after 4cd13c21b207: https://drive.google.com/open?id=0B8siaK6ZjvEwbXVzcnpieVkzWFU
> (btw, if there's a preferred way to send the logs let me know. I
> wasn't sure large or non-text attachments would be well received)
>
> I'm not sure how much help the trace is, but it does look like the
> system is spending far too much time in the ethernet device's IRQ
> handler to be healthy.
>

Thanks a lot Brian

Can you confirm interrupt handler is smc911x_interrupt() ?

(ie : is SMC_USE_PXA_DMA / SMC_USE_DMA defined or not ?)

>
> Thanks,
> Brian
>>
>>
>> Thanks,
>>
>> tglx
>>

2016-11-22 15:27:44

by Brian Starkey

[permalink] [raw]

Subject: Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"

Hi Eric,

On Tue, Nov 22, 2016 at 06:29:33AM -0800, Eric Dumazet wrote:
>On Tue, Nov 22, 2016 at 2:33 AM, Brian Starkey <[email protected]> wrote:
>>
>> Hi,
>>
>> On Fri, Nov 18, 2016 at 01:40:43AM +0100, Thomas Gleixner wrote:
>>>
>>> Brian,
>>>
>>> On Thu, 17 Nov 2016, Brian Starkey wrote:
>>>>
>>>> No joy with this patch :-(
>>>>
>>>> I had to add an ioaddr argument because apparently that macro depends
>>>> on local context (yuck...), but it doesn't help my issue.
>>>>
>>>> FWIW I don't see any timeouts, either with or without the patch.
>>>> (I don't know for sure, but I would guess that the model of the
>>>> network card doesn't model whatever stall that loop is checking for.
>>>> It probably just completes all MMU operations immediately)
>>>
>>>
>>> Is there a chance that you enable trace points at the kernel command line?
>>>
>>> trace_event=sched_wakeup,sched_switch,irq_handler_entry,irq_handler_exit,softirq_raise,softirq_entry,softirq_exit
>>>
>>> should be enough for a start. All we need aside of that is a trigger to
>>> stop the trace so we can actually see the events around the time where
>>> things go stale.
>>>
>>> I assume that the whole issue is visible throughout the slow progress of
>>> init towards a working system, so for a start it would be sufficient to add
>>> something like this into the startup sequence at some point:
>>>
>>> mount -t debugfs debugfs /sys/kernel/debug
>>> echo 0 >/sys/kernel/debug/tracing/tracing_on
>>>
>>> The only interesting challange is to get the trace data out of the
>>> system. The trace is accessible via:
>>>
>>> cat /sys/kernel/tracing/trace
>>>
>>> So if your ssh works at some point, that might be an option or you just try
>>> to store it over NFS (which will be slow, but better than nothing). Maybe
>>> you have a better idea :)
>>
>>
>> I finally managed to pry some traces out this morning. It seems like
>> the system struggles to even invoke echo when it's doing badly.
>>
>> Trace before 4cd13c21b207: https://drive.google.com/open?id=0B8siaK6ZjvEwU21wNTdZS29kVXc
>> Trace after 4cd13c21b207: https://drive.google.com/open?id=0B8siaK6ZjvEwbXVzcnpieVkzWFU
>> (btw, if there's a preferred way to send the logs let me know. I
>> wasn't sure large or non-text attachments would be well received)
>>
>> I'm not sure how much help the trace is, but it does look like the
>> system is spending far too much time in the ethernet device's IRQ
>> handler to be healthy.
>>
>
>Thanks a lot Brian
>
>Can you confirm interrupt handler is smc911x_interrupt() ?
>
>(ie : is SMC_USE_PXA_DMA / SMC_USE_DMA defined or not ?)
>

Looks like there's a few similarly named devices and drivers. Mine is
an SMSC LAN91C111 using the smc91x driver in
drivers/net/ethernet/smsc/smc91x.c, rather than smc911x.c. So the
interrupt handler is smc_interrupt()

CONFIG_ARCH_PXA is not set, nor is SMC_USE_PXA_DMA or SMC_USE_DMA.

-Brian

>
>>
>> Thanks,
>> Brian
>>>
>>>
>>> Thanks,
>>>
>>> tglx
>>>

2016-11-22 16:10:03

by Eric Dumazet

[permalink] [raw]

Subject: Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"

>
> Looks like there's a few similarly named devices and drivers. Mine is
> an SMSC LAN91C111 using the smc91x driver in
> drivers/net/ethernet/smsc/smc91x.c, rather than smc911x.c. So the
> interrupt handler is smc_interrupt()
>
> CONFIG_ARCH_PXA is not set, nor is SMC_USE_PXA_DMA or SMC_USE_DMA.

Oh right.

static irqreturn_t smc_interrupt(int irq, void *dev_id)
{
...
mask = SMC_GET_INT_MASK(lp);
SMC_SET_INT_MASK(lp, 0);

loop()

...
} else if (status & IM_ALLOC_INT) {
DBG(3, dev, "Allocation irq\n");
tasklet_hi_schedule(&lp->tx_task);
mask &= ~IM_ALLOC_INT;
}
...

SMC_SET_INT_MASK(lp, mask);
spin_unlock(&lp->lock);
/*
* We return IRQ_HANDLED unconditionally here even if there was
* nothing to do. There is a possibility that a packet might
* get enqueued into the chip right after TX_EMPTY_INT is raised
* but just before the CPU acknowledges the IRQ.
* Better take an unneeded IRQ in some occasions than complexifying
* the code for all cases.
*/
return IRQ_HANDLED;
}

Could you trace mask value, it looks we loop and never acknowledge
some interrupt status.

Maybe driver depends on tasklet_hi_schedule() triggers
smc_hardware_send_pkt() really really soon.

Oh, but smc_hardware_send_pkt() uses a spin_trylock() at its
beginning, then later a spin_lock() hidden in SMC_ENABLE_INT(),
this looks racy.

diff --git a/drivers/net/ethernet/smsc/smc91x.c
b/drivers/net/ethernet/smsc/smc91x.c
index 73212590d04a..32abb0a084ec 100644
--- a/drivers/net/ethernet/smsc/smc91x.c
+++ b/drivers/net/ethernet/smsc/smc91x.c
@@ -617,14 +617,13 @@ static void smc_hardware_send_pkt(unsigned long data)

/* queue the packet for TX */
SMC_SET_MMU_CMD(lp, MC_ENQUEUE);
+ SMC_SET_INT_MASK(lp, SMC_GET_INT_MASK(lp) | IM_TX_INT |
IM_TX_EMPTY_INT);
smc_special_unlock(&lp->lock, flags);

netif_trans_update(dev);
dev->stats.tx_packets++;
dev->stats.tx_bytes += len;

- SMC_ENABLE_INT(lp, IM_TX_INT | IM_TX_EMPTY_INT);
-
done: if (!THROTTLE_TX_PKTS)
netif_wake_queue(dev);

2016-11-23 18:21:41

by Brian Starkey

[permalink] [raw]

Subject: Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"

Hi Eric,

On Tue, Nov 22, 2016 at 08:09:58AM -0800, Eric Dumazet wrote:
>>
>> Looks like there's a few similarly named devices and drivers. Mine is
>> an SMSC LAN91C111 using the smc91x driver in
>> drivers/net/ethernet/smsc/smc91x.c, rather than smc911x.c. So the
>> interrupt handler is smc_interrupt()
>>
>> CONFIG_ARCH_PXA is not set, nor is SMC_USE_PXA_DMA or SMC_USE_DMA.
>
>Oh right.
>
>static irqreturn_t smc_interrupt(int irq, void *dev_id)
>{
>...
>mask = SMC_GET_INT_MASK(lp);
>SMC_SET_INT_MASK(lp, 0);
>
>loop()
>
>...
>} else if (status & IM_ALLOC_INT) {
> DBG(3, dev, "Allocation irq\n");
> tasklet_hi_schedule(&lp->tx_task);
> mask &= ~IM_ALLOC_INT;
>}
>...
>
>SMC_SET_INT_MASK(lp, mask);
>spin_unlock(&lp->lock);
> /*
> * We return IRQ_HANDLED unconditionally here even if there was
> * nothing to do. There is a possibility that a packet might
> * get enqueued into the chip right after TX_EMPTY_INT is raised
> * but just before the CPU acknowledges the IRQ.
> * Better take an unneeded IRQ in some occasions than complexifying
> * the code for all cases.
> */
> return IRQ_HANDLED;
>}
>
>Could you trace mask value, it looks we loop and never acknowledge
>some interrupt status.
>
>Maybe driver depends on tasklet_hi_schedule() triggers
>smc_hardware_send_pkt() really really soon.
>
>Oh, but smc_hardware_send_pkt() uses a spin_trylock() at its
>beginning, then later a spin_lock() hidden in SMC_ENABLE_INT(),
>this looks racy.

This patch didn't help.

I did get some new traces though - I've attached the diff for the
trace_printks I added.

Before 4cd13c21b207: https://drive.google.com/open?id=0B8siaK6ZjvEwcEtOeFQzTmY0Nnc
After 4cd13c21b207: https://drive.google.com/open?id=0B8siaK6ZjvEwZnQ4MVg1d3d1Tm8

It looks like the difference is that after 4cd13c21b207 the RX softirq
isn't running, and RX interrupts don't call softirq_raise anymore -
presumably because there's one pending, but I didn't have time to
track that down to a code-path.

Cheers,
-Brian

--->8---

diff --git a/drivers/net/ethernet/smsc/smc91x.c b/drivers/net/ethernet/smsc/smc91x.c
index 503a3b6..6598319 100644
--- a/drivers/net/ethernet/smsc/smc91x.c
+++ b/drivers/net/ethernet/smsc/smc91x.c
@@ -418,7 +418,7 @@ static inline void smc_rcv(struct net_device *dev)

packet_number = SMC_GET_RXFIFO(lp);
if (unlikely(packet_number & RXFIFO_REMPTY)) {
- PRINTK(dev, "smc_rcv with nothing on FIFO.\n");
+ trace_printk("smc_rcv with nothing on FIFO.\n");
return;
}

@@ -475,6 +475,7 @@ static inline void smc_rcv(struct net_device *dev)
SMC_WAIT_MMU_BUSY(lp);
SMC_SET_MMU_CMD(lp, MC_RELEASE);
dev->stats.rx_dropped++;
+ trace_printk("rx_dropped.\n");
return;
}

@@ -1216,7 +1217,7 @@ static irqreturn_t smc_interrupt(int irq, void *dev_id)
int status, mask, timeout, card_stats;
int saved_pointer;

- DBG(3, dev, "%s\n", __func__);
+ trace_printk("%s\n", __func__);

spin_lock(&lp->lock);

@@ -1235,7 +1236,7 @@ static irqreturn_t smc_interrupt(int irq, void *dev_id)
do {
status = SMC_GET_INT(lp);

- DBG(2, dev, "INT 0x%02x MASK 0x%02x MEM 0x%04x FIFO 0x%04x\n",
+ trace_printk("INT 0x%02x MASK 0x%02x MEM 0x%04x FIFO 0x%04x\n",
status, mask,
({ int meminfo; SMC_SELECT_BANK(lp, 0);
meminfo = SMC_GET_MIR(lp);
@@ -1254,7 +1255,7 @@ static irqreturn_t smc_interrupt(int irq, void *dev_id)
if (THROTTLE_TX_PKTS)
netif_wake_queue(dev);
} else if (status & IM_RCV_INT) {
- DBG(3, dev, "RX irq\n");
+ trace_printk("RX irq\n");
smc_rcv(dev);
} else if (status & IM_ALLOC_INT) {
DBG(3, dev, "Allocation irq\n");
@@ -1304,7 +1305,7 @@ static irqreturn_t smc_interrupt(int irq, void *dev_id)
PRINTK(dev, "spurious interrupt (mask = 0x%02x)\n",
mask);
#endif
- DBG(3, dev, "Interrupt done (%d loops)\n",
+ trace_printk("Interrupt done (%d loops)\n",
MAX_IRQ_LOOPS - timeout);

/*

2016-11-23 20:03:33

by Eric Dumazet

[permalink] [raw]

Subject: Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"

On Wed, Nov 23, 2016 at 10:21 AM, Brian Starkey <[email protected]> wrote:

> This patch didn't help.
>
> I did get some new traces though - I've attached the diff for the
> trace_printks I added.
>
> Before 4cd13c21b207:
> https://drive.google.com/open?id=0B8siaK6ZjvEwcEtOeFQzTmY0Nnc
> After 4cd13c21b207:
> https://drive.google.com/open?id=0B8siaK6ZjvEwZnQ4MVg1d3d1Tm8
>
> It looks like the difference is that after 4cd13c21b207 the RX softirq
> isn't running, and RX interrupts don't call softirq_raise anymore -
> presumably because there's one pending, but I didn't have time to
> track that down to a code-path.
>
> Cheers,
> -Brian
>

Hi Brian

Looks like netif_rx() drops the incoming packets then ?

Maybe netif_running() is not happy :(

Could you trace netif_rx() return value (NET_RX_SUCCESS or NET_RX_DROP)

Thanks !

2016-11-25 13:14:21

by Brian Starkey

[permalink] [raw]

Subject: Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"

Hi,

On Wed, Nov 23, 2016 at 12:03:28PM -0800, Eric Dumazet wrote:
>On Wed, Nov 23, 2016 at 10:21 AM, Brian Starkey <[email protected]> wrote:
>
>> This patch didn't help.
>>
>> I did get some new traces though - I've attached the diff for the
>> trace_printks I added.
>>
>> Before 4cd13c21b207:
>> https://drive.google.com/open?id=0B8siaK6ZjvEwcEtOeFQzTmY0Nnc
>> After 4cd13c21b207:
>> https://drive.google.com/open?id=0B8siaK6ZjvEwZnQ4MVg1d3d1Tm8
>>
>> It looks like the difference is that after 4cd13c21b207 the RX softirq
>> isn't running, and RX interrupts don't call softirq_raise anymore -
>> presumably because there's one pending, but I didn't have time to
>> track that down to a code-path.
>>
>> Cheers,
>> -Brian
>>
>
>Hi Brian
>
>Looks like netif_rx() drops the incoming packets then ?
>
>Maybe netif_running() is not happy :(
>
>Could you trace netif_rx() return value (NET_RX_SUCCESS or NET_RX_DROP)

Some packets are dropped, but not very many:

$ grep NET_RX_SUCCESS trace_netif_rx.txt | wc -l
14399
$ grep NET_RX_DROP trace_netif_rx.txt | wc -l
22

Without the ksoftirqd change there were zero NET_RX_DROPs.

-Brian

>
>Thanks !