Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933512AbcKVO3g (ORCPT ); Tue, 22 Nov 2016 09:29:36 -0500 Received: from mail-it0-f42.google.com ([209.85.214.42]:37261 "EHLO mail-it0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933360AbcKVO3e (ORCPT ); Tue, 22 Nov 2016 09:29:34 -0500 MIME-Version: 1.0 In-Reply-To: <20161122103351.GA25080@e106950-lin.cambridge.arm.com> References: <20161116135527.GA5833@e106950-lin.cambridge.arm.com> <20161116180156.GA21156@e106950-lin.cambridge.arm.com> <20161116210139.GB21156@e106950-lin.cambridge.arm.com> <20161117164200.GA24653@e106950-lin.cambridge.arm.com> <20161122103351.GA25080@e106950-lin.cambridge.arm.com> From: Eric Dumazet Date: Tue, 22 Nov 2016 06:29:33 -0800 Message-ID: Subject: Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job" To: Brian Starkey Cc: Thomas Gleixner , LKML , Peter Zijlstra , Ingo Molnar , Andrew Morton , Alexander Potapenko , Steven Rostedt , Sebastian Andrzej Siewior Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2525 Lines: 75 On Tue, Nov 22, 2016 at 2:33 AM, Brian Starkey wrote: > > Hi, > > On Fri, Nov 18, 2016 at 01:40:43AM +0100, Thomas Gleixner wrote: >> >> Brian, >> >> On Thu, 17 Nov 2016, Brian Starkey wrote: >>> >>> No joy with this patch :-( >>> >>> I had to add an ioaddr argument because apparently that macro depends >>> on local context (yuck...), but it doesn't help my issue. >>> >>> FWIW I don't see any timeouts, either with or without the patch. >>> (I don't know for sure, but I would guess that the model of the >>> network card doesn't model whatever stall that loop is checking for. >>> It probably just completes all MMU operations immediately) >> >> >> Is there a chance that you enable trace points at the kernel command line? >> >> trace_event=sched_wakeup,sched_switch,irq_handler_entry,irq_handler_exit,softirq_raise,softirq_entry,softirq_exit >> >> should be enough for a start. All we need aside of that is a trigger to >> stop the trace so we can actually see the events around the time where >> things go stale. >> >> I assume that the whole issue is visible throughout the slow progress of >> init towards a working system, so for a start it would be sufficient to add >> something like this into the startup sequence at some point: >> >> mount -t debugfs debugfs /sys/kernel/debug >> echo 0 >/sys/kernel/debug/tracing/tracing_on >> >> The only interesting challange is to get the trace data out of the >> system. The trace is accessible via: >> >> cat /sys/kernel/tracing/trace >> >> So if your ssh works at some point, that might be an option or you just try >> to store it over NFS (which will be slow, but better than nothing). Maybe >> you have a better idea :) > > > I finally managed to pry some traces out this morning. It seems like > the system struggles to even invoke echo when it's doing badly. > > Trace before 4cd13c21b207: https://drive.google.com/open?id=0B8siaK6ZjvEwU21wNTdZS29kVXc > Trace after 4cd13c21b207: https://drive.google.com/open?id=0B8siaK6ZjvEwbXVzcnpieVkzWFU > (btw, if there's a preferred way to send the logs let me know. I > wasn't sure large or non-text attachments would be well received) > > I'm not sure how much help the trace is, but it does look like the > system is spending far too much time in the ethernet device's IRQ > handler to be healthy. > Thanks a lot Brian Can you confirm interrupt handler is smc911x_interrupt() ? (ie : is SMC_USE_PXA_DMA / SMC_USE_DMA defined or not ?) > > Thanks, > Brian >> >> >> Thanks, >> >> tglx >>