Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753586AbcKRUXu (ORCPT ); Fri, 18 Nov 2016 15:23:50 -0500 Received: from foss.arm.com ([217.140.101.70]:58582 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752720AbcKRUXr (ORCPT ); Fri, 18 Nov 2016 15:23:47 -0500 Date: Fri, 18 Nov 2016 20:23:38 +0000 From: Brian Starkey To: Thomas Gleixner Cc: Eric Dumazet , LKML , Peter Zijlstra , Ingo Molnar , Andrew Morton , Alexander Potapenko , Steven Rostedt , Sebastian Andrzej Siewior Subject: Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job" Message-ID: <20161118183633.GA25157@e106950-lin.cambridge.arm.com> References: <20161116135527.GA5833@e106950-lin.cambridge.arm.com> <20161116180156.GA21156@e106950-lin.cambridge.arm.com> <20161116210139.GB21156@e106950-lin.cambridge.arm.com> <20161117164200.GA24653@e106950-lin.cambridge.arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2014 Lines: 60 Hi Thomas, On Fri, Nov 18, 2016 at 01:40:43AM +0100, Thomas Gleixner wrote: >Brian, > >On Thu, 17 Nov 2016, Brian Starkey wrote: >> No joy with this patch :-( >> >> I had to add an ioaddr argument because apparently that macro depends >> on local context (yuck...), but it doesn't help my issue. >> >> FWIW I don't see any timeouts, either with or without the patch. >> (I don't know for sure, but I would guess that the model of the >> network card doesn't model whatever stall that loop is checking for. >> It probably just completes all MMU operations immediately) > >Is there a chance that you enable trace points at the kernel command line? > > trace_event=sched_wakeup,sched_switch,irq_handler_entry,irq_handler_exit,softirq_raise,softirq_entry,softirq_exit > >should be enough for a start. All we need aside of that is a trigger to >stop the trace so we can actually see the events around the time where >things go stale. > >I assume that the whole issue is visible throughout the slow progress of >init towards a working system, so for a start it would be sufficient to add >something like this into the startup sequence at some point: > > mount -t debugfs debugfs /sys/kernel/debug > echo 0 >/sys/kernel/debug/tracing/tracing_on > >The only interesting challange is to get the trace data out of the >system. The trace is accessible via: > > cat /sys/kernel/tracing/trace > Thanks for the pointers on tracing. I haven't used it before so that was very helpful. >So if your ssh works at some point, that might be an option or you just try >to store it over NFS (which will be slow, but better than nothing). Maybe >you have a better idea :) I've tried a whole bunch of different ways to reproduce the problem and get the logs out, so far they've all been unsuccessful (reproducing is easy, getting data out is not). I have a few more ideas to try, but it's pretty slow work - it's taking at least 30 minutes per attempt. I'll let you know if I manage something. Thanks! -Brian > >Thanks, > > tglx >