Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752591AbcKPPwq (ORCPT ); Wed, 16 Nov 2016 10:52:46 -0500 Received: from mail-it0-f43.google.com ([209.85.214.43]:36256 "EHLO mail-it0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751464AbcKPPwo (ORCPT ); Wed, 16 Nov 2016 10:52:44 -0500 MIME-Version: 1.0 In-Reply-To: <20161116135527.GA5833@e106950-lin.cambridge.arm.com> References: <20161116135527.GA5833@e106950-lin.cambridge.arm.com> From: Eric Dumazet Date: Wed, 16 Nov 2016 07:52:42 -0800 Message-ID: Subject: Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job" To: Brian Starkey Cc: LKML , Peter Zijlstra , Ingo Molnar , Andrew Morton , Alexander Potapenko , Steven Rostedt , Sebastian Andrzej Siewior , Thomas Gleixner Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2206 Lines: 58 On Wed, Nov 16, 2016 at 5:55 AM, Brian Starkey wrote: > Hi, > > I'm running an ARM FVP (virtual platform - simluated hardware), which > is failing to reach a login prompt due to extremely slow progress > during boot. systemd gives up waiting for the ttyAMA0 device to > appear, and never starts the getty. > > I've bisected this to commit 4cd13c21b207 "softirq: Let ksoftirqd do > its job". > > Without this commit, the system boots to a login prompt in 2 minutes. > With this commit, the system eventually manages to bring up sshd after > 22 minutes, but as mentioned, the dev-ttyAMA0.device unit has timed > out and so I don't get a prompt on my console. > > I only hit the issue when my rootfs is mounted over NFS, and with only > a single core enabled. The (simulated) network device is an SMC91C111. > With multiple cores enabled or a non-NFS filesystem, everything seems > to work OK. > > I don't have an identical real hardware platform to try, but I > could not reproduce it on a real ARM Juno board, which is similar. > > It looks from the logs that udev's workers are unable to make > progress, so the device nodes don't get created. Don't pay too much > attention to the timestamps in the logs below, they are "inside" the > virtual platform, and don't reflect wall-clock time. > Log before 4cd13c21b207: > https://drive.google.com/open?id=0B8siaK6ZjvEwMktoa0NUS2hJd1U > Log after 4cd13c21b207: > https://drive.google.com/open?id=0B8siaK6ZjvEwZXlfeFFSQl9xZTQ > Kernel config: arch/arm64/configs/defconfig > > I'm not sure how to debug this further, so if you have any suggestions > I'd be glad to hear them. > > Many thanks, > Brian > Hi Brian. Thanks a lot for this report. If issue triggers when/if using one core, it is possible one driver has a dependency on softirqs being serviced during an initialization loop. If the thread is not yielding cpu (holding something like a spinlock thus disabling preemption), then ksoftirqd might not be able to run on the (same) cpu. I sent a patch for busy polling yesterday, but I am almost certain this would not fix your issue (assuming you have CONFIG_PREEMPT) https://patchwork.ozlabs.org/patch/695185/