Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1978817imu; Wed, 28 Nov 2018 19:27:12 -0800 (PST) X-Google-Smtp-Source: AFSGD/UFPmVm5238eUT+kT74rx9nYpe7moSH1OxjY4Musrmdy61Lbv1Ckn6+faZDnZNIO5qpRt8K X-Received: by 2002:a17:902:8bc6:: with SMTP id r6mr39398683plo.67.1543462032739; Wed, 28 Nov 2018 19:27:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543462032; cv=none; d=google.com; s=arc-20160816; b=htmETM18Em0wFKIepuSxgbmv54hMLx+N7I+PEaOSF0miJl6UZDhAfhhtsfsMW6bZm+ qV7ps+pZ4VlZg5Af7RsUuLcX1tmyqryp2RajQ17LkiUBWghkf3xGKmFPp2XpItWGiCjj PRpQfVm4amA4eUbG6wfCebgUJTD7TxQyVcMrj8YVgdiW6WXmesRN/MX/NgqdgDjBhJlz psAIhEcTVgvk3cUVFP2q7hc2A04xWDwPPLjopiZ3GdM6fyPazmv4pBXlPgboOkDIsc6B 7Ye7/8AGPDY3+rFGOI1gPamdDQmP93jDh+fOqhLoMIVlcbiuwo2nJC+vC2qHoVkwOjAd mfsQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date; bh=bPKv4ct6d/Rbz0/B5vKIli1waGCbOgpp135R50fWu6A=; b=koVogptsMGM9cvWngynpcBMl0Z70q3wQK1UixeoqVO6VON1eU+QMUmuwLbrJfHWuQX kjFSA7NZBUJM1t/lowXLcFYxjEGDmsOG1PhGODufc8gZO4ykyJD9H/caNnPTEUZqot3N X9Tcfnnvvi+65oDYx3DeG2pC+r2WkUPCB2EH7wm/TwJZkWqKDeWkIWSUnxOtQZdM3FUs xHjqX8Tai0+cvv76CNzNNmnyiEKk7NwjfAkquVZKXjrtJR9oZcgSNvdUzOX2YzRdk+GR 10ChhmHdYuWATx01e4lpsFDL3oPapJJfRKATkk5ZNINTRMDoXGFeQ+uiDTsyzPDx8vS7 L7oA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l12si777644plc.0.2018.11.28.19.26.57; Wed, 28 Nov 2018 19:27:12 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727514AbeK2O2c (ORCPT + 99 others); Thu, 29 Nov 2018 09:28:32 -0500 Received: from mail.kernel.org ([198.145.29.99]:37610 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726954AbeK2O2c (ORCPT ); Thu, 29 Nov 2018 09:28:32 -0500 Received: from vmware.local.home (cpe-66-24-56-78.stny.res.rr.com [66.24.56.78]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 82A472086B; Thu, 29 Nov 2018 03:24:38 +0000 (UTC) Date: Wed, 28 Nov 2018 22:24:36 -0500 From: Steven Rostedt To: Arnd Bergmann Cc: Anders Roxell , Ingo Molnar , Andrew Morton , Dmitry Vyukov , Linux Kernel Mailing List Subject: Re: [PATCH 2/2] kernel/trace: fix watchdog soft lockup Message-ID: <20181128222436.2d29ca9a@vmware.local.home> In-Reply-To: References: <20181128081334.18408-1-anders.roxell@linaro.org> <20181128090903.7a27ae58@gandalf.local.home> X-Mailer: Claws Mail 3.15.1 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 28 Nov 2018 23:24:26 +0100 Arnd Bergmann wrote: > On Wed, Nov 28, 2018 at 3:09 PM Steven Rostedt wrote: > > > > On Wed, 28 Nov 2018 09:13:34 +0100 > > Anders Roxell wrote: > > > > > When building a allmodconfig kernel for arm64 and boot that in qemu, > > > CONFIG_FTRACE_STARTUP_TEST gets enabled and that takes time so the > > > watchdog expires and prints out a message like this: > > > 'watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]' > > > Each time the function ftrace_replace_code gets called it stays in that > > > functions loop for 41424 times. > > > Rework so that function cond_resched() gets called in the > > > ftrace_replace_code loop. > > > > > > Co-developed-by: Arnd Bergmann > > > Signed-off-by: Arnd Bergmann > > > Signed-off-by: Anders Roxell > > > --- > > > kernel/trace/ftrace.c | 4 ++++ > > > 1 file changed, 4 insertions(+) > > > > > > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c > > > index 5b4f73e4fd56..3f456921dedf 100644 > > > --- a/kernel/trace/ftrace.c > > > +++ b/kernel/trace/ftrace.c > > > @@ -2426,6 +2426,10 @@ void __weak ftrace_replace_code(int enable) > > > > > > do_for_each_ftrace_rec(pg, rec) { > > > > > > + /* This loop can take minutes when sanitizers are enabled, so > > > + * lets make sure we allow RCU processing. > > > + */ > > > + cond_resched(); > > > if (rec->flags & FTRACE_FL_DISABLED) > > > continue; > > > > > > > NACK. On some architectures this code is run from stop machine. We > > can't call cond_resched() because it may be called with interrupts > > disabled. > > > > This is a weak function. If arm64 has special needs, just copy it in > > the arm64 code. > > I think it's currently broken on all architectures that don't already > override it, the problem being that the function is simply too > expensive when all debug options are enabled. Would it be possible to add something like touch_nmi_watchdog()? > > In an ARM64 allmodconfig kernel, there are 41424 records > that we iterate through several times. In an earlier version of the > test, the cond_resched() was only in the loop in > init_trace_selftests(), and I think that is safe and should /mostly/ > solve the problem, so maybe Anders can submit that version again. > > However, at least trace_selftest_ops() still takes half a minute > to complete in qemu, and that triggers the softlockup watchdog. > trace_selftest_ops() calls ftrace_replace_code() four or five times. And I don't have a problem with adding cond_resched() there. I'm concerned about adding it where it can be called with interrupts and/or preemption disabled. > > Here is the excerpt with printk times from one of Anders' tests: > > [ 8.350607] Running postponed tracer tests: > [ 8.356045] Testing tracer function: > [ 18.932077] ../kernel/trace/ftrace.c:2441, loop_counter: 41424 > [ 27.454205] ../kernel/trace/ftrace.c:2441, loop_counter: 41424 > [ 27.462594] PASSED > [ 27.462954] Testing dynamic ftrace: > [ 28.510903] ../kernel/trace/ftrace.c:2441, loop_counter: 41424 > [ 28.746934] PASSED > [ 28.747469] Testing dynamic ftrace ops #1: > [ 32.488427] ../kernel/trace/ftrace.c:2441, loop_counter: 41424 > [ 32.501864] (1 0 1 0 0) > [ 32.502041] (1 1 2 0 0) > [ 50.213914] ../kernel/trace/ftrace.c:2441, loop_counter: 41424 > [ 50.219736] (2 1 3 0 1066085) > [ 50.220077] (2 2 4 0 1066100) > [ 60.580678] ../kernel/trace/ftrace.c:2441, loop_counter: 41424 > [ 60.758019] ../kernel/trace/ftrace.c:2441, loop_counter: 41424 > [ 60.910501] ../kernel/trace/ftrace.c:2441, loop_counter: 41424 > [ 60.918354] PASSED > [ 60.919672] Testing dynamic ftrace ops #2: > [ 64.680222] ../kernel/trace/ftrace.c:2441, loop_counter: 41424 > [ 64.843430] ../kernel/trace/ftrace.c:2441, loop_counter: 41424 > [ 81.247068] ../kernel/trace/ftrace.c:2441, loop_counter: 41424 > [ 81.250895] (1 0 1 1033119 0) > [ 81.251186] (1 1 2 1033134 0) > [ 81.343168] (2 1 3 1 3732) > [ 81.344492] (2 2 4 118 3849) > [ 89.837665] ../kernel/trace/ftrace.c:2441, loop_counter: 41424 > [ 89.844371] PASSED > [ 89.844719] Testing ftrace recursion: > [ 90.890373] ../kernel/trace/ftrace.c:2441, loop_counter: 41424 > [ 91.042146] ../kernel/trace/ftrace.c:2441, loop_counter: 41424 > [ 91.048475] PASSED > [ 91.048806] Testing ftrace recursion safe: > [ 92.091174] ../kernel/trace/ftrace.c:2441, loop_counter: 41424 > [ 92.242403] ../kernel/trace/ftrace.c:2441, loop_counter: 41424 > [ 92.249119] PASSED > [ 92.249470] Testing ftrace regs(no arch support): > [ 93.293605] ../kernel/trace/ftrace.c:2441, loop_counter: 41424 > [ 93.444942] ../kernel/trace/ftrace.c:2441, loop_counter: 41424 > [ 93.451738] PASSED > [ 93.452300] Testing tracer nop: PASSED > [ 93.453288] Testing tracer irqsoff: > [ 104.486368] ../kernel/trace/ftrace.c:2441, loop_counter: 41424 > [ 112.918828] ../kernel/trace/ftrace.c:2441, loop_counter: 41424 > [ 112.925809] PASSED > [ 112.926435] Testing tracer function_graph: > [ 123.303248] ../kernel/trace/ftrace.c:2441, loop_counter: 41424 > [ 132.599763] ../kernel/trace/ftrace.c:2441, loop_counter: 41424 > [ 132.607614] PASSED > > In particular, the test_probe3 in trace_selftest_ops() takes > around 20 seconds, or 482 microseconds per loop iteration > in ftrace_replace_code(). > Do you think there is another bug that makes it slower than > expected, or is that a reasonable time that it could take? Well, if you are doing it under qemu and with all the debug options set, I could expect it to take that long. -- Steve