Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp7672901ybl; Thu, 16 Jan 2020 03:45:27 -0800 (PST) X-Google-Smtp-Source: APXvYqwqL66M+wySjeCSrVVTAWZH8ykmg/Yki9TLflOJWLpI3O6LmbJdMVDBa6CB1/Vs8QgDnoqW X-Received: by 2002:a9d:198b:: with SMTP id k11mr1632484otk.295.1579175127630; Thu, 16 Jan 2020 03:45:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1579175127; cv=none; d=google.com; s=arc-20160816; b=Pj1QETzG80NbnpTEEjJfvHJ76pN80GW3KVeLRqs6RvqOVxKmQGaxvqTAAztcCjGPaO CZrt4WyxCqOlZsS1zeHie3CX20YxX2TJxfEY3SBTjj/KWHLiET+aiUC/Fh2auzJ18alW nMwQxtKCyaR09fjZkiTE+AlrpnA4jkbufhqi7yhrfqnUPzwDpteYNIiyXi3YpIBNbLi3 kZwCmUvIgTXMMMzTxN/CPTCOh4iBLMSw8UAwoAXCamLFRlkyL+RiQ//qqGSR6eRVbbEE QASNmZiAJ+uNrT3H+x2qqwJMhP61sQwzNf3XW4vhts7TMPZu0Laq66LSHXYAR81mFCx2 FE0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from; bh=G8AuUg92UJm9Tnwvgig7/a5W/1nVbeesQOi2tMYj/CU=; b=m6LOyPnbUNNNVB7Z/FfSFqrIQQGwl7W5nLxhxqEG5EtI9ZgmTF1Zj8I/eWl5GmzgfU bORvVTpHqyCxHo9A9XHXJvmAh18HXLL98M6a80IaJ1wqHmJTlUCd2HI6PDKR7+qlSr6Y 5WRgUHLtsKzizMjpf83Wa/jt03fH0dDib9LbVsjJMH5MqPra6C6UMG7i2MjiGHNxSNiE 9pgfA2e23Y5x5huKFBnlXpzo1SIWEtZijlRn1XoEBHTyphlZU66CDBWqxAU6Lo7INGUT S0g+uMlZQzB4zRqZkBV5EzJN3ow0/Y09aa8rNd3NpA4SMInlX4aNRx4/+39RXu7CR7HH temg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g22si11136406otp.55.2020.01.16.03.45.14; Thu, 16 Jan 2020 03:45:27 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726566AbgAPLoS (ORCPT + 99 others); Thu, 16 Jan 2020 06:44:18 -0500 Received: from Galois.linutronix.de ([193.142.43.55]:51383 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725999AbgAPLoS (ORCPT ); Thu, 16 Jan 2020 06:44:18 -0500 Received: from [5.158.153.52] (helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1is3ZI-0005bg-09; Thu, 16 Jan 2020 12:44:08 +0100 Received: by nanos.tec.linutronix.de (Postfix, from userid 1000) id A4864101B66; Thu, 16 Jan 2020 12:44:07 +0100 (CET) From: Thomas Gleixner To: Waiman Long , Andrew Morton , Ingo Molnar Cc: linux-kernel@vger.kernel.org, Mike Rapoport , Kees Cook , Waiman Long , Catalin Marinas , Will Deacon , Robert Richter , Peter Zijlstra Subject: Re: [PATCH v2] watchdog: Fix possible soft lockup warning at bootup In-Reply-To: <87sgkgw3xq.fsf@nanos.tec.linutronix.de> References: <20200103151032.19590-1-longman@redhat.com> <87sgkgw3xq.fsf@nanos.tec.linutronix.de> Date: Thu, 16 Jan 2020 12:44:07 +0100 Message-ID: <87blr3wrqw.fsf@nanos.tec.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Thomas Gleixner writes: Added ARM64 and ThunderX folks > Waiman Long writes: >> By adding some instrumentation code, it was found that for cpu 14, >> watchdog_enable() was called early with a timestamp of 1. That activates >> the watchdog time checking logic. It was also found that the monotonic >> time measured during the smp_init() phase runs much slower than the >> real elapsed time as shown by the below debug printf output: >> >> [ 1.138522] run_queues, watchdog_timer_fn: now = 170000000 >> [ 25.519391] run_queues, watchdog_timer_fn: now = 4170000000 >> >> In this particular case, it took about 24.4s of elapsed time for the >> clock to advance 4s which is the soft expiration time that is required >> to trigger the calling of watchdog_timer_fn(). That clock slowdown >> stopped once the smp_init() call was done and the clock time ran at >> the same rate as the elapsed time afterward. And looking at this with a more awake brain, the root cause is pretty obvious. sched_clock() advances by 24 seconds, but clock MONOTONIC on which the watchdog timer is based does not. As the timestamps you printed have 7 trailing zeros, it's pretty clear that timekeeping is still jiffies based at this point and HZ is set to 100. So while bringing up the non-boot CPUs the boot CPU loses ~2000 timer interrupts. That needs to be fixed and not papered over. Thanks, tglx