Received: by 10.213.65.68 with SMTP id h4csp2268448imn; Mon, 9 Apr 2018 00:02:01 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+9abDS7c9N+g75fdAtxdZeWk9lpWBSvbwcnlxmrtZgRALUM5R2UuU1xk04kpTrjpa2JBu0 X-Received: by 10.99.127.83 with SMTP id p19mr10997263pgn.161.1523257321774; Mon, 09 Apr 2018 00:02:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523257321; cv=none; d=google.com; s=arc-20160816; b=WfEas8AvYt3plJ+xKAuiQM1QgiZUFcBZpL9e2X2SN6bRNIvBTtJH2x+x9B7o5d2Avh 7ixWEbjqRCN14kHQrRve16VhVsjXkTuB3jzMEe1ZQ8kfmuuZkOchbTBY5sLs2abrC1O4 AH85E/ejxaii3GxXeWmouMKvFu0qii39dd9k1zbrjpw180qzUp/ur4ljdUBaRj7v6q69 awNq4/sACAdWSnHfdn6xnWg0KBNhrPgKQGmTvpA4Uv1JFD2qdP2NbSdkq5VODrDUrx6Q dUH3uFqUu6R5PeyJ2Y6XXp2lYRYXVGzc9vRA3OLhexeCtivE9KXG/ZYYSip0vBe5O/+y 2BtQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=UANYPCy+U6SEwhQqe8K0Cm3XzNK9iTGk0VXSw+0qGMc=; b=Fd09dmp4JL+B6kfUyOPIB4scn9+B/+y7A45Y8t7j6SSiKHv25aXAYiQhPMc2pehUqG Dc28bPlQLXUnpDD8/riExj1oRrF5in0vWe5nWGAb8gR3jjr2AmhPc08MwAYQzXtrdSSk LWON1leUS/FVNdbldIKrSsmh7d/BXtTIXN30vJhG/O8vqdzd8Gn8g6L3vWt2AQYGAvuo 8+o/vSHLCrwuhRl0GJcKSGSOn84OBZWXOks979izVatJwgfLJEQgi0ffOoYTySpgn4CL ZaLw7aCo4/k9igzGRi8a8CI4A6x9zXEukYT4Tpq1mw7LN7zNdEzFHg8YQIVHpKCIvo5h vZSA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Mb/qPRpb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u9-v6si3949562plr.399.2018.04.09.00.01.24; Mon, 09 Apr 2018 00:02:01 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Mb/qPRpb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752235AbeDIG4q (ORCPT + 99 others); Mon, 9 Apr 2018 02:56:46 -0400 Received: from mail-vk0-f65.google.com ([209.85.213.65]:37844 "EHLO mail-vk0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750759AbeDIG4o (ORCPT ); Mon, 9 Apr 2018 02:56:44 -0400 Received: by mail-vk0-f65.google.com with SMTP id r19so4110458vkf.4; Sun, 08 Apr 2018 23:56:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=UANYPCy+U6SEwhQqe8K0Cm3XzNK9iTGk0VXSw+0qGMc=; b=Mb/qPRpbp3uK6vLUsVoKMqpRrQ5009BdCQjj+UcV2WMhjv8ZaZFMWMVUuQvUE9dOBQ UwPMM0vCL5pSB+Zgbb+dM2B4vtWqZke/u7+mCW3UwcWUwQuNZ+u/FhBARfBBjQxnVIrg T7p6krtGyF9fwqNK+C5JWqUmtty+CxbK8gFzXU+zQ/pohNy5S+ImsClhK5zTGlrpQilY bM3mgMVllrIFoSD6AgZRPUbF0e98vxQ+nUeZcg7tSSD8P9+A72GzRHmhFXiaDmRoqx21 FffDQbLes8ae1/eojZ/onmhYGSkQoQc8ouzFniqpBWPoHqX8bAWHEVt5s98jcYKz/Svr RIXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=UANYPCy+U6SEwhQqe8K0Cm3XzNK9iTGk0VXSw+0qGMc=; b=TboVFZZ0VF1GBIa7hXiAsIiw76u1vvX/eaOr5tN/aReAU3tQ6h4xH0G2Hj6XbXSdRw adV8F8EIP3lc18bqh3+08QyrHkU15FZO5JuTMHazV1gYWD5jR6jrKQRVXzwnd/JxIJGM +R/klp98zV61p+dR2Rtwj3AHJm1SeqEl2cR8dvi0Mrmxk5gFLsGHtDognTNqOuuP0iMl iMeAVmcqzgABVCgeQtDy9B2udPakmWCnpPKLhDwTiTDsLEvEzIzE3O4Ylwr9Ig08ISN8 0abg2VK6JNc136Djs/jKq8zYyh1QFPQJB69caehv27izcqb6aXsZKGmEAF1f7WzVuexD wocg== X-Gm-Message-State: ALQs6tBVcCvQjtK/Yw+egbO2WfPsKGzTMGmLrpGxTafbbCKnQs2GN0hs DbCMaPoprTAYU7L/GoEJOivN+eF/0IKM/ybsR9w= X-Received: by 10.31.32.141 with SMTP id g135mr22542928vkg.25.1523257003370; Sun, 08 Apr 2018 23:56:43 -0700 (PDT) MIME-Version: 1.0 Received: by 10.103.222.23 with HTTP; Sun, 8 Apr 2018 23:56:42 -0700 (PDT) In-Reply-To: References: From: Pintu Kumar Date: Mon, 9 Apr 2018 12:26:42 +0530 Message-ID: Subject: Re: [HELP] CPU Hard LOCKUP during boot up with HPET clock source To: open list , linux-pm@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, As a simple query, Is there a way to skip current available clock source (hpet) and allow to pick the next one ? I guess this will solve our purpose. Thanks, Pintu On Fri, Apr 6, 2018 at 8:37 PM, Pintu Kumar wrote: > Hi, > > First the few details: > Kernel: 4.9.20 > Machine: x86_64 (AMD) > Model: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz > Cores: 8 > Available clock source: > # cat /sys/devices/system/clocksource/clocksource0/available_clocksource > tsc hpet acpi_pm > > Problem: > [ 28.027409] NMI watchdog: Watchdog detected hard LOCKUP on cpu > 1dModules linked in:c > [ 28.136317] RIP: 0010:[] c [] > read_hpet+0xb3/0x120 > [...] > > ------------------ > This lockup happens during boot when the cpu is stuck for about ~28 seconds. > This is because of our internal code changes. > During our init function we are running some calibrate loops > 10,000,000 (10MHz) times twice. > The LOCKUP is coming because of this loop. > > But, we observed that the main issue is the clock source that is > available at that time. > At the time this loop is executed, the available clock source is HPET (not TSC). > With HPET the loop runs slower. It takes almost 28 seconds to complete > with HPET clock source. Hence the boot time also increase by 28 > seconds. > Where as with TSC the loop completes in less than 4 seconds. So, with > TSC we dont get the LOCKUP. > > Thus, the lockup is happening only because the loop executes with HPET > clock source. > > To fix the problem, I tried the following approach: > 1) Use late_initcall for our driver init to delay the call until TSC > clock source is ready. > => With this there is no LOCKUP trace and no impact on boot time. > This is because the loop executes with TSC. > > 2) We have 2 loops. So I split the local_irq_save/restore part for > each loops separately. > => With this also there is no backtrace seen. > => But boot time is increased. > > 3) I used delayed_workqueue to delay the execution of the loop by 5 > seconds, until TSC is ready. > => With this there is no back trace and also boot time is normal. > => But if we disable TSC then we still get the back trace. > > 4) Disabled HPET from kernel command line using : hpet=disable > => This also works as the loop executes with the next available > clock source: acpi_pm > => But changing boot args is not recommended in our case. > > 5) Disable HPET related configs in kernel > => CONFIG_HPET=n > => CONFIG_HPET_TIMER=n > => This method does not work as we were not able to disable > HPET_TIMER on x86_64. > > 6) Use hpet_disable() from our code. > => This method also does not work. It actually does not disable > HPET clock source. > > > ----------------------------- > Thus we wanted to know your opinion which is the right solution to fix > this lockup during boot time. > > Is there a way to purposefully fallback to next available clock source > (acpi_pm) instead of hpet, from the source code, before executing our > loop ? > > > Please let me know if there are alternate options. > > > > Thanks, > Pintu