Received: by 10.213.65.68 with SMTP id h4csp742318imn; Fri, 6 Apr 2018 08:09:25 -0700 (PDT) X-Google-Smtp-Source: AIpwx4/JRCNz5HDbMILjUbwJTDm9VXkGvWK5y5nIXwj+5QSfWKizuCypzr0P9QCSwJ5r844Z+W3h X-Received: by 2002:a17:902:7201:: with SMTP id ba1-v6mr20181185plb.0.1523027364978; Fri, 06 Apr 2018 08:09:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523027364; cv=none; d=google.com; s=arc-20160816; b=zmdOOGtz29VBmuzkbEkGEArtMSnTzSBMi8RHwQdpdS/QSPraMtuAxoKvjmfAs0u9WW m8wOZ9ERm0VLWrWWDYd0jQVZHS1YxKqjiskFDlAdi68znDXayVOCGF+Gd/leFf5dsAsT TTTlTc6wwo8PMPGeIzUjInrwxqWmwvW7a9McR0EsqZE9qgkaXK0nGBDeQMOd3XVPA8Sp dRYjEh39wQsXPdQsHdJsO8nkmslpbn7gnfftVdKc2ELk1WpRl3uGt0xjsYDgSYAzOX6r pWSO9n823ecF0Nf+m0xAsfmU5eJjiFBVCyP69O4iMqZIr7jGZgNLlBDKl3aLjIDS822e TS7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:subject:message-id:date:from :mime-version:dkim-signature:arc-authentication-results; bh=8Wko0IUHZUEGsCGHUznXou8DfiMuW3WhtqWykF6+3Cg=; b=chvm6O+7zgySBrBDEHhrDVjaNijA23USdIpPgTDZHu2ymir6Dt0XbUJpQShZp+PDH5 kMqikN1Nf0jgMiFUV7yB9C6t5iERRDpsNalxz9E8W/voPZAa56AuabBPZV1C1nwHiDmm 7P+logZZcHDsJK6/70PtBE7unF8Zfpeeej9iKkN7Lv7gV3Iz+TvJ/R1G6lRA8ci9UKDG ZrQwv1/wJ/yg19f3yFxN+ky0bLttsQU6vpFpcLhe6+Nvnem9USi57xoIe6Iv9DK5+X9c uvIspF9I3MxY+/RMMh8YZ3H2bIo1Bx1aVuAlHfQbdyVd9s6Dp6jGZ+c9vykf8D4qsb8T A+SA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=ZWR2xBWX; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q7si7280199pgf.461.2018.04.06.08.09.10; Fri, 06 Apr 2018 08:09:24 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=ZWR2xBWX; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755224AbeDFPHQ (ORCPT + 99 others); Fri, 6 Apr 2018 11:07:16 -0400 Received: from mail-ua0-f169.google.com ([209.85.217.169]:39821 "EHLO mail-ua0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754793AbeDFPHN (ORCPT ); Fri, 6 Apr 2018 11:07:13 -0400 Received: by mail-ua0-f169.google.com with SMTP id g10so882758ual.6; Fri, 06 Apr 2018 08:07:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=8Wko0IUHZUEGsCGHUznXou8DfiMuW3WhtqWykF6+3Cg=; b=ZWR2xBWXNMWaSuZYM3Y8T3ZG2R81kZT3kY3ZLCogLwI9FC0pZwDUKWuaxy2WzXsTNl zhnT9g6CQooe6RjnO0cprNOeJE4FnAgosA+LJaSPfNyFRyT/LqtOcSYDSMXSiK2/JKjA nsjocA+yfH+ZNDVgrxXAQAEyOwzbIbdDmhLKXt5+EgGykSU5/THGD08E+6WrANEvG+9i xSm1GL9xxUlM+Vm/yeTwH097IuOR4zRwC0e3tXPrnm6KaFPF4Y1regQUEjSGgsn6+li8 tuh99CXNJGfXc10sXZ0XA1mIIxtTRP5QXLQ8Ya6OhIZoxCZxLO0bBZB9+DLNE2CvkHj1 BYbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=8Wko0IUHZUEGsCGHUznXou8DfiMuW3WhtqWykF6+3Cg=; b=f419LDZvoh/ZLh2stLbZg3jRuZNGEXf0WyZXBdjfjx0P/sofJrcL6GN4tYAN46ocbA nIFjWo1sfBBvzr0ATRthZVt4bXuYK4pMr1uH0jolCQKCNjCNOTxOhRSz+tt/39+3G0Tb xPkbGpJlM/lEA0ISK48cBha/z7ZWPVCzv02K/v8dvpYaLRBZzzvHU/XbQUsPS59wmVAQ lCHoqt0VA/d6fbhCHY9KA5ZFFplQehS9fc2fsRt6M/FdKA9eQrxMYLlOmisl14PsZfh5 KJg/HoAIDmPZGzlTP1c+2+TLXLwKY4E1NTiOn+XvaseI+FWgjf1sqyCCL0OipawGyuOu AoZw== X-Gm-Message-State: AElRT7ErXV4tkonCQp6gSIfAoPiZ2rl0/u+gfRR39353+CFXCb2DejwO xKmoM1667vSpTIm0dJw5VY+4pnhluYa1GDdL5SD0kw== X-Received: by 10.176.80.174 with SMTP id c43mr17551391uaa.18.1523027232299; Fri, 06 Apr 2018 08:07:12 -0700 (PDT) MIME-Version: 1.0 Received: by 10.103.222.23 with HTTP; Fri, 6 Apr 2018 08:07:11 -0700 (PDT) From: Pintu Kumar Date: Fri, 6 Apr 2018 20:37:11 +0530 Message-ID: Subject: [HELP] CPU Hard LOCKUP during boot up with HPET clock source To: open list , linux-pm@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, First the few details: Kernel: 4.9.20 Machine: x86_64 (AMD) Model: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz Cores: 8 Available clock source: # cat /sys/devices/system/clocksource/clocksource0/available_clocksource tsc hpet acpi_pm Problem: [ 28.027409] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1dModules linked in:c [ 28.136317] RIP: 0010:[] c [] read_hpet+0xb3/0x120 [...] ------------------ This lockup happens during boot when the cpu is stuck for about ~28 seconds. This is because of our internal code changes. During our init function we are running some calibrate loops 10,000,000 (10MHz) times twice. The LOCKUP is coming because of this loop. But, we observed that the main issue is the clock source that is available at that time. At the time this loop is executed, the available clock source is HPET (not TSC). With HPET the loop runs slower. It takes almost 28 seconds to complete with HPET clock source. Hence the boot time also increase by 28 seconds. Where as with TSC the loop completes in less than 4 seconds. So, with TSC we dont get the LOCKUP. Thus, the lockup is happening only because the loop executes with HPET clock source. To fix the problem, I tried the following approach: 1) Use late_initcall for our driver init to delay the call until TSC clock source is ready. => With this there is no LOCKUP trace and no impact on boot time. This is because the loop executes with TSC. 2) We have 2 loops. So I split the local_irq_save/restore part for each loops separately. => With this also there is no backtrace seen. => But boot time is increased. 3) I used delayed_workqueue to delay the execution of the loop by 5 seconds, until TSC is ready. => With this there is no back trace and also boot time is normal. => But if we disable TSC then we still get the back trace. 4) Disabled HPET from kernel command line using : hpet=disable => This also works as the loop executes with the next available clock source: acpi_pm => But changing boot args is not recommended in our case. 5) Disable HPET related configs in kernel => CONFIG_HPET=n => CONFIG_HPET_TIMER=n => This method does not work as we were not able to disable HPET_TIMER on x86_64. 6) Use hpet_disable() from our code. => This method also does not work. It actually does not disable HPET clock source. ----------------------------- Thus we wanted to know your opinion which is the right solution to fix this lockup during boot time. Is there a way to purposefully fallback to next available clock source (acpi_pm) instead of hpet, from the source code, before executing our loop ? Please let me know if there are alternate options. Thanks, Pintu