Received: by 2002:a05:7412:3b8b:b0:fc:a2b0:25d7 with SMTP id nd11csp1516977rdb; Sat, 10 Feb 2024 05:33:15 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCVynJsB/zC6TLsWoR9SUMLwdkTchg9s3Kwbyjjyw8XjV2TONcTE4wSUcnrpvpknthNszEFfPY+MLBwz/rOGvXv1dLhgfALKhDi0WW4mlQ== X-Google-Smtp-Source: AGHT+IGXc8dbhnUqAIHgDwlLLea5Zz0bRvcivEkzrQ2Gf6ozE0wnqCoMKjFtJvvb58LdZEl3zi5V X-Received: by 2002:a05:620a:26a4:b0:783:e30b:3759 with SMTP id c36-20020a05620a26a400b00783e30b3759mr2217774qkp.35.1707571994889; Sat, 10 Feb 2024 05:33:14 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707571994; cv=pass; d=google.com; s=arc-20160816; b=t3VdnNrPRuoKMRDENz97FRDHpIyDYb6FJKkYqbMgUSEo1pgKnTEg5/yF83zJfVaN1H V+WDjhLDvGBgO8lxuEQ991zwxzfkpnNulNjr46SWZW9U+m67KaNP3Xkd4xLnqWJydBUS Gbqtv8HKTsgy7nqqPxz2apDXnkAyWybc9XnF9aH8w0sffAwdB8tQkaA1wbQQB/Yz0/Rq YG3H0jlbFvYQEGYxCBqtd2HRxImk9e3MvkEbMfhffFuM1y4zeOr0O10Y+0zY0Kd35rcJ 32uVr8omBXkNwfUItYcQuq1qkn1k4PemvKkgoPDhqIAPtYyoq7UKThgC1J78IEoa1tYI LEPA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:list-unsubscribe:list-subscribe:list-id :precedence:date:message-id:dkim-signature; bh=99XXjLhf4MoYAImcI2IsHO8wd8GAAUQCn+cwMyrwLmM=; fh=sDMZ6JAh4pNGyZEIi6z0+ySnGMAFSXHVXXN4M45TeYU=; b=L8YdJPMZna0uncOpERu7W1SuZCkCFUdY7rihfqM9JAIQD3To1NVplGsguNKWkj2yKu eQuuWslXdgROkHPevRuTZnSphi/w68VuSsjAUQOig5/mW9AWfrXMYYy/OMQBLix06m3V eQXsoMB/14dwL5y9RGzuLCRvIOBPoar8QyBepkzh3Y9dlYVQ1fvcYUffLWjv07qtQzG9 ZkakWk65HYq17g+QTU4W0IxEzEHHVakOIVBE/ZgO02+dFe08nj/wdp/3iUfigTt4GcxX 80K59gS7i8/wf6gUyNBLIOScydFyqkwdtorKBTJUUBwUTjbev58uvJP7QuAaug7q/VIU 7BYw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linux.alibaba.com header.s=default header.b=iAcpXY9C; arc=pass (i=1 spf=pass spfdomain=linux.alibaba.com dkim=pass dkdomain=linux.alibaba.com dmarc=pass fromdomain=linux.alibaba.com); spf=pass (google.com: domain of linux-kernel+bounces-60410-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-60410-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.alibaba.com X-Forwarded-Encrypted: i=2; AJvYcCWhbj/MDsNL3a2mfQx608zzIF4+SXg3hMHCSZNwVZHDgW9mFKxpn9dAUFK3rGJosGMFmw9KsRikqchBopvATrDvgd/2qfSLYxtKCDl1hA== Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id q29-20020a05620a039d00b00783f40e703fsi1731851qkm.450.2024.02.10.05.33.14 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 10 Feb 2024 05:33:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-60410-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.alibaba.com header.s=default header.b=iAcpXY9C; arc=pass (i=1 spf=pass spfdomain=linux.alibaba.com dkim=pass dkdomain=linux.alibaba.com dmarc=pass fromdomain=linux.alibaba.com); spf=pass (google.com: domain of linux-kernel+bounces-60410-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-60410-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.alibaba.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 946711C21313 for ; Sat, 10 Feb 2024 13:33:14 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 73A0053810; Sat, 10 Feb 2024 13:33:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="iAcpXY9C" Received: from out30-99.freemail.mail.aliyun.com (out30-99.freemail.mail.aliyun.com [115.124.30.99]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9E7C31B971 for ; Sat, 10 Feb 2024 13:33:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.99 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707571987; cv=none; b=B2/HuqnDnJGonz0VbRqWpakueFZo4np245PH9kS0+fCKwwzxiGq0SJOBoVjdKIodKQW0YnMGtU4f3gBpS/zTTJTgxcNdp+ZuYB2Rx1SCP3ODVVHbqibmngRwcM/7pYVEuoXTZEzXN4sVzj201rQoJOPFARyd91ywNu2cBNrSgiM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707571987; c=relaxed/simple; bh=2sZ9E0/xTDqLlaH/qHjRSjD4cdH1/44z5u1OyOr6WMY=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=Z7QhhN04PFUf056tfWruCka7bhotozzEUb/8uWgtFukcVm713QXVY56AjP05+xBTcqveoUt0bYiQOS7mCkiWdJ0G9oYyDnnivVz6bwpqFO6UArf5gmt9dZ2mgX0LKDkm2mP/WmhbT4uwCLNmhOzdjvXkiZE8IqLJdUnbtGVUpFA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=iAcpXY9C; arc=none smtp.client-ip=115.124.30.99 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1707571975; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=99XXjLhf4MoYAImcI2IsHO8wd8GAAUQCn+cwMyrwLmM=; b=iAcpXY9C+JaJkgbbPZKSDzqSOpKBpR/FmwQNsFpLXqgI+e8QkJvABOoPRzVF+yuaFtbIXhChC8DdX91Gh+oVZRxEeq8KRxx5oDE9ISWwJKsm6AKy+4e1tDI/hHDIOJDkQH2x6JVI0TZIqne9itKqcJZ3xI4LdZ5hWurA7haSWx4= X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R171e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046056;MF=liusong@linux.alibaba.com;NM=1;PH=DS;RN=6;SR=0;TI=SMTPD_---0W0O8QJX_1707571973; Received: from 30.120.175.242(mailfrom:liusong@linux.alibaba.com fp:SMTPD_---0W0O8QJX_1707571973) by smtp.aliyun-inc.com; Sat, 10 Feb 2024 21:32:54 +0800 Message-ID: Date: Sat, 10 Feb 2024 21:32:52 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCHv6 1/2] watchdog/softlockup: low-overhead detection of interrupt To: Bitao Hu , dianders@chromium.org, akpm@linux-foundation.org, pmladek@suse.com, kernelfans@gmail.com Cc: linux-kernel@vger.kernel.org References: <20240208125426.70511-1-yaoma@linux.alibaba.com> <20240208125426.70511-2-yaoma@linux.alibaba.com> From: Liu Song In-Reply-To: <20240208125426.70511-2-yaoma@linux.alibaba.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Looks good! Reviewed-by: Liu Song 在 2024/2/8 20:54, Bitao Hu 写道: > The following softlockup is caused by interrupt storm, but it cannot be > identified from the call tree. Because the call tree is just a snapshot > and doesn't fully capture the behavior of the CPU during the soft lockup. > watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [fio:83921] > ... > Call trace: > __do_softirq+0xa0/0x37c > __irq_exit_rcu+0x108/0x140 > irq_exit+0x14/0x20 > __handle_domain_irq+0x84/0xe0 > gic_handle_irq+0x80/0x108 > el0_irq_naked+0x50/0x58 > > Therefore,I think it is necessary to report CPU utilization during the > softlockup_thresh period (report once every sample_period, for a total > of 5 reportings), like this: > watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [fio:83921] > CPU#28 Utilization every 4s during lockup: > #1: 0% system, 0% softirq, 100% hardirq, 0% idle > #2: 0% system, 0% softirq, 100% hardirq, 0% idle > #3: 0% system, 0% softirq, 100% hardirq, 0% idle > #4: 0% system, 0% softirq, 100% hardirq, 0% idle > #5: 0% system, 0% softirq, 100% hardirq, 0% idle > ... > > This would be helpful in determining whether an interrupt storm has > occurred or in identifying the cause of the softlockup. The criteria for > determination are as follows: > a. If the hardirq utilization is high, then interrupt storm should be > considered and the root cause cannot be determined from the call tree. > b. If the softirq utilization is high, then we could analyze the call > tree but it may cannot reflect the root cause. > c. If the system utilization is high, then we could analyze the root > cause from the call tree. > > The mechanism requires a considerable amount of global storage space > when configured for the maximum number of CPUs. Therefore, adding a > SOFTLOCKUP_DETECTOR_INTR_STORM Kconfig knob that defaults to "yes" > if the max number of CPUs is <= 128. > > Signed-off-by: Bitao Hu > --- > kernel/watchdog.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++ > lib/Kconfig.debug | 13 +++++++ > 2 files changed, 104 insertions(+) > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > index 81a8862295d6..380b60074f1d 100644 > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -16,6 +16,8 @@ > #include > #include > #include > +#include > +#include > #include > #include > #include > @@ -333,6 +335,92 @@ __setup("watchdog_thresh=", watchdog_thresh_setup); > > static void __lockup_detector_cleanup(void); > > +#ifdef CONFIG_SOFTLOCKUP_DETECTOR_INTR_STORM > +#define NUM_STATS_GROUPS 5 > +enum stats_per_group { > + STATS_SYSTEM, > + STATS_SOFTIRQ, > + STATS_HARDIRQ, > + STATS_IDLE, > + NUM_STATS_PER_GROUP, > +}; > + > +static const enum cpu_usage_stat tracked_stats[NUM_STATS_PER_GROUP] = { > + CPUTIME_SYSTEM, > + CPUTIME_SOFTIRQ, > + CPUTIME_IRQ, > + CPUTIME_IDLE, > +}; > + > +static DEFINE_PER_CPU(u16, cpustat_old[NUM_STATS_PER_GROUP]); > +static DEFINE_PER_CPU(u8, cpustat_util[NUM_STATS_GROUPS][NUM_STATS_PER_GROUP]); > +static DEFINE_PER_CPU(u8, cpustat_tail); > + > +/* > + * We don't need nanosecond resolution. A granularity of 16ms is > + * sufficient for our precision, allowing us to use u16 to store > + * cpustats, which will roll over roughly every ~1000 seconds. > + * 2^24 ~= 16 * 10^6 > + */ > +static u16 get_16bit_precision(u64 data_ns) > +{ > + return data_ns >> 24LL; /* 2^24ns ~= 16.8ms */ > +} > + > +static void update_cpustat(void) > +{ > + int i; > + u8 util; > + u16 old_stat, new_stat; > + struct kernel_cpustat kcpustat; > + u64 *cpustat = kcpustat.cpustat; > + u8 tail = __this_cpu_read(cpustat_tail); > + u16 sample_period_16 = get_16bit_precision(sample_period); > + > + kcpustat_cpu_fetch(&kcpustat, smp_processor_id()); > + for (i = 0; i < NUM_STATS_PER_GROUP; i++) { > + old_stat = __this_cpu_read(cpustat_old[i]); > + new_stat = get_16bit_precision(cpustat[tracked_stats[i]]); > + util = DIV_ROUND_UP(100 * (new_stat - old_stat), sample_period_16); > + __this_cpu_write(cpustat_util[tail][i], util); > + __this_cpu_write(cpustat_old[i], new_stat); > + } > + __this_cpu_write(cpustat_tail, (tail + 1) % NUM_STATS_GROUPS); > +} > + > +static void print_cpustat(void) > +{ > + int i, group; > + u8 tail = __this_cpu_read(cpustat_tail); > + u64 sample_period_second = sample_period; > + > + do_div(sample_period_second, NSEC_PER_SEC); > + /* > + * We do not want the "watchdog: " prefix on every line, > + * hence we use "printk" instead of "pr_crit". > + */ > + printk(KERN_CRIT "CPU#%d Utilization every %llus during lockup:\n", > + smp_processor_id(), sample_period_second); > + for (i = 0; i < NUM_STATS_GROUPS; i++) { > + group = (tail + i) % NUM_STATS_GROUPS; > + printk(KERN_CRIT "\t#%d: %3u%% system,\t%3u%% softirq,\t" > + "%3u%% hardirq,\t%3u%% idle\n", i + 1, > + __this_cpu_read(cpustat_util[group][STATS_SYSTEM]), > + __this_cpu_read(cpustat_util[group][STATS_SOFTIRQ]), > + __this_cpu_read(cpustat_util[group][STATS_HARDIRQ]), > + __this_cpu_read(cpustat_util[group][STATS_IDLE])); > + } > +} > + > +static void report_cpu_status(void) > +{ > + print_cpustat(); > +} > +#else > +static inline void update_cpustat(void) { } > +static inline void report_cpu_status(void) { } > +#endif > + > /* > * Hard-lockup warnings should be triggered after just a few seconds. Soft- > * lockups can have false positives under extreme conditions. So we generally > @@ -504,6 +592,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) > */ > period_ts = READ_ONCE(*this_cpu_ptr(&watchdog_report_ts)); > > + update_cpustat(); > + > /* Reset the interval when touched by known problematic code. */ > if (period_ts == SOFTLOCKUP_DELAY_REPORT) { > if (unlikely(__this_cpu_read(softlockup_touch_sync))) { > @@ -539,6 +629,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) > pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n", > smp_processor_id(), duration, > current->comm, task_pid_nr(current)); > + report_cpu_status(); > print_modules(); > print_irqtrace_events(current); > if (regs) > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > index 975a07f9f1cc..49f652674bd8 100644 > --- a/lib/Kconfig.debug > +++ b/lib/Kconfig.debug > @@ -1029,6 +1029,19 @@ config SOFTLOCKUP_DETECTOR > chance to run. The current stack trace is displayed upon > detection and the system will stay locked up. > > +config SOFTLOCKUP_DETECTOR_INTR_STORM > + bool "Detect Interrupt Storm in Soft Lockups" > + depends on SOFTLOCKUP_DETECTOR && IRQ_TIME_ACCOUNTING > + default y if NR_CPUS <= 128 > + help > + Say Y here to enable the kernel to detect interrupt storm > + during "soft lockups". > + > + "soft lockups" can be caused by a variety of reasons. If one is > + caused by an interrupt storm, then the storming interrupts will not > + be on the callstack. To detect this case, it is necessary to report > + the CPU stats and the interrupt counts during the "soft lockups". > + > config BOOTPARAM_SOFTLOCKUP_PANIC > bool "Panic (Reboot) On Soft Lockups" > depends on SOFTLOCKUP_DETECTOR