Received: by 2002:a05:7412:3b8b:b0:fc:a2b0:25d7 with SMTP id nd11csp1517243rdb; Sat, 10 Feb 2024 05:33:53 -0800 (PST) X-Google-Smtp-Source: AGHT+IEE8Mloe7p/RK1SAeHYM9HFSrSfO7kYnO3lrGgLmj6SFFRgoo+tTmta7N+6AEtP2HwUbqpD X-Received: by 2002:a17:906:6c89:b0:a3c:5766:58f0 with SMTP id s9-20020a1709066c8900b00a3c576658f0mr77318ejr.4.1707572033055; Sat, 10 Feb 2024 05:33:53 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707572033; cv=pass; d=google.com; s=arc-20160816; b=fkhsS0RbQe30bWpx7SIchekeSJeh+l/k7ROkh+ekYKg2TzKEM5RRrmDCKNa8fiPB0U tZe2kCJWUbhmyw8Sp0DbtvRjdRWrtbfFu3DM4qqqm5jT5ryEl7fJ9J3XfZJQk/7ed3MN xLcQSSMMUlz+Wl5Y/9Nfu/Kdz1gwUKFdqfTVYPKx0R7zmKlTTIV1P5ufjXcEYl8VhJka RPpFVvuMg78pucEaifaJWhl6R705sff6y94OL9Xqr1fId8hiIenUR/CrbI06Nzl9P2Pr iuC3hnbbzbSw94y3ZYnqvbFfLzfa/cKHtjcZIYJf4rKkRE+JVPWcJ1XF1Zi31CnsxsQn qiAw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:list-unsubscribe:list-subscribe:list-id :precedence:date:message-id:dkim-signature; bh=Ftf8lo58gnF+caNhtpPCaf5NWmRTU4dk/LuOdGoR/C8=; fh=P/Z4eyEciuDl6Ebg294dr+8GeK5nLeoHJObLWb2DXFk=; b=baTYGIpzKUqhIM0/4DpngYrCq/6IvKFC17J/TMBALJoPpmlrUAc9NoNZeDC8jLH/SM C2s6OQ7DyRGt9r+mgVUiy5UrUmjd0WcNe2B3zAjYdiUIpjCtja6EmPmqZiBrNvIH4M9G 5Ea/jFKmr1DRIqk8COMnDkWDRzlPJbNk0AfDmiH9sDMhlu/8quTgSUcCD1bkrXaxzHif sFeK+WQpZH7MLcqHa+3zxe7OdAF9NUi4LcGSEKsalEFyKI1z2M3x8u7KH/qRmFwDydUi ZiHXH+MaXI4TyftxKQHb51HcH0UuRjY/onSKkTHxZGAR9s4d0/bKxGIimJHw0ngnlfBz TtzA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linux.alibaba.com header.s=default header.b=frXbHtSq; arc=pass (i=1 spf=pass spfdomain=linux.alibaba.com dkim=pass dkdomain=linux.alibaba.com dmarc=pass fromdomain=linux.alibaba.com); spf=pass (google.com: domain of linux-kernel+bounces-60411-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-60411-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.alibaba.com X-Forwarded-Encrypted: i=2; AJvYcCU+iXRT3lbyN71/YpMnQKL1V+0FgdSgm/ddH/UttyEQQ0q7gAaSMMvvu470erPR8rlWCm5ai1ceFdx3zyVUWfezp869SJxM9jCDQJ5pyw== Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id v20-20020a170906339400b00a37845ce4c2si1968917eja.371.2024.02.10.05.33.53 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 10 Feb 2024 05:33:53 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-60411-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.alibaba.com header.s=default header.b=frXbHtSq; arc=pass (i=1 spf=pass spfdomain=linux.alibaba.com dkim=pass dkdomain=linux.alibaba.com dmarc=pass fromdomain=linux.alibaba.com); spf=pass (google.com: domain of linux-kernel+bounces-60411-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-60411-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.alibaba.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id A44151F22121 for ; Sat, 10 Feb 2024 13:33:52 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6B93853810; Sat, 10 Feb 2024 13:33:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="frXbHtSq" Received: from out30-131.freemail.mail.aliyun.com (out30-131.freemail.mail.aliyun.com [115.124.30.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4FE4C2114 for ; Sat, 10 Feb 2024 13:33:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.131 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707572023; cv=none; b=NBauBrEAOvF/shI8wp0exud0oZDYy+QL/kyhIJF7D7HBqK+XLO4Gws8Ar3JGDhAdCIE84HTR9QqJNmr1pVcuqtqS43z4HiHfE1lvxhgfoCzQyuQzVWLm/SLg5csXuspJAwDBwa2lb8igwfdyeiEZdUROHHYU9lFjbPbv5zDpLGw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707572023; c=relaxed/simple; bh=S3qpEyi6iEpk0KiykEAHNzfsDGwEH3Ja+8u230iWX7k=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=i8j5ePyV6D9+Ue+Ao9VBKlVXlCXR80gy5tElHzaRpkjWa4s8j3lEIZyFXUirEDATpVDnVvOIusnI8P2y0Q9WjIiecxrdY+VA0h0BUs+xqd8EQ+GbiLmdi5F46s8o4AOSxk3qPoyZGO8FjGx9xNOc6CE08vZlYbH5LyfFVOzAn4c= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=frXbHtSq; arc=none smtp.client-ip=115.124.30.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1707572011; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=Ftf8lo58gnF+caNhtpPCaf5NWmRTU4dk/LuOdGoR/C8=; b=frXbHtSq0QaECRcPQ1R8kjP44PPl9Wz7AszziS1o6bOaQ0vyuIOqnz7NGosMS5dGNCFLsOlYhqboP+09FUWiLBE0hNN5imGaLqk+HfsbbQC9qDwjVYOMBQLohVNT5OjCXcab2MAGWF0MTin1YimXIHWhaxj7qOTUY67KyQLyiCw= X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R141e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046051;MF=liusong@linux.alibaba.com;NM=1;PH=DS;RN=6;SR=0;TI=SMTPD_---0W0O8QKA_1707571975; Received: from 30.120.175.242(mailfrom:liusong@linux.alibaba.com fp:SMTPD_---0W0O8QKA_1707571975) by smtp.aliyun-inc.com; Sat, 10 Feb 2024 21:33:31 +0800 Message-ID: <470e9795-6269-407b-801a-5cffb95988f0@linux.alibaba.com> Date: Sat, 10 Feb 2024 21:33:30 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCHv6 2/2] watchdog/softlockup: report the most frequent interrupts To: Bitao Hu , dianders@chromium.org, akpm@linux-foundation.org, pmladek@suse.com, kernelfans@gmail.com Cc: linux-kernel@vger.kernel.org References: <20240208125426.70511-1-yaoma@linux.alibaba.com> <20240208125426.70511-3-yaoma@linux.alibaba.com> From: Liu Song In-Reply-To: <20240208125426.70511-3-yaoma@linux.alibaba.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Looks good! Reviewed-by: Liu Song 在 2024/2/8 20:54, Bitao Hu 写道: > When the watchdog determines that the current soft lockup is due > to an interrupt storm based on CPU utilization, reporting the > most frequent interrupts could be good enough for further > troubleshooting. > > Below is an example of interrupt storm. The call tree does not > provide useful information, but we can analyze which interrupt > caused the soft lockup by comparing the counts of interrupts. > > [ 2987.488075] watchdog: BUG: soft lockup - CPU#9 stuck for 23s! [kworker/9:1:214] > [ 2987.488607] CPU#9 Utilization every 4s during lockup: > [ 2987.488941] #1: 0% system, 0% softirq, 100% hardirq, 0% idle > [ 2987.489357] #2: 0% system, 0% softirq, 100% hardirq, 0% idle > [ 2987.489771] #3: 0% system, 0% softirq, 100% hardirq, 0% idle > [ 2987.490186] #4: 0% system, 0% softirq, 100% hardirq, 0% idle > [ 2987.490601] #5: 0% system, 0% softirq, 100% hardirq, 0% idle > [ 2987.491034] CPU#9 Detect HardIRQ Time exceeds 50%. Most frequent HardIRQs: > [ 2987.491493] #1: 330985 irq#7(IPI) > [ 2987.491743] #2: 5000 irq#10(arch_timer) > [ 2987.492039] #3: 9 irq#91(nvme0q2) > [ 2987.492318] #4: 3 irq#118(virtio1-output.12) > ... > [ 2987.492728] Call trace: > [ 2987.492729] __do_softirq+0xa8/0x364 > > Signed-off-by: Bitao Hu > --- > kernel/watchdog.c | 153 ++++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 149 insertions(+), 4 deletions(-) > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > index 380b60074f1d..e9e98ce5ff40 100644 > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -12,22 +12,25 @@ > > #define pr_fmt(fmt) "watchdog: " fmt > > -#include > #include > -#include > #include > +#include > +#include > #include > +#include > #include > +#include > #include > +#include > +#include > #include > #include > + > #include > #include > #include > -#include > > #include > -#include > > static DEFINE_MUTEX(watchdog_mutex); > > @@ -412,13 +415,142 @@ static void print_cpustat(void) > } > } > > +#define HARDIRQ_PERCENT_THRESH 50 > +#define NUM_HARDIRQ_REPORT 5 > +static DEFINE_PER_CPU(u32 *, hardirq_counts); > +static DEFINE_PER_CPU(int, actual_nr_irqs); > +struct irq_counts { > + int irq; > + u32 counts; > +}; > + > +/* Tabulate the most frequent interrupts. */ > +static void tabulate_irq_count(struct irq_counts *irq_counts, int irq, u32 counts, int rank) > +{ > + int i; > + struct irq_counts new_count = {irq, counts}; > + > + for (i = 0; i < rank; i++) { > + if (counts > irq_counts[i].counts) > + swap(new_count, irq_counts[i]); > + } > +} > + > +/* > + * If the hardirq time exceeds HARDIRQ_PERCENT_THRESH% of the sample_period, > + * then the cause of softlockup might be interrupt storm. In this case, it > + * would be useful to start interrupt counting. > + */ > +static bool need_counting_irqs(void) > +{ > + u8 util; > + int tail = __this_cpu_read(cpustat_tail); > + > + tail = (tail + NUM_HARDIRQ_REPORT - 1) % NUM_HARDIRQ_REPORT; > + util = __this_cpu_read(cpustat_util[tail][STATS_HARDIRQ]); > + return util > HARDIRQ_PERCENT_THRESH; > +} > + > +static void start_counting_irqs(void) > +{ > + int i; > + int local_nr_irqs; > + struct irq_desc *desc; > + u32 *counts = __this_cpu_read(hardirq_counts); > + > + if (!counts) { > + /* > + * nr_irqs has the potential to grow at runtime. We should read > + * it and store locally to avoid array out-of-bounds access. > + */ > + local_nr_irqs = READ_ONCE(nr_irqs); > + counts = kcalloc(local_nr_irqs, sizeof(u32), GFP_ATOMIC); > + if (!counts) > + return; > + for (i = 0; i < local_nr_irqs; i++) { > + desc = irq_to_desc(i); > + if (!desc) > + continue; > + counts[i] = desc->kstat_irqs ? > + *this_cpu_ptr(desc->kstat_irqs) : 0; > + } > + __this_cpu_write(actual_nr_irqs, local_nr_irqs); > + __this_cpu_write(hardirq_counts, counts); > + } > +} > + > +static void stop_counting_irqs(void) > +{ > + kfree(__this_cpu_read(hardirq_counts)); > + __this_cpu_write(hardirq_counts, NULL); > +} > + > +static void print_irq_counts(void) > +{ > + int i; > + struct irq_desc *desc; > + u32 counts_diff; > + int local_nr_irqs = __this_cpu_read(actual_nr_irqs); > + u32 *counts = __this_cpu_read(hardirq_counts); > + struct irq_counts irq_counts_sorted[NUM_HARDIRQ_REPORT] = { > + {-1, 0}, {-1, 0}, {-1, 0}, {-1, 0}, > + }; > + > + if (counts) { > + for_each_irq_desc(i, desc) { > + /* > + * We need to bounds-check in case someone on a different CPU > + * expanded nr_irqs. > + */ > + if (desc->kstat_irqs) { > + counts_diff = *this_cpu_ptr(desc->kstat_irqs); > + if (i < local_nr_irqs) > + counts_diff -= counts[i]; > + } else { > + counts_diff = 0; > + } > + tabulate_irq_count(irq_counts_sorted, i, counts_diff, > + NUM_HARDIRQ_REPORT); > + } > + /* > + * We do not want the "watchdog: " prefix on every line, > + * hence we use "printk" instead of "pr_crit". > + */ > + printk(KERN_CRIT "CPU#%d Detect HardIRQ Time exceeds %d%%. Most frequent HardIRQs:\n", > + smp_processor_id(), HARDIRQ_PERCENT_THRESH); > + for (i = 0; i < NUM_HARDIRQ_REPORT; i++) { > + if (irq_counts_sorted[i].irq == -1) > + break; > + desc = irq_to_desc(irq_counts_sorted[i].irq); > + if (desc && desc->action) > + printk(KERN_CRIT "\t#%u: %-10u\tirq#%d(%s)\n", > + i + 1, irq_counts_sorted[i].counts, > + irq_counts_sorted[i].irq, desc->action->name); > + else > + printk(KERN_CRIT "\t#%u: %-10u\tirq#%d\n", > + i + 1, irq_counts_sorted[i].counts, > + irq_counts_sorted[i].irq); > + } > + /* > + * If the hardirq time is less than HARDIRQ_PERCENT_THRESH% in the last > + * sample_period, then we suspect the interrupt storm might be subsiding. > + */ > + if (!need_counting_irqs()) > + stop_counting_irqs(); > + } > +} > + > static void report_cpu_status(void) > { > print_cpustat(); > + print_irq_counts(); > } > #else > static inline void update_cpustat(void) { } > static inline void report_cpu_status(void) { } > +static inline bool need_counting_irqs(void) { return false; } > +static inline void start_counting_irqs(void) { } > +static inline void stop_counting_irqs(void) { } > #endif > > /* > @@ -522,6 +654,18 @@ static int is_softlockup(unsigned long touch_ts, > unsigned long now) > { > if ((watchdog_enabled & WATCHDOG_SOFTOCKUP_ENABLED) && watchdog_thresh) { > + /* > + * If period_ts has not been updated during a sample_period, then > + * in the subsequent few sample_periods, period_ts might also not > + * be updated, which could indicate a potential softlockup. In > + * this case, if we suspect the cause of the potential softlockup > + * might be interrupt storm, then we need to count the interrupts > + * to find which interrupt is storming. > + */ > + if (time_after_eq(now, period_ts + get_softlockup_thresh() / 5) && > + need_counting_irqs()) > + start_counting_irqs(); > + > /* Warn about unreasonable delays. */ > if (time_after(now, period_ts + get_softlockup_thresh())) > return now - touch_ts; > @@ -544,6 +688,7 @@ static DEFINE_PER_CPU(struct cpu_stop_work, softlockup_stop_work); > static int softlockup_fn(void *data) > { > update_touch_ts(); > + stop_counting_irqs(); > complete(this_cpu_ptr(&softlockup_completion)); > > return 0;