Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp6548286rwd; Mon, 5 Jun 2023 21:10:57 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4VEkDRmUIgEYnaXE/Z1PEnivjsccbuIqRBrlADyyfjfgXon70AJjopcvIt2c9Mw+5GjYBq X-Received: by 2002:ac8:5e08:0:b0:3f6:a3d3:a504 with SMTP id h8-20020ac85e08000000b003f6a3d3a504mr746437qtx.47.1686024657434; Mon, 05 Jun 2023 21:10:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686024657; cv=none; d=google.com; s=arc-20160816; b=nxiS5bN7HM3XJZS9ciFzUJwdZRShXavfFcPGlUJ/m1P13I9/XAZuqCGfSUOUYwTQz+ ZIP/BIYYd9KbeFRhUDxoEQ2HTnX64q7cyk/meEmJlGIklnczpBgPfjj0ZWRHLv7lm5xv wmAbpKJ3kJ9s4eWOGgTNAwgPuzxtwfEUypCRu1P9mcXpt6D1N4OqcM5COJE98vhE776P dde4CCOXvsfkqEzvghUjWdWFBPWpYmlMyDvEsJtPX0Tznrwp0l+nmuOqZoUF3eWkK6w2 HTwLA52LbRgNgYyEvA9ExuRMmsRtFwCU15780b8eoZHQ/IYj0h56qU2k6sGmF6QNbel4 0K2Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Zl0xSYFVXihxlFwU2GI1A7DmXGMBSodJ+vcgxYeTY0Q=; b=PIdj0iDNaZlokkL10zT6puqE7Z7NrIR3x7zq5j/IDXmkPLzZi4/nEthkgZAXdqSG1t 1W5JS5Wew1YnFQcqc95osyTPepA5dJ8dV43WoTqRiNA7WU+rI2JL3ZnljAvxzA53Ied9 DS55Eaw6TtmYb9caI8yzQIR0bIsz1umClTbeacUuM35WHWWaPVoPIfx4yIxBDf91D9nA PPqire0nbF14LR6My+en4yHgEY67cMLsoqiL+Ck42J8NzMeHgLenpRgcdCUQyXTPiuj3 kKQM1ZC/KUzIaz5WCFaJrG8e6y5MD7wf0Dg9dlNuC2euY3D80JWJ9CRA9P1I0IqkOE73 HowQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=c6TZBnZm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o5-20020ac87c45000000b003f53ff4be0bsi5779664qtv.46.2023.06.05.21.10.42; Mon, 05 Jun 2023 21:10:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=c6TZBnZm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233516AbjFFDyz (ORCPT + 99 others); Mon, 5 Jun 2023 23:54:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49672 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232326AbjFFDyw (ORCPT ); Mon, 5 Jun 2023 23:54:52 -0400 Received: from mail-pg1-x533.google.com (mail-pg1-x533.google.com [IPv6:2607:f8b0:4864:20::533]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D45B7187 for ; Mon, 5 Jun 2023 20:54:50 -0700 (PDT) Received: by mail-pg1-x533.google.com with SMTP id 41be03b00d2f7-53fbb3a013dso5204862a12.1 for ; Mon, 05 Jun 2023 20:54:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1686023690; x=1688615690; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Zl0xSYFVXihxlFwU2GI1A7DmXGMBSodJ+vcgxYeTY0Q=; b=c6TZBnZm4tjQlimIKuxQRIpQb8QaOyqNIC3Ybs3R1I6QmAUHBdQ6VrOpRmH5sJzlHl yNmYpXjlJeKbgoxlirqPmoAAXwNKYE025hH6CtoReVgEIf39AnRgPxBYn8s75U2Dz/3p LMG/pkrnhM1SXB7U+oq9nukDzv5jxXZm9PchQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686023690; x=1688615690; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Zl0xSYFVXihxlFwU2GI1A7DmXGMBSodJ+vcgxYeTY0Q=; b=MvCy50558dFDtPNZ7T4NgwaLmkr4TBASExYtXTkE5PPy2NVOuxo/I61B21LKOhdtyN BRhIO3YtppEiOmH9EY02wmhH7fzpl9EjSHznSk073yfetrde0Ul8v7sc0bMyM7tNWM5D xnj/i9/+qzcSJ4rhq8+KgalrYdpisYdAJDs+3e+IwF+5D9qzj1Ive/tLPBJCc2SjhbP1 Q7JniqWgI58o+UufTxvgTTdakkJnqSIXLaOZuZW/UXYFdNjdnzyCrSs3m/KyvbQ6PwHC 0GgbCJTzYeRkh45DkDE4BKZ5KF5OuVingYGatFejCFNYpZT4NFC61bx0wlmcPjDMJnox SoPQ== X-Gm-Message-State: AC+VfDxP7uZmixQSbaVGoWtxV6P/7cn8YvA3QO0j4QwP68YaU9nTqmwa HV9x8uhbv/OzlXbeXfaOHH6MGw== X-Received: by 2002:a05:6a20:d806:b0:101:73a9:1683 with SMTP id iv6-20020a056a20d80600b0010173a91683mr1218950pzb.33.1686023690390; Mon, 05 Jun 2023 20:54:50 -0700 (PDT) Received: from grundler-glapstation.lan ([70.134.62.80]) by smtp.gmail.com with ESMTPSA id e11-20020a17090301cb00b001ac5b0a959bsm7346636plh.24.2023.06.05.20.54.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Jun 2023 20:54:50 -0700 (PDT) From: Grant Grundler To: Mahesh J Salgaonkar , "Oliver O \ 'Halloran" , Bjorn Helgaas Cc: Rajat Khandelwal , linuxppc-dev@lists.ozlabs.org, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Rajat Jain , Grant Grundler Subject: [PATCHv3 pci-next 2/2] PCI/AER: Rate limit the reporting of the correctable errors Date: Mon, 5 Jun 2023 20:54:42 -0700 Message-ID: <20230606035442.2886343-2-grundler@chromium.org> X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog In-Reply-To: <20230606035442.2886343-1-grundler@chromium.org> References: <20230606035442.2886343-1-grundler@chromium.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Rajat Khandelwal There are many instances where correctable errors tend to inundate the message buffer. We observe such instances during thunderbolt PCIe tunneling. It's true that they are mitigated by the hardware and are non-fatal but we shouldn't be spamming the logs with such correctable errors as it confuses other kernel developers less familiar with PCI errors, support staff, and users who happen to look at the logs, hence rate limit them. A typical example log inside an HP TBT4 dock: [54912.661142] pcieport 0000:00:07.0: AER: Multiple Corrected error received: 0000:2b:00.0 [54912.661194] igc 0000:2b:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID) [54912.661203] igc 0000:2b:00.0: device [8086:5502] error status/mask=00001100/00002000 [54912.661211] igc 0000:2b:00.0: [ 8] Rollover [54912.661219] igc 0000:2b:00.0: [12] Timeout [54982.838760] pcieport 0000:00:07.0: AER: Corrected error received: 0000:2b:00.0 [54982.838798] igc 0000:2b:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID) [54982.838808] igc 0000:2b:00.0: device [8086:5502] error status/mask=00001000/00002000 [54982.838817] igc 0000:2b:00.0: [12] Timeout This gets repeated continuously, thus inundating the buffer. Signed-off-by: Rajat Khandelwal Signed-off-by: Grant Grundler --- drivers/pci/pcie/aer.c | 80 +++++++++++++++++++++++++++--------------- 1 file changed, 51 insertions(+), 29 deletions(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index d7bfc6070ddb..830f5a1261c9 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -686,26 +686,36 @@ static void __aer_print_error(struct pci_dev *dev, struct aer_err_info *info) { const char **strings; + char aer_msg[512]; unsigned long status = info->status & ~info->mask; - const char *level, *errmsg; int i; - if (info->severity == AER_CORRECTABLE) { - strings = aer_correctable_error_string; - level = KERN_INFO; - } else { - strings = aer_uncorrectable_error_string; - level = KERN_ERR; - } + memset(aer_msg, 0, sizeof(*aer_msg)); + snprintf(aer_msg, sizeof(*aer_msg), "aer_status: 0x%08x, aer_mask: 0x%08x\n", + info->status, info->mask); + + strings = (info->severity == AER_CORRECTABLE) ? + aer_correctable_error_string : aer_uncorrectable_error_string; for_each_set_bit(i, &status, 32) { - errmsg = strings[i]; + const char *errmsg = strings[i]; + char bitmsg[64]; + memset(bitmsg, 0, sizeof(*bitmsg)); + if (!errmsg) errmsg = "Unknown Error Bit"; - pci_printk(level, dev, " [%2d] %-22s%s\n", i, errmsg, - info->first_error == i ? " (First)" : ""); + snprintf(bitmsg, sizeof(*bitmsg), " [%2d] %-22s%s\n", i, errmsg, + info->first_error == i ? " (First)" : ""); + + strlcat(aer_msg, bitmsg, sizeof(*aer_msg)); } + + if (info->severity == AER_CORRECTABLE) + pci_info_ratelimited(dev, "%s", aer_msg); + else + pci_err(dev, "%s", aer_msg): + pci_dev_aer_stats_incr(dev, info); } @@ -713,7 +723,6 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) { int layer, agent; int id = ((dev->bus->number << 8) | dev->devfn); - const char *level; if (!info->status) { pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent ID)\n", @@ -724,14 +733,19 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) layer = AER_GET_LAYER_ERROR(info->severity, info->status); agent = AER_GET_AGENT(info->severity, info->status); - level = (info->severity == AER_CORRECTABLE) ? KERN_INFO : KERN_ERR; - - pci_printk(level, dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n", - aer_error_severity_string[info->severity], - aer_error_layer[layer], aer_agent_string[agent]); - - pci_printk(level, dev, " device [%04x:%04x] error status/mask=%08x/%08x\n", - dev->vendor, dev->device, info->status, info->mask); + if (info->severity == AER_CORRECTABLE) { + pci_info_ratelimited(dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n" + " device [%04x:%04x] error status/mask=%08x/%08x\n", + aer_error_severity_string[info->severity], + aer_error_layer[layer], aer_agent_string[agent], + dev->vendor, dev->device, info->status, info->mask); + } else { + pci_err(dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n", + " device [%04x:%04x] error status/mask=%08x/%08x\n", + aer_error_severity_string[info->severity], + aer_error_layer[layer], aer_agent_string[agent], + dev->vendor, dev->device, info->status, info->mask); + } __aer_print_error(dev, info); @@ -751,11 +765,19 @@ static void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info) u8 bus = info->id >> 8; u8 devfn = info->id & 0xff; - pci_info(dev, "%s%s error received: %04x:%02x:%02x.%d\n", - info->multi_error_valid ? "Multiple " : "", - aer_error_severity_string[info->severity], - pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn), - PCI_FUNC(devfn)); + if (info->severity == AER_CORRECTABLE) + pci_info_ratelimited(dev, "%s%s error received: %04x:%02x:%02x.%d\n", + info->multi_error_valid ? "Multiple " : "", + aer_error_severity_string[info->severity], + pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn), + PCI_FUNC(devfn)); + else + pci_info(dev, "%s%s error received: %04x:%02x:%02x.%d\n", + info->multi_error_valid ? "Multiple " : "", + aer_error_severity_string[info->severity], + pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn), + PCI_FUNC(devfn)); + } #ifdef CONFIG_ACPI_APEI_PCIEAER @@ -798,7 +820,7 @@ void cper_print_aer(struct pci_dev *dev, int aer_severity, info.first_error = PCI_ERR_CAP_FEP(aer->cap_control); if (aer_severity == AER_CORRECTABLE) - pci_info(dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n", status, mask); + pci_info_ratelimited(dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n", status, mask); else pci_err(dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n", status, mask); @@ -808,9 +830,9 @@ void cper_print_aer(struct pci_dev *dev, int aer_severity, pci_info(dev, "aer_layer=%s, aer_agent=%s\n", aer_error_layer[layer], aer_agent_string[agent]); } else { - pci_err(dev, "aer_layer=%s, aer_agent=%s\n", - aer_error_layer[layer], aer_agent_string[agent]); - pci_err(dev, "aer_uncor_severity: 0x%08x\n", + pci_err(dev, "aer_layer=%s, aer_agent=%s," + " aer_uncor_severity=0x%08x\n", + aer_error_layer[layer], aer_agent_string[agent], aer->uncor_severity); } -- 2.41.0.rc0.172.g3f132b7071-goog