Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp1272348rwl; Fri, 7 Apr 2023 12:51:08 -0700 (PDT) X-Google-Smtp-Source: AKy350biU8CaQVvUUFKbF3Y3UT/0zwscTHWMeMRwiSiS64btPyVEC9NcEyaYmKL8S5IV9BqupXAB X-Received: by 2002:a17:906:ae0f:b0:935:1565:d661 with SMTP id le15-20020a170906ae0f00b009351565d661mr591446ejb.66.1680897067857; Fri, 07 Apr 2023 12:51:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680897067; cv=none; d=google.com; s=arc-20160816; b=WEkg2QVIvQXuA516CNLDF46eHALF9PmQps5Wn1dLBHnenvdgo6DPFXLm9SAaKgQbWU iR4citZq/2+0IyvJbR2tTo7/jly+qvlFFhIMvzMqWmb5/fWAomNYYerNbGCusU4lXyeb MYbGc2p9RtM76JzObr2HBPYa3DTHYNay9+2QX7b0IVjpRn2tl6mv94ZlRHJwjiCWLGKa 4KttS2K45+4YUcASBJshSJZdy2/5uF+8CUwRfBnHGUKB0SumbGgBmTHEvY2oZYvqhv+h 1ZionIQOA7LvKSYBq0sNxuYY3gpIw4r66g8Kioo6JG8tsak5B2qfgXQSN24ersTnSyju 8crQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:message-id:subject:cc:to:from:date :dkim-signature; bh=zoI32R10F+Pe5zau+WizBUK6yKf9FQcM9SAYVMSg79Y=; b=L+RFO+R8rhoP31BysOUv7PUiclqyGOg3XXl/J9ayAjv2T5Cz3BxPow+683LvGiV4OR WDlt1BaEe0A7t09F6BfbZkpO4CyVl2311cevYCAEc63tKFibUoCCuyR3zi89x8+Ye1TN j0I2A/EqrzYjISK1vaDoUBuEd8tb3PNAnm4YElQTLEGLIJfIZfeOjluOg5l5FrY1n3Xc Mbk2S8PSnI4GlaND3IK7QoTWBMTf/n5lGhl5SM44tZxLOXrbllqoAYaf5n1DTaexxpEB F/TDb3Wsm1pmks/uhOLv7czk5cTyNR57f9j4EmIdeMLENz5NuBO8H89NyAopWjnLcjIY QE/A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=TO469duA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t12-20020a170906948c00b009331d7ae4e8si976048ejx.77.2023.04.07.12.50.42; Fri, 07 Apr 2023 12:51:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=TO469duA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229932AbjDGTqv (ORCPT + 99 others); Fri, 7 Apr 2023 15:46:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41728 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229469AbjDGTqu (ORCPT ); Fri, 7 Apr 2023 15:46:50 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 535D97EFD; Fri, 7 Apr 2023 12:46:48 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id D56A065199; Fri, 7 Apr 2023 19:46:47 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 10AE7C433EF; Fri, 7 Apr 2023 19:46:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1680896807; bh=jJZrgaS9x+x1S2R2yg1K1+c5nWNoK5ss3XwLkYAtE8Y=; h=Date:From:To:Cc:Subject:In-Reply-To:From; b=TO469duAJkhpj1So22MlFHRFn5xX24ea33Qa5N8RAp5UJIIKGZZOBX6Hs2rMJ52Ig RpwfFH4aHa7VjgDuSDJpTTjnqOfjYnsa/tGBV93QjZABBO8NYJgZnZ+X8EpcEN0nIk E2oP1kjOx7Zu2S0/t9nYBdInDLCE+uBvUVdVndcyp72HOaATPyxov0cl2HDrAvHx7d t+kv49Qt/c6skSolQc+cw42feY7R8ExgnIaeO5/eeEMS8QDG0EbTOJhO2aCPC00vWu F9IZPC/a1LZehjXz2KicoaZxZlOVth24NTYZ6QPxJFFjousdIZDGbwBFHwtqkLMJKP s3lHDSpINxvew== Date: Fri, 7 Apr 2023 14:46:45 -0500 From: Bjorn Helgaas To: Grant Grundler Cc: Mahesh J Salgaonkar , Oliver O 'Halloran , Bjorn Helgaas , Rajat Jain , Rajat Khandelwal , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCHv2 pci-next 2/2] PCI/AER: Rate limit the reporting of the correctable errors Message-ID: <20230407194645.GA3814486@bhelgaas> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-5.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 07, 2023 at 11:53:27AM -0700, Grant Grundler wrote: > On Thu, Apr 6, 2023 at 12:50 PM Bjorn Helgaas wrote: > > On Fri, Mar 17, 2023 at 10:51:09AM -0700, Grant Grundler wrote: > > > From: Rajat Khandelwal > > > > > > There are many instances where correctable errors tend to inundate > > > the message buffer. We observe such instances during thunderbolt PCIe > > > tunneling. > ... > > > if (info->severity == AER_CORRECTABLE) > > > - pci_info(dev, " [%2d] %-22s%s\n", i, errmsg, > > > - info->first_error == i ? " (First)" : ""); > > > + pci_info_ratelimited(dev, " [%2d] %-22s%s\n", i, errmsg, > > > + info->first_error == i ? " (First)" : ""); > > > > I don't think this is going to reliably work the way we want. We have > > a bunch of pci_info_ratelimited() calls, and each caller has its own > > ratelimit_state data. Unless we call pci_info_ratelimited() exactly > > the same number of times for each error, the ratelimit counters will > > get out of sync and we'll end up printing fragments from error A mixed > > with fragments from error B. > > Ok - what I'm reading between the lines here is the output should be > emitted in one step, not multiple pci_info_ratelimited() calls. if the > code built an output string (using sprintnf()), and then called > pci_info_ratelimited() exactly once at the bottom, would that be > sufficient? > > > I think we need to explicitly manage the ratelimiting ourselves, > > similar to print_hmi_event_info() or print_extlog_rcd(). Then we can > > have a *single* ratelimit_state, and we can check it once to determine > > whether to log this correctable error. > > Is the rate limiting per call location or per device? From above, I > understood rate limiting is "per call location". If the code only > has one call location, it should achieve the same goal, right? Rate-limiting is per call location, so yes, if we only have one call location, that would solve it. It would also have the nice property that all the output would be atomic so it wouldn't get mixed with other stuff, and it might encourage us to be a little less wordy in the output. But I don't think we need output in a single step; we just need a single instance of ratelimit_state (or one for CPER path and another for native AER path), and that can control all the output for a single error. E.g., print_hmi_event_info() looks like this: static void print_hmi_event_info(...) { static DEFINE_RATELIMIT_STATE(rs, ...); if (__ratelimit(&rs)) { printk("%s%s Hypervisor Maintenance interrupt ..."); printk("%s Error detail: %s\n", ...); printk("%s HMER: %016llx\n", ...); } } I think it's nice that the struct ratelimit_state is explicit and there's no danger of breaking it when adding another printk later. It *could* be per pci_dev, too, but I suspect it's not worth spending 40ish bytes per device for the ratelimit data. Bjorn