Received: by 2002:a05:6a10:a852:0:0:0:0 with SMTP id d18csp1114765pxy; Thu, 6 May 2021 00:19:03 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxcxuCux/5QtVWrCfelD7LHPcuHtlrXu/qu7OHOKqiL9FaVRjZTbiHX89K5SXB2OKuvAeLE X-Received: by 2002:a05:6402:cb0:: with SMTP id cn16mr3363490edb.15.1620285543129; Thu, 06 May 2021 00:19:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620285543; cv=none; d=google.com; s=arc-20160816; b=YTfAoUzTP0mQnWc5f2Z50k0M2pzXIAGo1CbGCkZwhU5WERK+JGqh8vowtVd8DI4bOv oX2muuH5+DWiMEr9u8QTlP9qYpC1kuLznpfQqcNr3QvqJRHbWOs6m6VN7X1dYmetJ9Sj Pn1dArzTM6oq6o+aSsPg7bwwaq00gl6WwdcZyv4KwySXqzCHOxU3uDec/dp81IoCVcEM dJn4BLkQy0aSv06goobrnl+TwhHcH5gIjdrJdF6TtrAvQY3LMxS7r5mRy5s5l4H35gBE lZG6yZQYql0tAXln2cI/NtjHv7oBud1ulDrDHvcAT4yMP23pQjq7dorQCSGCkLIPnfa5 xCpA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=K3r4Lw+3GgOsldFU5Vmydo11jLFc+S9Jq6WLEO1jmXI=; b=G9cWG7rCyxqqbfX3h7KykEMaNZ2q7Why6lw0bmtUFKglM8otpKf1ctcTRPgyF7AeGu 9AHf4J6j/xqCAlbW5647IKEDTr6nks+BtT9jaE1hmcOJ2ZINtHKJrLdGkTceB9ByCERN EjOYUxfUuTZpVyITZDCHw4bFmgBptPhV9iMr88D8nKdP2MuW4RXfkbfd8dIZOMu1Znzl gdRf3+/75JziwOOQuGYf/QInoJn8C3yh9nXqhR0kDEzpV31usD90w5wHE0q4a3h3rv5w 39lSjWT6Dlwu7ElGgcIC/R/0zGfYwLZrim27GBCFh4bx3eVaBE1GIauWrUcI2iCAdvrB 1zvA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=nBBNCoQx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v29si1475327eda.308.2021.05.06.00.18.38; Thu, 06 May 2021 00:19:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=nBBNCoQx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233244AbhEFHRe (ORCPT + 99 others); Thu, 6 May 2021 03:17:34 -0400 Received: from mail.kernel.org ([198.145.29.99]:36794 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231254AbhEFHRe (ORCPT ); Thu, 6 May 2021 03:17:34 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id CD5BB61106; Thu, 6 May 2021 07:16:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1620285396; bh=E3+ulk7TNn7yLBg14VbzBiJFH+IBeMEzZPW/Kp5aFQ0=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=nBBNCoQxtcLiR+5VCDtC2d6ruYl1Ksg/hdNLQT8AtjRgC2C7QtM7U0UW2ULaWNfAg IAIpiGIt/Rn9H42QzRANiQtckXAczjXsOagegFy1P4Mta3a+9Hz0ovhfGqmasdkEhS 7qByO0A4PE8knwfyqltVn1afLRrsD5NXRu2SHuDAJCImkKRUbBIZ7oSRhj/mXRk1Zh HTDugDBg0TI4ASz/6GcstZeXzkQ3Vuli6r2M9sKT/Sa6fXzzAh5mYao24pr3I/j15J ee+XuGuJyFnWITgAsCcZ7F3U+XXS8+4sglQ0SXb4/LsfuAoPspGP4LjPwxqR8LGLkY AC2x0ypVMMgSw== Date: Thu, 6 May 2021 09:16:30 +0200 From: Mauro Carvalho Chehab To: Tyler Hicks Cc: Borislav Petkov , wangglei , "Lei Wang (DPLAT)" , "tony.luck@intel.com" , "james.morse@arm.com" , "rric@kernel.org" , "linux-edac@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Hang Li , Brandon Waller Subject: Re: [EXTERNAL] Re: [PATCH] EDAC: update edac printk wrappers to use printk_ratelimited. Message-ID: <20210506091630.168c7887@coco.lan> In-Reply-To: <20210505230152.GH4967@sequoia> References: <20210505202357.GC4967@sequoia> <20210505214846.GE4967@sequoia> <20210505221605.GF4967@sequoia> <20210505224357.GG4967@sequoia> <20210505230152.GH4967@sequoia> X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.33; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Em Wed, 5 May 2021 18:01:52 -0500 Tyler Hicks escreveu: > On 2021-05-06 00:55:00, Borislav Petkov wrote: > > On Wed, May 05, 2021 at 05:43:57PM -0500, Tyler Hicks wrote: =20 > > > This is x86-specific =20 > >=20 > > That's because it is used by x86 currently. It shouldn't be hard to use > > it on another arch though as the machinery is pretty generic. > > =20 > > > and not applicable in our situation. =20 > >=20 > > What is your situation? ARM? =20 >=20 > Yes, though I'm not sure if those additional features are > important/useful enough for us to generalize that driver. The main > motivation here was just to prevent storage/network from being flooded > by obviously-bad nodes that haven't been offlined yet. :)=20 Well, if a machine starts to produce 500+ errors per second, then it should be offlined as soon as possible, as otherwise bad results will be produced ;-) See, the CE error reporting mechanism is meant to be used together with some error correction code algorithm like the ones used on ECC memories. Such algorithms are designed to detect a single errored bit=20 with a change usually at the ~10=E2=81=BB4 to 10^-7 order (the precision depends on how many bits are used and what algorithm is used), but=20 if there are two wrong bits at the same word, the chance to detect=20 is a lot lower. So, keeping the server enabled up to the point that it would consume enough resources at the storage/network to bother someone sounds a=20 terrible idea, as sooner or later it will miss an error or produce an uncorrected event ;-) Besides that, if you're running rasdaemon to capture the hardware errors,=20 the storage will also be flooded by something like that, even if you disable them from going to syslog via=20 sys/module/edac_core/parameters/edac_mc_log_ce. Now, the question is: are those 500+ errors per second a real hardware problem, or is it due to some broken error report mechanism? In the latter case, the driver or the hardware that it is producing=20 invalid errors should be fixed. >=20 > Lei and others on cc will need to evaluate porting cec.c and what it > will gain them. Thanks again. Regards, Mauro