On 2021-05-06 00:02:44, Borislav Petkov wrote:
> On Wed, May 05, 2021 at 04:48:46PM -0500, Tyler Hicks wrote:
> > The thought was that the full stream of log messages isn't necessary to
> > notice that there's a problem when they are being emitted at such a high
> > rate (500 per second). They're just filling up disk space and/or wasting
> > networking bandwidth at that point.
>
> I already asked about this but lemme point it out again: have you guys
> looked at drivers/ras/cec.c ?
We'll have a closer look. Thanks for the pointer!
Tyler
>
> With that there won't be *any* error reports in dmesg and it will even
> poison and offline pages which generate excessive errors so that ...
>
> > Of course, the best course of action here is to service the machine
> > but there's still a period of time between the CE errors popping up
> > and the machine being serviced.
>
> ... you'll have ample time to service the machine.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
>
On 2021-05-05 17:16:11, Tyler Hicks wrote:
> On 2021-05-06 00:02:44, Borislav Petkov wrote:
> > On Wed, May 05, 2021 at 04:48:46PM -0500, Tyler Hicks wrote:
> > > The thought was that the full stream of log messages isn't necessary to
> > > notice that there's a problem when they are being emitted at such a high
> > > rate (500 per second). They're just filling up disk space and/or wasting
> > > networking bandwidth at that point.
> >
> > I already asked about this but lemme point it out again: have you guys
> > looked at drivers/ras/cec.c ?
>
> We'll have a closer look. Thanks for the pointer!
This is x86-specific and not applicable in our situation.
Tyler
>
> Tyler
>
> >
> > With that there won't be *any* error reports in dmesg and it will even
> > poison and offline pages which generate excessive errors so that ...
> >
> > > Of course, the best course of action here is to service the machine
> > > but there's still a period of time between the CE errors popping up
> > > and the machine being serviced.
> >
> > ... you'll have ample time to service the machine.
> >
> > --
> > Regards/Gruss,
> > Boris.
> >
> > https://people.kernel.org/tglx/notes-about-netiquette
> >
On Wed, May 05, 2021 at 05:43:57PM -0500, Tyler Hicks wrote:
> This is x86-specific
That's because it is used by x86 currently. It shouldn't be hard to use
it on another arch though as the machinery is pretty generic.
> and not applicable in our situation.
What is your situation? ARM?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette