Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp2328610pxb; Sat, 25 Sep 2021 04:26:18 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwmLdjyzke82nDHO3/z8gink0s+xjKrDd1bRws33VERIPt9F/BG8MtqN/yjN0nSZNKBcAhd X-Received: by 2002:a05:6402:21d2:: with SMTP id bi18mr10624816edb.21.1632569178704; Sat, 25 Sep 2021 04:26:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1632569178; cv=none; d=google.com; s=arc-20160816; b=srnVg0Ts245gEL27jXoiKSih4iGEW93gTY/mYYT2C2KneTBb45OOUYc2aQn1yExCA0 H/qiVvjQd4MSY1UOkux3jX+2y+7Z2r2DBEMlqg17F2KPsmADuYZyn2i7k/Fqb5o+yJM7 atELuVj3eazZUqyPss3d6llcjWKJbmCSjItk3qLjt2YfQNgMXqcK4GnqdR82s6a8l4fK utC+ksFWWnmsY76qjJpyEewGpOcSZXSwx0ZCBYghxtjNO6fqzvhuNzJ6R73ENo745R2k qoANdks+D4JgvTXi14aFXGOumDOw020kgDqm33JSeRgHpg65hVzYhu2GvWLwgfzEIILW 2G4w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=7lwTvw436hmF3Y3GbvXEGF+jEH1nN3GmL3cFs+61EGc=; b=chMxWCjYsH8MAM8ia61es6IOeYDm2K7c0sYZCdyhXlA9wTs2CeQww110fE//19Q7jQ 0Iuu/VxssD01VmIakTANwLeNg0U+VIMK7ToDagonscOUwnjzEh3ykoveecGh8cm0thpz pxpju1Pw8VSKIQMeJr8OXO5LGTyjb7MsblIBZYqeEHsSxGy1anE4gpBkIzRBWQpRyTuu WI+EfN/POnK0BimZyV7M6837B4VlBIjNgjIXSxgGWohsNMDuH2JEZ4n0YROL69793kNN /7S02r3MKzI9s/TBlvCpL3u4I9lFRsL/V1aOViezE0vEuyk3bPnuN96FLdtTuy5ryZvS t0/Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@alien8.de header.s=dkim header.b="g9rL/9qJ"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alien8.de Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d13si12007578edx.153.2021.09.25.04.25.55; Sat, 25 Sep 2021 04:26:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@alien8.de header.s=dkim header.b="g9rL/9qJ"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alien8.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244476AbhIYLWq (ORCPT + 99 others); Sat, 25 Sep 2021 07:22:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33130 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243920AbhIYLWp (ORCPT ); Sat, 25 Sep 2021 07:22:45 -0400 Received: from mail.skyhub.de (mail.skyhub.de [IPv6:2a01:4f8:190:11c2::b:1457]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E9C2DC061570; Sat, 25 Sep 2021 04:21:10 -0700 (PDT) Received: from zn.tnic (p200300ec2f1bac00c299c4b579452b16.dip0.t-ipconnect.de [IPv6:2003:ec:2f1b:ac00:c299:c4b5:7945:2b16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.skyhub.de (SuperMail on ZX Spectrum 128k) with ESMTPSA id 20A211EC05E2; Sat, 25 Sep 2021 13:21:04 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alien8.de; s=dkim; t=1632568864; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=7lwTvw436hmF3Y3GbvXEGF+jEH1nN3GmL3cFs+61EGc=; b=g9rL/9qJwvzkA7ANvscQ39xtPncQb6k05Ih3B8Bh0saxRL9smQ3EWzhHbMMXb5b9tEQcaV 9BVufb0EFLlpK+0G3htKz9YgMSO3oVVtB0ChRzHN2H9fiM6E0dasLNARDrXM4mPJrx/ae+ RAjBwe5hB/I0PdzlRtcvAgu7hX5HtvU= Date: Sat, 25 Sep 2021 13:20:57 +0200 From: Borislav Petkov To: Yazen Ghannam Cc: "Joshi, Mukul" , "linux-edac@vger.kernel.org" , "x86@kernel.org" , "linux-kernel@vger.kernel.org" , "mingo@redhat.com" , "mchehab@kernel.org" , "amd-gfx@lists.freedesktop.org" Subject: Re: [PATCHv3 2/2] drm/amdgpu: Register MCE notifier for Aldebaran RAS Message-ID: References: <20210913021311.12896-2-mukul.joshi@amd.com> <20210922193620.15925-1-mukul.joshi@amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Sep 24, 2021 at 07:46:10PM +0000, Yazen Ghannam wrote: > I agree with you in general. But this device isn't really a GPU. And > users of this device seem to want to count *every* error, at least for > now. Aha, so something accelerator-y where they do general purpose computation. So what's the big picture here: they count all the errors and when they reach a certain amount, they decide to replace the GPUs just in case? Or wait until they become uncorrectable? But then it doesn't matter because we will handle it properly by excluding the VRAM range from further use. Or do they wanna see *when* they had the correctable errors so that they can restart the computation, just in case. Dunno, it would be a lot helpful if we had some RAS strategy for those things... Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette