Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp2762144imm; Fri, 24 Aug 2018 05:03:39 -0700 (PDT) X-Google-Smtp-Source: ANB0Vdb2k2qrNHNT+AUFZ+JcIEj6vounHB84CPDlkr+h4H4NUBPZ8irFbhCiIsXv01Fw8r4ygggq X-Received: by 2002:a17:902:28aa:: with SMTP id f39-v6mr1468248plb.150.1535112219132; Fri, 24 Aug 2018 05:03:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535112219; cv=none; d=google.com; s=arc-20160816; b=FAJL54W1dQ/xxea2EhCeHT8rd+Jo5HXHwasvEGlRz50dP+91iSr5lgT1eOcpx8iVr/ wozAtD7adOjVLTHmwI0i0RDlNE5w3AwIA7qD+/qdWkAkU8q17aD/+xr5Pq5Q5E30OUkG papT7b8c83VnGBO1wRyIhfaC7tUOhxJ8AquUTIWSfjAfose3XT/ddo93Xx5eytwITk8m EC0YwSNedt9C+1Aa4artJ6at/P1EBzK9r0zy+c/XMjGaLodDt1hh6sqXLCSPmnVhm0f6 AiGqEJ9Kktm7a0r2hsNLw9oy78RhNyqIKqd3Y+2iYkFogHoS9p69/2ZOiahUYw90ZUJj dKvA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=sq6K/h2wobT515FjjCx9v6py4B4UeUwWg7/m6M4wUIU=; b=VUCN5j1ypjAwDEDTdnI+GRaLvTCpQcPT5T6T8tDmsPj2JDfecoTYEbyTGsiGeEy7eB 2J31bxgG8KC2YYrjORxF/2+KWpK2JQichosfUaB+FPWo746onXIMOYq5i1pWH4ryLzbJ VqHQrw8PtgfZW+Ty59CNYVx1fqauTeor56A8mzd5UwFsKyTOBogV5fo11LtjnYNQn9s/ qg5VYEDMnWfLz9F9jbUUtU0+wg7B7LhzHqrjnjjjOv9P2Xyp2wI/mMHo2ywdcFsG+dE7 aklfD1Bmb3357WXPUNMOvadduddEopoq5/VlOFzBzQBOfR8Un4N0JSgQ8Glvc+H3agrD 25jg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q16-v6si6448492pll.1.2018.08.24.05.03.23; Fri, 24 Aug 2018 05:03:39 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727852AbeHXPft (ORCPT + 99 others); Fri, 24 Aug 2018 11:35:49 -0400 Received: from mail.skyhub.de ([5.9.137.197]:58704 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727043AbeHXPft (ORCPT ); Fri, 24 Aug 2018 11:35:49 -0400 X-Virus-Scanned: Nedap ESD1 at mail.skyhub.de Received: from mail.skyhub.de ([127.0.0.1]) by localhost (blast.alien8.de [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id uISBwQ4W370m; Fri, 24 Aug 2018 14:01:09 +0200 (CEST) Received: from nazgul.tnic (79-100-101-223.ip.btc-net.bg [79.100.101.223]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.skyhub.de (SuperMail on ZX Spectrum 128k) with ESMTPSA id 1311D1EC0310; Fri, 24 Aug 2018 14:00:49 +0200 (CEST) Date: Fri, 24 Aug 2018 14:01:02 +0200 From: Borislav Petkov To: James Morse Cc: Tyler Baicar , Tyler Baicar , wufan@codeaurora.org, Linux Kernel Mailing List , harba@qti.qualcomm.com, mchehab@kernel.org, arm-mail-list , linux-edac@vger.kernel.org Subject: Re: [RFC PATCH] EDAC, ghes: Enable per-layer error reporting for ARM Message-ID: <20180824120102.GB29751@nazgul.tnic> References: <1531762009-15112-1-git-send-email-tbaicar@codeaurora.org> <20180719140102.GB25185@nazgul.tnic> <94e3a0fb-9b7d-045f-733b-9f063dcb39e4@arm.com> <45fefe7d-c6ea-5791-4477-13ecce39ce48@codeaurora.org> <68a800c7-446e-9b6b-1847-6e45a1d17262@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <68a800c7-446e-9b6b-1847-6e45a1d17262@arm.com> User-Agent: Mutt/1.6.0 (2016-04-01) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Aug 24, 2018 at 10:48:24AM +0100, James Morse wrote: > Why get avoid the layer stuff? Isn't counting DIMM/memory-devices what > EDAC_MC_LAYER_SLOT is for? Yap. > so edac_raw_mc_handle_error() has no clue where the error happened. (I haven't > read what it does with this information yet). See edac_inc_ce_error(), for example - it uses the layers which are not negative (-1) to increment the error counts of the respective layer. It all depends on what granularity of the hardware part you're reporting the error for: is it a DIMM rank, a whole DIMM or for a channel which can span multiple DIMM ranks. And so on... Look at some of the drivers and how they're doing that layering. It all depends on whether you can get the precise info from the hw. > ghes_edac_report_mem_error() does check CPER_MEM_VALID_MODULE_HANDLE, and if its > set, it uses the handle to find the bank/device strings and prints them out. Yap, and the error counts are lumped together into /sys/devices/system/edac/mc/mc*/ce_noinfo_count > Naively I thought we could generate some index during ghes_edac_count_dimms(), > and use this as e->${whichever}_layer. I hoped there would be something we could > already use as the index, but I can't spot it, so this will be more than the > one-liner I was hoping for! If you can get that info from the hardware and injecting an error into a DIMM gives you the correct DIMM number so that we can increment the proper counter, then you're golden. I don't think that works reliably on x86, though, therefore the lumping together. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. --