Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp3283762imm; Fri, 24 Aug 2018 14:06:42 -0700 (PDT) X-Google-Smtp-Source: ANB0VdYMIW5ieu1Tz9gvfeSkJocLVfK5Z6wlqCb1In9BkNfosqSz2kJTzhvwxOKrEBHJJHFiD8yO X-Received: by 2002:a62:586:: with SMTP id 128-v6mr3569672pff.80.1535144802742; Fri, 24 Aug 2018 14:06:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535144802; cv=none; d=google.com; s=arc-20160816; b=r8zanRJwf/7u02rijWtjTyDQP1EDlm79FhFfonPqfJsP39vcRCh3ylQXNTIoeBzZvx pY1fCh2PDgdhrUE3g4oF41Vi7JhOq9gUIubEziJL7hTx+QlnA0eyDYowM/tVj0BS2u5E PCCsYk8PdTc7ODLsKEMPygOLBkTD3583vYU+OKvXr48YmXsDQoD2c1zzQHjpZWEUbIkU HNJqudz42XoN0yU9DOCdPgiUNRshaB2h27erlCeBC20zX95h+Bk03S3o/3l/qxpAOC0k o1pb9t+T0vQYiFcgtedZCGl5WE5FCqeA/CeI9yLytJMsy9k1aReBagRsipgNSz7kM4/c bIlg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:message-id:references :in-reply-to:subject:cc:to:from:date:content-transfer-encoding :mime-version:dkim-signature:dkim-signature :arc-authentication-results; bh=SyHEb7ydcpfzdduVo+0rcXpV/OxzbqrHBpYfTQIyZtQ=; b=TUpAYjJi6AKmE1AwVQFbpTg0eIki5imlMPnej/qTqm61YCXhW2H+pZnRgtbNwB92T4 N+JbgocrN2Q6XkeqeeHAwmlivmVa1tsg4WiGkIUNeiCofgvtd6ax7xP7Lyy/Op+69Rpg wU2CjyTIhCmaXgf4qBr8d/U2KtptzEdDfDF0mlE/j1kaWRWogX86HUXFTbnpjFNc0GQb M0q8frHvT82LS46KLo/UBAQgcEBbMukGDniVxFcsXSaUA4c28z3uUAuI+WTFgX4a7H9l xFwIQlBBqc7/xd1MkoOVSLtVaWyNWK7f1DXwZQUGuxXu9hvy8Jdhy9BvdxU3tefXTywX wtdw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=S2udgNXY; dkim=pass header.i=@codeaurora.org header.s=default header.b=Ncl06neW; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n7-v6si7450629pgp.411.2018.08.24.14.06.26; Fri, 24 Aug 2018 14:06:42 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=S2udgNXY; dkim=pass header.i=@codeaurora.org header.s=default header.b=Ncl06neW; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727535AbeHYAlj (ORCPT + 99 others); Fri, 24 Aug 2018 20:41:39 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:39618 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726510AbeHYAlj (ORCPT ); Fri, 24 Aug 2018 20:41:39 -0400 Received: by smtp.codeaurora.org (Postfix, from userid 1000) id 329366037C; Fri, 24 Aug 2018 21:05:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1535144721; bh=AF2gU/eTnRapLtyz2Yqmn+e+2Wb5zs6uBHyBalZQhvc=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=S2udgNXYQ/feeL3L+KnTfR36opnTgjZ2upHY+gomMOAeaPvkRcb6l495CrZ0fk1C2 Th+M8SSl9NcurhUvCOsn1eXO3gmDtPcHTfmGb4ZTCvNgQcvVVtWUZ9Pdn9AJYvKxjI vQ0hb6jcKDj50I15Jzk3Km2uJVlP3Wo1zM3pJwsQ= X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on pdx-caf-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=ALL_TRUSTED,BAYES_00, DKIM_SIGNED,T_DKIM_INVALID autolearn=no autolearn_force=no version=3.4.0 Received: from mail.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.codeaurora.org (Postfix) with ESMTP id 791A36037C; Fri, 24 Aug 2018 21:05:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1535144719; bh=AF2gU/eTnRapLtyz2Yqmn+e+2Wb5zs6uBHyBalZQhvc=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=Ncl06neW/DvLWq7t0DtGljH4eKaMWzVgHxDoVoAKMuoS9V2l6iz27Phx2jMzavzQi UM6XJwZpieVWvnCqFbqQJIBTc/ORKHVEruAkiD19K8FzHm0SOLmN2MqxbBtsjYnO9Z LV6lrn1rKa809MPASOS6KlYPz+IXY3vrThfvsURI= MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Fri, 24 Aug 2018 14:05:19 -0700 From: vnkgutta@codeaurora.org To: Evan Green Cc: robh@kernel.org, mchehab@kernel.org, linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org, Andy Gross , David Brown , linux-arm-msm@vger.kernel.org, linux-soc@vger.kernel.org, robh+dt@kernel.org, mark.rutland@arm.com, devicetree@vger.kernel.org, tsoni@codeaurora.org, ckadabi@codeaurora.org, rishabhb@codeaurora.org, bp@alien8.de Subject: Re: [PATCH v2 3/4] drivers: edac: Add EDAC driver support for QCOM SoCs In-Reply-To: References: <1534550915-18230-1-git-send-email-vnkgutta@codeaurora.org> <1534550915-18230-4-git-send-email-vnkgutta@codeaurora.org> Message-ID: <93f11f3728ef658f903509ae0bc00d03@codeaurora.org> X-Sender: vnkgutta@codeaurora.org User-Agent: Roundcube Webmail/1.2.5 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2018-08-24 13:18, Evan Green wrote: > On Fri, Aug 24, 2018 at 11:32 AM wrote: >> >> On 2018-08-23 16:04, Evan Green wrote: >> > On Fri, Aug 17, 2018 at 5:08 PM Venkata Narendra Kumar Gutta >> > wrote: >> >> >> >> From: Channagoud Kadabi >> >> >> >> Add error reporting driver for Single Bit Errors (SBEs) and Double Bit >> >> Errors (DBEs). As of now, this driver supports erp for Last Level >> >> Cache >> >> Controller (LLCC). This driver takes care of dumping registers and >> >> adding >> >> config options to enable and disable panic when the errors happen in >> >> cache. >> >> >> >> Signed-off-by: Channagoud Kadabi >> >> Signed-off-by: Venkata Narendra Kumar Gutta >> >> Co-developed-by: Venkata Narendra Kumar Gutta >> >> >> >> --- >> >> MAINTAINERS | 8 + >> >> drivers/edac/Kconfig | 28 +++ >> >> drivers/edac/Makefile | 1 + >> >> drivers/edac/qcom_edac.c | 446 >> >> +++++++++++++++++++++++++++++++++++++ >> >> include/linux/soc/qcom/llcc-qcom.h | 25 +++ >> >> 5 files changed, 508 insertions(+) >> >> create mode 100644 drivers/edac/qcom_edac.c >> >> >> >> diff --git a/MAINTAINERS b/MAINTAINERS >> >> index 0a23427..0bff713 100644 >> >> --- a/MAINTAINERS >> >> +++ b/MAINTAINERS >> >> @@ -5227,6 +5227,14 @@ L: linux-edac@vger.kernel.org >> >> S: Maintained >> >> F: drivers/edac/ti_edac.c >> >> >> >> +EDAC-QUALCOMM >> >> +M: Channagoud Kadabi >> >> +M: Venkata Narendra Kumar Gutta >> >> +L: linux-arm-msm@vger.kernel.org >> >> +L: linux-edac@vger.kernel.org >> >> +S: Maintained >> >> +F: drivers/edac/qcom_edac.c >> >> + >> >> EDIROL UA-101/UA-1000 DRIVER >> >> M: Clemens Ladisch >> >> L: alsa-devel@alsa-project.org (moderated for non-subscribers) >> >> diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig >> >> index 57304b2..da8f150 100644 >> >> --- a/drivers/edac/Kconfig >> >> +++ b/drivers/edac/Kconfig >> >> @@ -460,4 +460,32 @@ config EDAC_TI >> >> Support for error detection and correction on the >> >> TI SoCs. >> >> >> >> +config EDAC_QCOM >> >> + tristate "QCOM EDAC Controller" >> >> + depends on EDAC >> >> + help >> >> + Support for error detection and correction on the >> >> + QCOM SoCs. >> >> + >> >> +config EDAC_QCOM_LLCC >> >> + tristate "QCOM EDAC Controller for LLCC Cache" >> >> + depends on EDAC_QCOM && QCOM_LLCC >> >> + help >> >> + Support for error detection and correction on the >> >> + QCOM LLCC cache. Report errors caught by LLCC ECC >> >> + mechanism. >> >> + >> >> + For debugging issues having to do with stability and overall >> >> system >> >> + health, you should probably say 'Y' here. >> >> + >> >> +config EDAC_QCOM_LLCC_PANIC_ON_UE >> >> + bool "Panic on uncorrectable errors - qcom llcc" >> >> + depends on EDAC_QCOM_LLCC >> >> + help >> >> + Forcibly cause a kernel panic if an uncorrectable error (UE) >> >> is >> >> + detected. This can reduce debugging times on hardware which >> >> may be >> >> + operating at voltages or frequencies outside normal >> >> specification. >> >> + >> >> + For production builds, you should probably say 'N' here. >> >> + >> >> endif # EDAC >> >> diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile >> >> index 02b43a7..716096d 100644 >> >> --- a/drivers/edac/Makefile >> >> +++ b/drivers/edac/Makefile >> >> @@ -77,3 +77,4 @@ obj-$(CONFIG_EDAC_ALTERA) += >> >> altera_edac.o >> >> obj-$(CONFIG_EDAC_SYNOPSYS) += synopsys_edac.o >> >> obj-$(CONFIG_EDAC_XGENE) += xgene_edac.o >> >> obj-$(CONFIG_EDAC_TI) += ti_edac.o >> >> +obj-$(CONFIG_EDAC_QCOM) += qcom_edac.o >> >> diff --git a/drivers/edac/qcom_edac.c b/drivers/edac/qcom_edac.c >> >> new file mode 100644 >> >> index 0000000..9a8c670 >> >> --- /dev/null >> >> +++ b/drivers/edac/qcom_edac.c >> >> @@ -0,0 +1,446 @@ >> >> +// SPDX-License-Identifier: GPL-2.0 >> >> +/* >> >> + * Copyright (c) 2018, The Linux Foundation. All rights reserved. >> >> + */ >> >> + >> >> +#include >> >> +#include >> >> +#include >> >> +#include >> >> +#include >> >> +#include >> >> +#include >> >> +#include >> >> + >> >> +#include "edac_mc.h" >> >> +#include "edac_device.h" >> >> + >> >> +#ifdef CONFIG_EDAC_QCOM_LLCC_PANIC_ON_UE >> >> +#define LLCC_ERP_PANIC_ON_UE 1 >> >> +#else >> >> +#define LLCC_ERP_PANIC_ON_UE 0 >> >> +#endif >> >> + >> >> +#define EDAC_LLCC "qcom_llcc" >> >> + >> >> +#define TRP_SYN_REG_CNT 6 >> >> + >> >> +#define DRP_SYN_REG_CNT 8 >> >> + >> >> +#define LLCC_COMMON_STATUS0 0x0003000C >> >> +#define LLCC_LB_CNT_MASK GENMASK(31, 28) >> >> +#define LLCC_LB_CNT_SHIFT 28 >> >> + >> >> +/* single & Double Bit syndrome register offsets */ >> > >> > Strange capitalization going on here. >> I'll fix this. >> >> > >> >> +#define TRP_ECC_SB_ERR_SYN0 0x0002304C >> >> +#define TRP_ECC_DB_ERR_SYN0 0x00020370 >> >> +#define DRP_ECC_SB_ERR_SYN0 0x0004204C >> >> +#define DRP_ECC_DB_ERR_SYN0 0x00042070 >> > >> > I think the convention is to use lowercase hex everywhere. >> >> I didn't get you. Do you mean, the Macros should be in lower case or >> the >> comments? > > I mean 0x0002304C should be 0x0002304c. Oh, I see. I'll update it, Thanks. > >> >> > >> >> + >> >> +/* Error register offsets */ >> >> +#define TRP_ECC_ERROR_STATUS1 0x00020348 >> >> +#define TRP_ECC_ERROR_STATUS0 0x00020344 >> >> +#define DRP_ECC_ERROR_STATUS1 0x00042048 >> >> +#define DRP_ECC_ERROR_STATUS0 0x00042044 >> >> + >> >> +/* TRP, DRP interrupt register offsets */ >> >> +#define DRP_INTERRUPT_STATUS 0x00041000 >> >> +#define TRP_INTERRUPT_0_STATUS 0x00020480 >> >> +#define DRP_INTERRUPT_CLEAR 0x00041008 >> >> +#define DRP_ECC_ERROR_CNTR_CLEAR 0x00040004 >> >> +#define TRP_INTERRUPT_0_CLEAR 0x00020484 >> >> +#define TRP_ECC_ERROR_CNTR_CLEAR 0x00020440 >> >> + >> >> +/* Mask and shift macros */ >> >> +#define ECC_DB_ERR_COUNT_MASK GENMASK(4, 0) >> >> +#define ECC_DB_ERR_WAYS_MASK GENMASK(31, 16) >> >> +#define ECC_DB_ERR_WAYS_SHIFT BIT(4) >> >> + >> >> +#define ECC_SB_ERR_COUNT_MASK GENMASK(23, 16) >> >> +#define ECC_SB_ERR_COUNT_SHIFT BIT(4) >> >> +#define ECC_SB_ERR_WAYS_MASK GENMASK(15, 0) >> >> + >> >> +#define SB_ECC_ERROR BIT(0) >> >> +#define DB_ECC_ERROR BIT(1) >> >> + >> >> +#define DRP_TRP_INT_CLEAR GENMASK(1, 0) >> >> +#define DRP_TRP_CNT_CLEAR GENMASK(1, 0) >> >> + >> >> +/* Config registers offsets*/ >> >> +#define DRP_ECC_ERROR_CFG 0x00040000 >> >> + >> >> +/* TRP, DRP interrupt register offsets */ >> >> +#define CMN_INTERRUPT_0_ENABLE 0x0003001C >> >> +#define CMN_INTERRUPT_2_ENABLE 0x0003003C >> >> +#define TRP_INTERRUPT_0_ENABLE 0x00020488 >> >> +#define DRP_INTERRUPT_ENABLE 0x0004100C >> >> + >> >> +#define SB_ERROR_THRESHOLD 0x1 >> >> +#define SB_ERROR_THRESHOLD_SHIFT 24 >> >> +#define SB_DB_TRP_INTERRUPT_ENABLE 0x3 >> >> +#define TRP0_INTERRUPT_ENABLE 0x1 >> >> +#define DRP0_INTERRUPT_ENABLE BIT(6) >> >> +#define SB_DB_DRP_INTERRUPT_ENABLE 0x3 >> >> + >> >> +enum { >> >> + LLCC_DRAM_CE = 0, >> >> + LLCC_DRAM_UE, >> >> + LLCC_TRAM_CE, >> >> + LLCC_TRAM_UE, >> >> + LLCC_ERR_TYPE_MAX = LLCC_TRAM_UE + 1, >> > >> > This is a nit, or perhaps personal preference, but I prefer to not >> > have initializers for sentinel values, since it's one more thing >> > someone could forget to update when adding new values. >> >> I'll get rid of this, I was using this one to allocate the memory for >> llcc_driv_data->edac_reg, (struct llcc_edac_reg_data). >> But the suggestion was to initialize that one statically. > > Sounds good. > >> >> >> > >> >> +}; >> >> + >> >> +static int qcom_llcc_core_setup(struct regmap *llcc_bcast_regmap) >> >> +{ >> >> + u32 sb_err_threshold; >> >> + int ret; >> >> + >> >> + /* Enable TRP in instance 2 of common interrupt enable >> >> register */ >> > >> > Can we get a comment explaining what's so special about instance 2? >> > Instances 1 and 3 get no love? >> >> I'll try to elaborate on this. >> >> > >> >> + ret = regmap_update_bits(llcc_bcast_regmap, >> >> CMN_INTERRUPT_2_ENABLE, >> >> + TRP0_INTERRUPT_ENABLE, >> >> + TRP0_INTERRUPT_ENABLE); >> >> + if (ret) >> >> + return ret; >> >> + >> >> + /* Enable ECC interrupts on Tag Ram */ >> >> + ret = regmap_update_bits(llcc_bcast_regmap, >> >> TRP_INTERRUPT_0_ENABLE, >> >> + SB_DB_TRP_INTERRUPT_ENABLE, >> >> + SB_DB_TRP_INTERRUPT_ENABLE); >> >> + if (ret) >> >> + return ret; >> >> + >> >> + /* Enable SB error for Data RAM */ >> >> + sb_err_threshold = (SB_ERROR_THRESHOLD << >> >> SB_ERROR_THRESHOLD_SHIFT); >> >> + ret = regmap_write(llcc_bcast_regmap, DRP_ECC_ERROR_CFG, >> >> + sb_err_threshold); >> >> + if (ret) >> >> + return ret; >> >> + >> >> + /* Enable DRP in instance 2 of common interrupt enable >> >> register */ >> >> + ret = regmap_update_bits(llcc_bcast_regmap, >> >> CMN_INTERRUPT_2_ENABLE, >> >> + DRP0_INTERRUPT_ENABLE, >> >> + DRP0_INTERRUPT_ENABLE); >> >> + if (ret) >> >> + return ret; >> >> + >> >> + /* Enable ECC interrupts on Data Ram */ >> >> + ret = regmap_write(llcc_bcast_regmap, DRP_INTERRUPT_ENABLE, >> >> + SB_DB_DRP_INTERRUPT_ENABLE); >> >> + return ret; >> >> +} >> >> + >> >> +/* Clear the error interrupt and counter registers */ >> >> +static int >> >> +qcom_llcc_clear_errors_status(int err_type, struct llcc_drv_data >> >> *drv) >> > >> > Another nit: errors_status is kind of weird. Maybe >> > qcom_llcc_clear_errors or qcom_llcc_clear_error_status? >> >> I'll update the name. >> >> > >> >> +{ >> >> + int ret = 0; >> >> + >> >> + switch (err_type) { >> >> + case LLCC_DRAM_CE: >> >> + case LLCC_DRAM_UE: >> >> + /* Clear the interrupt */ >> >> + ret = regmap_write(drv->bcast_regmap, >> >> DRP_INTERRUPT_CLEAR, >> >> + DRP_TRP_INT_CLEAR); >> >> + if (ret) >> >> + return ret; >> >> + >> >> + /* Clear the counters */ >> >> + ret = regmap_write(drv->bcast_regmap, >> >> DRP_ECC_ERROR_CNTR_CLEAR, >> >> + DRP_TRP_CNT_CLEAR); >> >> + if (ret) >> >> + return ret; >> >> + break; >> >> + case LLCC_TRAM_CE: >> >> + case LLCC_TRAM_UE: >> >> + ret = regmap_write(drv->bcast_regmap, >> >> TRP_INTERRUPT_0_CLEAR, >> >> + DRP_TRP_INT_CLEAR); >> >> + if (ret) >> >> + return ret; >> >> + >> >> + ret = regmap_write(drv->bcast_regmap, >> >> TRP_ECC_ERROR_CNTR_CLEAR, >> >> + DRP_TRP_CNT_CLEAR); >> >> + if (ret) >> >> + return ret; >> >> + break; >> > >> > A default case that errors or complains or both would be nice. >> >> Ok, I had this thought too, but we never run into that scenario, >> that's we don't ever call this function with any other types, >> Had it been an API it makes sense to have a default case. >> The internal functions require them too! I don't know!! >> What do you think? >> As we say, if it's good to have it, I'll add it. > > I personally like the default case, I find it to be defensive against > someone adding code later who passes the wrong value or type down. > Others might disagree. It's up to you. It makes sense to have a defensive code against someone adding the code later, I'll add it.