Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp299094ybi; Fri, 7 Jun 2019 08:14:48 -0700 (PDT) X-Google-Smtp-Source: APXvYqwYuMdoncMQTbcK5nbGrp9IVs9OGQPV2VB1CmKRkQap4bdx5SqNfyqcLh1i5HHp4S+6Vpc+ X-Received: by 2002:a17:902:4643:: with SMTP id o61mr3965747pld.101.1559920488575; Fri, 07 Jun 2019 08:14:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559920488; cv=none; d=google.com; s=arc-20160816; b=r4QOxmCDNabNKDU9/e28JR+MMFGy/fNjSqbic0sMUMZ/hnjmlnvUeNjSieKeY+37j/ PvWt17aGXTXP3T97CUDdzrj0VY668O7pq8zIt0ZEZMpDBY2ugKvr1fQtLVj31BMvzUPq cHbHyVKUB8mdfjTmdIwkxFDxL6pkiJZtFupBwMWiZeVwGwc7ERs7a6ATpmt/y29kaw40 q76C+S9oDlHBmvHe11t5v+wcdrCPliJxj3Tg63s2EWkQYAyk1q31FtSqSBokjAGqDxS7 scFpbkRP5aXeJKsCPDl39Lif9xlRzMWAvC2MSbVec4mqoQ+ju8CamTxpMSfOLWktiWM2 R1/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=c6Vlg5R2lZRwZsK17OLywxRNWSMAcEBH303UxgbOYXc=; b=J3YWeDJI5Y0n9dtW89MhGNi/2/46G41sGDh+AgzrfKDh8BudrptzUDKfPeD36pp2Ep tOiE7EHcxTdPhMMSbqJI+EySYYW6fNQtZwNcIWX3dHb5ecV5zpnMT1bF89MbiKqD/qfI 9Lhnavb1AfFZMUz6BMQ6FBZj3YAFVjp5RCcB0/v27WaBUTRLUVUYUck3jnNzvvkZssA6 jLdiwVdnqnUtIdeWmmNBRT/Gladch9zcv7DhQ2N+h1WhFE40cnsQEQ0To7RWHiY/nhCA 2ZWyRrU+xl7Y/SBaiXp7qC8XngPVC9y7eLHWJUF5aLiMtBexglfZdBzVs1UbrfxUPmIq XvOg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x3si1958622plb.427.2019.06.07.08.14.31; Fri, 07 Jun 2019 08:14:48 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729819AbfFGPLt (ORCPT + 99 others); Fri, 7 Jun 2019 11:11:49 -0400 Received: from foss.arm.com ([217.140.110.172]:42296 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729173AbfFGPLr (ORCPT ); Fri, 7 Jun 2019 11:11:47 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 84117C0A; Fri, 7 Jun 2019 08:11:46 -0700 (PDT) Received: from [10.1.196.105] (eglon.cambridge.arm.com [10.1.196.105]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3F8053F718; Fri, 7 Jun 2019 08:11:44 -0700 (PDT) Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC To: "Shenhar, Talel" Cc: "Hawa, Hanna" , Borislav Petkov , "Herrenschmidt, Benjamin" , "robh+dt@kernel.org" , "Woodhouse, David" , "paulmck@linux.ibm.com" , "mchehab@kernel.org" , "mark.rutland@arm.com" , "gregkh@linuxfoundation.org" , "davem@davemloft.net" , "nicolas.ferre@microchip.com" , "devicetree@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "Chocron, Jonathan" , "Krupnik, Ronen" , "linux-edac@vger.kernel.org" , "Hanoch, Uri" References: <1559211329-13098-1-git-send-email-hhhawa@amazon.com> <1559211329-13098-3-git-send-email-hhhawa@amazon.com> <20190531051400.GA2275@cz.tnic> <32431fa2-2285-6c41-ce32-09630205bb54@arm.com> <71da083e-1a74-cf86-455d-260a34ee01fd@amazon.com> From: James Morse Message-ID: <25efb27c-b725-137d-5735-b3ab88323846@arm.com> Date: Fri, 7 Jun 2019 16:11:42 +0100 User-Agent: Mozilla/5.0 (X11; Linux aarch64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0 MIME-Version: 1.0 In-Reply-To: <71da083e-1a74-cf86-455d-260a34ee01fd@amazon.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-GB Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi guys, On 06/06/2019 12:37, Shenhar, Talel wrote: >>> Disagree. The various drivers don't depend on each other. >>> I think we should keep the drivers separated as they are distinct and independent IP >>> blocks. >> But they don't exist in isolation, they both depend on the integration-choices/firmware >> that makes up your platform. >> >> Other platforms may have exactly the same IP blocks, configured differently, or with >> different features enabled in firmware. This means we can't just probe the driver based on >> the presence of the IP block, we need to know the integration choices and firmware >> settings match what the driver requires. >> >> (Case in point, that A57 ECC support is optional, another A57 may not have it) >> >> Descriptions of what firmware did don't really belong in the DT. Its not a hardware >> property. >> >> This is why its better to probe this stuff based on the machine-compatible/platform-name, >> not the presence of the IP block in the DT. >> >> >> Will either of your separate drivers ever run alone? If they're probed from the same >> machine-compatible this won't happen. >> >> >> How does your memory controller report errors? Does it send back some data with an invalid >> checksum, or a specific poison/invalid flag? Will the cache report this as a cache error >> too, if its an extra signal, does the cache know what it is? >> >> All these are integration choices between the two IP blocks, done as separate drivers we >> don't have anywhere to store that information. Even if you don't care about this, making >> them separate drivers should only be done to make them usable on other platforms, where >> these choices may have been different. > From our perspective, l1/l2 has nothing to do with the ddr memory controller. I understand you're coming from the position that these things have counters, you want something to read and export them. I'm coming at this from somewhere else. This stuff has to be considered all the way through the system. Just because each component supports error detection, doesn't mean you aren't going to get silent corruption. Likewise if another platform picks up two piecemeal edac drivers for hardware it happens to have in common with yours, it doesn't mean we're counting all the errors. This stuff has to be viewed for the whole platform. > Its right that they both use same edac subsystem but they are using totally different APIs > of it. > > We also even want to have separate control for enabling/disabling l1/l2 edac vs memory > controller edac. Curious, what for? Surely you either care about counting errors, or you don't. > Even from technical point-of-view L1/L2 UE collection method is totally different from > collecting memory-controller UE. (CPU exception vs actual interrupts). > > So there is less reason why to combine them vs giving each one its own file, e.g. > al_mc_edac, al_l1_l2_edac (I even don't see why Hanna combined l1 and l2...) > As we don't have any technical relation between the two we would rather avoid this > combination. > > Also, Lets assume we have different setups with different memory controllers, having a dt > binding to control the difference is super easy and flexible. If the hardware is different you should describe this in the DT. I'm not suggesting you don't describe it. The discussion here is whether we should probe the driver based on a dummy-node compatible, (which this 'edac_l1_l2' is) or based on the machine compatible. At the extreme end: you should paint the CPU and cache nodes with a compatible describing your integration. (I've mangled Juno's DT here:) | A57_0: cpu@0 { | compatible = "amazon-al,cortex-a57", "arm,cortex-a57"; | reg = <0x0 0x0>; | device_type = "cpu"; | next-level-cache = <&A57_L2>; | }; | [...] | | A57_L2: l2-cache0 { | compatible = "amazon-al,cache", "cache"; | cpu_map = | }; This is the most accurate way to describe what you have here. The driver can use this to know that this integration of CPU and Cache support the edac registers. (This doesn't tell us anything about whether firmware enabled this stuff, or made/left it all secure-only) But this doesn't give you a device you can bind a driver to, to kick this stuff off. This (I assume) is why you added a dummy 'edac_l1_l2' node, that just probes the driver. The hardware is to do with the CPU and caches, 'edac_l1'_l2' doesn't correspond to any distinct part of the soc. The request is to use the machine compatible, not a dummy node. This wraps up the firmware properties too, and any other platform property we don't know about today. Once you have this, you don't really need the cpu/cache integration annotations, and your future memory-controller support can be picked up as part of the platform driver. If you have otherwise identical platforms with different memory controllers, OF gives you the API to match the node in the DT. > Would having a dedicated folder for amazon ease the move to separate files? I don't think anyone cares about the number of files. Code duplication and extra boiler-plate, maybe. Thanks, James