Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp1837998ybt; Mon, 15 Jun 2020 10:39:21 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyXQ+SPGM2sYkXUtlA3u/fc3XxD2W8aquWZpPhFhqsQCqiZ0rJKuk1U9rNZwIB6UWLGuS1Z X-Received: by 2002:a17:906:97cb:: with SMTP id ef11mr3941582ejb.69.1592242761717; Mon, 15 Jun 2020 10:39:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1592242761; cv=none; d=google.com; s=arc-20160816; b=lYpB9edaZ+dydbMFl3zwpcvl+uB/gUYUMJIuVrjZeLdEFZZSZldEHdXi5ing9m5MiP Ybm8hP5MPTXPygLj8b83BLIeLmjgHxirBr2sbdI/Jy7qQpPB7SYLi6TVvmKFna0fgDxd jk/y39v+pc6Iy49UEJ0M14WMFkZ1EC5dg4u1G1KQEfJSZqTCvt4G7YCfUWpX1UwYJcXi eIeyZM8f6Q7iF9QASpOoj8w48BIlv49rCCtQhMmA5Oakoin07bMooHz8ekewH707TFG+ 5qVEW0Rps0G+Kil8sUMTSZqiUf7zXBFOe2qQPYojYp7UXvZe3VEPaHo/wBe1gI4CVw3d lhDQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date; bh=OJ6omvsRNXZAU43uBk7ylIh6UflKcje17BoYJjceEpA=; b=lkjisIt/2yCOdrSC1I+FaUUksvnhbdBT9DbsLR4Zv76fJ2p3wzH5RuC/PXq7APN2VJ 2L2dmj2mx0wvIJb/7XqbqjI2mfOpc3tmOantKU3GD4kS2RrzI2JNmZcK4YEMMaVgn38g /a1+FAeSYC0AzdHJxzL+CSODJK68dj1T12ck17izgT6N+M6x76qi/TwprCoyhu8DlD+z s0n44a8eNP/pQ6Q2MjqZeAcFNNbvqDLpoz68LC+JVGPhH9Z+FIUWAR/Z1uAOK/r1ZmfK oPE9AXgiabzg88ETbMym89bZKBcZC3YJG28yl/WLfeFz6RfwHD4zbYF0BOdwjTEYcfWX NdxA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id o25si9390771eja.361.2020.06.15.10.38.59; Mon, 15 Jun 2020 10:39:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729966AbgFORhS convert rfc822-to-8bit (ORCPT + 99 others); Mon, 15 Jun 2020 13:37:18 -0400 Received: from relay11.mail.gandi.net ([217.70.178.231]:52021 "EHLO relay11.mail.gandi.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729124AbgFORhS (ORCPT ); Mon, 15 Jun 2020 13:37:18 -0400 Received: from xps13 (unknown [91.224.148.103]) (Authenticated sender: miquel.raynal@bootlin.com) by relay11.mail.gandi.net (Postfix) with ESMTPSA id 30E6B100005; Mon, 15 Jun 2020 17:37:14 +0000 (UTC) Date: Mon, 15 Jun 2020 19:37:12 +0200 From: Miquel Raynal To: Kamal Dasu Cc: Brian Norris , Richard Weinberger , Vignesh Raghavendra , MTD Maling List , bcm-kernel-feedback-list , Linux Kernel Mailing List Subject: Re: [PATCH 2/2] mtd: rawnand: brcmnand: Ecc error handling on EDU transfers Message-ID: <20200615193712.487d1f85@xps13> In-Reply-To: References: <20200611054454.2547-1-kdasu.kdev@gmail.com> <20200611054454.2547-2-kdasu.kdev@gmail.com> <20200611092707.75da8c6a@xps13> <20200612090728.043b6baf@xps13> <20200615091923.0c3c7aa7@xps13> Organization: Bootlin X-Mailer: Claws Mail 3.17.4 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Kamal, Kamal Dasu wrote on Mon, 15 Jun 2020 11:11:00 -0400: > On Mon, Jun 15, 2020 at 3:19 AM Miquel Raynal wrote: > > > > Hi Kamal, > > > > Kamal Dasu wrote on Fri, 12 Jun 2020 12:34:22 > > -0400: > > > > > On Fri, Jun 12, 2020 at 3:07 AM Miquel Raynal wrote: > > > > > > > > Hi Kamal, > > > > > > > > Kamal Dasu wrote on Thu, 11 Jun 2020 12:04:29 > > > > -0400: > > > > > > > > > On Thu, Jun 11, 2020 at 3:27 AM Miquel Raynal wrote: > > > > > > > > > > > > Hi Kamal, > > > > > > > > > > > > Kamal Dasu wrote on Thu, 11 Jun 2020 01:44:54 > > > > > > -0400: > > > > > > > > > > > > > Implemented ECC correctable and uncorrectable error handling for EDU > > > > > > > > > > > > Implement? > > > > > > > > > > > > > reads. If ECC correctable bitflips are encountered on EDU transfer, > > > > > > > > > > > > extra space ^ > > > > > > > > > > > > > read page again using pio, This is needed due to a controller lmitation > > > > > > > > > > > > s/pio/PIO/ > > > > > > > > > > > > > where read and corrected data is not transferred to the DMA buffer on ECC > > > > > > > errors. This holds true for ECC correctable errors beyond set threshold. > > > > > > > > > > > > error. > > > > > > > > > > > > Not sure what the last sentence means? > > > > > > > > > > > > > > > > NAND controller allows for setting a correctable ECC threshold number > > > > > of bits beyond which it will actually report the error to the driver. > > > > > e.g. for BCH-4 the threshold is 3, so 3-bit and 4-bit errors will > > > > > generate correctable ECC interrupt however 1-bit and 2-bit errors will > > > > > be corrected silently. > > > > > From the above example EDU hardware will not transfer corrected data > > > > > to the DMA buffer for 3-bit and 4-bit errors that get reported. So > > > > > once we detect > > > > > the error duing EDU we read the page again using pio. > > > > > > > > Ok I see what you mean, can't you fake the threshold instead? The NAND > > > > controller in Linux is not supposed to handle this threshold, the NAND > > > > core is in charge. So what the controller driver should do is just: > > > > increase the number of bitflips + return the maximum number or bitflip > > > > or increase the failure counter. Is this already the case? > > > > > > > /* threshold = ceil(BCH-level * 0.75) */ > > > brcmnand_wr_corr_thresh(host, DIV_ROUND_UP(chip->ecc.strength * 3, 4)); > > > This how the threshold is set, all it means is that for high BCH > > > levels don't interrupt on low number (less than threshold) of > > > bit_flips. Yes the controller driver only increments correctable ECC > > > count. But due the EDU design an EDU operation is disrupted when the > > > controller interrupts on correctable ECC errors during subpage ECC > > > calculations. Hence the driver needs to read the page again with PIO > > > to transfer corrected data. > > > > IIUC, you are doing the job twice: you should just return a number of > > bitflips or an error to the NAND core. So that's why I'm telling that > > you should get rid of this threshold. It would avoid the need for the > > PIO transfer too. > > I think you are reading some statements in isolation that probably are > causing some confusion. > EDU design has a flaw in case of reported ECC error interrupt in that > corrected data is not transferred to the DMA buffer. The PIO is needed > to read corrected data into the NAND data buffer and only for the > reported errors. So there is no need to change the threshold > calculation logic, if we get rid of the threshold then we will have to > do the PIO read on any correctable bit error if it occurs during EDU > reads. > > > > > You also say that the controller "only increments correctable ECC > > count", what do you mean exactly? > > Maybe that statement was a bit misleading. To be clear when an ECC > error is reported the controller gives the bit_flips count as well as > updates the ECC error address Register and ecc error status registers. > This logic works as expected in the hardware. > > >The controller does not report errors > > when the number of bitflips happens to be above the BCH threshold? This > > would be the only case where what is currently done would be actually > > needed though. > > It's the other way. The controller only reports bit errors beyond > >=threshold value, will not report otherwise and silently correct the > data. There is no problem in cases where erros are corrected > silently. Now ECC (un)correctable on EDU reads are detected by simply > reading back the ECC Error address register. And in case of reported > uncorrectable ECC errors are treated as usual. And for reported > correctable ECC errors we need to read the page again using PIO so > that the corrected data is properly transferred. All this applies to > EDU transfer only. > Thank you very much for the explanation, I understand better how this controller works now. Cheers, Miquèl