Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp454964ybv; Fri, 7 Feb 2020 02:31:51 -0800 (PST) X-Google-Smtp-Source: APXvYqyoGmcdOoFOVrezSp8nruPmAoWcG7wNppX7nNOqPevMm3cEARFJnduZ6CPDSmCGXp2Gl5yS X-Received: by 2002:a05:6830:95:: with SMTP id a21mr2133764oto.171.1581071511709; Fri, 07 Feb 2020 02:31:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581071511; cv=none; d=google.com; s=arc-20160816; b=0bA2qQSq3+6Jsd37RWz6K4+w8U+P5ONdLEKsHFjqhjxtVOFslmsZzPrSLdDnPmaUT9 1xAIho/0f4f4mIsscBDkwyks6j10fJ42LpklA3/jozbWvQwSZCNXKLxXG+DZaJV6TT4m egLPGyxluXEU6MJbkUb7eLFWVtuBqAZOhpD64/4oeYVmPEnf4Xkhbfn+PcOAxNM5SOUS 70+KpSCvjnRyxvydDq7N+MArHWy/VCqnvIlFNlNSChybUp6BVwvt1ogIgD9HFZXvMRua SfJW3KHlaSffaI+/KZcwgUPqI6Otc4ORYSvwZ/Bigx9Y+wOND9ASOXRKdvYYfq923aNR qYBQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=9wLC/8S5wMaJ/9sB3eOaxQTJB5yGMNR40nF0BMpkAhM=; b=BlQ0ra2bY7MqnEyjdPyy6FG5e/ePuBuJG1YbBxu4yTi2cYQlNMrfvOgUJDl2rv5VzK lXH/t5QubuGxCQj+efcQU3g7HgusslupQqE2bS3DxtlfIEJRA8D6RP8QyoV5SkCvnE9F 41C0Gg/AcbCZTYNHoLihFqGtxv+5Y7xwpqlkm+03KzwcSveeKjfUKnV+q4XDiZ4Zelv5 3mKPRKwk7SH6rc4Ar7PgVqHXG2tWirjqzM5qpKMoTl1ZJitk6kABC7RKoHze2qOG/Fjx wlRwZM77jCiUxipak2HQe2XdgyfdsMy1U3vIXi27Qe2luxXkU78vKelRuy2+4tsYMXXv Y7cw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d19si1394055ote.3.2020.02.07.02.31.39; Fri, 07 Feb 2020 02:31:51 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726915AbgBGKaq (ORCPT + 99 others); Fri, 7 Feb 2020 05:30:46 -0500 Received: from smtp2207-205.mail.aliyun.com ([121.197.207.205]:41971 "EHLO smtp2207-205.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726587AbgBGKaq (ORCPT ); Fri, 7 Feb 2020 05:30:46 -0500 X-Alimail-AntiSpam: AC=CONTINUE;BC=0.07436287|-1;CH=green;DM=CONTINUE|CONTINUE|true|0.126263-0.0024786-0.871258;DS=CONTINUE|ham_regular_dialog|0.0595875-0.00190005-0.938512;FP=0|0|0|0|0|-1|-1|-1;HT=e02c03279;MF=liaoweixiong@allwinnertech.com;NM=1;PH=DS;RN=16;RT=16;SR=0;TI=SMTPD_---.GlYZUym_1581071437; Received: from 192.168.31.126(mailfrom:liaoweixiong@allwinnertech.com fp:SMTPD_---.GlYZUym_1581071437) by smtp.aliyun-inc.com(10.147.44.118); Fri, 07 Feb 2020 18:30:38 +0800 Subject: Re: [PATCH v1 11/11] mtd: new support oops logger based on pstore/blk To: Miquel Raynal Cc: Kees Cook , Anton Vorontsov , Colin Cross , Tony Luck , Jonathan Corbet , Richard Weinberger , Vignesh Raghavendra , Mauro Carvalho Chehab , "David S. Miller" , Rob Herring , Greg Kroah-Hartman , Jonathan Cameron , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mtd@lists.infradead.org References: <1579482233-2672-1-git-send-email-liaoweixiong@allwinnertech.com> <1579482233-2672-12-git-send-email-liaoweixiong@allwinnertech.com> <20200120110306.32e53fd8@xps13> <27226590-379c-8784-f461-f5d701015611@allwinnertech.com> <20200121094802.61f8cb4d@xps13> <2c6000b1-ae25-564b-911a-2879e9c244b2@allwinnertech.com> <20200122184114.125b42c8@xps13> <20200206164559.59c5eb6a@xps13> <6a1b50f4-320f-43d1-50e3-b0a2c3c7fb96@allwinnertech.com> <20200207094121.0ad702d0@xps13> From: liaoweixiong Message-ID: <05c21cb6-988b-63a7-60ff-c4e975f25817@allwinnertech.com> Date: Fri, 7 Feb 2020 18:30:37 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1 MIME-Version: 1.0 In-Reply-To: <20200207094121.0ad702d0@xps13> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org hi Miquel Raynal, On 2020/2/7 下午4:41, Miquel Raynal wrote: > Hi Liao, > > liaoweixiong wrote on Fri, 7 Feb 2020 > 12:13:08 +0800: > >> hi Miquel Raynal, >> >> On 2020/2/6 PM 11:45, Miquel Raynal wrote: >>> Hi liao, >>> >>> liaoweixiong wrote on Thu, 6 Feb 2020 >>> 21:10:47 +0800: >>> >>>> hi Miquel Raynal, >>>> >>>> On 2020/1/23 AM 1:41, Miquel Raynal wrote: >>>>> Hello, >>>>> >>>>> >>>>>>>> +/* >>>>>>>>>> + * All zones will be read as pstore/blk will read zone one by one when do >>>>>>>>>> + * recover. >>>>>>>>>> + */ >>>>>>>>>> +static ssize_t mtdpstore_read(char *buf, size_t size, loff_t off) >>>>>>>>>> +{ >>>>>>>>>> + struct mtdpstore_context *cxt = &oops_cxt; >>>>>>>>>> + size_t retlen; >>>>>>>>>> + int ret; >>>>>>>>>> + >>>>>>>>>> + if (mtdpstore_block_isbad(cxt, off)) >>>>>>>>>> + return -ENEXT; >>>>>>>>>> + >>>>>>>>>> + pr_debug("try to read off 0x%llx size %zu\n", off, size); >>>>>>>>>> + ret = mtd_read(cxt->mtd, off, size, &retlen, (u_char *)buf); >>>>>>>>>> + if ((ret < 0 && !mtd_is_bitflip(ret)) || size != retlen) { >>>>>>>>> >>>>>>>>> IIRC size != retlen does not mean it failed, but that you should >>>>>>>>> continue reading after retlen bytes, no? >>>>>>>>> >> >>>>>>>> Yes, you are right. I will fix it. Thanks. >>>>>>>> >>>>> Also, mtd_is_bitflip() does not mean that you are reading a false >>>>>>>>> buffer, but that the data has been corrected as it contained bitflips. >>>>>>>>> mtd_is_eccerr() however, would be meaningful. >>>>>>>>> >> >>>>>>>> Sure I know mtd_is_bitflip() does not mean failure, but I do not think >>>>>>>> mtd_is_eccerr() should be here since the codes are ret < 0 and NOT >>>>>>>> mtd_is_bitflip(). >>>>>>> >>>>>>> Yes, just drop this check, only keep ret < 0. >>>>>>> >> >>>>>> If I don't get it wrong, it should not be dropped here. Like your words, >>>>>> "mtd_is_bitflip() does not mean that you are reading a false buffer, >>>>>> but that the data has been corrected as it contained bitflips.", the >>>>>> data I get are valid even if mtd_is_bitflip() return true. It's correct >>>>>> data and it's no need to go to handle error. To me, the codes >>>>>> should be: >>>>>> if (ret < 0 && !mit_is_bitflip()) >>>>>> [error handling] >>>>> >>>>> Please check the implementation of mtd_is_bitflip(). You'll probably >>>>> figure out what I am saying. >>>>> >>>>> https://elixir.bootlin.com/linux/latest/source/include/linux/mtd/mtd.h#L585 >>>>> >> >>>> How about the codes as follows: >>>> >>>> for (done = 0, retlen = 0; done < size; done += retlen) { >>>> ret = mtd_read(..., &retlen, ...); >>>> if (!ret) >>>> continue; >>>> /* >>>> * do nothing if bitflip and ecc error occurs because whether >>>> * it's bitflip or ECC error, just a small number of bits flip >>>> * and the impact on log data is so small. The mtdpstore just >>>> * hands over what it gets and user can judge whether the data >>>> * is valid or not. >>>> */ >>>> if (mtd_is_bitflip(ret)) { >>>> dev_warn("bitflip at...."); >>>> continue; >> >>> I don't understand why do you check for bitflips. Bitflips have been >>> corrected at this stage, you just get the information that there >>> has been bitflips, but the data integrity is fine. >>> >> >> Both of bitflip and eccerror are not real wrong in this >> case. So we must check them. >> >>> I am not against ignoring ECC errors in this case though. I would >>> propose: >>> >>> for (...) { >>> if (ret < 0) { >>> complain; >>> return; >>> } >>> >> >> -117 (-EUCLEAN) means bitflip but be corrected. >> -74 (-EBADMSG) means ecc error that uncorrectable >> All of them are negative number that smaller than 0. If it just keeps >> "ret < 0", it can never make a difference between bitflip/eccerror >> and others. > > I forgot about these, your're right, so what about: > > static bool mtdpstore_is_io_error(int ret) > { > return ret < 0 && !mtd_is_eccerr(ret)> && !mtd_is_bitflip(ret); > } > > for (...) { > if (mtdpstore_is_io_error(ret)) > return ret; > > /* > * Comment explaining why we still return valid data > * despite ECC errors. > */ > if (mtd_is_eccerr(ret)) > just-complain(); > } > > This snippet makes a difference between any "controller/bus > timeout/error : data could not be retrieved" and "ECC error, maybe we > can still read it and try to understand (risky, must be warned)". > That seems good to me. I will fix it later. Thanks for your review. >> >>> if (mtd_is_eccerr()) >>> complain; >>> } >>> >>>> } else if (mtd_is_eccerr(ret)) { >>>> dev_warn("eccerr at...."); >>>> retlen = retlen == 0 ? size : retlen; >>>> continue; >>>> } else { >>>> dev_err("read failure at..."); >>>> /* this zone is broken, try next one */ >>>> return -ENEXT; >>>> } >>>> } >>>> >>> >>> >>> Thanks, >>> Miquèl >>> > > Thanks, > Miquèl >