Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp1498288pxb; Tue, 8 Feb 2022 20:11:02 -0800 (PST) X-Google-Smtp-Source: ABdhPJzKw3cynH3eLedYM4GsOKzl4rmA5Id3P5TKP3jAnOkZd3VtCVc4vMgOxzo4Q8GwAJCw8dmJ X-Received: by 2002:a17:907:7286:: with SMTP id dt6mr280293ejc.285.1644379862273; Tue, 08 Feb 2022 20:11:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1644379862; cv=none; d=google.com; s=arc-20160816; b=0u+FrzMbULWB3gW86UA1Ft9wuazVDjVJCUE6Aa2NjNGsYOCEx3oKCIiYPVagzxmRVN 5nuEUqyfPc3h1FcApEjv1VqJQp5d0iHA7uBLY+/TfvmPGb1weamRclBj449L38h8KEzz w5XnY0pbWmkp4j1ciS/AcdlxRGVAY6lQyLtLoWBO3S/UsDNqPCuegUIHTTXtyXTNG8sZ g8Wg16D/Dlib0a+QaxGutHHdQ583Q82po9bc3AeHbbk+WKOAe/jGw0CpQ7siaENsoi5W eeCn4PKYhll28ttssI+h2f9mYUm3pju7Ejd7gQVBFZxJAabbEgmOSzcgQSPXrg9/FZ2c 5ySw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :content-language:references:cc:to:subject:from:user-agent :mime-version:date:message-id; bh=xFAXh7/iM10ip2fBes0tSKcufbd6LrlGocOPlTU9pLY=; b=lisypDSvGi/OH5cWQfKbP5oIZsel0gJ+XvoWsXLZbhEl0xWbngOBkUisuzFVvQjxrL XsQItH2/pDEPM+0yTb+WuL1aCW557m5Khm7F7DN01sWD7nZ026psKEPJsGtmGWC7aiXc CtBGsk/Yi0vv+yfjRckGAjSWjP/aJlfuK50nC85d4TsAEbnbsgCVWQgrBDrizb9D8qyZ h7pIVyqYAjhe8s4Sj/ISrTVkTWog4SE6YxWB6coVDUbPDacbvBNRFxOkrctTqi3BLapN KzFw0t7YPwyB4FhLL9DzaHswr4MhtajsH4KKKC2h4sO5h4mI9GoGLybiuRWJ7JXuGQYn ToKg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d1si296699edo.398.2022.02.08.20.10.37; Tue, 08 Feb 2022 20:11:02 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1382896AbiBGOnm (ORCPT + 99 others); Mon, 7 Feb 2022 09:43:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43380 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237782AbiBGO2q (ORCPT ); Mon, 7 Feb 2022 09:28:46 -0500 Received: from metis.ext.pengutronix.de (metis.ext.pengutronix.de [IPv6:2001:67c:670:201:290:27ff:fe1d:cc33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B547DC0401C2 for ; Mon, 7 Feb 2022 06:28:44 -0800 (PST) Received: from gallifrey.ext.pengutronix.de ([2001:67c:670:201:5054:ff:fe8d:eefb] helo=[127.0.0.1]) by metis.ext.pengutronix.de with esmtp (Exim 4.92) (envelope-from ) id 1nH507-0005mA-VU; Mon, 07 Feb 2022 15:28:19 +0100 Message-ID: Date: Mon, 7 Feb 2022 15:28:10 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0 From: Ahmad Fatoum Subject: Re: [BUG] mtd: cfi_cmdset_0002: write regression since v4.17-rc1 To: Tokunori Ikegami , Thorsten Leemhuis , linux-mtd@lists.infradead.org, Joakim.Tjernlund@infinera.com, miquel.raynal@bootlin.com, vigneshr@ti.com, richard@nod.at, "regressions@lists.linux.dev" Cc: Chris Packham , Brian Norris , David Woodhouse , marek.vasut@gmail.com, cyrille.pitchen@wedev4u.fr, "linux-kernel@vger.kernel.org" , Pengutronix Kernel Team , linuxppc-dev@lists.ozlabs.org References: <3dbbcee5-81fc-cdf5-9f8b-b6ccb95beddc@pengutronix.de> <0f2cfcac-83ca-51a9-f92c-ff6495dca1d7@gmail.com> Content-Language: en-US In-Reply-To: <0f2cfcac-83ca-51a9-f92c-ff6495dca1d7@gmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-SA-Exim-Connect-IP: 2001:67c:670:201:5054:ff:fe8d:eefb X-SA-Exim-Mail-From: a.fatoum@pengutronix.de X-SA-Exim-Scanned: No (on metis.ext.pengutronix.de); SAEximRunCond expanded to false X-PTX-Original-Recipient: linux-kernel@vger.kernel.org X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello Tokunori-san, On 29.01.22 19:01, Tokunori Ikegami wrote: > Hi Ahmad-san, > > Thanks for your investigation. > >> The issue is still there with #define FORCE_WORD_WRITE 1: >> >>    jffs2: Write clean marker to block at 0x000a0000 failed: -5 >>    MTD do_write_oneword_once(): software timeout > Which kernel version has been tested about this? I last tested with v5.10.30, but I had briefly tried v5.16-rc as well when first debugging this issue. I have rebased onto v5.17-rc2 now and will use that for further tests. The same issue with word write forcing is reproducible there as well. > Since the buffered writes disabled by 7e4404113686 for S29GL256N and tested on kernel 5.10.16. > So I would like to confirm if the issue depended on the CPU or kernel version, etc. > Note: The chips S29GL064N and S29GL256N seem different the flash Mb size basically. I see. To be extra sure, I have replaced 0x2201 with 0x0c01 to hit the same code paths, but no improvement. >> Doesn't seem to be a buffered write issue here though as the writes >> did work fine before dfeae1073583. Any other ideas? > At first I thought the issue is possible to be resolved by using the word write instead of the buffered writes. > Now I am thinking to disable the changes dfeae1073583 partially with any condition if possible. What seems to work for me is checking if chip_good or chip_ready and map_word is equal to 0xFF. I can't justify why this is ok though. (Worst case bus is floating at this point of time and Hi-Z is read as 0xff on CPU data lines...) > By the way could you please let me know the chip information for more detail? (For example model number, cycle and device ID, etc.) I can't read it off the chip, but vendor uses S29GL064N90FFI02 or S29GL964N11FFI02. Kernel reports it with: ff800000.flash: Found 1 x16 devices at 0x0 in 8-bit bank. Manufacturer ID 0x000001 Chip ID 0x000c01 I am not sure what you mean with cycle. If you tell me what command to run, I can paste the output. Thanks, Ahmad > > Regards, > Ikegami > > > On 2021/12/14 16:23, Thorsten Leemhuis wrote: > >>>> [TLDR: adding this regression to regzbot; most of this mail is compiled >>>> from a few templates paragraphs some of you might have seen already.] >>>> >>>> Hi, this is your Linux kernel regression tracker speaking. >>>> >>>> Top-posting for once, to make this easy accessible to everyone. >>>> >>>> Thanks for the report. >>>> >>>> Adding the regression mailing list to the list of recipients, as it >>>> should be in the loop for all regressions, as explained here: >>>> https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html >>>> >>>> To be sure this issue doesn't fall through the cracks unnoticed, I'm >>>> adding it to regzbot, my Linux kernel regression tracking bot: >>>> >>>> #regzbot ^introduced dfeae1073583 >>>> #regzbot title mtd: cfi_cmdset_0002: flash write accesses on the >>>> hardware fail on a PowerPC MPC8313 to a 8-bit-parallel S29GL064N flash >>>> #regzbot ignore-activity >>>> >>>> Reminder: when fixing the issue, please add a 'Link:' tag with the URL >>>> to the report (the parent of this mail), then regzbot will automatically >>>> mark the regression as resolved once the fix lands in the appropriate >>>> tree. For more details about regzbot see footer. >>>> >>>> Sending this to everyone that got the initial report, to make all aware >>>> of the tracking. I also hope that messages like this motivate people to >>>> directly get at least the regression mailing list and ideally even >>>> regzbot involved when dealing with regressions, as messages like this >>>> wouldn't be needed then. >>>> >>>> Don't worry, I'll send further messages wrt to this regression just to >>>> the lists (with a tag in the subject so people can filter them away), as >>>> long as they are intended just for regzbot. With a bit of luck no such >>>> messages will be needed anyway. >>>> >>>> Ciao, Thorsten (wearing his 'Linux kernel regression tracker' hat). >>>> >>>> P.S.: As a Linux kernel regression tracker I'm getting a lot of reports >>>> on my table. I can only look briefly into most of them. Unfortunately >>>> therefore I sometimes will get things wrong or miss something important. >>>> I hope that's not the case here; if you think it is, don't hesitate to >>>> tell me about it in a public reply. That's in everyone's interest, as >>>> what I wrote above might be misleading to everyone reading this; any >>>> suggestion I gave thus might sent someone reading this down the wrong >>>> rabbit hole, which none of us wants. >>>> >>>> BTW, I have no personal interest in this issue, which is tracked using >>>> regzbot, my Linux kernel regression tracking bot >>>> (https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting >>>> this mail to get things rolling again and hence don't need to be CC on >>>> all further activities wrt to this regression. >>>> >>>> On 13.12.21 14:24, Ahmad Fatoum wrote: >>>>> Hi, >>>>> >>>>> I've been investigating a breakage on a PowerPC MPC8313: The SoC is connected >>>>> via the "Enhanced Local Bus Controller" to a 8-bit-parallel S29GL064N flash, >>>>> which is represented as a memory-mapped cfi-flash. >>>>> >>>>> The regression began in v4.17-rc1 with >>>>> >>>>>     dfeae1073583 ("mtd: cfi_cmdset_0002: Change write buffer to check correct value") >>>>> >>>>> and causes all flash write accesses on the hardware to fail. Example output >>>>> after v5.1-rc2[1]: >>>>> >>>>>     root@host:~# mount -t jffs2 /dev/mtdblock0 /mnt >>>>>     MTD do_write_buffer_wait(): software timeout, address:0x000c000b. >>>>>     jffs2: Write clean marker to block at 0x000c0000 failed: -5 >>>>> >>>>> This issue still persists with v5.16-rc. Reverting aforementioned patch fixes >>>>> it, but I am still looking for a change that keeps both Tokunori's and my >>>>> hardware happy. >>>>> >>>>> What Tokunori's patch did is that it strengthened the success condition >>>>> for flash writes: >>>>> >>>>>    - Prior to the patch, DQ polling was done until bits >>>>>      stopped toggling. This was taken as an indicator that the write succeeded >>>>>      and was reported up the stack. i.e. success condition is chip_ready() >>>>> >>>>>    - After the patch, polling continues until the just written data is >>>>>      actually read back, i.e. success condition is chip_good() >>>>> >>>>> This new condition never holds for me, when DQ stabilizes, it reads 0xFF, >>>>> never the just written data. The data is still written and can be read back >>>>> on subsequent reads, just not at that point of time in the poll loop. >>>>> >>>>> We haven't had write issues for the years predating that patch. As the >>>>> regression has been mainline for a while, I am wondering what about my setup >>>>> that makes it pop up here, but not elsewhere? >>>>> >>>>> I consulted the data sheet[2] and found Figure 27, which describes DQ polling >>>>> during embedded algorithms. DQ switches from status output to "True" (I assume >>>>> True == all bits set == 0xFF) until CS# is reasserted. >>>>> >>>>> I compared with another chip's datasheet, and it (Figure 8.4) doesn't describe >>>>> such an intermittent "True" state. In any case, the driver polls a few hundred >>>>> times, however, before giving up, so there should be enough CS# toggles. >>>>> >>>>> >>>>> Locally, I'll revert this patch for now. I think accepting 0xFF as a success >>>>> condition may be appropriate, but I don't yet have the rationale to back it up. >>>>> >>>>> I am investigating this some more, probably with a logic trace, but I wanted >>>>> to report this in case someone has pointers and in case other people run into >>>>> the same issue. >>>>> >>>>> >>>>> Cheers, >>>>> Ahmad >>>>> >>>>> [1] Prior to d9b8a67b3b95 ("mtd: cfi: fix deadloop in cfi_cmdset_0002.c do_write_buffer") >>>>>       first included with v5.1-rc2, failing writes just hung indefinitely in kernel space. >>>>>       That's fixed, but the writes still fail. >>>>> >>>>> [2]: 001-98525 Rev. *B, https://www.infineon.com/dgdl/Infineon-S29GL064N_S29GL032N_64_Mbit_32_Mbit_3_V_Page_Mode_MirrorBit_Flash-DataSheet-v03_00-EN.pdf?fileId=8ac78c8c7d0d8da4017d0ed556fd548b >>>>> >>>>> [3]: https://www.mouser.com/datasheet/2/268/SST39VF1601C-SST39VF1602C-16-Mbit-x16-Multi-Purpos-709008.pdf >>>>>        Note that "true data" means valid data here, not all bits one. >>>>> >> > -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |