Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754365AbcCaB4d (ORCPT ); Wed, 30 Mar 2016 21:56:33 -0400 Received: from lucky1.263xmail.com ([211.157.147.130]:43542 "EHLO lucky1.263xmail.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750704AbcCaB4b (ORCPT ); Wed, 30 Mar 2016 21:56:31 -0400 X-263anti-spam: KSV:0; X-MAIL-GRAY: 1 X-MAIL-DELIVERY: 0 X-ABS-CHECKED: 4 X-ADDR-CHECKED: 0 X-KSVirus-check: 0 X-RL-SENDER: shawn.lin@rock-chips.com X-FST-TO: javier@osg.samsung.com X-SENDER-IP: 58.22.7.114 X-LOGIN-NAME: shawn.lin@rock-chips.com X-UNIQUE-TAG: X-ATTACHMENT-NUM: 0 X-DNS-TYPE: 0 Subject: Re: [PATCH] mmc: dw_mmc: Wait for data transfer after response errors To: Russell King - ARM Linux , Enric Balletbo Serra References: <20160324153056.GT19428@n2100.arm.linux.org.uk> <20160324162220.GV19428@n2100.arm.linux.org.uk> <20160330172604.GI19428@n2100.arm.linux.org.uk> Cc: shawn.lin@rock-chips.com, Doug Anderson , "linux-mmc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Alim Akhtar , Jaehoon Chung , Ulf Hansson , Alim Akhtar , Sonny Rao , Andrew Bresticker , Heiko Stuebner , Addy Ke , Alexandru Stan , Chris Zhong , Caesar Wang , Javier Martinez Canillas From: Shawn Lin Message-ID: <56FC83C2.3000703@rock-chips.com> Date: Thu, 31 Mar 2016 09:56:18 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.7.0 MIME-Version: 1.0 In-Reply-To: <20160330172604.GI19428@n2100.arm.linux.org.uk> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4794 Lines: 107 在 2016/3/31 1:26, Russell King - ARM Linux 写道: > On Wed, Mar 30, 2016 at 07:16:18PM +0200, Enric Balletbo Serra wrote: >> 2016-03-24 17:22 GMT+01:00 Russell King - ARM Linux : >>> On Thu, Mar 24, 2016 at 09:06:45AM -0700, Doug Anderson wrote: >>>> Russell, >>> ... >>>> Presumably this is similar to what you saw: the host saw the CRC error >>>> but the card knew nothing about it. Sending the stop command during >>>> this time confused the card. Presumably the card was in transfer >>>> state during this time? >>> >>> If the card was in transfer state for a command which expects a stop >>> command, and that stop command was issued after the card entered >>> the transfer state, then I'd expect the card to handle it... though >>> there's always the firmware bug issue. >>> >>> If the card hadn't entered transfer state at the time the stop command >>> was issued.. I think that's more likely to hit card firmware issues. >>> >>> With the tuning commands, there's another case you can hit though: >>> the data transfer may have completed before you get around to sending >>> the stop command. >>> >>> That's why, for sdhci, I came to the conclusion that waiting for the >>> data transfer to complete or timeout was the best solution for SDHCI. >>> >> >> In fact I only saw the problem with dw_mmc-exynos, on dw_mmc-rockchip >> it doesn't happen because it enables the DW_MCI_QUIRK_BROKEN_DTO >> behaviour. What does this is use a kernel timer to signal when DTO >> interrupt does NOT come. Note that if I disable this quirk I can also >> saw the problem on rockchip. >> >>> Maybe, if sending a STOP command does cause card firmware issues, then: >>> >>> 1) it provides evidence that trying to send a stop command on response >>> CRC error is the wrong thing to do (it was talked about making SDHCI >>> do this.) >>> >> >> Seems the same here, so guess is the wrong thing to do. >> >>> 2) it suggests that the solution I came up with for SDHCI is the better >>> solution, rather than trying to immediately recover the situation by >>> sending a STOP command. >>> >> >> I'm wondering if just enable this quirk on exynos too is the proper >> solution. Unfortunately I don't have enough documentation to check >> differences between those controllers. >> Also will really help have access to some hardware that uses >> dw_mmc-pltfm to check if, like on exynos, same issue is triggered. >> Anyone with the hardware who can do some tests? > > I'd really suggest that the dw-mmc folk place a moritorium on quirk > flags, and instead deal with situations like this without resorting > to this kind of thing. > Some quirks and some callbacks have been cleaned in Jaehoon's repo,and still some are going to removed. Finally we do plan to turn dw_mmc core into a pure library.. > sdhci is a good example why the quirk flag approach is totally wrong, > and shows that it leads to an unmaintainable mess. If dw-mmc people > don't want the driver to decend into the same state that sdhci is, > then things like this should not be quirks. sdhci already has a > long-term moritorium on quirk flags until the resulting mess has been > cleaned up. > > The danger that quirk flags cause is also highlighted in your mail: > it's very likely that this _isn't_ a host controller issue at all, Two issues found by dw_mmc-rockchip part, (1) need reset idma when switching between fifo-transfer and idma-transfer. When biu:ciu > 1:6, idma internal fsm take a risk of a race condition to maintain its fifo lookup pointer. It can be very easy reproduce by seting biu:ciu > 1:6.. Common bug for dw_mmc! But unfortunately these details was missing in the commit msg. (2) Missing DTO/DRTO; I missed the thread for this topic, so I need to reproduce it by setting a simple C model code. I can't say more currently until we can find a way to easily reproduce it. But I guess it's NOT a host issue....since I slightly glance at the TMOUT reg at dw_mmc databook and find a software timer requirement: 31:8 data_timeout 0xffffff Value for card Data Read Timeout; same value also used for Data Starvation by Host timeout. The timeout counter is started only after thecard clock is stopped. Value is in number of card output clocks – cclk_out of selected card. Note: The software timer should be used if the timeout value is in the order of 100 ms. In this case, read data timeout interrupt needs to be disabled. > but a MMC protocol issue or a card issue - and the behaviour required > here is not specific to any particular host controller. The problem > with having a quirk flag for it is that you end up with some hosts > enabling it, and other hosts having it disabled only because they > haven't yet tripped over the issue. > -- Best Regards Shawn Lin