Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932167AbdC1JHs (ORCPT ); Tue, 28 Mar 2017 05:07:48 -0400 Received: from mga14.intel.com ([192.55.52.115]:28422 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932077AbdC1JHq (ORCPT ); Tue, 28 Mar 2017 05:07:46 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.36,236,1486454400"; d="scan'208";a="65890790" Subject: Re: [PATCH] mmc: core: Do not hold re-tuning during CMD6 commands To: Ulf Hansson , Chaotian Jing References: <1490336341-22292-1-git-send-email-chaotian.jing@mediatek.com> <1490336341-22292-2-git-send-email-chaotian.jing@mediatek.com> <13a83728-0031-5683-c371-4b517df32299@intel.com> <1490344369.22814.10.camel@mhfsdcap03> <03d54000-9ced-1b31-df80-d254f02433db@intel.com> <1490348427.22814.19.camel@mhfsdcap03> <1490578500.22814.31.camel@mhfsdcap03> Cc: Matthias Brugger , Jaehoon Chung , Shawn Lin , Masahiro Yamada , "linux-mmc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" , linux-mediatek@lists.infradead.org, srv_heupstream From: Adrian Hunter Organization: Intel Finland Oy, Registered Address: PL 281, 00181 Helsinki, Business Identity Code: 0357606 - 4, Domiciled in Helsinki Message-ID: Date: Tue, 28 Mar 2017 12:01:33 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5094 Lines: 139 On 28/03/17 11:30, Ulf Hansson wrote: > [...] > >>> >>> If there is a problem in __mmc_switch(), let's try to fix it there first. >>> >> Anyway, it is a bug of retry 3 times at max but without check current >> card status and ensure it's in transfer state before next retry. > > Correct. Do you want to send a patch that fixes this? Otherwise I can do it... > >>>>>> I think the purpose of "re-tune" is trying to cover particular case(eg. >>>>>> voltage fluctuate or EMI or some glitch of host/device which caused CRC >>>>>> error) >>>>> >>>>> No, re-tuning is to compensate for drift caused primarily by temperature change. >>>>> >>>> Yes, by JEDEC spec, temperature change cause timing drift of EMMC >>>> device, but, as you mentioned, maybe I have a hardware problem of host, >>>> but needs Software to cover it. so that we are doing our best to do >>>> re-tune if got CRC error. if could recover it, then it's better than >>>> system hung. >>> >>> Exactly in what cases do you get CRC errors for CMD6. We need a full >>> cmd log to understand and to help. >>> >>>>>> error) , but in such cases, too many cases are disable re-tune function >>>>>> by mmc_retune_hold(), for example, in this case, if a response CRC error >>>>>> got then we never have chance to recover it. then cause system cannot >>>>>> access emmc or suspend/resume fail. >>>>> >>>>> Maybe you have a hardware problem. >>> >>> There is no way I am going to accept patches touching this part of the >>> mmc core, without providing real evidence for how it solves a problem. >>> To me, it seems like you are applying a workaround for another issue. >>> >>> Again, try to provide us with some more data and logs, then perhaps we >>> can help narrow down the issues. >>> >>> Kind regards >>> Uffe >> >> Below is the fail log of suspend fail. >> the normal command tune result should be 0xffffff9ff, but some time, we >> get the tune result of 0xffffffff, then we choose the 10 as the best >> tune parameter, which is not stable. >> I know that we should focus on why we get the result of 0xffffffff, this >> may be result of device/host timing shifting while tuning. but what I >> want to do is that when get a response CRC error, we can do re-tune to >> recovery it, but not only return the -84 and cause suspend fail >> eventually. if all hardware are perfect, then we don't need the re-tune >> mechanism. > > Thanks for elaborating! > > Can you please also tell exactly which of the CMD6 commands in the > suspend sequence that is triggering this problem? Cache flush? Power > off notification? > >> >> as Adrian's comment, if temperature change at here caused CMD6 response >> CRC error, then how to recovery it ? > > So in your case, allowing re-tuning a little longer in __mmc_switch() > solves your problem. Clearly there are cases when we need to prevent > re-tuning when sending CMD6, however maybe not in all cases as we do > today. > > For example it seems reasonable to not hold retuning before sending > CMD6 for cache flush, but instead it should be sufficient to hold it > before polling for busy in __mmc_switch(). > > Adrian, what's your thoughts on this? mmc_retune_hold() and mmc_retune_release() are designed to go around a group of commands, but re-tuning can still be done before the first command. i.e. mmc_retune_hold cmd A cmd B cmd C mmc_retune_release That is the same in the retry case: mmc_retune_hold cmd A retry cmd A cmd B cmd C mmc_retune_release The retry mechanism provided by mmc_wait_for_cmd() and friends really only makes sense for simple commands. In other cases, like this, we need to consider what state the card is in. For __mmc_switch we need to consider whether the card is busy or whether a timing change been made. > >> >> [ 129.106622] (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase: >> [map:fffff9ff] [maxlen:21] [final:21] -->current result is OK and 21 is >> stable >> [ 129.109404] (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase: >> [map:ffffe03f] [maxlen:19] [final:22] >> --------------------> below is next resume and re-init card: >> [ 129.778454] (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: Regulator set >> error -22: 3300000 - 3300000 >> [ 130.016987] (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase: >> [map:ffffffff] [maxlen:32] [final:10] --> this result if not OK and 10 >> is not stable. > > As you suspect the tuning didn't work out correctly, then why don't > you retry one more time? Or restore the previously known good result? > >> [ 130.019556] (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase: >> [map:ffffc03f] [maxlen:18] [final:23] >> [ 130.124279] (1)[1248:system_server]mmc0: cache flush error -84 >> [ 130.125058] (1)[1248:system_server]dpm_run_callback(): >> mmc_bus_suspend+0x0/0x4c returns -84 >> [ 130.126104] (1)[1248:system_server]PM: Device mmc0:0001 failed to >> suspend: error -84 >> >> >> > > Kind regards > Uffe >