Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752598AbbBMI0q (ORCPT ); Fri, 13 Feb 2015 03:26:46 -0500 Received: from va-smtp01.263.net ([54.88.144.211]:40203 "EHLO va-smtp01.263.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751910AbbBMI0o (ORCPT ); Fri, 13 Feb 2015 03:26:44 -0500 X-Greylist: delayed 684 seconds by postgrey-1.27 at vger.kernel.org; Fri, 13 Feb 2015 03:26:43 EST X-RL-SENDER: addy.ke@rock-chips.com X-FST-TO: kever.yang@rock-chips.com X-SENDER-IP: 127.0.0.1 X-LOGIN-NAME: addy.ke@rock-chips.com X-UNIQUE-TAG: <2d24a06bbb5aedf983ed2a88f01dcd5a> X-ATTACHMENT-NUM: 0 X-DNS-TYPE: 1 Message-ID: <54DDB285.20900@rock-chips.com> Date: Fri, 13 Feb 2015 16:15:01 +0800 From: addy ke User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: alim.akhtar@gmail.com, a.hajda@samsung.com CC: robh+dt@kernel.org, pawel.moll@arm.com, mark.rutland@arm.com, ijc+devicetree@hellion.org.uk, galak@codeaurora.org, rdunlap@infradead.org, tgih.jun@samsung.com, jh80.chung@samsung.com, chris@printf.net, ulf.hansson@linaro.org, dinguyen@altera.com, heiko@sntech.de, olof@lixom.net, dianders@chromium.org, sonnyrao@chromium.org, amstan@chromium.org, djkurtz@chromium.org, huangtao@rock-chips.com, devicetree@vger.kernel.org, hl@rock-chips.com, linux-doc@vger.kernel.org, yzq@rock-chips.com, zyw@rock-chips.com, zhangqing@rock-chips.com, linux-mmc@vger.kernel.org, linux-kernel@vger.kernel.org, kever.yang@rock-chips.com Subject: Re: [PATCH v2 1/2] mmc: dw_mmc: fix bug that cause 'Timeout sending command' References: <1423134801-23219-1-git-send-email-addy.ke@rock-chips.com> <1423466726-20833-1-git-send-email-addy.ke@rock-chips.com> <1423466726-20833-2-git-send-email-addy.ke@rock-chips.com> <54DAC534.4020708@rock-chips.com> <54DB43E2.70203@samsung.com> <54DC0FBB.7010308@rock-chips.com> <54DC8A1B.7070402@samsung.com> In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 12609 Lines: 301 On 2015/2/12 21:59, Alim Akhtar wrote: > On Thu, Feb 12, 2015 at 4:40 PM, Andrzej Hajda wrote: >> On 02/12/2015 03:28 AM, addy ke wrote: >>> Hi Andrzej and Alim >>> >>> On 2015/2/12 07:20, Alim Akhtar wrote: >>>> Hi Andrzej, >>>> >>>> On Wed, Feb 11, 2015 at 5:28 PM, Andrzej Hajda wrote: >>>>> Hi Alim, >>>>> >>>>> On 02/11/2015 03:57 AM, Addy wrote: >>>>>> On 2015/02/10 23:22, Alim Akhtar wrote: >>>>>>> Hi Addy, >>>>>>> >>>>>>> On Mon, Feb 9, 2015 at 12:55 PM, Addy Ke wrote: >>>>>>>> Because of some uncertain factors, such as worse card or worse hardware, >>>>>>>> DAT[3:0](the data lines) may be pulled down by card, and mmc controller >>>>>>>> will be in busy state. This should not happend when mmc controller >>>>>>>> send command to update card clocks. If this happends, mci_send_cmd will >>>>>>>> be failed and we will get 'Timeout sending command', and then system will >>>>>>>> be blocked. To avoid this, we need reset mmc controller. >>>>>>>> >>>>>>>> Signed-off-by: Addy Ke >>>>>>>> --- >>>>>>>> drivers/mmc/host/dw_mmc.c | 28 ++++++++++++++++++++++++++++ >>>>>>>> 1 file changed, 28 insertions(+) >>>>>>>> >>>>>>>> diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c >>>>>>>> index 4d2e3c2..b0b57e3 100644 >>>>>>>> --- a/drivers/mmc/host/dw_mmc.c >>>>>>>> +++ b/drivers/mmc/host/dw_mmc.c >>>>>>>> @@ -100,6 +100,7 @@ struct idmac_desc { >>>>>>>> }; >>>>>>>> #endif /* CONFIG_MMC_DW_IDMAC */ >>>>>>>> >>>>>>>> +static int dw_mci_card_busy(struct mmc_host *mmc); >>>>>>>> static bool dw_mci_reset(struct dw_mci *host); >>>>>>>> static bool dw_mci_ctrl_reset(struct dw_mci *host, u32 reset); >>>>>>>> >>>>>>>> @@ -888,6 +889,31 @@ static void mci_send_cmd(struct dw_mci_slot *slot, u32 cmd, u32 arg) >>>>>>>> cmd, arg, cmd_status); >>>>>>>> } >>>>>>>> >>>>>>>> +static void dw_mci_wait_busy(struct dw_mci_slot *slot) >>>>>>>> +{ >>>>>>>> + struct dw_mci *host = slot->host; >>>>>>>> + unsigned long timeout = jiffies + msecs_to_jiffies(500); >>>>>>>> + >>>>>>> Why 500 msec? >>>>>> This timeout value is the same as mci_send_cmd: >>>>>> static void mci_send_cmd(struct dw_mci_slot *slot, u32 cmd, u32 arg) >>>>>> { >>>>>> struct dw_mci *host = slot->host; >>>>>> unsigned long timeout = jiffies + msecs_to_jiffies(500); >>>>>> .... >>>>>> } >>>>>> >>>>>> I have not clear that which is suitable. >>>>>> Do you have any suggestion on it? >>>>>>>> + do { >>>>>>>> + if (!dw_mci_card_busy(slot->mmc)) >>>>>>>> + return; >>>>>>>> + cpu_relax(); >>>>>>>> + } while (time_before(jiffies, timeout)); >>>>>>>> + >>>>>>>> + dev_err(host->dev, "Data busy (status %#x)\n", >>>>>>>> + mci_readl(slot->host, STATUS)); >>>>>>>> + >>>>>>>> + /* >>>>>>>> + * Data busy, this should not happend when mmc controller send command >>>>>>>> + * to update card clocks in non-volt-switch state. If it happends, we >>>>>>>> + * should reset controller to avoid getting "Timeout sending command". >>>>>>>> + */ >>>>>>>> + dw_mci_ctrl_reset(host, SDMMC_CTRL_ALL_RESET_FLAGS); >>>>>>>> + >>>>>>> Why you need to reset all blocks? may be CTRL_RESET is good enough here. >>>>>> I have tested on rk3288, if only reset ctroller, data busy bit will not >>>>>> be cleaned,and we will still get >>>>>> >>>>>> "Timeout sending command". >>>>>> >>>>>>>> + /* Fail to reset controller or still data busy, WARN_ON! */ >>>>>>>> + WARN_ON(dw_mci_card_busy(slot->mmc)); >>>>>>>> +} >>>>>>>> + >>>>>>>> static void dw_mci_setup_bus(struct dw_mci_slot *slot, bool force_clkinit) >>>>>>>> { >>>>>>>> struct dw_mci *host = slot->host; >>>>>>>> @@ -899,6 +925,8 @@ static void dw_mci_setup_bus(struct dw_mci_slot *slot, bool force_clkinit) >>>>>>>> /* We must continue to set bit 28 in CMD until the change is complete */ >>>>>>>> if (host->state == STATE_WAITING_CMD11_DONE) >>>>>>>> sdmmc_cmd_bits |= SDMMC_CMD_VOLT_SWITCH; >>>>>>>> + else >>>>>>>> + dw_mci_wait_busy(slot); >>>>>>>> >>>>>>> hmm...I would suggest you to call dw_mci_wait_busy() from inside >>>>>>> mci_send_cmd(), seems like dw_mmc hangs while sending update clock cmd >>>>>>> in multiple cases.see [1] >>>>>>> >>>>>>> [1]: http://permalink.gmane.org/gmane.linux.kernel.mmc/31140 >>>>>> I think this patch is more reasonable. >>>>>> So I will resend patches based on this patch. >>>>>> thank you! >>>>> I have tested your patches instead [1] above and they do not solve my issue: >>>>> Board: odroid-xu3/exynos5422/dw_mmc_250a. >>>>> MMC card: absent, broken-cd quirk >>>>> SD card: present >>>>> >>>> I doubt $SUBJECT patch in current form can resolve you issue. I have >>>> already given comments on $subject patch. >>>> >>>> Can you try out below patch (I have not tested yet) on top of $SUBJECT patch? >>>> >>>> ======= >>>> diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c >>>> index b0b57e3..ea87844 100644 >>>> --- a/drivers/mmc/host/dw_mmc.c >>>> +++ b/drivers/mmc/host/dw_mmc.c >>>> @@ -101,6 +101,7 @@ struct idmac_desc { >>>> #endif /* CONFIG_MMC_DW_IDMAC */ >>>> >>>> static int dw_mci_card_busy(struct mmc_host *mmc); >>>> +static void dw_mci_wait_busy(struct dw_mci_slot *slot); >>>> static bool dw_mci_reset(struct dw_mci *host); >>>> static bool dw_mci_ctrl_reset(struct dw_mci *host, u32 reset); >>>> >>>> @@ -874,16 +875,22 @@ static void mci_send_cmd(struct dw_mci_slot >>>> *slot, u32 cmd, u32 arg) >>>> struct dw_mci *host = slot->host; >>>> unsigned long timeout = jiffies + msecs_to_jiffies(500); >>>> unsigned int cmd_status = 0; >>>> + int re_try = 3; /* just random for now, 1 re-try should be ok */ >>>> >>>> - mci_writel(host, CMDARG, arg); >>>> - wmb(); >>>> - mci_writel(host, CMD, SDMMC_CMD_START | cmd); >>>> + while(re_try--) { >>>> + mci_writel(host, CMDARG, arg); >>>> + wmb(); >>>> + mci_writel(host, CMD, SDMMC_CMD_START | cmd); >>>> >>>> - while (time_before(jiffies, timeout)) { >>>> - cmd_status = mci_readl(host, CMD); >>>> - if (!(cmd_status & SDMMC_CMD_START)) >>>> - return; >>>> + while (time_before(jiffies, timeout)) { >>>> + cmd_status = mci_readl(host, CMD); >>>> + if (!(cmd_status & SDMMC_CMD_START)) >>>> + return; >>>> + } >>>> + >>>> + dw_mci_wait_busy(slot); >>>> } >>>> + >>>> dev_err(&slot->mmc->class_dev, >>>> "Timeout sending command (cmd %#x arg %#x status %#x)\n", >>>> cmd, arg, cmd_status); >>>> @@ -925,8 +932,6 @@ static void dw_mci_setup_bus(struct dw_mci_slot >>>> *slot, bool force_clkinit) >>>> /* We must continue to set bit 28 in CMD until the change is complete */ >>>> if (host->state == STATE_WAITING_CMD11_DONE) >>>> sdmmc_cmd_bits |= SDMMC_CMD_VOLT_SWITCH; >>>> - else >>>> - dw_mci_wait_busy(slot); >>>> >>>> if (!clock) { >>>> mci_writel(host, CLKENA, 0); >>>> >>>> ===== end ====== >>> The reason why we are fail to send command is that we got data busy in >>> none-switch-volt state(host->state != STATE_WAITING_CMD11_DONE). >>> So: >>> if(host->state != STATE_WAITING_CMD11_DONE), we must wait until data not busy, >>> And if (host->state == STATE_WAITING_CMD11_DONE) we should not wait. >>> >>>>> System hangs during boot after few minutes kernel spits: >>>>> [ 242.188098] INFO: task kworker/u16:1:50 blocked for more than 120 >>>>> seconds. >>>>> [ 242.193524] Not tainted >>>>> 3.19.0-next-20150210-00002-gf96831b-dirty #3834 >>>>> [ 242.200622] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >>>>> disables this message. >>>>> [ 242.208422] kworker/u16:1 D c04766ac 0 50 2 0x00000000 >>>>> [ 242.214756] Workqueue: kmmcd mmc_rescan >>>>> [ 242.218553] [] (__schedule) from [] >>>>> (schedule+0x34/0x98) >>>>> [ 242.225591] [] (schedule) from [] >>>>> (schedule_timeout+0x110/0x164) >>>>> [ 242.233302] [] (schedule_timeout) from [] >>>>> (wait_for_common+0xb8/0x14c) >>>>> [ 242.241539] [] (wait_for_common) from [] >>>>> (mmc_wait_for_req+0x68/0x17c) >>>>> [ 242.249861] [] (mmc_wait_for_req) from [] >>>>> (mmc_wait_for_cmd+0x80/0xa0) >>>>> [ 242.258002] [] (mmc_wait_for_cmd) from [] >>>>> (mmc_go_idle+0x78/0xf8) >>>>> [ 242.265796] [] (mmc_go_idle) from [] >>>>> (mmc_rescan+0x280/0x314) >>>>> [ 242.273253] [] (mmc_rescan) from [] >>>>> (process_one_work+0x120/0x324) >>>>> [ 242.281135] [] (process_one_work) from [] >>>>> (worker_thread+0x30/0x42c) >>>>> [ 242.289194] [] (worker_thread) from [] >>>>> (kthread+0xd8/0xf4) >>>>> [ 242.296389] [] (kthread) from [] >>>>> (ret_from_fork+0x14/0x34) >>>>> >>>>> Just for record, Exynos4412/dw_mmc_240a with the same configuration >>>>> (no MMC card, broken-cd) works OK without patches. >>> This is because mmc start command,but mmc_request_done() is't called. >>> I have ever found this issue. >>> I found that host does't get DTO interrupt when mmc send command to read data. >>> I have sent a patch for it, see: >>> https://patchwork.kernel.org/patch/5426531/ >>> >>> Would you please merge it and test again? >> >> I have merged it and added quirk to exynos, but it does not help. There >> is still timeout: >> > I don't think this DTO issue. I think we need a way to reproduce this, > at least on Exyons5422/5800. > what type of sd card is being used? Are you trying UHS-I mode by any chance? > Emmc, sd card and sdio, all have this issue on rk3288, I have NOT found data busy before send command, but still not get DTO interrupt. And we have ever found little probability of no command done interrupt. >> [ 242.188178] INFO: task kworker/u16:1:50 blocked for more than 120 >> seconds. >> [ 242.193605] Not tainted >> 3.19.0-next-20150212-00003-g7850750-dirty #3841 >> [ 242.200703] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. >> [ 242.208592] kworker/u16:1 D c04755f4 0 50 2 0x00000000 >> [ 242.214840] Workqueue: kmmcd mmc_rescan >> [ 242.218635] [] (__schedule) from [] >> (schedule+0x34/0x98) >> [ 242.225671] [] (schedule) from [] >> (schedule_timeout+0x110/0x164) >> [ 242.233383] [] (schedule_timeout) from [] >> (wait_for_common+0xb8/0x14c) >> [ 242.241619] [] (wait_for_common) from [] >> (mmc_wait_for_req+0xb0/0x13c) >> [ 242.249848] [] (mmc_wait_for_req) from [] >> (mmc_wait_for_cmd+0x80/0xa0) >> [ 242.258086] [] (mmc_wait_for_cmd) from [] >> (mmc_go_idle+0x78/0xf8) >> [ 242.265876] [] (mmc_go_idle) from [] >> (mmc_rescan+0x25c/0x2e4) >> [ 242.273333] [] (mmc_rescan) from [] >> (process_one_work+0x120/0x324) >> [ 242.281216] [] (process_one_work) from [] >> (worker_thread+0x30/0x42c) >> [ 242.289275] [] (worker_thread) from [] >> (kthread+0xd8/0xf4) >> [ 242.296469] [] (kthread) from [] >> (ret_from_fork+0x14/0x34) >> >> >> Regards >> Andrzej >> >>>>> >>>>> Regards >>>>> Andrzej >>>>> >>>>>>>> if (!clock) { >>>>>>>> mci_writel(host, CLKENA, 0); >>>>>>>> -- >>>>>>>> 1.8.3.2 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> linux-arm-kernel mailing list >>>>>>>> linux-arm-kernel@lists.infradead.org >>>>>>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >>>>>>> >>>>>> >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe linux-doc" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>> >>>> >>> >> > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/