Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752769AbbKXB7b (ORCPT ); Mon, 23 Nov 2015 20:59:31 -0500 Received: from mailout1.samsung.com ([203.254.224.24]:36708 "EHLO mailout1.samsung.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751568AbbKXB72 (ORCPT ); Mon, 23 Nov 2015 20:59:28 -0500 X-AuditID: cbfee68d-f79286d000007523-d9-5653c47e588c Message-id: <5653C47E.9030801@samsung.com> Date: Tue, 24 Nov 2015 10:59:26 +0900 From: Jaehoon Chung User-Agent: Mozilla/5.0 (X11; Linux i686; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-version: 1.0 To: Jorge Ramirez-Ortiz , Doug Anderson Cc: Ulf Hansson , Alim Akhtar , Sonny Rao , Andrew Bresticker , =?UTF-8?B?SGVpa28gU3TDvGJuZXI=?= , "linux-kernel@vger.kernel.org" , "linux-mmc@vger.kernel.org" , Guodong Xu Subject: Re: dw_mmc: HLE errors References: <56531E3B.6060601@linaro.org> <56534CFA.7070403@linaro.org> <5653AB49.3090901@samsung.com> <5653C383.3090802@linaro.org> In-reply-to: <5653C383.3090802@linaro.org> Content-type: text/plain; charset=utf-8 Content-transfer-encoding: 7bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrIIsWRmVeSWpSXmKPExsWyRsSkRLfuSHCYwbWb2hYr3/9ltHgwbxub xdllB9ks/t/pYbX4/+g1q8X5m+dYLC7vmsNmceR/P6PFkzMzGS2Orw134PKY3XCRxePOtT1s Hn1bVjF6bL82j9nj8ya5ANYoLpuU1JzMstQifbsEroxL32+wFewwqHi9qp+tgfGVWhcjJ4eE gInExslbGCFsMYkL99azdTFycQgJrGCU+Pe5lxGmqPliKyNEYimjxN59P5khnAeMEk9XLmID qeIV0JJ4tqwHrINFQFXi7uu1rCA2m4COxPZvx5lAbFGBMIkH6/ayQtQLSvyYfI8FxBYRSJRY sfc2K8hQZoGfTBLP7l8FGyQsIC+xuX81M4gtJHCeUWLhkTgQmxNo2eXNvUBxDqAGdYkpU3JB wswg5Wvegh0nIfCSXWLxh5nMEAcJSHybfIgFpF5CQFZi0wFmiM8kJQ6uuMEygVFsFpKTZiFM nYVk6gJG5lWMoqkFyQXFSelFhnrFibnFpXnpesn5uZsYgTF5+t+z3h2Mtw9YH2IU4GBU4uGd URQcJsSaWFZcmXuI0RToiInMUqLJ+cDIzyuJNzQ2M7IwNTE1NjK3NFMS51WU+hksJJCeWJKa nZpakFoUX1Sak1p8iJGJg1OqgVF5yxNX3+XGm/c2tM/dnHeo8ILIScsYzt3hP0J3rRSxuv2q 223qaR4ur/ou+Z6N52J97151ZujhCfjiVl4Quv/5kRUnF4g3WR5i2P58kYzIhP0fHq5pv8V7 a7LdhCvzNXMmWUnLFi8/f//0ZwGDT4nzC+csvZLq86SKtXnJlwXR/Mc1wq2F1i1UYinOSDTU Yi4qTgQA1RAa7sQCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrHIsWRmVeSWpSXmKPExsVy+t9jAd26I8FhBo96lCxWvv/LaPFg3jY2 i7PLDrJZ/L/Tw2rx/9FrVovzN8+xWFzeNYfN4sj/fkaLJ2dmMlocXxvuwOUxu+Eii8eda3vY PPq2rGL02H5tHrPH501yAaxRDYw2GamJKalFCql5yfkpmXnptkrewfHO8aZmBoa6hpYW5koK eYm5qbZKLj4Bum6ZOUBXKSmUJeaUAoUCEouLlfTtME0IDXHTtYBpjND1DQmC6zEyQAMJaxgz Ln2/wVaww6Di9ap+tgbGV2pdjJwcEgImEs0XWxkhbDGJC/fWs3UxcnEICSxllNi77yczhPOA UeLpykVsIFW8AloSz5b1gHWwCKhK3H29lhXEZhPQkdj+7TgTiC0qECbxYN1eVoh6QYkfk++x gNgiAokSK/beZgUZyizwk0ni2f2rYIOEBeQlNvevZgaxhQTOM0osPBIHYnMCLbu8uRcozgHU oC4xZUouSJgZpHzNW+YJjAKzkKyYhVA1C0nVAkbmVYwSqQXJBcVJ6bmGeanlesWJucWleel6 yfm5mxjBcf9MagfjwV3uhxgFOBiVeHg/lASHCbEmlhVX5h5ilOBgVhLh1VwCFOJNSaysSi3K jy8qzUktPsRoCgyDicxSosn5wJSUVxJvaGxiZmRpZG5oYWRsriTOe2G/X5iQQHpiSWp2ampB ahFMHxMHp1QD48FNU7dFWTbxusjqZ4Qtay5f6tR+JV0xMGbh4jbWtgT1lXVXvfTXq1x9nFtQ tSKHZ66IVtOz5PULhNhyf9ZlTWFv1aoxY+SPNy05v2GFjH7ihshIz9NXpeeaSHNOW7q7fn6N s6R2elljuqiacG/kQWHRrokrT1Rtn+VUOmHn1VSBJY+nd3ErsRRnJBpqMRcVJwIAwLk9khED AAA= DLP-Filter: Pass X-MTR: 20000000000000000@CPGS X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6287 Lines: 163 On 11/24/2015 10:55 AM, Jorge Ramirez-Ortiz wrote: > On 11/23/2015 07:11 PM, Jaehoon Chung wrote: >> Dear, Jorge. >> >> On 11/24/2015 02:29 AM, Jorge Ramirez-Ortiz wrote: >>> On 11/23/2015 11:57 AM, Doug Anderson wrote: >>>> Jorge, >>>> >>>> On Mon, Nov 23, 2015 at 6:10 AM, Jorge Ramirez-Ortiz >>>> wrote: >>>>> Doug/Jaehoon, >>>>> >>>>> Were there any follow ups to this thread [1] from March 30, 2015? >>>>> We are seeing HLE errors on 3.18 and we are trying to determine if a solution >>>>> was ever delivered. >>>>> On inspection, I can't find anything specific in recent kernels that address >>>>> this particular issue (was the actual root cause identified?) >>>>> >>>>> I put together a possible work-around that avoids the HLE storm from occurring >>>>> for this specific SoC [2]. >>>>> However we'd rather not merge this -or any other similar fix- if there is a >>>>> generic solution already that we can pick up from mainline. >>>> Nothing landed that I'm aware of. Are you on SDIO, SD or eMMC? >>>> Trying to do UHS? >>> SD even without UHS (yet, that is coming now) >> If you want to use the upper mode than UHS-DDR50 for SD-card, you need to apply the below patch. > > ACK > >> >> https://patchwork.kernel.org/patch/7456121/ >> >> Actually, this is not relevant to HLE error. >> >> When sd-card is inserted/removed quickly, then sometime dwmmc controller is occurred the HLE error. >> (Now, i can't see HLE error.) >> So i had applied the some reset processing at my official repository.(It's not generic solution.) > > Thanks, I'll have a look now. > > I believe this to be your official repo: > https://github.com/jh80chung/dw-mmc > > Please let me know if it is not. Sorry. it's not official repo (Samsung). So i can't share URL. :( It's just my personal git repository. I will work on that repository.. :) Best Regards, Jaehoon Chung > > >> >>>> I know that this patch mattered for me for UHS: >>>> >>>> 7c5209c315ea mmc: core: Increase delay for voltage to stabilize from >>>> 3.3V to 1.8V >>>> >>>> >>>> Also important for UHS (for at least some folks) were patches like: >>>> >>>> 9c85f37a2984 mmc: core: Add mmc_regulator_set_vqmmc() >>>> >>>> ...that attempted to get voltages more proper... >>> ack >>> >>>> >>>> In the ChromeOS tree we did just land treating HLE errors as data and >>>> cmd errors . It's not >>>> wonderful but it's better than letting an interrupt go off forever... >>> Yes I did try this patch on 3.18 but it didn't seem to be enough for us. >>> Even though it would prevent the interrupt storm from flooding the kernel, once >>> the event triggered and the interrupt was handled no more card >>> insertions/ejections would be detected. >> If HLE error will be reproduce with the generic sequence, I think we can find the generic solution. >> So could you explain to me in more detail? If i can reproduce with v3.18, i will try to test it. >> Your case will be helpful to me for solving the HLE error. > > > Yes, the issue is relatively easy to reproduce. > > On this platform: > https://www.96boards.org/products/ce/hikey/ > > Using either debian [1] or android [2] releases and the latest UEFI [3] > [1] https://builds.96boards.org/snapshots/hikey/linaro/debian/379/ > [2] https://builds.96boards.org/snapshots/hikey/linaro/aosp/197/ > [3] https://builds.96boards.org/snapshots/hikey/linaro/uefi/89/ > > The kernel tree between android and debian is shared [4]. > We are using the "hikey" branch (v3.18) > [4] https://github.com/96boards/linux > > For my tests and to be able to handed the interrupt storm and monitor the > registers while it happens, I patched the kernel with a Xenomai [5] co-kernel. > This is my kernel tree [6] > [5] http://xenomai.org/ > [6] http://git.xenomai.org/ipipe-jro.git/log/?h=hikey > > To reproduce the problem all it was required was to insert/remove the SD card > rapidly until it triggers this condition: > [ 229.974525] dwmmc_k3 f723e000.dwmmc1: Busy; trying anyway > > When it triggered, and after patching the interrupt handler with some debug info > to show the distance between interrupts and the content of the MINTSTS register, > I could see the following: > mci_isr: 0x1000, 3333 ns > mci_isr: 0x1000, 3334 ns > mci_isr: 0x1000, 3333 ns > mci_isr: 0x1000, 3334 ns > mci_isr: 0x1000, 3333 ns > mci_isr: 0x1000, 2500 ns > mci_isr: 0x1000, 3334 ns > mci_isr: 0x1000, 2500 ns > mci_isr: 0x1000, 3333 ns > mci_isr: 0x1000, 3334 ns > mci_isr: 0x1000, 3334 ns > mci_isr: 0x1000, 3333 ns > mci_isr: 0x1000, 3334 ns > mci_isr: 0x1000, 2500 ns > mci_isr: 0x1000, 3334 ns > [...] > > Notice that since the Xenomai co-kernel runs with a higher priority than the > Linux kernel, I was able to output this information to the console. > > I put together a fix based on this commit from Doug; > mmc: dw_mmc: Don't start commands while busy > https://lkml.org/lkml/2015/2/20/508 > > In Doug's commit, we would delay sending a command until the SDMCC_STATUS_BUSY > cleared. > However if it never cleared, we'd go ahead and submit the command anyway. > > I believe this is what was causing the HLE to be raised. > In order to prevent that from happening, I think we should abort the operation > completely. > My "extension" for the Hikey platform looks like this: > https://github.com/96boards/linux/commit/fe8d7f714d420121cec460e69f6529044a2cb6d > > It could be made generic or the fix could have some other form of course. > I was only targeting the Hikey platform when I wrote this hoping that it would > have been fixed upstream. > > Having said all of this, I am not sure what would cause the host status to > remain busy for so long (which is Ulf's biggest concern) > I also tried increasing some of the timers that wait for the voltages to ramp up > after power on but it didnt make any difference. > > I captured most of the information above under this bug for reference. > https://bugs.96boards.org/show_bug.cgi?id=175 > > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/