Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934639AbbLQLDQ (ORCPT ); Thu, 17 Dec 2015 06:03:16 -0500 Received: from metis.ext.4.pengutronix.de ([92.198.50.35]:58803 "EHLO metis.ext.4.pengutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932323AbbLQLDP (ORCPT ); Thu, 17 Dec 2015 06:03:15 -0500 Message-ID: <1450350190.3163.93.camel@pengutronix.de> Subject: Re: SDHCI long sleep with interrupts off From: Lucas Stach To: David Jander Cc: linux-mmc@vger.kernel.org, Pierre Ossman , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Date: Thu, 17 Dec 2015 12:03:10 +0100 In-Reply-To: <20151217112814.6250a3b9@archvile> References: <20151217112814.6250a3b9@archvile> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.12.9-1+b1 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-SA-Exim-Connect-IP: 2001:67c:670:100:fa0f:41ff:fe58:4010 X-SA-Exim-Mail-From: l.stach@pengutronix.de X-SA-Exim-Scanned: No (on metis.ext.pengutronix.de); SAEximRunCond expanded to false X-PTX-Original-Recipient: linux-kernel@vger.kernel.org Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3592 Lines: 101 Am Donnerstag, den 17.12.2015, 11:28 +0100 schrieb David Jander: > Hi all, > > I was investigating the source of abnormal irq-latency spikes on an i.MX6 > (ARM) board, and discovered this: > > # tracer: preemptirqsoff > # > # preemptirqsoff latency trace v1.1.5 on 4.4.0-rc4+ > # -------------------------------------------------------------------- > # latency: 2068 us, #4/4, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:1) > # ----------------- > # | task: mmcqd/0-92 (uid:0 nice:0 policy:0 rt_prio:0) > # ----------------- > # => started at: _raw_spin_lock_irqsave > # => ended at: _raw_spin_unlock_irqrestore > # > # > # _------=> CPU# > # / _-----=> irqs-off > # | / _----=> need-resched > # || / _---=> hardirq/softirq > # ||| / _--=> preempt-depth > # |||| / delay > # cmd pid ||||| time | caller > # \ / ||||| \ | / > mmcqd/0-92 0d... 1us#: _raw_spin_lock_irqsave > mmcqd/0-92 0.n.1 2066us : _raw_spin_unlock_irqrestore > mmcqd/0-92 0.n.1 2070us+: trace_preempt_on <-_raw_spin_unlock_irqrestore > mmcqd/0-92 0.n.1 2107us : > => sdhci_runtime_resume_host > => __rpm_callback > => rpm_callback > => rpm_resume > => __pm_runtime_resume > => __mmc_claim_host > => mmc_blk_issue_rq > => mmc_queue_thread > => kthread > => ret_from_fork > > 2 ms with interrupts disabled!!! To much dismay, I later discovered that this > isn't even the worst case scenario. I also discovered that this has been in > the kernel for a long time without a fix (I have tested from 3.17 to 4.4-rc4). > There has been an attempt by someone to address this back in 2010, but > apparently it never hit mainline. > Going through the code in sdhci.c, I found this troublesome code-path: > > sdhci_do_set_ios() { > spin_lock_irqsave(&host->lock, flags); > ... > sdhci_reinit() --> sdhci_init() --> sdhci_do_reset() --> > host->ops->reset() --> sdhci_reset() > ... > spin_unlock_irqrestore(&host->lock, flags); > } > > And in sdhci_reset(), which may be called with held spinlock: > > ... > /* Wait max 100 ms */ > timeout = 100; > > /* hw clears the bit when it's done */ > while (sdhci_readb(host, SDHCI_SOFTWARE_RESET) & mask) { > if (timeout == 0) { > pr_err("%s: Reset 0x%x never completed.\n", > mmc_hostname(host->mmc), (int)mask); > sdhci_dumpregs(host); > return; > } > timeout--; > mdelay(1); > } > > I am wondering: There either must be a reason this hasn't been fixed in such a > long time, or I am not understanding this correctly, so please enlighten me. > Before trying a cowboy attempt at "fixing" this, I'd really like to know why am > I seeing this? > I mean... how can such a problem get unnoticed and unfixed for so long? > Will an attempt at fixing this issue even be accepted? > I would like to see the sdhci spinlock killed and replaced by a mutex for exactly this reason. That said, your problem is card polling, when no card is present in the slot. This is most probably caused by CD gpios having the wrong polarity. Regards, Lucas -- Pengutronix e.K. | Lucas Stach | Industrial Linux Solutions | http://www.pengutronix.de/ | -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/