Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756176AbbLQKrL (ORCPT ); Thu, 17 Dec 2015 05:47:11 -0500 Received: from protonic.xs4all.nl ([83.163.252.89]:21206 "EHLO protonic.xs4all.nl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756159AbbLQKrK (ORCPT ); Thu, 17 Dec 2015 05:47:10 -0500 X-Greylist: delayed 1134 seconds by postgrey-1.27 at vger.kernel.org; Thu, 17 Dec 2015 05:47:09 EST Date: Thu, 17 Dec 2015 11:28:14 +0100 From: David Jander To: linux-mmc@vger.kernel.org Cc: , linux-kernel@vger.kernel.org, Pierre Ossman Subject: SDHCI long sleep with interrupts off Message-ID: <20151217112814.6250a3b9@archvile> Organization: Protonic Holland X-Mailer: Claws Mail 3.11.1 (GTK+ 2.24.27; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3030 Lines: 92 Hi all, I was investigating the source of abnormal irq-latency spikes on an i.MX6 (ARM) board, and discovered this: # tracer: preemptirqsoff # # preemptirqsoff latency trace v1.1.5 on 4.4.0-rc4+ # -------------------------------------------------------------------- # latency: 2068 us, #4/4, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:1) # ----------------- # | task: mmcqd/0-92 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: _raw_spin_lock_irqsave # => ended at: _raw_spin_unlock_irqrestore # # # _------=> CPU# # / _-----=> irqs-off # | / _----=> need-resched # || / _---=> hardirq/softirq # ||| / _--=> preempt-depth # |||| / delay # cmd pid ||||| time | caller # \ / ||||| \ | / mmcqd/0-92 0d... 1us#: _raw_spin_lock_irqsave mmcqd/0-92 0.n.1 2066us : _raw_spin_unlock_irqrestore mmcqd/0-92 0.n.1 2070us+: trace_preempt_on <-_raw_spin_unlock_irqrestore mmcqd/0-92 0.n.1 2107us : => sdhci_runtime_resume_host => __rpm_callback => rpm_callback => rpm_resume => __pm_runtime_resume => __mmc_claim_host => mmc_blk_issue_rq => mmc_queue_thread => kthread => ret_from_fork 2 ms with interrupts disabled!!! To much dismay, I later discovered that this isn't even the worst case scenario. I also discovered that this has been in the kernel for a long time without a fix (I have tested from 3.17 to 4.4-rc4). There has been an attempt by someone to address this back in 2010, but apparently it never hit mainline. Going through the code in sdhci.c, I found this troublesome code-path: sdhci_do_set_ios() { spin_lock_irqsave(&host->lock, flags); ... sdhci_reinit() --> sdhci_init() --> sdhci_do_reset() --> host->ops->reset() --> sdhci_reset() ... spin_unlock_irqrestore(&host->lock, flags); } And in sdhci_reset(), which may be called with held spinlock: ... /* Wait max 100 ms */ timeout = 100; /* hw clears the bit when it's done */ while (sdhci_readb(host, SDHCI_SOFTWARE_RESET) & mask) { if (timeout == 0) { pr_err("%s: Reset 0x%x never completed.\n", mmc_hostname(host->mmc), (int)mask); sdhci_dumpregs(host); return; } timeout--; mdelay(1); } I am wondering: There either must be a reason this hasn't been fixed in such a long time, or I am not understanding this correctly, so please enlighten me. Before trying a cowboy attempt at "fixing" this, I'd really like to know why am I seeing this? I mean... how can such a problem get unnoticed and unfixed for so long? Will an attempt at fixing this issue even be accepted? Best regards, -- David Jander Protonic Holland. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/