2009-01-14 19:42:18

by Anton Vorontsov

[permalink] [raw]
Subject: [PATCH] sdhci: Fix potential spinlock recursion

I noticed this bug while working on a driver that is derived
from the SDHCI driver:

BUG: spinlock recursion on CPU#0, swapper/0
lock: df890668, .magic: dead4ead, .owner: swapper/0, .owner_cpu: 0
Call Trace:
[c04b5c50] [c0008b2c] show_stack+0x4c/0x16c (unreliable)
[c04b5c90] [c0193aa8] spin_bug+0x7c/0xc4
[c04b5cb0] [c0193dcc] _raw_spin_lock+0xb4/0xb8
[c04b5cc0] [c035baf4] _spin_lock+0x10/0x20
[c04b5cd0] [c02a8d60] esdhc_irq+0x20/0x210
[c04b5cf0] [c00665c0] handle_IRQ_event+0x5c/0xb0
[c04b5d10] [c0068708] handle_level_irq+0xa8/0x144
[c04b5d30] [c000684c] do_IRQ+0xac/0xec
[c04b5d50] [c0014f24] ret_from_except+0x0/0x14
--- Exception: 501 at __delay+0x34/0x74
LR = esdhc_reset+0xa4/0x114
[c04b5e10] [c0480000] empty_zero_page+0x0/0x1000 (unreliable)
[c04b5e30] [c02a9150] esdhc_tasklet_card+0x104/0x148
[c04b5e50] [c003cabc] tasklet_action+0x78/0xfc
[c04b5e60] [c003cbc4] __do_softirq+0x84/0x124
[c04b5e90] [c00065dc] do_softirq+0x58/0x5c
[c04b5ea0] [c003ca40] irq_exit+0x94/0x98
[c04b5eb0] [c0006850] do_IRQ+0xb0/0xec
[c04b5ed0] [c0014f24] ret_from_except+0x0/0x14
--- Exception: 501 at cpu_idle+0xa0/0xec
LR = cpu_idle+0xa0/0xec
[c04b5f90] [c0009d14] cpu_idle+0x50/0xec (unreliable)
[c04b5fb0] [c035c000] __got2_end+0x58/0x68
[c04b5fc0] [c04468fc] start_kernel+0x1a4/0x228
[c04b5ff0] [00003438] 0x3438

This happens because plain spin_lock() won't protect us from
softirqs (tasklets). So in the sdhci interrupt handler we must
grab the _irq version of the lock.

Note that the _irqsave version isn't mandatory here because
a) we aren't trying to protect ourselves from another hardirq, and
b) the sdhci driver requests irqs with IRQF_SHARED flag, and that
guarantees that we'll get a huge warning if anyone will try to
request the same interrupt with IRQF_DISABLED flag, since
IRQF_SHARED | IRQF_DISABLED is guaranteed to be broken anyway.

Briefly looking into other mmc hosts I don't see any problems
with them, so the fix goes for the sdhci driver only.

Signed-off-by: Anton Vorontsov <[email protected]>
---
drivers/mmc/host/sdhci.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 4d010a9..5248041 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -1363,7 +1363,7 @@ static irqreturn_t sdhci_irq(int irq, void *dev_id)
u32 intmask;
int cardint = 0;

- spin_lock(&host->lock);
+ spin_lock_irq(&host->lock);

intmask = readl(host->ioaddr + SDHCI_INT_STATUS);

@@ -1424,7 +1424,7 @@ static irqreturn_t sdhci_irq(int irq, void *dev_id)

mmiowb();
out:
- spin_unlock(&host->lock);
+ spin_unlock_irq(&host->lock);

/*
* We have to delay this as it calls back into the driver.
--
1.5.6.5


2009-01-15 06:05:39

by Pierre Ossman

[permalink] [raw]
Subject: Re: [PATCH] sdhci: Fix potential spinlock recursion

On Wed, 14 Jan 2009 22:41:59 +0300
Anton Vorontsov <[email protected]> wrote:

>
> This happens because plain spin_lock() won't protect us from
> softirqs (tasklets). So in the sdhci interrupt handler we must
> grab the _irq version of the lock.
>

?! The docs I've read state that softirq:s are not executed until all
the hardirq:s have finished processing. And looking at your code, that
seems to still hold true. A softirq running esdhc_tasklet_card() gets
preempted by a hard irq and we have the lockup.

If you're running the code you sent a few minutes later, then something
is broken with your platform as esdhc_tasklet_card() clearly tries to
disable interrupts when it grabs the lock.

Rgds
--
-- Pierre Ossman

WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.


Attachments:
signature.asc (197.00 B)

2009-01-15 14:33:16

by Anton Vorontsov

[permalink] [raw]
Subject: Re: [PATCH] sdhci: Fix potential spinlock recursion

On Thu, Jan 15, 2009 at 07:05:21AM +0100, Pierre Ossman wrote:
> On Wed, 14 Jan 2009 22:41:59 +0300
> Anton Vorontsov <[email protected]> wrote:
>
> >
> > This happens because plain spin_lock() won't protect us from
> > softirqs (tasklets). So in the sdhci interrupt handler we must
> > grab the _irq version of the lock.
> >
>
> ?! The docs I've read state that softirq:s are not executed until all
> the hardirq:s have finished processing. And looking at your code, that
> seems to still hold true. A softirq running esdhc_tasklet_card() gets
> preempted by a hard irq and we have the lockup.

Right you are. That was a total brain fart on my part, I apologize.

> If you're running the code you sent a few minutes later, then something
> is broken with your platform as esdhc_tasklet_card() clearly tries to
> disable interrupts when it grabs the lock.

Luckily the platform isn't broken. It's just that sdhci driver
doesn't have this bug, I was a bit confused by two these drivers
instead.

Here is the original patch that has this bug:

http://www.bitshrine.org/gpp/kernel-2.6.25-MPC837xE-RDB-add-esdhc-support.patch

Notice that esdhc_tasklet_card() grabs spin_lock(), not
spin_lock_irqsave() as sdhci driver. I fixed this in the esdhc patch
that I sent, but then I decided to "fix" irq handler too, which
wasn't necessary, of course. And that's it.

Thanks,

--
Anton Vorontsov
email: [email protected]
irc://irc.freenode.net/bd2