2010-04-01 16:19:49

by Linus Torvalds

[permalink] [raw]
Subject: Re: start_kernel(): bug: interrupts were enabled early



On Wed, 31 Mar 2010, H. Peter Anvin wrote:
>
> The obvious way to fix this would be to use
> spin_lock_irqsave..spin_lock_irqrestore in __down_read as well as in the
> other locations; I don't have a good feel for what the cost of doing so
> would be, though. On x86 it's fairly expensive simply because the only
> way to save the state is to push it on the stack, which the compiler
> doesn't deal well with, but this code isn't used on x86.

I think that's what we should just do, with a good comment both in the
code and the changelog. I'm not entirely happy with it, because obviously
it's conceptually kind of dubious to take a lock with interrupts disabled
in the first place, but this is not a new issue per se.

The whole bootup code is special, and we already make similar guarantees
about memory allocators and friends - just because it's too dang painful
to have some special code that does GFP_ATOMIC for early bootup when the
same code is often shared and used at run-time too.

So we've accepted that people can do GFP_KERNEL allocations and we won't
care about them if we're in the boot phase (and suspend/resume), and we
have that whole 'gfp_allowed_mask' thing for that.

I think this probably falls under exactly the same heading of "not pretty,
but let's not blow up".

So making the slow-path do the spin_[un]lock_irq{save,restore}() versions
sounds like the right thing. It won't be a performance issue: it _is_ the
slow-path, and we're already doing the expensive part (the spinlock itself
and the irq thing).

So ACK on the idea. Who wants to write the trivial patch and test it?
Preferably somebody who sees the problem in the first place - x86 should
not be impacted, since the irq-disabling slow-path should never be hit
without contention anyway (and contention cannot/mustnot happen for this
case).

Linus


2010-04-01 17:30:25

by Andrew Morton

[permalink] [raw]
Subject: Re: start_kernel(): bug: interrupts were enabled early

On Thu, 1 Apr 2010 09:13:31 -0700 (PDT) Linus Torvalds <[email protected]> wrote:

>
>
> On Wed, 31 Mar 2010, H. Peter Anvin wrote:
> >
> > The obvious way to fix this would be to use
> > spin_lock_irqsave..spin_lock_irqrestore in __down_read as well as in the
> > other locations; I don't have a good feel for what the cost of doing so
> > would be, though. On x86 it's fairly expensive simply because the only
> > way to save the state is to push it on the stack, which the compiler
> > doesn't deal well with, but this code isn't used on x86.
>
> I think that's what we should just do, with a good comment both in the
> code and the changelog. I'm not entirely happy with it, because obviously
> it's conceptually kind of dubious to take a lock with interrupts disabled
> in the first place, but this is not a new issue per se.
>
> The whole bootup code is special, and we already make similar guarantees
> about memory allocators and friends - just because it's too dang painful
> to have some special code that does GFP_ATOMIC for early bootup when the
> same code is often shared and used at run-time too.
>
> So we've accepted that people can do GFP_KERNEL allocations and we won't
> care about them if we're in the boot phase (and suspend/resume), and we
> have that whole 'gfp_allowed_mask' thing for that.
>
> I think this probably falls under exactly the same heading of "not pretty,
> but let's not blow up".
>
> So making the slow-path do the spin_[un]lock_irq{save,restore}() versions
> sounds like the right thing. It won't be a performance issue: it _is_ the
> slow-path, and we're already doing the expensive part (the spinlock itself
> and the irq thing).

It's actually on the fastpath for lib/rwsem-spinlock.c.

> So ACK on the idea. Who wants to write the trivial patch and test it?
> Preferably somebody who sees the problem in the first place - x86 should
> not be impacted, since the irq-disabling slow-path should never be hit
> without contention anyway (and contention cannot/mustnot happen for this
> case).
>
> Linus

2010-04-01 20:17:54

by Linus Torvalds

[permalink] [raw]
Subject: Re: start_kernel(): bug: interrupts were enabled early



On Thu, 1 Apr 2010, Andrew Morton wrote:
>
> > So making the slow-path do the spin_[un]lock_irq{save,restore}() versions
> > sounds like the right thing. It won't be a performance issue: it _is_ the
> > slow-path, and we're already doing the expensive part (the spinlock itself
> > and the irq thing).
>
> It's actually on the fastpath for lib/rwsem-spinlock.c.

Ahh, yes. In this case, that doesn't likely change anything. The
save/restore versions of the irq-safe locks shouldn't be appreciably more
expensive than the non-saving ones. And architectures that really care
should have done their own per-arch optimized version anyway.

Maybe we should even document that - so that nobody else makes the mistake
x86-64 did of thinking that the "generic spinlock" version of the rwsem's
is anything but a hacky and bad fallback case.

Linus

2010-04-02 14:47:27

by David Howells

[permalink] [raw]
Subject: Re: start_kernel(): bug: interrupts were enabled early

Linus Torvalds <[email protected]> wrote:

> Ahh, yes. In this case, that doesn't likely change anything. The
> save/restore versions of the irq-safe locks shouldn't be appreciably more
> expensive than the non-saving ones. And architectures that really care
> should have done their own per-arch optimized version anyway.

That depends on the CPU. Some CPUs have quite expensive interrupt disablement
instructions. FRV does for instance; but fortunately, on the FRV, I can use
some of the excessive quantities of conditional registers to pretend that I
disable interrupts, and only actually do so if an interrupt actually happens.

> Maybe we should even document that - so that nobody else makes the mistake
> x86-64 did of thinking that the "generic spinlock" version of the rwsem's
> is anything but a hacky and bad fallback case.

In some cases, it's actually the best way. On a UP machine, for instance,
where they reduce to nothing or where your only atomic instruction is an XCHG
equivalent.

David

2010-04-02 15:00:43

by Linus Torvalds

[permalink] [raw]
Subject: Re: start_kernel(): bug: interrupts were enabled early



On Fri, 2 Apr 2010, David Howells wrote:

> Linus Torvalds <[email protected]> wrote:
>
> > Ahh, yes. In this case, that doesn't likely change anything. The
> > save/restore versions of the irq-safe locks shouldn't be appreciably more
> > expensive than the non-saving ones. And architectures that really care
> > should have done their own per-arch optimized version anyway.
>
> That depends on the CPU. Some CPUs have quite expensive interrupt disablement
> instructions. FRV does for instance; but fortunately, on the FRV, I can use
> some of the excessive quantities of conditional registers to pretend that I
> disable interrupts, and only actually do so if an interrupt actually happens.

I think you're missing the part where we're not _adding_ any irq disables:
we're just changing the unconditional irq disable to a save-and-disable
(and the unconditional irq enable to a restore).

So even if irq's are expensive to disable, the change from

spin_lock_irq()

to

spin_lock_irqsave()

won't make that code any more expensive.

> > Maybe we should even document that - so that nobody else makes the mistake
> > x86-64 did of thinking that the "generic spinlock" version of the rwsem's
> > is anything but a hacky and bad fallback case.
>
> In some cases, it's actually the best way. On a UP machine, for instance,
> where they reduce to nothing or where your only atomic instruction is an XCHG
> equivalent.

Again, you seem to think that we used to have just a plain spin_lock. Not
so. We currently have a spin_lock_irq(), and it is NOT a no-op even on UP.
It does that irq disable.

Anyway, I suspect that even with just an atomic xchg, you can do a better
job at doing down_read() than using the generic spin-lock version (likely
by busy-looping on a special "we're busy" value). But if you do want to
use the generic spin-lock version, I doubt any architecture makes that
irqsave version noticeable slower than the unconditional irq version.

Linus

2010-04-07 19:09:25

by Kevin Hilman

[permalink] [raw]
Subject: Re: start_kernel(): bug: interrupts were enabled early

Linus Torvalds <[email protected]> writes:

> On Wed, 31 Mar 2010, H. Peter Anvin wrote:
>>
>> The obvious way to fix this would be to use
>> spin_lock_irqsave..spin_lock_irqrestore in __down_read as well as in the
>> other locations; I don't have a good feel for what the cost of doing so
>> would be, though. On x86 it's fairly expensive simply because the only
>> way to save the state is to push it on the stack, which the compiler
>> doesn't deal well with, but this code isn't used on x86.
>

[...]

> So making the slow-path do the spin_[un]lock_irq{save,restore}() versions
> sounds like the right thing. It won't be a performance issue: it _is_ the
> slow-path, and we're already doing the expensive part (the spinlock itself
> and the irq thing).
>
> So ACK on the idea. Who wants to write the trivial patch and test it?

OK, I'll bite since I was seeing boot-time hangs on ARM (TI OMAP3) due
to this. Patch below.

Kevin


>From 7baff4008353bbfd2a2e2a4da22b87bc4efa4194 Mon Sep 17 00:00:00 2001
From: Kevin Hilman <[email protected]>
Date: Wed, 7 Apr 2010 11:52:46 -0700
Subject: [PATCH] rwsem generic spinlock: use IRQ save/restore spinlocks

rwsems can be used with IRQs disabled, particularily in early boot
before IRQs are enabled. Currently the spin_unlock_irq() usage in the
slow-patch will unconditionally enable interrupts and cause problems
since interrupts are not yet initialized or enabled.

This patch uses save/restore versions of IRQ spinlocks in the slowpath
to ensure interrupts are not unintentionally disabled.

Signed-off-by: Kevin Hilman <[email protected]>
---
lib/rwsem-spinlock.c | 14 ++++++++------
1 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/lib/rwsem-spinlock.c b/lib/rwsem-spinlock.c
index ccf95bf..ffc9fc7 100644
--- a/lib/rwsem-spinlock.c
+++ b/lib/rwsem-spinlock.c
@@ -143,13 +143,14 @@ void __sched __down_read(struct rw_semaphore *sem)
{
struct rwsem_waiter waiter;
struct task_struct *tsk;
+ unsigned long flags;

- spin_lock_irq(&sem->wait_lock);
+ spin_lock_irqsave(&sem->wait_lock, flags);

if (sem->activity >= 0 && list_empty(&sem->wait_list)) {
/* granted */
sem->activity++;
- spin_unlock_irq(&sem->wait_lock);
+ spin_unlock_irqrestore(&sem->wait_lock, flags);
goto out;
}

@@ -164,7 +165,7 @@ void __sched __down_read(struct rw_semaphore *sem)
list_add_tail(&waiter.list, &sem->wait_list);

/* we don't need to touch the semaphore struct anymore */
- spin_unlock_irq(&sem->wait_lock);
+ spin_unlock_irqrestore(&sem->wait_lock, flags);

/* wait to be given the lock */
for (;;) {
@@ -209,13 +210,14 @@ void __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
{
struct rwsem_waiter waiter;
struct task_struct *tsk;
+ unsigned long flags;

- spin_lock_irq(&sem->wait_lock);
+ spin_lock_irqsave(&sem->wait_lock, flags);

if (sem->activity == 0 && list_empty(&sem->wait_list)) {
/* granted */
sem->activity = -1;
- spin_unlock_irq(&sem->wait_lock);
+ spin_unlock_irqrestore(&sem->wait_lock, flags);
goto out;
}

@@ -230,7 +232,7 @@ void __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
list_add_tail(&waiter.list, &sem->wait_list);

/* we don't need to touch the semaphore struct anymore */
- spin_unlock_irq(&sem->wait_lock);
+ spin_unlock_irqrestore(&sem->wait_lock, flags);

/* wait to be given the lock */
for (;;) {
--
1.7.0.2

2010-04-08 15:52:43

by Cong Wang

[permalink] [raw]
Subject: Re: start_kernel(): bug: interrupts were enabled early

On Wed, Apr 07, 2010 at 12:09:17PM -0700, Kevin Hilman wrote:
>Linus Torvalds <[email protected]> writes:
>
>> On Wed, 31 Mar 2010, H. Peter Anvin wrote:
>>>
>>> The obvious way to fix this would be to use
>>> spin_lock_irqsave..spin_lock_irqrestore in __down_read as well as in the
>>> other locations; I don't have a good feel for what the cost of doing so
>>> would be, though. On x86 it's fairly expensive simply because the only
>>> way to save the state is to push it on the stack, which the compiler
>>> doesn't deal well with, but this code isn't used on x86.
>>
>
>[...]
>
>> So making the slow-path do the spin_[un]lock_irq{save,restore}() versions
>> sounds like the right thing. It won't be a performance issue: it _is_ the
>> slow-path, and we're already doing the expensive part (the spinlock itself
>> and the irq thing).
>>
>> So ACK on the idea. Who wants to write the trivial patch and test it?
>
>OK, I'll bite since I was seeing boot-time hangs on ARM (TI OMAP3) due
>to this. Patch below.
>
>Kevin
>
>
>From 7baff4008353bbfd2a2e2a4da22b87bc4efa4194 Mon Sep 17 00:00:00 2001
>From: Kevin Hilman <[email protected]>
>Date: Wed, 7 Apr 2010 11:52:46 -0700
>Subject: [PATCH] rwsem generic spinlock: use IRQ save/restore spinlocks
>
>rwsems can be used with IRQs disabled, particularily in early boot
>before IRQs are enabled. Currently the spin_unlock_irq() usage in the
>slow-patch will unconditionally enable interrupts and cause problems
>since interrupts are not yet initialized or enabled.
>
>This patch uses save/restore versions of IRQ spinlocks in the slowpath
>to ensure interrupts are not unintentionally disabled.
>
>Signed-off-by: Kevin Hilman <[email protected]>

This looks reasonable and fine for me.

Reviewed-by: WANG Cong <[email protected]>

Thanks.


>---
> lib/rwsem-spinlock.c | 14 ++++++++------
> 1 files changed, 8 insertions(+), 6 deletions(-)
>
>diff --git a/lib/rwsem-spinlock.c b/lib/rwsem-spinlock.c
>index ccf95bf..ffc9fc7 100644
>--- a/lib/rwsem-spinlock.c
>+++ b/lib/rwsem-spinlock.c
>@@ -143,13 +143,14 @@ void __sched __down_read(struct rw_semaphore *sem)
> {
> struct rwsem_waiter waiter;
> struct task_struct *tsk;
>+ unsigned long flags;
>
>- spin_lock_irq(&sem->wait_lock);
>+ spin_lock_irqsave(&sem->wait_lock, flags);
>
> if (sem->activity >= 0 && list_empty(&sem->wait_list)) {
> /* granted */
> sem->activity++;
>- spin_unlock_irq(&sem->wait_lock);
>+ spin_unlock_irqrestore(&sem->wait_lock, flags);
> goto out;
> }
>
>@@ -164,7 +165,7 @@ void __sched __down_read(struct rw_semaphore *sem)
> list_add_tail(&waiter.list, &sem->wait_list);
>
> /* we don't need to touch the semaphore struct anymore */
>- spin_unlock_irq(&sem->wait_lock);
>+ spin_unlock_irqrestore(&sem->wait_lock, flags);
>
> /* wait to be given the lock */
> for (;;) {
>@@ -209,13 +210,14 @@ void __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
> {
> struct rwsem_waiter waiter;
> struct task_struct *tsk;
>+ unsigned long flags;
>
>- spin_lock_irq(&sem->wait_lock);
>+ spin_lock_irqsave(&sem->wait_lock, flags);
>
> if (sem->activity == 0 && list_empty(&sem->wait_list)) {
> /* granted */
> sem->activity = -1;
>- spin_unlock_irq(&sem->wait_lock);
>+ spin_unlock_irqrestore(&sem->wait_lock, flags);
> goto out;
> }
>
>@@ -230,7 +232,7 @@ void __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
> list_add_tail(&waiter.list, &sem->wait_list);
>
> /* we don't need to touch the semaphore struct anymore */
>- spin_unlock_irq(&sem->wait_lock);
>+ spin_unlock_irqrestore(&sem->wait_lock, flags);
>
> /* wait to be given the lock */
> for (;;) {
>--
>1.7.0.2
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/

--
Live like a child, think like the god.