2002-07-19 20:53:01

by Ingo Molnar

[permalink] [raw]
Subject: [announce, patch, RFC] "big IRQ lock" removal, IRQ cleanups.


the following patch, against 2.5.26:

http://redhat.com/~mingo/remove-irqlock-patches/remove-irqlock-2.5.26-A2

is a work-in-progress massive cleanup of the IRQ subsystem. It's losely
based on Linus' original idea and DaveM's original implementation, to fold
our various irq, softirq and bh counters into the preemption counter.

with this approach it was possible:

- to remove the 'big IRQ lock' on SMP - on which sti() and cli() relied.

- to streamline/simplify arch/i386/kernel/irq.c significantly.

- to simplify the softirq code.

- to remove the preemption count increase/decrease code from the lowlevel
IRQ assembly code.

- to speed up schedule() a bit.

sti() and cli() is gone forever, there is no more globally synchronizing
irq-disabling capability. All code that relied on sti() and cli() and
restore_flags() must use other locking mechanisms from now on (spinlocks
and __cli()/__sti()).

obviously this patch breaks massive amounts of code, so only limited
.configs are working at the moment, such as:

http://redhat.com/~mingo/remove-irqlock-patches/config

otherwise the patch was developed and tested on SMP systems, and while the
code is still a bit rough in places, the base IRQ code appears to be
pretty robust and clean.

while it boots already so the worst is over, there is lots of work left:
eg. to fix the serial layer to not use cli()/sti() and bhs ...

RMK, is there any chance to get your new serial layer into 2.5 sometime
soon? ['soon' as in 'tomorrow' :-) ] That is perhaps one of the biggest
kernel subsystems that make use of cli()/sti() currently. The rest is
drivers mostly, which is still not unsignificant, but perhaps a bit easier
to manage.

Ingo


2002-07-19 20:55:29

by Ingo Molnar

[permalink] [raw]
Subject: Re: [announce, patch, RFC] "big IRQ lock" removal, IRQ cleanups.


> the following patch, against 2.5.26:
>
> http://redhat.com/~mingo/remove-irqlock-patches/remove-irqlock-2.5.26-A2

forgot to mention that the patch applies ontop the scheduler-improvements
patch:

http://redhat.com/~mingo/O(1)-scheduler/sched-nobatch-2.5.26-C2

Ingo

2002-07-19 21:47:22

by George Anzinger

[permalink] [raw]
Subject: Re: [announce, patch, RFC] "big IRQ lock" removal, IRQ cleanups.

Ingo Molnar wrote:
>
> the following patch, against 2.5.26:
>
> http://redhat.com/~mingo/remove-irqlock-patches/remove-irqlock-2.5.26-A2

I haven't applied the patch, but just looking at it, it
appears that you have removed the conditional call to
softirq on local_bh_enable(). Isn't this needed?

-g
>
> is a work-in-progress massive cleanup of the IRQ subsystem. It's losely
> based on Linus' original idea and DaveM's original implementation, to fold
> our various irq, softirq and bh counters into the preemption counter.
>
> with this approach it was possible:
>
> - to remove the 'big IRQ lock' on SMP - on which sti() and cli() relied.
>
> - to streamline/simplify arch/i386/kernel/irq.c significantly.
>
> - to simplify the softirq code.
>
> - to remove the preemption count increase/decrease code from the lowlevel
> IRQ assembly code.
>
> - to speed up schedule() a bit.
>
> sti() and cli() is gone forever, there is no more globally synchronizing
> irq-disabling capability. All code that relied on sti() and cli() and
> restore_flags() must use other locking mechanisms from now on (spinlocks
> and __cli()/__sti()).
>
> obviously this patch breaks massive amounts of code, so only limited
> .configs are working at the moment, such as:
>
> http://redhat.com/~mingo/remove-irqlock-patches/config
>
> otherwise the patch was developed and tested on SMP systems, and while the
> code is still a bit rough in places, the base IRQ code appears to be
> pretty robust and clean.
>
> while it boots already so the worst is over, there is lots of work left:
> eg. to fix the serial layer to not use cli()/sti() and bhs ...
>
> RMK, is there any chance to get your new serial layer into 2.5 sometime
> soon? ['soon' as in 'tomorrow' :-) ] That is perhaps one of the biggest
> kernel subsystems that make use of cli()/sti() currently. The rest is
> drivers mostly, which is still not unsignificant, but perhaps a bit easier
> to manage.
>
> Ingo
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

2002-07-20 14:27:50

by Marcin Dalecki

[permalink] [raw]
Subject: Re: [announce, patch, RFC] "big IRQ lock" removal, IRQ cleanups.


> RMK, is there any chance to get your new serial layer into 2.5 sometime
> soon? ['soon' as in 'tomorrow' :-) ] That is perhaps one of the biggest

Well I would like to allo wmysefl to allow a tad bit of advocacy for
this step.

I looked at it already quite a time ago and *second* it!

2002-07-20 17:51:26

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [announce, patch, RFC] "big IRQ lock" removal, IRQ cleanups.

Hello.

> - to remove the preemption count increase/decrease code from the lowlevel
> IRQ assembly code.

So do_IRQ() can start with preempt_count == 0.
Suppose another cpu sets TIF_NEED_RESCHED flag
at the same time.

spin_lock(&desc->lock) sets preempt_count == 1.
Before calling handle_IRQ_event() (which adds IRQ_OFFSET
to preempt_count), do_IRQ() does spin_unlock(&desc->lock)
and falls into schedule().

Am I missed something?

It seems to me that call to irq_enter() must be shifted
from handle_IRQ_event() to do_IRQ().

Oleg.

2002-07-21 21:15:14

by Ingo Molnar

[permalink] [raw]
Subject: Re: [announce, patch, RFC] "big IRQ lock" removal, IRQ cleanups.


Martin,

On Sat, 20 Jul 2002, Marcin Dalecki wrote:

> Well I would like to allow myself to allow a tad bit of advocacy for
> this step.

the attached patch was a quick hack to get rid of the global cli() from
drivers/ide/main.c - but it's obviously broken and i'd like to fix it.

how can this be done cleanly - should we do a:

synchronize_irq(drive->channel->irq);

before unregistering the driver?

this might not even be necessery i believe, since the implicit (?)
free_irq() synchronizes with all pending instances of that interrupt
source.

is there something else i'm missing, something else we need to synchronize
with - perhaps the timers?

Ingo

--- linux/drivers/ide/main.c.orig Sun Jul 21 20:37:12 2002
+++ linux/drivers/ide/main.c Sun Jul 21 21:06:28 2002
@@ -1091,18 +1091,18 @@
{
unsigned long flags;

- save_flags(flags); /* all CPUs */
- cli(); /* all CPUs */
+ __save_flags(flags); // FIXME: is this safe?
+ __cli();

#if 0
if (__MOD_IN_USE(ata_ops(drive)->owner)) {
- restore_flags(flags);
+ __restore_flags(flags); // FIXME: is this safe?
return 1;
}
#endif

if (drive->usage || drive->busy || !ata_ops(drive)) {
- restore_flags(flags); /* all CPUs */
+ __restore_flags(flags); // FIXME: is this safe?
return 1;
}

@@ -1111,7 +1111,7 @@
#endif
drive->driver = NULL;

- restore_flags(flags); /* all CPUs */
+ __restore_flags(flags); // FIXME: is this safe?

return 0;
}


2002-07-21 23:41:08

by George Anzinger

[permalink] [raw]
Subject: Re: [announce, patch, RFC] "big IRQ lock" removal, IRQ cleanups.

Oleg Nesterov wrote:
>
> Hello.
>
> > - to remove the preemption count increase/decrease code from the lowlevel
> > IRQ assembly code.

IMHO this is unwise as the system NEEDS to exit to user (or
system) space when preempt_count is zero with a garuntee
that TIF_NEED_RESCHED is 0. To do this from irq.c means
that it must exit with interrupts off and the the low level
code needs to keep them off till the irtn. But this is
where we test TIF_NEED_RESCHED, and if it is set, schedule()
is called and should be called with preempt_count != 0, to
avoid stack overflow interrupt loops. And, schedule()
returns with the interrupt system on (even if we, unwisely,
call it with it off).
>
> So do_IRQ() can start with preempt_count == 0.
> Suppose another cpu sets TIF_NEED_RESCHED flag
> at the same time.
>
> spin_lock(&desc->lock) sets preempt_count == 1.
> Before calling handle_IRQ_event() (which adds IRQ_OFFSET
> to preempt_count), do_IRQ() does spin_unlock(&desc->lock)
> and falls into schedule().
>
> Am I missed something?

Nope, you got it right.
>
> It seems to me that call to irq_enter() must be shifted
> from handle_IRQ_event() to do_IRQ().

Even then...
>
> Oleg.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

2002-07-21 23:48:56

by Ingo Molnar

[permalink] [raw]
Subject: Re: [announce, patch, RFC] "big IRQ lock" removal, IRQ cleanups.


On Sun, 21 Jul 2002, george anzinger wrote:

> > > - to remove the preemption count increase/decrease code from the lowlevel
> > > IRQ assembly code.
>
> IMHO this is unwise as the system NEEDS to exit to user (or
> system) space when preempt_count is zero with a garuntee
> that TIF_NEED_RESCHED is 0. [...]

my comment was too compact - the bumping of the preemption count did not
get removed, it's just the dualness that got removed. Both the lowlevel
entry code and the irq_enter() code used to do the same thing - now only
irq_enter() does it.

in the latest patch irq_enter() is done early in do_IRQ(), and irq_exit()
is done before returning from do_IRQ().

> [...] To do this from irq.c means that it must exit with interrupts off
> and the the low level code needs to keep them off till the irtn. [...]

yes, we are very careful to keep irqs disabled in do_IRQ(), both before
and after calling the handler.

> [...] But this is where we test TIF_NEED_RESCHED, and if it is set,
> schedule() is called and should be called with preempt_count != 0, to
> avoid stack overflow interrupt loops. And, schedule() returns with the
> interrupt system on (even if we, unwisely, call it with it off).

yes, in this case schedule() is called with a nonzero preempt count
(PREEMPT_ACTIVE), to keep it from recursing into itself.

> > It seems to me that call to irq_enter() must be shifted from
> > handle_IRQ_event() to do_IRQ().
>
> Even then...

could you check out the latest patch, can you still see any hole in it?
All the above scenarios should be handled correctly.

Ingo

2002-07-22 14:39:15

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [announce, patch, RFC] "big IRQ lock" removal, IRQ cleanups.

Hello.

> > [...] To do this from irq.c means that it must exit with interrupts off
> > and the the low level code needs to keep them off till the irtn. [...]

> yes, we are very careful to keep irqs disabled in do_IRQ(), both before
> and after calling the handler.

Note that smp_xxx_interrupt() functions must be carefull
with preemt_{disable,enable} brackets.

For example, smp_invalidate_interrupt() may be preempted
after put_cpu(). Probably not big deal (it is return path),
but it is better to use preempt_enable_no_resched() here -
let ret_from_intr: do its job.

smp_{error,spurious,thermal}_interrupt() - all of them
use printk() without bumping preemt_count and have problem
after spin_unlock_irqrestore(&logbuf_lock, flags).

If these problems worth fixing, then preempt_stop (cli)
can be killed in entry.S:ret_from_intr(), yes? If i understand
correctly none of the irq handlers should return to low level
code with irq enabled.

P.S.

May I suggest somebody with good english fix
Documentation/preempt-locking.txt?
It states, that disabled interrupts prevents preemption.
Yes, but only in a sense, that the delivery of reschedule
interrupt is suppressed.

Process with irqs disabled and current->preempt_count == 0 can
be preempted (with interrupts enabled) after spin_lock/unlock etc.
Even in UP case preemption can happen while calling wake_up_...().

Oleg.

2002-07-22 14:56:54

by Ingo Molnar

[permalink] [raw]
Subject: [patch] big IRQ lock removal, 2.5.27-D9


On Mon, 22 Jul 2002, Oleg Nesterov wrote:

> Note that smp_xxx_interrupt() functions must be carefull
> with preemt_{disable,enable} brackets.
>
> For example, smp_invalidate_interrupt() may be preempted
> after put_cpu(). Probably not big deal (it is return path),
> but it is better to use preempt_enable_no_resched() here -
> let ret_from_intr: do its job.

i solved it slightly differently: added a new put_cpu_no_resched() macro.

> smp_{error,spurious,thermal}_interrupt() - all of them
> use printk() without bumping preemt_count and have problem
> after spin_unlock_irqrestore(&logbuf_lock, flags).

fixed this - all these IRQ vector paths must use irq_enter()/irq_exit()
pairs.

> If these problems worth fixing, then preempt_stop (cli) can be killed in
> entry.S:ret_from_intr(), yes? If i understand correctly none of the irq
> handlers should return to low level code with irq enabled.

yes, it can be removed, and i did this.

> May I suggest somebody with good english fix
> Documentation/preempt-locking.txt?
> It states, that disabled interrupts prevents preemption.
> Yes, but only in a sense, that the delivery of reschedule
> interrupt is suppressed.
>
> Process with irqs disabled and current->preempt_count == 0 can
> be preempted (with interrupts enabled) after spin_lock/unlock etc.
> Even in UP case preemption can happen while calling wake_up_...().

added such a section to preempt-locking.txt.

this and the other changes can be found at:

http://redhat.com/~mingo/remove-irqlock-patches/remove-irqlock-2.5.27-D9

is anything else missing?

Ingo

2002-07-22 17:19:23

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [patch] big IRQ lock removal, 2.5.27-D9

Hello.

> > then preempt_stop (cli) can be killed in entry.S:ret_from_intr()
>
> yes, it can be removed, and i did this.

Sorry, forgot to say that GET_THREAD_INFO(%ebx) can be shifted
from common_interrupt: and BUILD_INTERRUPT() to ret_from_intr:
in entry.s (sorry again, can't make diff).

Oleg.

2002-07-22 18:02:40

by Ingo Molnar

[permalink] [raw]
Subject: Re: [announce, patch, RFC] "big IRQ lock" removal, IRQ cleanups.


On Mon, 22 Jul 2002, george anzinger wrote:

> But schedule and signal code does return with interrupts enabled, so a
> cli is still needed here. Also at least some of the trap code returns
> with interrupts enabled.

the change only affects the ret_from_intr path, which is IRQ-only. The
signal and schedule path is still disabling interrupts.

Ingo

2002-07-22 18:00:40

by George Anzinger

[permalink] [raw]
Subject: Re: [announce, patch, RFC] "big IRQ lock" removal, IRQ cleanups.

Oleg Nesterov wrote:
>
> Hello.
>
> > > [...] To do this from irq.c means that it must exit with interrupts off
> > > and the the low level code needs to keep them off till the irtn. [...]
>
> > yes, we are very careful to keep irqs disabled in do_IRQ(), both before
> > and after calling the handler.
>
> Note that smp_xxx_interrupt() functions must be carefull
> with preemt_{disable,enable} brackets.
>
> For example, smp_invalidate_interrupt() may be preempted
> after put_cpu(). Probably not big deal (it is return path),
> but it is better to use preempt_enable_no_resched() here -
> let ret_from_intr: do its job.
>
> smp_{error,spurious,thermal}_interrupt() - all of them
> use printk() without bumping preemt_count and have problem
> after spin_unlock_irqrestore(&logbuf_lock, flags).
>
> If these problems worth fixing, then preempt_stop (cli)
> can be killed in entry.S:ret_from_intr(), yes?

I REALLY have a problem with this abstraction for cli. I
think it just makes the code hard to read...
> If i understand
> correctly none of the irq handlers should return to low level
> code with irq enabled.

But schedule and signal code does return with interrupts
enabled, so a cli is still needed here. Also at least some
of the trap code returns with interrupts enabled.
>
> P.S.
>
> May I suggest somebody with good english fix
> Documentation/preempt-locking.txt?
> It states, that disabled interrupts prevents preemption.
> Yes, but only in a sense, that the delivery of reschedule
> interrupt is suppressed.
>
> Process with irqs disabled and current->preempt_count == 0 can
> be preempted (with interrupts enabled) after spin_lock/unlock etc.
> Even in UP case preemption can happen while calling wake_up_...().

This is really a bug and a fix is on the way. Turning
interrupts off MUST disable preemption, but trying to
preempt from this state is so rare that the test will be in
preempt_schedule() rather than inline or an attempt to put
disable/enable code along with each cli/sti.

--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

2002-07-22 21:15:08

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [patch] big IRQ lock removal, 2.5.27-D9

Hello.

George Anzinger wrote:
>
> > then preempt_stop (cli) can be killed in entry.S:ret_from_intr()
>
> Also at least some of the trap code returns with interrupts enabled.

Then linux already have a bug - ret_from_exception: bypasses
ret_from_intr:. But as far as i can see, it does not - all users of
ret_from_exception: do preempt_stop. And so it can be shifted to
ret_from_exception.

I beleive none of the traps need current_thread_info() in regs->ebx,
so GET_THREAD_INFO(%ebx) can be killed in error_code:,
common_interrupt:,
and BUILD_INTERRUPT().

Not sure it is right time to such minor cleanups, but...
Patch on top of remove-irqlock-2.5.27-E0:

--- arch/i386/kernel/entry.S.orig Mon Jul 22 23:44:41 2002
+++ arch/i386/kernel/entry.S Tue Jul 23 00:04:02 2002
@@ -185,8 +185,10 @@

# userspace resumption stub bypassing syscall exit tracing
ALIGN
-ret_from_intr:
ret_from_exception:
+ preempt_stop
+ret_from_intr:
+ GET_THREAD_INFO(%ebx)
movl EFLAGS(%esp), %eax # mix EFLAGS and CS
movb CS(%esp), %al
testl $(VM_MASK | 3), %eax
@@ -333,14 +335,12 @@
common_interrupt:
SAVE_ALL
call do_IRQ
- GET_THREAD_INFO(%ebx)
jmp ret_from_intr

#define BUILD_INTERRUPT(name, nr) \
ENTRY(name) \
pushl $nr-256; \
SAVE_ALL \
- GET_THREAD_INFO(%ebx); \
call smp_/**/name; \
jmp ret_from_intr;

@@ -400,10 +400,8 @@
movl $(__KERNEL_DS), %edx
movl %edx, %ds
movl %edx, %es
- GET_THREAD_INFO(%ebx)
call *%edi
addl $8, %esp
- preempt_stop
jmp ret_from_exception

ENTRY(coprocessor_error)
@@ -430,7 +428,6 @@
pushl $0 # temporary storage for ORIG_EIP
call math_emulate
addl $4, %esp
- preempt_stop
jmp ret_from_exception

ENTRY(debug)

Oleg.

2002-07-23 09:16:05

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] big IRQ lock removal, 2.5.27-D9


On Tue, 23 Jul 2002, Oleg Nesterov wrote:

> Not sure it is right time to such minor cleanups, but...

i've applied your patch to my tree - all of the irqlock patches are
cleanups to begin with.

Ingo