2001-04-04 20:16:42

by Pavel Machek

[permalink] [raw]
Subject: softirq buggy [Re: Serial port latency]

Hi!

> > Seems floppy and console is buggy, then.
> >
>
> No. The softirq implementation is buggy.
> I can trigger the problem with the TASKLET_HI (floppy), and both net rx
> and tx (ping -l)
>
> > > What about creating a special cpu_is_idle() function that the idle
> > > functions must call before sleeping?
> >
> > I'd say just fix all the bugs.
> >
>
> Ok, there are 2 bugs that are (afaics) impossible to fix without
> checking for pending softirq's in cpu_idle():
>
> a)
> queue_task(my_task1, tq_immediate);
> mark_bh();
> schedule();
> ;within schedule: do_softirq()
> ;within my_task1:
> mark_bh();
> ; bh returns, but do_softirq won't loop
> ; do_softirq returns.
> ; schedule() clears current->need_resched
> ; idle thread scheduled.
> --> idle can run although softirq's are pending

Or anything else can run altrough softirqs are pending. If it is
computation job, softinterrupts are delayed quiet a bit, right?

So right fix seems to be "loop in do_softirq".

Pavel

> I assume I trigger this race with the floppy driver.
>
> b)
> hw interrupt
> do_softirq
> within the net_rx handler: another hw interrupt, additional packets are
> queued
> do_softirq won't loop.
> returns to idle thread. --> packets delayed unnecessary.
>
> What about the attached patch? Obviously the other idle cpu must be
> converted to use the function as well.

--
I'm [email protected]. "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [email protected]


2001-04-04 21:19:08

by Manfred Spraul

[permalink] [raw]
Subject: Re: softirq buggy [Re: Serial port latency]

From: "Pavel Machek" <[email protected]>
>
> > Ok, there are 2 bugs that are (afaics) impossible to fix without
> > checking for pending softirq's in cpu_idle():
> >
> > a)
> > queue_task(my_task1, tq_immediate);
> > mark_bh();
> > schedule();
> > ;within schedule: do_softirq()
> > ;within my_task1:
> > mark_bh();
> > ; bh returns, but do_softirq won't loop
> > ; do_softirq returns.
> > ; schedule() clears current->need_resched
> > ; idle thread scheduled.
> > --> idle can run although softirq's are pending
>
> Or anything else can run altrough softirqs are pending. If it is
> computation job, softinterrupts are delayed quiet a bit, right?
>
> So right fix seems to be "loop in do_softirq".
>
No, it's the wrong fix.
A network server under high load would loop forever within the softirq,
never returning to process level.

do_softirq cannot loop, the right fix is "check often for pending
softirq's".
It's checked before a process returns to user space, it's checked when a
process schedules. What's missing is that the idle functions must check
for pending softirqs, too.

--
Manfred

2001-04-06 12:02:11

by Pavel Machek

[permalink] [raw]
Subject: Re: softirq buggy [Re: Serial port latency]

Hi!

> > > Ok, there are 2 bugs that are (afaics) impossible to fix without
> > > checking for pending softirq's in cpu_idle():
> > >
> > > a)
> > > queue_task(my_task1, tq_immediate);
> > > mark_bh();
> > > schedule();
> > > ;within schedule: do_softirq()
> > > ;within my_task1:
> > > mark_bh();
> > > ; bh returns, but do_softirq won't loop
> > > ; do_softirq returns.
> > > ; schedule() clears current->need_resched
> > > ; idle thread scheduled.
> > > --> idle can run although softirq's are pending
> >
> > Or anything else can run altrough softirqs are pending. If it is
> > computation job, softinterrupts are delayed quiet a bit, right?
> >
> > So right fix seems to be "loop in do_softirq".
> >
> No, it's the wrong fix.
> A network server under high load would loop forever within the softirq,
> never returning to process level.
>
> do_softirq cannot loop, the right fix is "check often for pending
> softirq's".
> It's checked before a process returns to user space, it's checked when a
> process schedules. What's missing is that the idle functions must check
> for pending softirqs, too.

Ok. I was missing the fact it is checked on going userspace.

--
The best software in life is free (not shareware)! Pavel
GCM d? s-: !g p?:+ au- a--@ w+ v- C++@ UL+++ L++ N++ E++ W--- M- Y- R+

2001-04-08 08:26:41

by Manfred Spraul

[permalink] [raw]
Subject: Re: softirq buggy [Re: Serial port latency]

// $Header$
// Kernel Version:
// VERSION = 2
// PATCHLEVEL = 4
// SUBLEVEL = 3
// EXTRAVERSION = -ac3
--- 2.4/include/linux/interrupt.h Thu Feb 22 22:29:53 2001
+++ build-2.4/include/linux/interrupt.h Sat Apr 7 23:51:31 2001
@@ -269,4 +269,25 @@
extern int probe_irq_off(unsigned long); /* returns 0 or negative on failure */
extern unsigned int probe_irq_mask(unsigned long); /* returns mask of ISA interrupts */

+/**
+ * cpu_is_idle - helper function for idle functions
+ *
+ * pm_idle functions must call this function to verify that
+ * the cpu is really idle. It must be called with disabled
+ * local interrupts.
+ * Return values:
+ * 0: cpu was not idle.
+ * 1: go into power saving mode.
+ */
+static inline int cpu_is_idle(void)
+{
+ if (current->need_resched) {
+ return 0;
+ }
+ if (softirq_active(smp_processor_id()) & softirq_mask(smp_processor_id())) {
+ do_softirq();
+ return 0;
+ }
+ return 1;
+}
#endif
--- 2.4/arch/i386/kernel/process.c Thu Feb 22 22:28:52 2001
+++ build-2.4/arch/i386/kernel/process.c Sat Apr 7 23:51:46 2001
@@ -73,6 +73,7 @@
hlt_counter--;
}

+
/*
* We use this if we don't have any better
* idle routine..
@@ -81,7 +82,7 @@
{
if (current_cpu_data.hlt_works_ok && !hlt_counter) {
__cli();
- if (!current->need_resched)
+ if (cpu_is_idle())
safe_halt();
else
__sti();
@@ -92,26 +93,21 @@
* On SMP it's slightly faster (but much more power-consuming!)
* to poll the ->need_resched flag instead of waiting for the
* cross-CPU IPI to arrive. Use this option with caution.
+ *
+ * Isn't this identical to default_idle with the 'no-hlt' boot
+ * option? <[email protected]>
*/
static void poll_idle (void)
{
int oldval;

+ for(;;) {
+ if (!cpu_is_idle())
+ break;
+ __sti();
+ rep_nop();
+ }
__sti();
-
- /*
- * Deal with another CPU just having chosen a thread to
- * run here:
- */
- oldval = xchg(&current->need_resched, -1);
-
- if (!oldval)
- asm volatile(
- "2:"
- "cmpl $-1, %0;"
- "rep; nop;"
- "je 2b;"
- : :"m" (current->need_resched));
}

/*
--- 2.4/drivers/acpi/cpu.c Sat Apr 7 22:02:01 2001
+++ build-2.4/drivers/acpi/cpu.c Sat Apr 7 23:55:17 2001
@@ -148,7 +148,7 @@
unsigned long diff;

__cli();
- if (current->need_resched)
+ if (!cpu_is_idle())
goto out;
if (acpi_bm_activity())
goto sleep2;
@@ -171,7 +171,7 @@
unsigned long diff;

__cli();
- if (current->need_resched)
+ if (!cpu_is_idle())
goto out;
if (acpi_bm_activity())
goto sleep2;
@@ -205,7 +205,7 @@
unsigned long diff;

__cli();
- if (current->need_resched)
+ if (!cpu_is_idle())
goto out;

time = acpi_read_pm_timer();
@@ -235,7 +235,7 @@
unsigned long diff;

__cli();
- if (current->need_resched)
+ if (!cpu_is_idle())
goto out;
time = acpi_read_pm_timer();
acpi_c1_count++;


Attachments:
patch-cpu-idle (2.79 kB)

2001-04-08 16:59:10

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: softirq buggy [Re: Serial port latency]

Hello!

> + if (softirq_active(smp_processor_id()) & softirq_mask(smp_processor_id())) {
> + do_softirq();
> + return 0;

BTW you may delete do_softirq()... schedule() will call this.


> + *
> + * Isn't this identical to default_idle with the 'no-hlt' boot
> + * option? <[email protected]>

Seeems, it is not. need_resched=-1 avoids useless IPIs.

Alexey

2001-04-08 17:21:49

by Manfred Spraul

[permalink] [raw]
Subject: Re: softirq buggy [Re: Serial port latency]

From: <[email protected]>
> Hello!
>
> > + if (softirq_active(smp_processor_id()) &
softirq_mask(smp_processor_id())) {
> > + do_softirq();
> > + return 0;
>
> BTW you may delete do_softirq()... schedule() will call this.
>

But with a huge overhead. I'd prefer to call it directly from within the
idle functions, the overhead of schedule is IMHO too high.

>
> > + *
> > + * Isn't this identical to default_idle with the 'no-hlt' boot
> > + * option? <[email protected]>
>
> Seeems, it is not. need_resched=-1 avoids useless IPIs.
>
I already wondered why need_resched is set to -1 ;-)

I'll remove that comment and repost the patch.

--
Manfred

2001-04-08 17:59:13

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: softirq buggy [Re: Serial port latency]

Hello!

> But with a huge overhead. I'd prefer to call it directly from within the
> idle functions, the overhead of schedule is IMHO too high.


+ if (current->need_resched) {
+ return 0;
^^^^^^^^
+ }
+ if (softirq_active(smp_processor_id()) & softirq_mask(smp_processor_id())) {
+ do_softirq();
+ return 0;
^^^^^^^^^
You return one value in both casesand I decided it means "schedule". 8)
Apparently you meaned return 1 in the first case. 8)

But in this case it becomes wrong. do_softirq() can raise need_reshed
and moreover irqs arrive during it. Order of check should be different.


BTW what's about overhead... I suspect it is _lower_ in the case
of schedule(). In the case of networking at least, when softirq
most likely wakes some socket.

Alexey

2001-04-08 18:16:38

by Manfred Spraul

[permalink] [raw]
Subject: Re: softirq buggy [Re: Serial port latency]

From: <[email protected]>
To: "Manfred Spraul" <[email protected]>
Cc: <[email protected]>
Sent: Sunday, April 08, 2001 7:58 PM
Subject: Re: softirq buggy [Re: Serial port latency]


> Hello!
>
> > But with a huge overhead. I'd prefer to call it directly from within
the
> > idle functions, the overhead of schedule is IMHO too high.
>
>
> + if (current->need_resched) {
> + return 0;
> ^^^^^^^^
> + }
> + if (softirq_active(smp_processor_id()) &
softirq_mask(smp_processor_id())) {
> + do_softirq();
> + return 0;
> ^^^^^^^^^
> You return one value in both casesand I decided it means "schedule".
8)
> Apparently you meaned return 1 in the first case. 8)
>
No, the code is correct. 0 means "don't stop the cpu".
The pm_idle function pointer will return to cpu_idle()
(arch/i386/kernel/process.c), and that function contains another

while(!current->need_resched)
idle();

loop ;-)

> But in this case it becomes wrong. do_softirq() can raise need_reshed
> and moreover irqs arrive during it. Order of check should be
different.
>
Yes, I'll correct that.

>
> BTW what's about overhead... I suspect it is _lower_ in the case
> of schedule(). In the case of networking at least, when softirq
> most likely wakes some socket.
>
I'm not sure - what if the computer is just a router?
But OTHO: the cpu is idle, so it doesn't matter at all if the idle cpu
spends it's time within schedule() or within safe_hlt(), I'll change my
patch.

I have another question:
I added cpu_is_idle() into <linux/interrupt.h>. Is that acceptable, or
is there a better header file for such a function?

--
Manfred

2001-04-08 21:36:45

by Manfred Spraul

[permalink] [raw]
Subject: [PATCH] Re: softirq buggy

// $Header$
// Kernel Version:
// VERSION = 2
// PATCHLEVEL = 4
// SUBLEVEL = 3
// EXTRAVERSION = -ac3
--- 2.4/include/linux/pm.h Thu Jan 4 23:50:47 2001
+++ build-2.4/include/linux/pm.h Sun Apr 8 21:02:02 2001
@@ -186,6 +186,8 @@
extern void (*pm_idle)(void);
extern void (*pm_power_off)(void);

+int cpu_is_idle(void);
+
#endif /* __KERNEL__ */

#endif /* _LINUX_PM_H */
--- 2.4/kernel/sched.c Sat Apr 7 22:02:27 2001
+++ build-2.4/kernel/sched.c Sun Apr 8 21:02:37 2001
@@ -1242,6 +1242,28 @@
sched_data->last_schedule = get_cycles();
}

+/**
+ * cpu_is_idle - helper function for idle functions
+ *
+ * pm_idle functions must call this function to verify that
+ * the cpu is really idle.
+ * The return value is valid until local interrupts are
+ * reenabled.
+ * Return values:
+ * 1: go into power saving mode.
+ * 0: cpu is not idle, return to schedule()
+ */
+int cpu_is_idle(void)
+{
+ if (current->need_resched)
+ return 0;
+
+ if (softirq_active(smp_processor_id()) & softirq_mask(smp_processor_id()))
+ return 0;
+
+ return 1;
+}
+
extern void init_timervecs (void);

void __init sched_init(void)
--- 2.4/kernel/ksyms.c Sat Apr 7 22:02:27 2001
+++ build-2.4/kernel/ksyms.c Sun Apr 8 21:42:48 2001
@@ -440,6 +440,7 @@
#endif
EXPORT_SYMBOL(kstat);
EXPORT_SYMBOL(nr_running);
+EXPORT_SYMBOL(cpu_is_idle);

/* misc */
EXPORT_SYMBOL(panic);
--- 2.4/arch/i386/kernel/process.c Thu Feb 22 22:28:52 2001
+++ build-2.4/arch/i386/kernel/process.c Sun Apr 8 21:25:23 2001
@@ -81,7 +81,7 @@
{
if (current_cpu_data.hlt_works_ok && !hlt_counter) {
__cli();
- if (!current->need_resched)
+ if (cpu_is_idle())
safe_halt();
else
__sti();
@@ -105,13 +105,10 @@
*/
oldval = xchg(&current->need_resched, -1);

- if (!oldval)
- asm volatile(
- "2:"
- "cmpl $-1, %0;"
- "rep; nop;"
- "je 2b;"
- : :"m" (current->need_resched));
+ if (!oldval) {
+ while(cpu_is_idle())
+ rep_nop();
+ }
}

/*
@@ -131,7 +128,7 @@
void (*idle)(void) = pm_idle;
if (!idle)
idle = default_idle;
- while (!current->need_resched)
+ while (cpu_is_idle())
idle();
schedule();
check_pgt_cache();
--- 2.4/drivers/acpi/cpu.c Sat Apr 7 22:02:01 2001
+++ build-2.4/drivers/acpi/cpu.c Sat Apr 7 23:55:17 2001
@@ -148,7 +148,7 @@
unsigned long diff;

__cli();
- if (current->need_resched)
+ if (!cpu_is_idle())
goto out;
if (acpi_bm_activity())
goto sleep2;
@@ -171,7 +171,7 @@
unsigned long diff;

__cli();
- if (current->need_resched)
+ if (!cpu_is_idle())
goto out;
if (acpi_bm_activity())
goto sleep2;
@@ -205,7 +205,7 @@
unsigned long diff;

__cli();
- if (current->need_resched)
+ if (!cpu_is_idle())
goto out;

time = acpi_read_pm_timer();
@@ -235,7 +235,7 @@
unsigned long diff;

__cli();
- if (current->need_resched)
+ if (!cpu_is_idle())
goto out;
time = acpi_read_pm_timer();
acpi_c1_count++;


Attachments:
patch-cpu-idle (2.87 kB)

2001-04-09 08:42:34

by Albert D. Cahalan

[permalink] [raw]
Subject: Re: [PATCH] Re: softirq buggy

> I'd prefer to inline cpu_is_idle(), but optimizing the idle
> code path is probably not that important ;-)

Sure it is, in one way: how fast can you get back to work?
(not OK to take a millisecond getting out of the idle loop)

Subject: Re: [PATCH] Re: softirq buggy

> > I'd prefer to inline cpu_is_idle(), but optimizing the idle
> >code path is probably not that important ;-)
>
> Sure it is, in one way: how fast can you get back to work?
> (not OK to take a millisecond getting out of the idle loop)

2 short function calls instead of 2 "if(current->need_resched)" on the
way out.

I didn't try very hard to fix the inline dependencies, could you try to
move cpu_is_idle() from kernel/sched.c into <linux/pm.h>?

I'm sure it won't be more difficult than the last "Athlon+SMP doesn't
compile" problem ;-)

--
Manfred

2001-04-09 13:50:56

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH] Re: softirq buggy

On Sun, Apr 08, 2001 at 11:35:36PM +0200, Manfred Spraul wrote:
> I've attached a new patch:
>
> * cpu_is_idle() moved to <linux/pm.h>
> * function uninlined due to header dependencies
> * cpu_is_idle() doesn't call do_softirq directly, instead the caller
> returns to schedule()
> * cpu_is_idle() exported for modules.
> * docu updated.
>
> I'd prefer to inline cpu_is_idle(), but optimizing the idle code path is
> probably not that important ;-)

your cpu_is_idle will return 0 in the need_resched != 0 check even if the cpu
is idle (because of the -1 trick for avoiding the SMP-IPI to notify the cpu).

The issue you are addressing is quite londstanding and it is not only related
to the loop with an idle cpu.

This is the way I prefer to fix it:

ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.4pre1/ksoftirqd-1

Andrea

2001-04-09 15:26:34

by Manfred Spraul

[permalink] [raw]
Subject: Re: [PATCH] Re: softirq buggy

Andrea Arcangeli wrote:
>
> your cpu_is_idle will return 0 in the need_resched != 0 check even if the cpu
> is idle (because of the -1 trick for avoiding the SMP-IPI to notify the cpu).
>
Fixed.

> The issue you are addressing is quite londstanding and it is not only related
> to the loop with an idle cpu.
>
> This is the way I prefer to fix it:
>
> ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.4pre1/ksoftirqd-1
>
The return path to user space checks for pending softirqs. A delay of
1/HZ is only possible if the cpu loops in kernel space without returning
to user space - and the functions that can loop check
'current->need_resched'. That means that either cpu_is_idle() must be
renamed to schedule_required() and all 'need_resched' users should use
that function, or something like your patch.

Is a full thread really necessary? Just setting 'need_resched' should be
enough, schedule() checks for pending softirqs.
And do you have a rough idea how often that new thread is scheduled
under load?

Btw, you don't schedule the ksoftirqd thread if do_softirq() returns
from the 'if(in_interrupt())' check.
I assume that this is the most common case of delayed softirq
processing:

; in process context
spin_lock_bh();
; hw interrupt arrives
; do_softirq returns immediately
spin_unlock_bh();


--
Manfred

2001-04-09 17:30:16

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH] Re: softirq buggy

On Mon, Apr 09, 2001 at 05:26:27PM +0200, Manfred Spraul wrote:
> The return path to user space checks for pending softirqs. A delay of

And it breaks the loop too if new softirq events become pending again in
background.

> 1/HZ is only possible if the cpu loops in kernel space without returning
> to user space - and the functions that can loop check

It is also possible when new events are posted and I think it makes
sense to scale the softirq load with the scheduler when it is flooding.

Theoretically one could move the _whole_ softirq load into the ksoftirqd, but
that would increase the latency too much and I think it is better to use it
only as a fallback when we have to giveup but we still would like to keep
processing the softirq load so we let the scheduler to choose if we still can
do that or if we should giveup on the softirq.

> Is a full thread really necessary? Just setting 'need_resched' should be
> enough, schedule() checks for pending softirqs.

If you abuse need_resched then you can starve userspace again, if you are ok
to starve userspace indefinitely then it is more efficient to keep looping
forever into do_softirq as far as new events are posted in background instead
of exiting do_softirq and waiting the scheduler to kickin again.

> And do you have a rough idea how often that new thread is scheduled
> under load?

The scheduling is not as heavy as with tasks, it's a kernel thread
so the tlb isn't touched. However yes it will generate some overhead
with schedule() compared to just waiting the 1/HZ but letting the scheduler to
understand when the softirq should keep running instead of another task is
supposed to be a feature. I run a netpipe run with an alpha SMP as receiver
with the ksoftirqd patch and then without and the numbers didn't changed at all
even if the ksofitrqd was often running (1/2% of the load of the machine).

> Btw, you don't schedule the ksoftirqd thread if do_softirq() returns
> from the 'if(in_interrupt())' check.

That's not necessary and it's intentional, such check will be passed in the
last do_softirq executed before returning to userspace or kernel normal
context, the reason of such check is only to avoid recursing too much on the
stack during nested irqs.

> I assume that this is the most common case of delayed softirq
> processing:
>
> ; in process context
> spin_lock_bh();
> ; hw interrupt arrives
> ; do_softirq returns immediately
> spin_unlock_bh();

This is yet another case and it's handled before returning to userspace so the
latency should still be pretty small (and there would be no singificant
advantage and almost certainly only a performance drop in waking up ksoftirqd
from the `do_softirq returns immediatly' line).

Andrea


Attachments:
(No filename) (2.67 kB)
ksofitrqd.png (3.85 kB)
Download all attachments

2001-04-09 17:50:00

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH] Re: softirq buggy

Hello!

> Btw, you don't schedule the ksoftirqd thread if do_softirq() returns
> from the 'if(in_interrupt())' check.

ksoftirqd will not be switched to before the first schedule
or ret form syscall, when softirqs will be processed in any case.
So, wake up in this case would be mistake.


> I assume that this is the most common case of delayed softirq
> processing:

softirqs have the same latency warranty as rt threads, so that
this is not a problem at all.

The _real_ problem is softirqs generated from another softirqs:
additonal thread is made _not_ to speed up softirqs, but to _tame_
them (if I understood Andres's explanations correctly).

Alexey

2001-04-09 18:25:16

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH] Re: softirq buggy

On Mon, Apr 09, 2001 at 09:48:02PM +0400, [email protected] wrote:
> Hello!
>
> > Btw, you don't schedule the ksoftirqd thread if do_softirq() returns
> > from the 'if(in_interrupt())' check.
>
> ksoftirqd will not be switched to before the first schedule
> or ret form syscall, when softirqs will be processed in any case.
> So, wake up in this case would be mistake.

Agreed.

> The _real_ problem is softirqs generated from another softirqs:
> additonal thread is made _not_ to speed up softirqs, but to _tame_
> them (if I understood Andres's explanations correctly).

Exactly.

Andrea