2005-12-12 10:55:50

by Thomas Gleixner

[permalink] [raw]
Subject: [ANNOUNCE] 2.6.15-rc5-hrt2 - hrtimers based high resolution patches


The rebased version of the high resolution patches on top of the
hrtimers base patch is available from the new project home:

http://www.tglx.de/projetcs/hrtimers

The current patch is available here:

http://www.tglx.de/projects/hrtimers/2.6.15-rc5/patch-2.6.15-rc5-hrt2.patch

along with the broken out series

http://www.tglx.de/projects/hrtimers/2.6.15-rc5/patch-2.6.15-rc5-hrt2.patches.tar.bz2


Changes since the last 2.6.15-rc2-kthrt8 patch:

- rebased to hrtimers
- newest Generic Timeofday patch from John Stultz
- cleanups, bugfixes and improvements all over the place

Thanks to
- Roman Zippel for his help with ktime_t (see ktime.h), discussion and
suggestions.
- John Stultz for his timeofday work
- all others who provided testing, help and suggestions

tglx



2005-12-12 13:56:29

by Steven Rostedt

[permalink] [raw]
Subject: Re: [ANNOUNCE] 2.6.15-rc5-hrt2 - hrtimers based high resolution patches

Thomas or Ingo,

Are these patches included in the 2.6.15-rc5-rt1?

Thanks,

-- Steve

On Mon, 2005-12-12 at 12:02 +0100, Thomas Gleixner wrote:
> The rebased version of the high resolution patches on top of the
> hrtimers base patch is available from the new project home:
>
> http://www.tglx.de/projetcs/hrtimers
>
> The current patch is available here:
>
> http://www.tglx.de/projects/hrtimers/2.6.15-rc5/patch-2.6.15-rc5-hrt2.patch
>
> along with the broken out series
>
> http://www.tglx.de/projects/hrtimers/2.6.15-rc5/patch-2.6.15-rc5-hrt2.patches.tar.bz2
>
>
> Changes since the last 2.6.15-rc2-kthrt8 patch:
>
> - rebased to hrtimers
> - newest Generic Timeofday patch from John Stultz
> - cleanups, bugfixes and improvements all over the place
>
> Thanks to
> - Roman Zippel for his help with ktime_t (see ktime.h), discussion and
> suggestions.
> - John Stultz for his timeofday work
> - all others who provided testing, help and suggestions
>
> tglx
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Steven Rostedt
Senior Programmer
Kihon Technologies
(607)786-4830

2005-12-13 21:05:41

by Steven Rostedt

[permalink] [raw]
Subject: Re: [ANNOUNCE] 2.6.15-rc5-hrt2 - hrtimers based high resolution patches

Thomas and Ingo,

I found a bug in 2.6.15-rc5-rt1 that was due to a race in the hrtimers
code. This bug is most likely in the vanilla hrtimers code as well.

I added a HRTIMER_RUNNING state because there's a moment in the
run_hrtimer_queues that turns interrupts on and releases the base lock.
In this time, a remove_hrtimer can be called while the state is still
HRTIMER_PENDING_CALLBACK, but it has been removed off the list. The
remove_hrtimer will then try to remove this again.

Since I couldn't think of which state to use, I created the
HRTIMER_RUNNING, and used that instead.

I have a program (a simple jitter test) that, with out the patch,
reliably crashes the 2.6.15-rc5-rt1 on a slow UP machine. With the
patch, it runs solidly, so I know this is the reason for the crash.

-- Steve

Index: linux-2.6.15-rc5-rt1/kernel/hrtimer.c
===================================================================
--- linux-2.6.15-rc5-rt1.orig/kernel/hrtimer.c 2005-12-13 15:51:52.000000000 -0500
+++ linux-2.6.15-rc5-rt1/kernel/hrtimer.c 2005-12-13 15:53:36.000000000 -0500
@@ -846,6 +846,7 @@
data = timer->data;
set_curr_timer(base, timer);
list_del(&timer->list);
+ timer->state = HRTIMER_RUNNING;
spin_unlock_irq(&base->lock);

/*
@@ -904,6 +905,7 @@
data = timer->data;
set_curr_timer(base, timer);
__remove_hrtimer(timer, base);
+ timer->state = HRTIMER_RUNNING;
spin_unlock_irq(&base->lock);

/*
Index: linux-2.6.15-rc5-rt1/include/linux/hrtimer.h
===================================================================
--- linux-2.6.15-rc5-rt1.orig/include/linux/hrtimer.h 2005-12-13 15:51:40.000000000 -0500
+++ linux-2.6.15-rc5-rt1/include/linux/hrtimer.h 2005-12-13 15:55:21.000000000 -0500
@@ -41,6 +41,7 @@
enum hrtimer_state {
HRTIMER_INACTIVE, /* Timer is inactive */
HRTIMER_EXPIRED, /* Timer is expired */
+ HRTIMER_RUNNING, /* Timer is currently running */
HRTIMER_PENDING, /* Timer is pending */
HRTIMER_PENDING_CALLBACK, /* Timer is expired, callback pending */
};


2005-12-14 08:41:25

by Ingo Molnar

[permalink] [raw]
Subject: Re: [ANNOUNCE] 2.6.15-rc5-hrt2 - hrtimers based high resolution patches


* Steven Rostedt <[email protected]> wrote:

> Thomas and Ingo,
>
> I found a bug in 2.6.15-rc5-rt1 that was due to a race in the hrtimers
> code. This bug is most likely in the vanilla hrtimers code as well.
>
> I added a HRTIMER_RUNNING state because there's a moment in the
> run_hrtimer_queues that turns interrupts on and releases the base
> lock. In this time, a remove_hrtimer can be called while the state is
> still HRTIMER_PENDING_CALLBACK, but it has been removed off the list.
> The remove_hrtimer will then try to remove this again.
>
> Since I couldn't think of which state to use, I created the
> HRTIMER_RUNNING, and used that instead.
>
> I have a program (a simple jitter test) that, with out the patch,
> reliably crashes the 2.6.15-rc5-rt1 on a slow UP machine. With the
> patch, it runs solidly, so I know this is the reason for the crash.

hm, in that case it should be base->running that protects against this
case. Did you run an UP PREEMPT_RT kernel? In that case could you check
whether the fix below resolves the crash too?

Ingo

Index: linux/kernel/hrtimer.c
===================================================================
--- linux.orig/kernel/hrtimer.c
+++ linux/kernel/hrtimer.c
@@ -118,7 +118,7 @@ static DEFINE_PER_CPU(struct hrtimer_bas
* Functions and macros which are different for UP/SMP systems are kept in a
* single place
*/
-#ifdef CONFIG_SMP
+#if defined(CONFIG_SMP) || defined(PREEMPT_RT)

#define set_curr_timer(b, t) do { (b)->curr_timer = (t); } while (0)

2005-12-14 08:44:22

by Ingo Molnar

[permalink] [raw]
Subject: Re: [ANNOUNCE] 2.6.15-rc5-hrt2 - hrtimers based high resolution patches


* Ingo Molnar <[email protected]> wrote:

> hm, in that case it should be base->running that protects against this
> case. Did you run an UP PREEMPT_RT kernel? In that case could you
> check whether the fix below resolves the crash too?

try the patch below instead. (Thomas pointed out that the correct
dependency in this case is PREEMPT_SOFTIRQS)

Ingo

Index: linux/kernel/hrtimer.c
===================================================================
--- linux.orig/kernel/hrtimer.c
+++ linux/kernel/hrtimer.c
@@ -118,7 +118,7 @@ static DEFINE_PER_CPU(struct hrtimer_bas
* Functions and macros which are different for UP/SMP systems are kept in a
* single place
*/
-#ifdef CONFIG_SMP
+#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_SOFTIRQS)

#define set_curr_timer(b, t) do { (b)->curr_timer = (t); } while (0)

2005-12-14 13:48:38

by Steven Rostedt

[permalink] [raw]
Subject: Re: [ANNOUNCE] 2.6.15-rc5-hrt2 - hrtimers based high resolution patches

On Wed, 2005-12-14 at 09:43 +0100, Ingo Molnar wrote:
> * Ingo Molnar <[email protected]> wrote:
>
> > hm, in that case it should be base->running that protects against this
> > case. Did you run an UP PREEMPT_RT kernel? In that case could you
> > check whether the fix below resolves the crash too?
>
> try the patch below instead. (Thomas pointed out that the correct
> dependency in this case is PREEMPT_SOFTIRQS)
>
> Ingo
>
> Index: linux/kernel/hrtimer.c
> ===================================================================
> --- linux.orig/kernel/hrtimer.c
> +++ linux/kernel/hrtimer.c
> @@ -118,7 +118,7 @@ static DEFINE_PER_CPU(struct hrtimer_bas
> * Functions and macros which are different for UP/SMP systems are kept in a
> * single place
> */
> -#ifdef CONFIG_SMP
> +#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_SOFTIRQS)
>
> #define set_curr_timer(b, t) do { (b)->curr_timer = (t); } while (0)
>

With the above patch, I get the following soft lockup.

BUG: jitter_inc:3085, possible softlockup detected on CPU#0!
[<c0104223>] dump_stack+0x23/0x30 (20)
[<c014c226>] softlockup_detected+0x46/0x60 (24)
[<c014c301>] softlockup_tick+0xc1/0xd0 (20)
[<c0126aaa>] update_process_times+0x8a/0x180 (32)
[<c013e79a>] handle_tick_update+0x5a/0x70 (16)
[<c013e609>] timer_interrupt+0x39/0x110 (24)
[<c014c77d>] handle_IRQ_event+0x7d/0x110 (52)
[<c014cbfd>] __do_IRQ+0xcd/0x170 (44)
[<c0105724>] do_IRQ+0x44/0x60 (28)
[<c0103ce6>] common_interrupt+0x1a/0x20 (80)
[<c0136b98>] hrtimer_cancel+0x18/0x20 (16)
[<c0339d6d>] schedule_hrtimer+0x4d/0xb0 (44)
[<c013723e>] hrtimer_nanosleep+0x5e/0x130 (104)
[<c0137383>] sys_nanosleep+0x73/0x80 (36)
[<c010329a>] syscall_call+0x7/0xb (-8116)
---------------------------
| preempt count: 00010001 ]
| 1-level deep critical section nesting:
----------------------------------------
.. [<c0142b8c>] .... add_preempt_count+0x1c/0x20
.....[<c0136264>] .. ( <= lock_hrtimer_base+0x34/0x80)

The system is locked up with this periodically spitting out. My only
recourse is to hit the big P button (or the litter r one).

And going into gdb, I get:

(gdb) li *0xc0136b98
0xc0136b98 is in hrtimer_cancel (kernel/hrtimer.c:671).
666 int hrtimer_cancel(struct hrtimer *timer)
667 {
668 for (;;) {
669 int ret = hrtimer_try_to_cancel(timer);
670
671 if (ret >= 0)
672 return ret;
673 }
674 }
675

So it may not really be locked, and if I waited a couple of hours, it
might actually finish (the test usually takes a couple of minutes to
run, and I let it run here for about 20 minutes).

Yeah, the above code would be very bad if this happens after preempting
the running timer.

So the fix to this, in the case of preempting the softirq, that we need
to introduce some wait queue that will allow processes to wait for the
softirq to finish, and then the softirq will wake up all the processes.

Ingo, would something like that fly?

-- Steve


2005-12-14 13:54:28

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [ANNOUNCE] 2.6.15-rc5-hrt2 - hrtimers based high resolution patches

On Wed, 2005-12-14 at 08:48 -0500, Steven Rostedt wrote:
> On Wed, 2005-12-14 at 09:43 +0100, Ingo Molnar wrote:
> > * Ingo Molnar <[email protected]> wrote:

> And going into gdb, I get:
>
> (gdb) li *0xc0136b98
> 0xc0136b98 is in hrtimer_cancel (kernel/hrtimer.c:671).
> 666 int hrtimer_cancel(struct hrtimer *timer)
> 667 {
> 668 for (;;) {
> 669 int ret = hrtimer_try_to_cancel(timer);
> 670
> 671 if (ret >= 0)
> 672 return ret;
> 673 }
> 674 }
> 675
>
> So it may not really be locked, and if I waited a couple of hours, it
> might actually finish (the test usually takes a couple of minutes to
> run, and I let it run here for about 20 minutes).
>
> Yeah, the above code would be very bad if this happens after preempting
> the running timer.
>
> So the fix to this, in the case of preempting the softirq, that we need
> to introduce some wait queue that will allow processes to wait for the
> softirq to finish, and then the softirq will wake up all the processes.

We had the waitqueue in the ktimer based -rt patches and did not add it
back.

tglx


2005-12-14 14:01:44

by Steven Rostedt

[permalink] [raw]
Subject: Re: [ANNOUNCE] 2.6.15-rc5-hrt2 - hrtimers based high resolution patches


On Wed, 14 Dec 2005, Thomas Gleixner wrote:

> >
> > So the fix to this, in the case of preempting the softirq, that we need
> > to introduce some wait queue that will allow processes to wait for the
> > softirq to finish, and then the softirq will wake up all the processes.
>
> We had the waitqueue in the ktimer based -rt patches and did not add it
> back.
>

Actually, to save on the memory, I was thinking of a general purpose wait
queue, that is either one for all softirqs, or one for each particular
softirq, that the softirq handler would wakeup when finished.

Thoughts?

-- Steve

2005-12-14 14:05:05

by Steven Rostedt

[permalink] [raw]
Subject: Re: [ANNOUNCE] 2.6.15-rc5-hrt2 - hrtimers based high resolution patches

On Wed, 2005-12-14 at 08:48 -0500, Steven Rostedt wrote:

> And going into gdb, I get:
>
> (gdb) li *0xc0136b98
> 0xc0136b98 is in hrtimer_cancel (kernel/hrtimer.c:671).
> 666 int hrtimer_cancel(struct hrtimer *timer)
> 667 {
> 668 for (;;) {
> 669 int ret = hrtimer_try_to_cancel(timer);
> 670
> 671 if (ret >= 0)
> 672 return ret;
> 673 }
> 674 }
> 675
>
> So it may not really be locked, and if I waited a couple of hours, it
> might actually finish (the test usually takes a couple of minutes to
> run, and I let it run here for about 20 minutes).

Nope, looking at my test code again, I forgot that I set the program to
max prio, so this would never finish.

-- Steve


2005-12-14 15:44:15

by Steven Rostedt

[permalink] [raw]
Subject: [PATCH -RT] Add softirq waitqueue for CONFIG_PREEMPT_SOFTIRQ (was: Re: [ANNOUNCE] 2.6.15-rc5-hrt2 ...)

On Wed, 2005-12-14 at 15:01 +0100, Thomas Gleixner wrote:
> >
> > So the fix to this, in the case of preempting the softirq, that we need
> > to introduce some wait queue that will allow processes to wait for the
> > softirq to finish, and then the softirq will wake up all the processes.
>
> We had the waitqueue in the ktimer based -rt patches and did not add it
> back.

Ingo,

This patch fixes the problem I have. I've added a waitqueue per all
softirq threads IFF a CONFIG_PREEMPT_SOFTIRQ is set.

When a softirqd is running, it sets a flag saying it is (this can
probably be done better), and then executes what it needs to. Before
going back to sleep, it clears the flag and wakes up any processes that
might be waiting to be woken up when it is done.

I added the function wait_for_softirq(int softirq) that is a no-op when
CONFIG_PREEMPT_SOFTIRQ is off. This function will add the process to
the wait queue of the softirq that it is waiting to finish.

I really would like to add priority inheritance to waitqueues, so things
like this can benefit from PI as well.

Take a look, and let me know what you think.

Thanks,

-- Steve

This patch was applied on top of yours.

Index: linux-2.6.15-rc5-rt1/kernel/softirq.c
===================================================================
--- linux-2.6.15-rc5-rt1.orig/kernel/softirq.c 2005-12-14 10:17:47.000000000 -0500
+++ linux-2.6.15-rc5-rt1/kernel/softirq.c 2005-12-14 10:33:31.000000000 -0500
@@ -22,6 +22,7 @@
#include <linux/kallsyms.h>
#include <linux/syscalls.h>
#include <linux/delay.h>
+#include <linux/wait.h>

#include <asm/irq.h>
/*
@@ -53,10 +54,38 @@
int nr;
unsigned long cpu;
struct task_struct *tsk;
+#ifdef CONFIG_PREEMPT_SOFTIRQS
+ wait_queue_head_t wait;
+ int running;
+#endif
};

static DEFINE_PER_CPU(struct softirqdata, ksoftirqd[MAX_SOFTIRQ]);

+#ifdef CONFIG_PREEMPT_SOFTIRQS
+/*
+ * Preempting the softirq causes cases that would not be a
+ * problem when the softirq is not preempted. That is a
+ * process may have code to spin while waiting for a softirq
+ * to finish on another CPU. But if it happens that the
+ * process has preempted the softirq, this could cause a
+ * deadlock.
+ */
+void wait_for_softirq(int softirq)
+{
+ struct softirqdata *data = &__get_cpu_var(ksoftirqd[softirq]);
+ if (data->running) {
+ DECLARE_WAITQUEUE(wait, current);
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ add_wait_queue(&data->wait, &wait);
+ if (data->running)
+ schedule();
+ remove_wait_queue(&data->wait, &wait);
+ __set_current_state(TASK_RUNNING);
+ }
+}
+#endif
+
/*
* we cannot loop indefinitely here to avoid userspace starvation,
* but we also don't want to introduce a worst case 1/HZ latency
@@ -460,6 +489,10 @@
// set_user_nice(current, -10);
current->flags |= PF_NOFREEZE | PF_SOFTIRQ;

+#ifdef CONFIG_PREEMPT_SOFTIRQS
+ init_waitqueue_head(&data->wait);
+#endif
+
set_current_state(TASK_INTERRUPTIBLE);

while (!kthread_should_stop()) {
@@ -469,6 +502,9 @@
schedule();
preempt_disable();
}
+#ifdef CONFIG_PREEMPT_SOFTIRQS
+ data->running = 1;
+#endif
__set_current_state(TASK_RUNNING);

while (local_softirq_pending() & mask) {
@@ -493,6 +529,10 @@
cond_resched();
preempt_disable();
}
+#ifdef CONFIG_PREEMPT_SOFTIRQS
+ data->running = 0;
+ wake_up(&data->wait);
+#endif
preempt_enable();
set_current_state(TASK_INTERRUPTIBLE);
}
Index: linux-2.6.15-rc5-rt1/include/linux/interrupt.h
===================================================================
--- linux-2.6.15-rc5-rt1.orig/include/linux/interrupt.h 2005-12-12 16:31:25.000000000 -0500
+++ linux-2.6.15-rc5-rt1/include/linux/interrupt.h 2005-12-14 10:29:51.000000000 -0500
@@ -145,6 +145,11 @@
extern void FASTCALL(raise_softirq(unsigned int nr));
extern void wakeup_irqd(void);

+#ifdef CONFIG_PREEMPT_SOFTIRQS
+extern void wait_for_softirq(int softirq);
+#else
+# define wait_for_softirq(x) do {} while(0)
+#endif

/* Tasklets --- multithreaded analogue of BHs.

Index: linux-2.6.15-rc5-rt1/kernel/hrtimer.c
===================================================================
--- linux-2.6.15-rc5-rt1.orig/kernel/hrtimer.c 2005-12-14 10:17:53.000000000 -0500
+++ linux-2.6.15-rc5-rt1/kernel/hrtimer.c 2005-12-14 10:29:07.000000000 -0500
@@ -670,6 +670,7 @@

if (ret >= 0)
return ret;
+ wait_for_softirq(HRTIMER_SOFTIRQ);
}
}



2005-12-14 16:18:01

by Daniel Walker

[permalink] [raw]
Subject: Re: [PATCH -RT] Add softirq waitqueue for CONFIG_PREEMPT_SOFTIRQ (was: Re: [ANNOUNCE] 2.6.15-rc5-hrt2 ...)

On Wed, 2005-12-14 at 10:43 -0500, Steven Rostedt wrote:

> I really would like to add priority inheritance to waitqueues, so things
> like this can benefit from PI as well.
>

I submitted a generic PI patch a while back, but the design was not that
great. It had callback functions for getting the lock owner , and for
signaling when changing priorities ..

Here it is,
http://lkml.org/lkml/2005/5/31/288

I think it may be pretty trivial to add priority sorted waitqueues by
just adding a plist .

Daniel

2005-12-14 18:26:36

by Jeff Carr

[permalink] [raw]
Subject: Re: [ANNOUNCE] 2.6.15-rc5-hrt2 - hrtimers based high resolution patches

Linux version 2.6.15-rc5-hrt2-051213 (root@ATCA14) (gcc version 3.3.5 (Debian 1:3.3.5-1)) #5 SMP PREEMPT Tue Dec 13 23:49:48 PST 2005
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009e800 (usable)
BIOS-e820: 000000000009e800 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000007fff0000 (usable)
BIOS-e820: 000000007fff0000 - 000000007fffe000 (ACPI data)
BIOS-e820: 000000007fffe000 - 0000000080000000 (ACPI NVS)
BIOS-e820: 00000000fec00000 - 00000000fec86000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved)
1151MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000ff780
On node 0 totalpages: 524272
DMA zone: 4096 pages, LIFO batch:0
DMA32 zone: 0 pages, LIFO batch:0
Normal zone: 225280 pages, LIFO batch:31
HighMem zone: 294896 pages, LIFO batch:31
DMI 2.3 present.
ACPI: RSDP (v000 ACPIAM ) @ 0x000f78d0
ACPI: RSDT (v001 A M I OEMRSDT 0x06000517 MSFT 0x00000097) @ 0x7fff0000
ACPI: FADT (v002 A M I OEMFACP 0x06000517 MSFT 0x00000097) @ 0x7fff0200
ACPI: MADT (v001 A M I OEMAPIC 0x06000517 MSFT 0x00000097) @ 0x7fff0390
ACPI: MCFG (v001 A M I OEMMCFG 0x06000517 MSFT 0x00000097) @ 0x7fff0410
ACPI: OEMB (v001 A M I AMI_OEM 0x06000517 MSFT 0x00000097) @ 0x7fffe040
ACPI: DSDT (v001 0AAAA ATC5232 0x00000000 INTL 0x02002026) @ 0x00000000
ACPI: PM-Timer IO Port: 0x408
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:4 APIC version 20
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x06] enabled)
Processor #6 15:4 APIC version 20
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x01] enabled)
Processor #1 15:4 APIC version 20
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x07] enabled)
Processor #7 15:4 APIC version 20
ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23
ACPI: IOAPIC (id[0x09] address[0xfec10000] gsi_base[24])
IOAPIC[1]: apic_id 9, version 32, address 0xfec10000, GSI 24-47
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Enabling APIC mode: Flat. Using 2 I/O APICs
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 88000000 (gap: 80000000:7ec00000)
Detected 2793.270 MHz processor.
Built 1 zonelists
Kernel command line: root=/dev/hda1 ro console=ttyS0,115200
mapped APIC to ffffd000 (fee00000)
mapped IOAPIC to ffffc000 (fec00000)
mapped IOAPIC to ffffb000 (fec10000)
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 65536 bytes)
Event source pit installed with caps set: 01
Console: colour dummy device 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 2061876k/2097088k available (1750k kernel code, 33860k reserved, 617k data, 208k init, 1179584k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 5588.01 BogoMIPS (lpj=2794006)
Security Framework v1.0.0 initialized
SELinux: Disabled at boot.
Mount-cache hash table entries: 512
CPU: After generic identify, caps: bfebfbff 20100000 00000000 00000000 0000641d 00000000 00000000
CPU: After vendor identify, caps: bfebfbff 20100000 00000000 00000000 0000641d 00000000 00000000
monitor/mwait feature present.
using mwait in idle threads.
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 0
CPU: After all inits, caps: bfebfbff 20100000 00000000 00000080 0000641d 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU0: Intel P4/Xeon Extended MCE MSRs (24) available
CPU0: Thermal monitoring enabled
mtrr: v2.0 (20020519)
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
CPU0: Intel(R) Xeon(TM) CPU 2.80GHz stepping 01
Booting processor 1/1 eip 2000
Initializing CPU#1
Calibrating delay using timer specific routine.. 5585.25 BogoMIPS (lpj=2792629)
CPU: After generic identify, caps: bfebfbff 20100000 00000000 00000000 0000641d 00000000 00000000
CPU: After vendor identify, caps: bfebfbff 20100000 00000000 00000000 0000641d 00000000 00000000
monitor/mwait feature present.
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 0
CPU: After all inits, caps: bfebfbff 20100000 00000000 00000080 0000641d 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: Intel P4/Xeon Extended MCE MSRs (24) available
CPU1: Thermal monitoring enabled
CPU1: Intel(R) Xeon(TM) CPU 2.80GHz stepping 01
Booting processor 2/6 eip 2000
Initializing CPU#2
Calibrating delay using timer specific routine.. 5585.26 BogoMIPS (lpj=2792630)
CPU: After generic identify, caps: bfebfbff 20100000 00000000 00000000 0000641d 00000000 00000000
CPU: After vendor identify, caps: bfebfbff 20100000 00000000 00000000 0000641d 00000000 00000000
monitor/mwait feature present.
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 3
CPU: After all inits, caps: bfebfbff 20100000 00000000 00000080 0000641d 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#2.
CPU2: Intel P4/Xeon Extended MCE MSRs (24) available
CPU2: Thermal monitoring enabled
CPU2: Intel(R) Xeon(TM) CPU 2.80GHz stepping 01
Booting processor 3/7 eip 2000
Initializing CPU#3
Calibrating delay using timer specific routine.. 5585.29 BogoMIPS (lpj=2792645)
CPU: After generic identify, caps: bfebfbff 20100000 00000000 00000000 0000641d 00000000 00000000
CPU: After vendor identify, caps: bfebfbff 20100000 00000000 00000000 0000641d 00000000 00000000
monitor/mwait feature present.
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 3
CPU: After all inits, caps: bfebfbff 20100000 00000000 00000080 0000641d 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#3.
CPU3: Intel P4/Xeon Extended MCE MSRs (24) available
CPU3: Thermal monitoring enabled
CPU3: Intel(R) Xeon(TM) CPU 2.80GHz stepping 01
Total of 4 processors activated (22343.82 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
Event source lapic installed with caps set: 0e
checking TSC synchronization across 4 CPUs: passed.
Event source lapic installed with caps set: 0e
Event source lapic installed with caps set: 0e
Event source lapic installed with caps set: 0e
Brought up 4 CPUs
checking if image is initramfs...it isn't (bad gzip magic numbers); looks like an initrd
Freeing initrd memory: 5228k freed
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: PCI BIOS revision 2.10 entry at 0xf0031, last bus=5
PCI: Using MMCONFIG
ACPI: Subsystem revision 20050902
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Probing PCI hardware (bus 00)
PCI quirk: region 0400-047f claimed by ICH4 ACPI/GPIO/TCO
PCI quirk: region 0500-053f claimed by ICH4 GPIO
PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.1
PCI: Transparent bridge - 0000:00:1e.0
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P1._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.EPB0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P2._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 *10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 *10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 *9 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 *5 6 7 9 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 7 9 10 *11 12 14 15)
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
pnp: PnP ACPI: found 10 devices
PnPBIOS: Disabled by ACPI PNP
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report
pnp: 00:05: ioport range 0x680-0x6ff has been reserved
PCI: Bridge: 0000:00:02.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:00:04.0
IO window: disabled.
MEM window: fe900000-feafffff
PREFETCH window: fd500000-fe5fffff
PCI: Bridge: 0000:00:06.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:00:1c.0
IO window: d000-dfff
MEM window: fe800000-fe8fffff
PREFETCH window: fd400000-fd4fffff
PCI: Bridge: 0000:00:1e.0
IO window: c000-cfff
MEM window: fe700000-fe7fffff
PREFETCH window: disabled.
ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:00:02.0 to 64
ACPI: PCI Interrupt 0000:00:04.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:00:04.0 to 64
ACPI: PCI Interrupt 0000:00:06.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:00:06.0 to 64
PCI: Setting latency timer of device 0000:00:1e.0 to 64
highmem bounce pool size: 64 pages
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
Initializing Cryptographic API
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:00:02.0 to 64
Allocate Port Service[pcie00]
Allocate Port Service[pcie01]
ACPI: PCI Interrupt 0000:00:04.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:00:04.0 to 64
Allocate Port Service[pcie00]
Allocate Port Service[pcie01]
ACPI: PCI Interrupt 0000:00:06.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:00:06.0 to 64
Allocate Port Service[pcie00]
Allocate Port Service[pcie01]
isapnp: Scanning for PnP cards...
Time: tsc clocksource has been installed.
check_monotonic_clock: monotonic inconsistency detected!
from 2e5d8586 (777880966) to 2bd16c47 (735145031).
Badness in check_monotonic_clock at kernel/time/timeofday.c:161
[<c0103dd7>] dump_stack+0x1e/0x22
[<c0134782>] check_monotonic_clock+0xd4/0xd9
[<c0134c95>] get_realtime_clock+0xc9/0x10b
[<c01338b3>] hrtimer_run_queues+0x4b/0x10b
[<c01258eb>] run_timer_softirq+0x1f/0x1c7
[<c01215bd>] __do_softirq+0xc1/0xd6
[<c012161d>] do_softirq+0x4b/0x4d
[<c0121707>] irq_exit+0x47/0x49
[<c0110fcd>] smp_apic_timer_interrupt+0x42/0x50
[<c01039a8>] apic_timer_interrupt+0x1c/0x24
[<c0100e08>] cpu_idle+0x89/0x9f
[<c010fcda>] start_secondary+0x17f/0x2ef
[<00000000>] _stext+0x3feffd68/0x49
[<dfaa3fb4>] 0xdfaa3fb4
BUG in read_tsc(): TSC went backward!
... Unsynced TSCs?
... [ from 000000003aeeffc4 to 000000003aeedc00 ]
hrtimers: Switched to high resolution mode CPU 0
hrtimers: Switched to high resolution mode CPU 1
hrtimers: Switched to high resolution mode CPU 3
hrtimers: Switched to high resolution mode CPU 2
isapnp: No Plug & Play device found
PNP: No PS/2 controller found. Probing ports directly.
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
00:04: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
RAMDISK driver initialized: 16 RAM disks of 8192K size 1024 blocksize
NET: Registered protocol family 2
IP route cache hash table entries: 131072 (order: 7, 524288 bytes)
TCP established hash table entries: 131072 (order: 9, 3145728 bytes)
TCP bind hash table entries: 65536 (order: 8, 1572864 bytes)
TCP: Hash tables configured (established 131072 bind 65536)
TCP reno registered
TCP bic registered
NET: Registered protocol family 8
NET: Registered protocol family 20
Using IPI Shortcut mode
RAMDISK: cramfs filesystem found at block 0
RAMDISK: Loading 5228KiB [1 disk] into ram disk... |/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-done.
VFS: Mounted root (cramfs filesystem) readonly.
Freeing unused kernel memory: 208k freed
NET: Registered protocol family 1
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ICH5: IDE controller at PCI slot 0000:00:1f.1
ACPI: PCI Interrupt 0000:00:1f.1[A] -> GSI 18 (level, low) -> IRQ 177
ICH5: chipset revision 2
ICH5: not 100% native mode: will probe irqs later
ide0: BM-DMA at 0xfc00-0xfc07, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0xfc08-0xfc0f, BIOS settings: hdc:pio, hdd:pio
Probing IDE interface ide0...
hda: HTS424040M9AT00, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
Probing IDE interface ide1...
hda: max request size: 1024KiB
hda: 78140160 sectors (40007 MB) w/1739KiB Cache, CHS=16383/255/63, UDMA(33)
hda: cache flushes supported
hda: hda1 hda2
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
Adding 208836k swap on /dev/hda2. Priority:-1 extents:1 across:208836k
EXT3 FS on hda1, internal journal
do_settimeofday() was called!
do_settimeofday() was called!
check_periodic_interval: short interval! 49999544.
bad calibration or timers may be broken.
Intel(R) PRO/1000 Network Driver - version 6.1.16-k2-NAPI
Copyright (c) 1999-2005 Intel Corporation.
ACPI: PCI Interrupt 0000:02:0d.0[A] -> GSI 25 (level, low) -> IRQ 185
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
ACPI: PCI Interrupt 0000:02:0d.1[B] -> GSI 26 (level, low) -> IRQ 193
e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection
ACPI: PCI Interrupt 0000:01:0e.0[A] -> GSI 18 (level, low) -> IRQ 177
e1000: eth2: e1000_probe: Intel(R) PRO/1000 Network Connection
usbcore: registered new driver usbfs
usbcore: registered new driver hub
ACPI: PCI Interrupt 0000:00:1d.7[D] -> GSI 23 (level, low) -> IRQ 201
PCI: Setting latency timer of device 0000:00:1d.7 to 64
ehci_hcd 0000:00:1d.7: EHCI Host Controller
ehci_hcd 0000:00:1d.7: debug port 1
PCI: cache line size of 128 is not supported by device 0000:00:1d.7
ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 1
ehci_hcd 0000:00:1d.7: irq 201, io mem 0xfebff800
ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 2 ports detected
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
check_periodic_interval: short interval! 49998755.
bad calibration or timers may be broken.
USB Universal Host Controller Interface driver v2.3
ACPI: PCI Interrupt 0000:00:1d.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:00:1d.0 to 64
uhci_hcd 0000:00:1d.0: UHCI Host Controller
uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 2
uhci_hcd 0000:00:1d.0: irq 169, io base 0x0000ec00
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 2 ports detected
i6300ESB timer: initialized (0xf88b2c00). heartbeat=30 sec (nowayout=0)
check_periodic_interval: short interval! 49998757.
bad calibration or timers may be broken.
hw_random: RNG not detected
ib_mthca: Mellanox InfiniBand HCA driver v0.06 (June 23, 2005)
ib_mthca: Initializing 0000:03:00.0
ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:03:00.0 to 64
ib_mthca 0000:03:00.0: HCA FW version 5.0.1 is old (5.1.0 is current).
ib_mthca 0000:03:00.0: If you have problems, try updating your HCA FW.
check_periodic_interval: short interval! 49997913.
bad calibration or timers may be broken.
Real Time Clock Driver v1.12
input: PC Speaker as /class/input/input0
e1000: eth0: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex
ttyS1: LSR safety check engaged!
ttyS1: LSR safety check engaged!
do_settimeofday() was called!


Attachments:
config-2.6.15-rc5-hrt2-051213 (64.17 kB)
dmesg (16.61 kB)
Download all attachments

2005-12-14 19:28:45

by Steven Rostedt

[permalink] [raw]
Subject: Re: [ANNOUNCE] 2.6.15-rc5-hrt2 - hrtimers based high resolution patches

On Wed, 2005-12-14 at 09:04 -0500, Steven Rostedt wrote:
> On Wed, 2005-12-14 at 08:48 -0500, Steven Rostedt wrote:
>
> > And going into gdb, I get:
> >
> > (gdb) li *0xc0136b98
> > 0xc0136b98 is in hrtimer_cancel (kernel/hrtimer.c:671).
> > 666 int hrtimer_cancel(struct hrtimer *timer)
> > 667 {
> > 668 for (;;) {
> > 669 int ret = hrtimer_try_to_cancel(timer);
> > 670
> > 671 if (ret >= 0)
> > 672 return ret;
> > 673 }
> > 674 }
> > 675
> >
> > So it may not really be locked, and if I waited a couple of hours, it
> > might actually finish (the test usually takes a couple of minutes to
> > run, and I let it run here for about 20 minutes).

Ingo,

Running my jitter test shows a slow down on the responsiveness of
nanosleep.

In 2.6.14-rt2 (the only 2.6.14-rtx laying around still) I get:

reported res = 0.000001000
starting calibrate
finished calibrate: 736.0509MHz 736050890

Requested time Max error
-------------- --- -----
0.000001000 0.000032218 0.000031218
0.000010000 0.000033196 0.000023196
0.000100000 0.000124324 0.000024324
0.001000000 0.001027797 0.000027797
0.010000000 0.010025151 0.000025151
0.100000000 0.100020185 0.000020185
1.000000000 1.000024266 0.000024266
1.000001000 1.000028153 0.000027153
1.000010000 1.000036685 0.000026685
1.000100000 1.000122415 0.000022415
1.001000000 1.001023463 0.000023463
1.010000000 1.010021942 0.000021942
1.100000000 1.100025656 0.000025656
2.000000000 2.000028559 0.000028559

errors off by 20 - 32 usecs

On 2.6.15-rc5-rt1 (with my patches) on the same machine I get:

reported res = 0.000001000
starting calibrate
finished calibrate: 736.0510MHz 736051025

Requested time Max error
-------------- --- -----
0.000001000 0.000089476 0.000088476
0.000010000 0.000045054 0.000035054
0.000100000 0.000135034 0.000035034
0.001000000 0.001041864 0.000041864
0.010000000 0.010042513 0.000042513
0.100000000 0.100031702 0.000031702
1.000000000 1.000041268 0.000041268
1.000001000 1.000044342 0.000043342
1.000010000 1.000055852 0.000045852
1.000100000 1.000157908 0.000057908
1.001000000 1.001055853 0.000055853
1.010000000 1.010036436 0.000036436
1.100000000 1.100040173 0.000040173
2.000000000 2.000060079 0.000060079


errors off by 35 - 89 usecs

Is this normal??

Attached is the jitter test that I ran (also the test that will crash
the system without my patches).

-- Steve


Attachments:
jitter_inc.c (2.43 kB)

2005-12-15 00:36:00

by Jeff Carr

[permalink] [raw]
Subject: Re: [ANNOUNCE] 2.6.15-rc5-hrt2 - hrtimers based high resolution patches

On 12/12/05 03:02, Thomas Gleixner wrote:
> The rebased version of the high resolution patches on top of the
> hrtimers base patch is available from the new project home:
>
> http://www.tglx.de/projetcs/hrtimers
>
> The current patch is available here:
>
> http://www.tglx.de/projects/hrtimers/2.6.15-rc5/patch-2.6.15-rc5-hrt2.patch

This is a simple module to start a hrtimer about 20k times a second. I
don't see a way to correctly restart a hrtimer or set one to be periodic
so this is inefficiently bouncing between two timers. Calling
restart_hrtimer() from within the hrtimer.function causes a panic. I was
wondering what the correct method would be.

I had to export some symbols from hrtimer.c so I could build this as a
module.



/*
*
* A simple module that starts a hrtimer ~20k/sec
*
* This software is available to you under the terms of the GNU
* General Public License (GPL) Version 2, available from the file
* COPYING in the main directory of this source tree.
*
*/

#include <asm/semaphore.h>
#include <linux/device.h>
#include <linux/err.h>
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/pci.h>
#include <linux/time.h>
#include <linux/workqueue.h>
#include <asm/uaccess.h>
#include <linux/delay.h>
#include <linux/proc_fs.h>
#include <linux/hrtimer.h>

MODULE_AUTHOR("Jeff Carr");
MODULE_DESCRIPTION("simple hrtimer test");
MODULE_LICENSE("GPL");

static struct workqueue_struct *testwq;
static struct work_struct testwork;

static struct hrtimer restart_hrtimer;
static struct hrtimer trigger_hrtimer;

static int done = 0;
static int hrtimer_count = 0;

#define HT_TEST_PERIOD 20000; // In nanoseconds

static void do_test_work(void *data)
{
restart_hrtimer.expires.tv64 = (u64) HT_TEST_PERIOD;
hrtimer_start(&restart_hrtimer, restart_hrtimer.expires, HRTIMER_REL);

return;
}

static int do_restart_hrtimer(void *data)
{
if (!done) {
trigger_hrtimer.expires.tv64 = (u64) HT_TEST_PERIOD;
hrtimer_start(&trigger_hrtimer, trigger_hrtimer.expires,
HRTIMER_REL);
}
return 0;
}

static int do_trigger_hrtimer(void *data)
{
static struct timeval now;

++hrtimer_count;
do_gettimeofday(&now);
if (printk_ratelimit())
printk(KERN_DEBUG
"do_trigger_hrtimer() ran %d times (%li.%li)\n",
hrtimer_count, now.tv_sec, now.tv_usec);

if (!done) {
restart_hrtimer.expires.tv64 = (u64) HT_TEST_PERIOD;
hrtimer_start(&restart_hrtimer, restart_hrtimer.expires,
HRTIMER_REL);
}
return 0;
}

static int __init hrtimer_test_init(void)
{
testwq = create_singlethread_workqueue("simple_hrtimer_test");
if (!testwq)
return -1;

INIT_WORK(&testwork, do_test_work, NULL);

hrtimer_init(&restart_hrtimer, (const clockid_t)CLOCK_REALTIME);
restart_hrtimer.data = (unsigned long)NULL;
restart_hrtimer.function = do_restart_hrtimer;

hrtimer_init(&trigger_hrtimer, (const clockid_t)CLOCK_REALTIME);
trigger_hrtimer.data = (unsigned long)NULL;
trigger_hrtimer.function = do_trigger_hrtimer;

queue_work(testwq, &testwork);

return 0;
}

static void __exit hrtimer_test_cleanup(void)
{
done = 1;

flush_workqueue(testwq);
destroy_workqueue(testwq);

// not sure how to destroy this correctly
hrtimer_cancel(&restart_hrtimer);
remove_hrtimer(&restart_hrtimer);

// not sure how to destroy this correctly
hrtimer_cancel(&trigger_hrtimer);
remove_hrtimer(&trigger_hrtimer);
}

module_init(hrtimer_test_init);
module_exit(hrtimer_test_cleanup);


Attachments:
export_symbols.patch (1.91 kB)

2005-12-20 13:28:46

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -RT] Add softirq waitqueue for CONFIG_PREEMPT_SOFTIRQ (was: Re: [ANNOUNCE] 2.6.15-rc5-hrt2 ...)


* Steven Rostedt <[email protected]> wrote:

> This patch fixes the problem I have. I've added a waitqueue per all
> softirq threads IFF a CONFIG_PREEMPT_SOFTIRQ is set.

thanks, applied.

Ingo

2006-01-10 23:59:40

by Steven Rostedt

[permalink] [raw]
Subject: Re: [ANNOUNCE] 2.6.15-rc5-hrt2 - hrtimers based high resolution patches

On Wed, 2005-12-14 at 09:43 +0100, Ingo Molnar wrote:
> * Ingo Molnar <[email protected]> wrote:
>
> > hm, in that case it should be base->running that protects against this
> > case. Did you run an UP PREEMPT_RT kernel? In that case could you
> > check whether the fix below resolves the crash too?
>
> try the patch below instead. (Thomas pointed out that the correct
> dependency in this case is PREEMPT_SOFTIRQS)
>
> Ingo
>
> Index: linux/kernel/hrtimer.c
> ===================================================================
> --- linux.orig/kernel/hrtimer.c
> +++ linux/kernel/hrtimer.c
> @@ -118,7 +118,7 @@ static DEFINE_PER_CPU(struct hrtimer_bas
> * Functions and macros which are different for UP/SMP systems are kept in a
> * single place
> */
> -#ifdef CONFIG_SMP
> +#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_SOFTIRQS)
>
> #define set_curr_timer(b, t) do { (b)->curr_timer = (t); } while (0)
>

OK Thomas and Ingo, I hit the bug again! And this time I found the other
race. Again let me explain where in the code I have a problem, and I
feel really uncomfortable with the "implied logic" to protect the race.

In __remove_hrtimer we have:

#ifdef CONFIG_HIGH_RES_TIMERS
if (timer->state == HRTIMER_PENDING_CALLBACK) {
list_del(&timer->list);
return;
}
#endif

And in run_hrtimer_hres_queue (and something similar with
run_hrtimer_queue) we have:

set_curr_timer(base, timer);
list_del(&timer->list);
spin_unlock_irq(&base->lock);

__remove_hrtimer is protected with the base-lock, so in this case we
have the logic of set_curr_timer to protect against
hrtimer_try_to_cancel which has:

if (base->curr_timer != timer)
ret = remove_hrtimer(timer, base);

And this _is_ protected, but I just discovered that this does _not_
protect against hrtimer_start!

In hrtimer_start we have:

base = lock_hrtimer_base(timer, &flags);

/* Remove an active timer from the queue: */
remove_hrtimer(timer, base);

Which can be called after that spin_unlock_irq is done by the
run_hrtimer_queue, and we will hit a bug (as I did). This is not an
easy race to hit. Here's an example of a race for this problem:

In posix-timers.c: commen_timer_set:

If we get preempted between hrtimer_try_to_cancel and
hrtimer_restart.

Then a new thread adds the timer back (by a threaded program).

Now the timer goes off for the softirq, and the softirq runs
and just after it removes the timer and unlocks the base spinlock, it
gets preempted by some high prio task that blocks on the original
thread and boosts it's priority higher than the softirq.

Now the first thread runs and then calls hrtimer_start, and the timer
is now in the state PENDING_CALLBACK but is _not_ on the list. So
this will error when it tries to delete the timer again.

This is a really hard race to hit. But it is possible, and I hit it with
my kernel because I have some crazy dynamic scheduling going on.

Now the question is what's the right solution? Can hrtimer_start
schedule? Probably not, maybe we should add something to check this and
have hrtimer_start return -1 if it is running, and let who ever called
it figure out what to do? Maybe have a hrtimer_cancel_start atomic
operation? As well as a hrtimer_try_to_cancel_start?

-- Steve

2006-01-12 02:15:07

by Steven Rostedt

[permalink] [raw]
Subject: Re: [ANNOUNCE] 2.6.15-rc5-hrt2 - hrtimers based high resolution patches

Finally! I did it. I have an updated timer_stress test at
http://www.kihontech.com/tests/rt/timer_stress.c
that triggers the deadlock that I have been mentioning (and hit once in
my kernel). But this time I hit it in 2.6.15-rt4-sr1 and got the
following output:

On my SMP machine I got this (this is even before I changed the test to
do intervals:

============================================
BUG: Unable to handle kernel paging request at virtual address 00100104
printing eip:
c0138ead
*pde = 00000000
Oops: 0002 [#1]
PREEMPT SMP
Modules linked in: binfmt_misc nfsd exportfs lockd sunrpc lp capability commoncap sd_mod usb_storage scsi_mod usbhid usblp psmouse serio_raw snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_seq_dummy snd_seq_oss snd_seq_midi snd_seq_midi_event snd_seq 3c59x mii floppy snd_emu10k1 snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm ohci_hcd snd_timer snd_ac97_bus snd_page_alloc snd_util_mem snd_hwdep parport_pc parport snd soundcore rtc
ide_cd cdrom pcspkr shpchp pci_hotplug hw_random joydev evdev mousedev unix
CPU: 1
EIP: 0060:[<c0138ead>] Not tainted VLI
EFLAGS: 00010046 (2.6.15-rt4-sr1)
EIP is at __remove_hrtimer+0x3d/0x70
eax: 00200200 ebx: f738a424 ecx: f738a44c edx: 00100100
esi: c20201b4 edi: f738a424 ebp: f4d7deb0 esp: f4d7dea0
ds: 007b es: 007b ss: 0068 preempt: 00000002
Process timer_stress (pid: 6035, threadinfo=f4d7c000 task=f6fa2070 stack_left=7788 worst_left=-1)
Stack: c0138775 00000000 00000000 c20201b4 f4d7ded8 c0138f0f f738a424 c20201b4
00000286 00000000 00000286 00000000 3b9aca00 f4d7df98 f4d7def4 c013902e
f738a424 0000c350 00000000 00000001 00000001 f4d7df74 c0123fd3 f738a424
Call Trace:
[<c01040dc>] show_stack+0x9c/0xe0 (28)
[<c01042c4>] show_registers+0x184/0x240 (60)
[<c0104522>] die+0x102/0x190 (56)
[<c0115e7b>] do_page_fault+0x20b/0x5a0 (76)
[<c0103d5f>] error_code+0x4f/0x54 (76)
[<c0138f0f>] __hrtimer_start+0x2f/0x120 (40)
[<c013902e>] hrtimer_start+0x2e/0x30 (28)
[<c0123fd3>] do_setitimer+0x573/0x5c0 (128)
[<c0124062>] sys_setitimer+0x42/0xb0 (64)
[<c01031e5>] syscall_call+0x7/0xb (-8116)
Code: 83 7b 18 03 74 1c 39 5e 10 74 3b 8d 46 0c 89 1c 24 89 44 24 04 e8 f4 fc 0b
00 8b 5d f8 8b 75 fc c9 c3 8d 4b 28 8b 53 28 8b 41 04 <89> 42 04 89 10 c7 41 04 00 02 20 00 c7 43 28 00 01 10 00 8b 5d
NMI watchdog detected lockup on CPU#1 (50000/50000)

Pid: 6035, comm: timer_stress
EIP: 0060:[<c02fbce4>] CPU: 1
EIP is at _raw_spin_lock+0x14/0x20
EFLAGS: 00000082 Not tainted (2.6.15-rt4-sr1)
EAX: c20201b8 EBX: 00000000 ECX: a7fb63da EDX: f4d7c000
ESI: c20201d4 EDI: c20201b4 EBP: f4d7dd34 DS: 007b ES: 007b
CR0: 8005003b CR2: 00100104 CR3: 34d33000 CR4: 000006d0
[<c01012e2>] show_regs+0x162/0x190 (44)
=======================================

Notice the bad address: 00100104, that's a list poison. From hrtimers.


On my UP machine running with the following:

# ./timer_stress -t 30 -s -1 -P

I got this:

======================================
BUG: Unable to handle kernel paging request at virtual address 00100104
printing eip:
c0128519
*pde = 00000000
Oops: 0002 [#1]
PREEMPT
Modules linked in: ipv6 tsdev mousedev psmouse parport_pc parport floppy pcspkrxCPU: 0
EIP: 0060:[<c0128519>] Not tainted VLI
EFLAGS: 00010046 (2.6.15-rt4-sr1)
EIP is at __remove_hrtimer+0x19/0x50
eax: 00200200 ebx: c99eb474 ecx: c99eb49c edx: 00100100
esi: c02edaec edi: 1cd4d17f ebp: 0000005b esp: c99a7e2c
ds: 007b es: 007b ss: 0068 preempt: 00000002
Process timer_stress (pid: 2627, threadinfo=c99a6000 task=c13276f0 stack_left=7)Stack: c02edaec c99a7e48 c0128582 c99eb474 c02edaec 00016406 ffff007b 00000282
00000000 c99eb474 c99eb428 c1327b64 c0128651 c99eb474 1cd4d17f 0000005b
00000000 00000001 c0124ba3 c99eb474 1cd4d17f 0000005b 00000000 c99eb474
Call Trace:
[<c0128582>] __hrtimer_start+0x32/0xea (12)
[<c0128651>] hrtimer_start+0x17/0x1b (40)
[<c0124ba3>] schedule_next_timer+0x41/0x48 (24)
[<c0124be0>] do_schedule_next_timer+0x36/0x4b (44)
[<c011ddb9>] dequeue_signal+0x86/0xb0 (20)
[<c011f4db>] get_signal_to_deliver+0xc3/0x2e9 (28)
[<c0102b02>] do_signal+0x5f/0xe5 (40)
[<c0128bc6>] hrtimer_nanosleep+0xa9/0xf1 (56)
[<c0128c4b>] sys_nanosleep+0x3d/0x4f (104)
[<c0102baf>] do_notify_resume+0x27/0x38 (24)
[<c0102d90>] work_notifysig+0x13/0x1b (8)
Code: d2 c7 47 18 02 00 00 00 5f 5d 5e 89 d0 5b 5e 5f 5d c3 56 53 8b 5c 24 0c 8
<6>note: timer_stress[2627] exited with preempt_count 1
BUG: scheduling while atomic: timer_stress/0x00000001/2627
caller is do_exit+0x39d/0x3db
[<c02a0105>] __schedule+0x67/0x57d (8)
[<c0117708>] do_exit+0x39d/0x3db (8)
[<c01172eb>] exit_notify+0x6a6/0x726 (28)
[<c0117708>] do_exit+0x39d/0x3db (48)
[<c01035ec>] do_trap+0x0/0xce (24)
[<c01103c9>] do_page_fault+0x368/0x477 (32)
[<c0110061>] do_page_fault+0x0/0x477 (60)
[<c0102f97>] error_code+0x4f/0x54 (8)
[<c0128519>] __remove_hrtimer+0x19/0x50 (44)
[<c0128582>] __hrtimer_start+0x32/0xea (20)
[<c0128651>] hrtimer_start+0x17/0x1b (40)
[<c0124ba3>] schedule_next_timer+0x41/0x48 (24)
[<c0124be0>] do_schedule_next_timer+0x36/0x4b (44)
[<c011ddb9>] dequeue_signal+0x86/0xb0 (20)
[<c011f4db>] get_signal_to_deliver+0xc3/0x2e9 (28)
[<c0102b02>] do_signal+0x5f/0xe5 (40)
[<c0128bc6>] hrtimer_nanosleep+0xa9/0xf1 (56)
[<c0128c4b>] sys_nanosleep+0x3d/0x4f (104)
[<c0102baf>] do_notify_resume+0x27/0x38 (24)
[<c0102d90>] work_notifysig+0x13/0x1b (8)
==============================

So that protection is not working here.

-- Steve


2006-01-12 02:52:13

by Steven Rostedt

[permalink] [raw]
Subject: [PATCH RT] fix or hrtimers (was: [ANNOUNCE] 2.6.15-rc5-hrt2 - hrtimers based high resolution patches)

On Wed, 2006-01-11 at 21:14 -0500, Steven Rostedt wrote:
> Finally! I did it. I have an updated timer_stress test at
> http://www.kihontech.com/tests/rt/timer_stress.c
> that triggers the deadlock that I have been mentioning (and hit once in
> my kernel). But this time I hit it in 2.6.15-rt4-sr1 and got the
> following output:

OK, it's not like me to just show a problem, without at least having
some type of fix for it. Since my last fix, was turned down, and
looking into it further, I now understand why.

The patch below now makes hrtimer_start cancel the timer and lock the
base in one action. It also checks to see if the timer is running, and
if it is, it doesn't do anything. It basically, tests to see if it
should cancel the timer.

This is now included in my rt maintenance patches at:

http://home.stny.rr.com/rostedt/patches/patch-2.6.15-rt4-sr2

But I'll include this patch here too so that you can look at what I've
done.

My test at http://www.kihontech.com/tests/rt/timer_stress.c hasn't
killed this kernel yet. But I'll run it all night on both a UP machine
with the -P (posix timers) and without -P on a SMP machine (setitimer).

-- Steve

Index: linux-2.6.15-rt4/kernel/hrtimer.c
===================================================================
--- linux-2.6.15-rt4.orig/kernel/hrtimer.c 2006-01-11 14:46:30.000000000 -0500
+++ linux-2.6.15-rt4/kernel/hrtimer.c 2006-01-11 21:36:31.000000000 -0500
@@ -610,6 +610,35 @@
}

/**
+ * hrtimer_cancel_and_lock - deactivate a timer and lock its base.
+ *
+ * @timer: hrtimer to stop
+ * @flags: pointer to the flags argument
+ *
+ * Returns:
+ * base of the timer.
+ */
+static struct hrtimer_base *
+hrtimer_cancel_and_lock(struct hrtimer *timer, unsigned long *flags)
+{
+ struct hrtimer_base *base;
+
+retry:
+ base = lock_hrtimer_base(timer, flags);
+
+ if (base->curr_timer == timer) {
+ unlock_hrtimer_base(timer, flags);
+ hrtimer_wait_for_timer();
+ goto retry;
+ }
+
+ remove_hrtimer(timer, base);
+
+ return base;
+
+}
+
+/**
* hrtimer_start - (re)start an relative timer on the current CPU
*
* @timer: the timer to be added
@@ -628,10 +657,7 @@
unsigned long flags;
int ret;

- base = lock_hrtimer_base(timer, &flags);
-
- /* Remove an active timer from the queue: */
- remove_hrtimer(timer, base);
+ base = hrtimer_cancel_and_lock(timer, &flags);

/* Switch the timer base, if necessary: */
new_base = switch_hrtimer_base(timer, base);


2006-01-12 11:10:51

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH RT] fix or hrtimers (was: [ANNOUNCE] 2.6.15-rc5-hrt2 - hrtimers based high resolution patches)

On Wed, 2006-01-11 at 21:51 -0500, Steven Rostedt wrote:
> On Wed, 2006-01-11 at 21:14 -0500, Steven Rostedt wrote:
> > Finally! I did it. I have an updated timer_stress test at
> > http://www.kihontech.com/tests/rt/timer_stress.c
> > that triggers the deadlock that I have been mentioning (and hit once in
> > my kernel). But this time I hit it in 2.6.15-rt4-sr1 and got the
> > following output:
>
> OK, it's not like me to just show a problem, without at least having
> some type of fix for it. Since my last fix, was turned down, and
> looking into it further, I now understand why.
>
> The patch below now makes hrtimer_start cancel the timer and lock the
> base in one action. It also checks to see if the timer is running, and
> if it is, it doesn't do anything. It basically, tests to see if it
> should cancel the timer.
>
> This is now included in my rt maintenance patches at:
>
> http://home.stny.rr.com/rostedt/patches/patch-2.6.15-rt4-sr2
>
> But I'll include this patch here too so that you can look at what I've
> done.
>
> My test at http://www.kihontech.com/tests/rt/timer_stress.c hasn't
> killed this kernel yet. But I'll run it all night on both a UP machine
> with the -P (posix timers) and without -P on a SMP machine (setitimer).

FYI,

The tests ran all night without a bug! Now I'm switching the tests
(without -P on UP and with -P on SMP).

-- Steve


2006-01-17 17:34:42

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [ANNOUNCE] 2.6.15-rc5-hrt2 - hrtimers based high resolution patches

On Tue, 2006-01-10 at 18:59 -0500, Steven Rostedt wrote:
> And this _is_ protected, but I just discovered that this does _not_
> protect against hrtimer_start!
>
> In hrtimer_start we have:
>
> base = lock_hrtimer_base(timer, &flags);
>
> /* Remove an active timer from the queue: */
> remove_hrtimer(timer, base);
>
> Which can be called after that spin_unlock_irq is done by the
> run_hrtimer_queue, and we will hit a bug (as I did). This is not an
> easy race to hit.

Right, but there should be actually no use case where this happens.

For now I prefer to add a

BUG_ON(base->curr_timer == timer);

into hrtimer_start to find the real reason.

> Here's an example of a race for this problem:
> In posix-timers.c: commen_timer_set:
>
> If we get preempted between hrtimer_try_to_cancel and
> hrtimer_restart.
>
> Then a new thread adds the timer back (by a threaded program).

I don't see how this should happen.

sys_timer_settime()
timr = lock_timer(timer_id, flag);

Now the k_itimer, which is the container of the hrtimer is locked.

common_timer_set()
if (hrtimer_try_to_cancel() < 0)
return TIMER_RETRY;
....
hrtimer_start();
return;

back in sys_timer_settime()

unlock_timer(timr);

So how gets this timer added back, when the container lock is held
across the complete operation ?

> Now the question is what's the right solution? Can hrtimer_start
> schedule? Probably not, maybe we should add something to check this and
> have hrtimer_start return -1 if it is running, and let who ever called
> it figure out what to do? Maybe have a hrtimer_cancel_start atomic
> operation? As well as a hrtimer_try_to_cancel_start?

The posix timer code does already the Right Thing, as it uses
hrtimer_try_to_cancel() and reacts on the return value. This is
necessary to prevent a dead lock with the k_itimer lock.

I try to reproduce this with your test program on my test boxes.

tglx