2006-03-16 09:58:47

by Ingo Molnar

[permalink] [raw]
Subject: 2.6.16-rc6-rt7

i have released the 2.6.16-rc6-rt7 tree, which can be downloaded from
the usual place:

http://redhat.com/~mingo/realtime-preempt/

the main change in this release is the merge up to John Stultz's GTOD
-B20 patchset, and Thomas Gleixner's latest -hrt (high resolution
timers) queue. This, amongst many other fixes, resolves a system-time
(and uptime) anomaly observable under high load.

Changes since -rt4:

- merge to John Stultz's GTOD -B20 (Thomas Gleixner)

- merge to latest -hrt (Thomas Gleixner)

- zap_pte_range() latency breaker (Hugh Dickins)

- small latency tracer cleanups

to build a 2.6.16-rc6-rt7 tree, the following patches should be applied:

http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.15.tar.bz2
http://kernel.org/pub/linux/kernel/v2.6/testing/patch-2.6.16-rc6.bz2
http://redhat.com/~mingo/realtime-preempt/patch-2.6.16-rc6-rt7

Ingo


2006-03-16 17:39:52

by David Brown

[permalink] [raw]
Subject: Re: 2.6.16-rc6-rt7

> i have released the 2.6.16-rc6-rt7 tree, which can be downloaded from
> the usual place:
>
> http://redhat.com/~mingo/realtime-preempt/
>
> the main change in this release is the merge up to John Stultz's GTOD
> -B20 patchset, and Thomas Gleixner's latest -hrt (high resolution
> timers) queue. This, amongst many other fixes, resolves a system-time
> (and uptime) anomaly observable under high load.
>
> Changes since -rt4:
>
> - merge to John Stultz's GTOD -B20 (Thomas Gleixner)
>
> - merge to latest -hrt (Thomas Gleixner)
>
> - zap_pte_range() latency breaker (Hugh Dickins)
>
> - small latency tracer cleanups
>
> to build a 2.6.16-rc6-rt7 tree, the following patches should be applied:
>
> http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.15.tar.bz2
> http://kernel.org/pub/linux/kernel/v2.6/testing/patch-2.6.16-rc6.bz2
> http://redhat.com/~mingo/realtime-preempt/patch-2.6.16-rc6-rt7
>
> Ingo
> -

I've been having issues with the realtime patch set and using scp
(specifically scp, wget, curl, git, cvs everything else works fine). I
was wondering what extra debugging features are helpful to have built
into the kernel that could help me nail down why this bug is
happening.

Specifically what's happening is scp is freezing my system, there
haven't been any kernel warnings or panics upon execution of scp, it
just freezes, every other application that uses network seems to work
just fine, so far it's just been scp.

- David Brown

2006-03-16 17:50:20

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.16-rc6-rt7

On Thu, 2006-03-16 at 09:39 -0800, David Brown wrote:
> I've been having issues with the realtime patch set and using scp
> (specifically scp, wget, curl, git, cvs everything else works fine). I
> was wondering what extra debugging features are helpful to have built
> into the kernel that could help me nail down why this bug is
> happening.
>
> Specifically what's happening is scp is freezing my system, there
> haven't been any kernel warnings or panics upon execution of scp, it
> just freezes, every other application that uses network seems to work
> just fine, so far it's just been scp.

Just found a problem in the highres timer merge. Can you try the patch
below?

tglx

Index: linux-2.6.16-rc6/include/linux/hrtimer.h
===================================================================
--- linux-2.6.16-rc6.orig/include/linux/hrtimer.h
+++ linux-2.6.16-rc6/include/linux/hrtimer.h
@@ -114,6 +114,8 @@ extern void hrtimer_clock_notify(void);
extern void clock_was_set(void);
extern int hrtimer_interrupt(void);

+#define hrtimer_cb_get_time(t) (t)->base->get_time()
+
/*
* The resolution of the clocks. The resolution value is returned in
* the clock_getres() system call to give application programmers an
@@ -136,6 +138,8 @@ extern int hrtimer_interrupt(void);
#define clock_was_set() do { } while (0)
#define hrtimer_clock_notify() do { } while (0)

+#define hrtimer_cb_get_time(t) (t)->base->softirq_time
+
#endif

# if (BITS_PER_LONG == 64) || defined(CONFIG_KTIME_SCALAR)
Index: linux-2.6.16-rc6/kernel/hrtimer.c
===================================================================
--- linux-2.6.16-rc6.orig/kernel/hrtimer.c
+++ linux-2.6.16-rc6/kernel/hrtimer.c
@@ -970,8 +970,6 @@ static inline void run_hrtimer_hres_queu
{
spin_lock_irq(&base->lock);

- base->softirq_time = base->get_softirq_time();
-
while (!list_empty(&base->cb_pending)) {
struct hrtimer *timer;
int (*fn)(struct hrtimer *);
Index: linux-2.6.16-rc6/kernel/itimer.c
===================================================================
--- linux-2.6.16-rc6.orig/kernel/itimer.c
+++ linux-2.6.16-rc6/kernel/itimer.c
@@ -136,7 +136,7 @@ int it_real_fn(struct hrtimer *timer)
send_group_sig_info(SIGALRM, SEND_SIG_PRIV, sig->tsk);

if (sig->it_real_incr.tv64 != 0) {
- hrtimer_forward(timer, timer->base->softirq_time,
+ hrtimer_forward(timer, hrtimer_cb_get_time(timer),
sig->it_real_incr);
return HRTIMER_RESTART;
}
Index: linux-2.6.16-rc6/kernel/posix-timers.c
===================================================================
--- linux-2.6.16-rc6.orig/kernel/posix-timers.c
+++ linux-2.6.16-rc6/kernel/posix-timers.c
@@ -355,9 +355,10 @@ static int posix_timer_fn(struct hrtimer
if (timr->it.real.interval.tv64 != 0) {
timr->it_overrun +=
hrtimer_forward(timer,
- timer->base->softirq_time,
+ hrtimer_cb_get_time(timer),
timr->it.real.interval);
ret = HRTIMER_RESTART;
+ ++timr->it_requeue_pending;
}
}



2006-03-16 21:42:34

by Michal Piotrowski

[permalink] [raw]
Subject: Re: 2.6.16-rc6-rt7

Hi,

On 16/03/06, Ingo Molnar <[email protected]> wrote:
> i have released the 2.6.16-rc6-rt7 tree, which can be downloaded from
> the usual place:
>

My system hangs on that:
http://www.stardust.webpages.pl/files/rt/2.6.16-rc6-rt7/oops-v1.jpg
http://www.stardust.webpages.pl/files/rt/2.6.16-rc6-rt7/oops-v2.jpg
Both looks the same.

Here is config http://www.stardust.webpages.pl/files/rt/2.6.16-rc6-rt7/rt-config

Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group
(http://www.stardust.webpages.pl/ltg/wiki/)

2006-03-17 08:23:10

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.16-rc6-rt7


* Thomas Gleixner <[email protected]> wrote:

> On Thu, 2006-03-16 at 09:39 -0800, David Brown wrote:
> > I've been having issues with the realtime patch set and using scp
> > (specifically scp, wget, curl, git, cvs everything else works fine). I
> > was wondering what extra debugging features are helpful to have built
> > into the kernel that could help me nail down why this bug is
> > happening.
> >
> > Specifically what's happening is scp is freezing my system, there
> > haven't been any kernel warnings or panics upon execution of scp, it
> > just freezes, every other application that uses network seems to work
> > just fine, so far it's just been scp.
>
> Just found a problem in the highres timer merge. Can you try the patch
> below?

i have released -rt8 with this fix included.

Ingo

2006-03-17 23:36:38

by Tom Rini

[permalink] [raw]
Subject: Re: 2.6.16-rc6-rt7

On Thu, Mar 16, 2006 at 10:56:08AM +0100, Ingo Molnar wrote:

> i have released the 2.6.16-rc6-rt7 tree, which can be downloaded from
> the usual place:
>
> http://redhat.com/~mingo/realtime-preempt/

I was wondering, is it normal for the nanosleep02 and alarm02 LTP tests
to fail? For sometime I've seen these tests fail from time to time with
the -RT patch but not the regular kernel.

--
Tom Rini
http://gate.crashing.org/~trini/

2006-03-18 08:59:50

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.16-rc6-rt7


* Tom Rini <[email protected]> wrote:

> On Thu, Mar 16, 2006 at 10:56:08AM +0100, Ingo Molnar wrote:
>
> > i have released the 2.6.16-rc6-rt7 tree, which can be downloaded from
> > the usual place:
> >
> > http://redhat.com/~mingo/realtime-preempt/
>
> I was wondering, is it normal for the nanosleep02 and alarm02 LTP
> tests to fail? For sometime I've seen these tests fail from time to
> time with the -RT patch but not the regular kernel.

no, it's not normal. How repeatable is it?

Ingo

2006-03-18 10:37:12

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.16-rc6-rt7

On Fri, 2006-03-17 at 16:36 -0700, Tom Rini wrote:
> On Thu, Mar 16, 2006 at 10:56:08AM +0100, Ingo Molnar wrote:
>
> > i have released the 2.6.16-rc6-rt7 tree, which can be downloaded from
> > the usual place:
> >
> > http://redhat.com/~mingo/realtime-preempt/
>
> I was wondering, is it normal for the nanosleep02 and alarm02 LTP tests
> to fail? For sometime I've seen these tests fail from time to time with
> the -RT patch but not the regular kernel.

The nanosleep02 failure is incorrect due to rounding errors in the test
code.

Requested time to sleep is 5.000009999 seconds (5s 9999ns)

Program flow is:

unsigned long req, rem, before, after, elapsed;

gettimeofday(&otime);
nanosleep(&timereq, &timerem); <- Interrupted by a signal
gettimeofday(&ntime);

req = timereq.tv_sec * 1000 + timereq.tv_nsec / 1000000;
rem = timerem.tv_sec * 1000 + timerem.tv_nsec / 1000000;
before = otime.tv_sec * 1000 + otime.tv_usec/1000;
after = ntime.tv_sec * 1000 + ntime.tv_usec/1000;
elapsed = after - before;

if (rem - (req -elapsed) > 250)
fail;

The error message is:
nanosleep02 1 FAIL : Remaining sleep time 3999 msec doesn't match with the expected 4000 msec time

rem: 3999 ms
req - elasped: 4000 ms

The unsigned long subtraction results in a value > 250, where the real
result is < 0.

Looking at the real values with usec resolution gives:

req: 5000009 usec
rem: 3999740 usec
elapsed: 1000452 usec
req - elapsed: 3999557 usec

rem - (req -elapsed) = 183 usec

Truncating the real values by the division used in the test code results
in:

req_ms = 5000010 / 1000 = 5000 ms
rem_ms = 3999470 / 1000 = 3999 ms
elapsed_ms = 1000452 / 1000 = 1000ms
req_ms - elapsed_ms = 4000ms

This never happens on vanilla, as the nanosleep is rounded to the next
jiffie. -rt has high resolution timers which are delivered accurate, so
the rounding errors of the testcode surface.

tglx


2006-03-18 12:55:00

by Tom Rini

[permalink] [raw]
Subject: Re: 2.6.16-rc6-rt7

On Sat, Mar 18, 2006 at 09:57:25AM +0100, Ingo Molnar wrote:
>
> * Tom Rini <[email protected]> wrote:
>
> > On Thu, Mar 16, 2006 at 10:56:08AM +0100, Ingo Molnar wrote:
> >
> > > i have released the 2.6.16-rc6-rt7 tree, which can be downloaded from
> > > the usual place:
> > >
> > > http://redhat.com/~mingo/realtime-preempt/
> >
> > I was wondering, is it normal for the nanosleep02 and alarm02 LTP
> > tests to fail? For sometime I've seen these tests fail from time to
> > time with the -RT patch but not the regular kernel.
>
> no, it's not normal. How repeatable is it?

With the feb. release of LTP I was 2/2 (./runltp -p -q -l /tmp/run.log
-o /tmp/run.out). I've seen it in previous ones as well, and I think I
saw it in the one run I did of the March release as well.

--
Tom Rini
http://gate.crashing.org/~trini/

2006-03-18 12:54:27

by Tom Rini

[permalink] [raw]
Subject: Re: 2.6.16-rc6-rt7

On Sat, Mar 18, 2006 at 11:37:19AM +0100, Thomas Gleixner wrote:
> On Fri, 2006-03-17 at 16:36 -0700, Tom Rini wrote:
> > On Thu, Mar 16, 2006 at 10:56:08AM +0100, Ingo Molnar wrote:
> >
> > > i have released the 2.6.16-rc6-rt7 tree, which can be downloaded from
> > > the usual place:
> > >
> > > http://redhat.com/~mingo/realtime-preempt/
> >
> > I was wondering, is it normal for the nanosleep02 and alarm02 LTP tests
> > to fail? For sometime I've seen these tests fail from time to time with
> > the -RT patch but not the regular kernel.
>
> The nanosleep02 failure is incorrect due to rounding errors in the test
> code.
[snip]
> This never happens on vanilla, as the nanosleep is rounded to the next
> jiffie. -rt has high resolution timers which are delivered accurate, so
> the rounding errors of the testcode surface.

Thanks! Any ideas about the alarm02 test?

--
Tom Rini
http://gate.crashing.org/~trini/

2006-03-18 12:57:52

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.16-rc6-rt7

On Sat, 2006-03-18 at 05:54 -0700, Tom Rini wrote:
> On Sat, Mar 18, 2006 at 11:37:19AM +0100, Thomas Gleixner wrote:
> > On Fri, 2006-03-17 at 16:36 -0700, Tom Rini wrote:
> > > On Thu, Mar 16, 2006 at 10:56:08AM +0100, Ingo Molnar wrote:
> > >
> > > > i have released the 2.6.16-rc6-rt7 tree, which can be downloaded from
> > > > the usual place:
> > > >
> > > > http://redhat.com/~mingo/realtime-preempt/
> > >
> > > I was wondering, is it normal for the nanosleep02 and alarm02 LTP tests
> > > to fail? For sometime I've seen these tests fail from time to time with
> > > the -RT patch but not the regular kernel.
> >
> > The nanosleep02 failure is incorrect due to rounding errors in the test
> > code.
> [snip]
> > This never happens on vanilla, as the nanosleep is rounded to the next
> > jiffie. -rt has high resolution timers which are delivered accurate, so
> > the rounding errors of the testcode surface.
>
> Thanks! Any ideas about the alarm02 test?

Yes. Its due (unsigned int) -> long conversion and a missing check.

That one affects mainline as well. I'm fixing this one right now. Patch
follows.

tglx