2007-05-01 13:32:04

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [BUG] 2.6.21: Kernel won't boot with either/both of CONFIG_NO_HZ, CONFIG_HIGH_RES_TIMERS

Mark,

On Mon, 2007-04-30 at 20:02 -0400, Mark Lord wrote:
> When it does boot with the modified .config file,
> the log looks like the one attached to this email.

Can you please send me the output of /proc/timer_list ?

tglx



2007-05-01 13:33:21

by Mark Lord

[permalink] [raw]
Subject: Re: [BUG] 2.6.21: Kernel won't boot with either/both of CONFIG_NO_HZ, CONFIG_HIGH_RES_TIMERS

Thomas Gleixner wrote:
> Mark,
>
> On Mon, 2007-04-30 at 20:02 -0400, Mark Lord wrote:
>> When it does boot with the modified .config file,
>> the log looks like the one attached to this email.
>
> Can you please send me the output of /proc/timer_list ?

Timer List Version: v0.3
HRTIMER_MAX_CLOCK_BASES: 2
now at 3232363876949 nsecs

cpu: 0
clock 0:
.index: 0
.resolution: 1 nsecs
.get_time: ktime_get_real
.offset: 1178023146733550511 nsecs
active timers:
clock 1:
.index: 1
.resolution: 1 nsecs
.get_time: ktime_get
.offset: 0 nsecs
active timers:
#0: <f59f9f08>, tick_sched_timer, S:01
# expires at 3232378000000 nsecs [in 14123051 nsecs]
#1: <f59f9f08>, hrtimer_wakeup, S:01
# expires at 3232578819129 nsecs [in 214942180 nsecs]
#2: <f59f9f08>, hrtimer_wakeup, S:01
# expires at 3232837692081 nsecs [in 473815132 nsecs]
#3: <f59f9f08>, hrtimer_wakeup, S:01
# expires at 3232838775084 nsecs [in 474898135 nsecs]
#4: <f59f9f08>, it_real_fn, S:01
# expires at 3233259312183 nsecs [in 895435234 nsecs]
#5: <f59f9f08>, it_real_fn, S:01
# expires at 3239124320054 nsecs [in 6760443105 nsecs]
#6: <f59f9f08>, it_real_fn, S:01
# expires at 3258112325550 nsecs [in 25748448601 nsecs]
.expires_next : 3232378000000 nsecs
.hres_active : 1
.nr_events : 599408
.nohz_mode : 2
.idle_tick : 3232362000000 nsecs
.tick_stopped : 1
.idle_jiffies : 2932361
.idle_calls : 1273043
.idle_sleeps : 992602
.idle_entrytime : 3232361030233 nsecs
.idle_sleeptime : 2944519354264 nsecs
.last_jiffies : 2932361
.next_jiffies : 2932378
.idle_expires : 3232378000000 nsecs
jiffies: 2932363

cpu: 1
clock 0:
.index: 0
.resolution: 1 nsecs
.get_time: ktime_get_real
.offset: 1178023146733550511 nsecs
active timers:
clock 1:
.index: 1
.resolution: 1 nsecs
.get_time: ktime_get
.offset: 0 nsecs
active timers:
#0: <f59f9f08>, tick_sched_timer, S:01
# expires at 3232364000000 nsecs [in 123051 nsecs]
#1: <f59f9f08>, hrtimer_wakeup, S:01
# expires at 3233992031032 nsecs [in 1628154083 nsecs]
.expires_next : 3232364000000 nsecs
.hres_active : 1
.nr_events : 291166
.nohz_mode : 2
.idle_tick : 3232355000000 nsecs
.tick_stopped : 0
.idle_jiffies : 2932354
.idle_calls : 619531
.idle_sleeps : 478589
.idle_entrytime : 3232354988977 nsecs
.idle_sleeptime : 3049656284766 nsecs
.last_jiffies : 2932354
.next_jiffies : 2932358
.idle_expires : 3232358000000 nsecs
jiffies: 2932363


Tick Device: mode: 1
Clock Event Device: hpet
max_delta_ns: 2147483647
min_delta_ns: 3352
mult: 61496110
shift: 32
mode: 3
next_event: 3232378000000 nsecs
set_next_event: hpet_next_event
set_mode: hpet_set_mode
event_handler: tick_handle_oneshot_broadcast
tick_broadcast_mask: 00000003
tick_broadcast_oneshot_mask: 00000001


Tick Device: mode: 1
Clock Event Device: lapic
max_delta_ns: 807210821
min_delta_ns: 1443
mult: 44633684
shift: 32
mode: 1
next_event: 3232378000000 nsecs
set_next_event: lapic_next_event
set_mode: lapic_timer_setup
event_handler: hrtimer_interrupt

Tick Device: mode: 1
Clock Event Device: lapic
max_delta_ns: 807210821
min_delta_ns: 1443
mult: 44633684
shift: 32
mode: 3
next_event: 3232365000000 nsecs
set_next_event: lapic_next_event
set_mode: lapic_timer_setup
event_handler: hrtimer_interrupt

2007-05-01 13:35:14

by Mark Lord

[permalink] [raw]
Subject: Re: [BUG] 2.6.21: Kernel won't boot with either/both of CONFIG_NO_HZ, CONFIG_HIGH_RES_TIMERS

Mark Lord wrote:
> Thomas Gleixner wrote:
>> Mark,
>>
>> On Mon, 2007-04-30 at 20:02 -0400, Mark Lord wrote:
>>> When it does boot with the modified .config file,
>>> the log looks like the one attached to this email.
>>
>> Can you please send me the output of /proc/timer_list ?
>
> Timer List Version: v0.3
> HRTIMER_MAX_CLOCK_BASES: 2
> now at 3232363876949 nsecs
...

Oh, in case it matters any:
that /proc/timer_list is from the system with CFS-V7 also patched in.

-ml

2007-05-01 13:45:54

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [BUG] 2.6.21: Kernel won't boot with either/both of CONFIG_NO_HZ, CONFIG_HIGH_RES_TIMERS

On Tue, 2007-05-01 at 09:34 -0400, Mark Lord wrote:
> > now at 3232363876949 nsecs
> ...
>
> Oh, in case it matters any:
> that /proc/timer_list is from the system with CFS-V7 also patched in.

It should not. What happens when you disable HPET ?

tglx


2007-05-01 14:13:53

by Mark Lord

[permalink] [raw]
Subject: Re: [BUG] 2.6.21: Kernel won't boot with either/both of CONFIG_NO_HZ, CONFIG_HIGH_RES_TIMERS

Thomas Gleixner wrote:
> On Tue, 2007-05-01 at 09:34 -0400, Mark Lord wrote:
>>> now at 3232363876949 nsecs
>> ...
>>
>> Oh, in case it matters any:
>> that /proc/timer_list is from the system with CFS-V7 also patched in.
>
> It should not. What happens when you disable HPET ?

Booting with clocksource=tsc still hangs in what appears
to be the exact same place.

Screenshots for that are available here: http://rtr.ca/hrtimers/

Of possible interest is that the bottom of the 25line screen capture
differs somewhat from the 50line capture.. see for yourself.
This is 100% consistent from boot to boot.

Using CONFIG_DETECT_SOFTLOCKUP=y eliminates the problem,
so that's really got to be a huge clue, somehow ?

Meanwhile, I'm going to try a libata patch I just saw posted by Tejun.

2007-05-01 14:26:17

by Michal Piotrowski

[permalink] [raw]
Subject: Re: [BUG] 2.6.21: Kernel won't boot with either/both of CONFIG_NO_HZ, CONFIG_HIGH_RES_TIMERS

On 01/05/07, Mark Lord <[email protected]> wrote:
> Thomas Gleixner wrote:
> > On Tue, 2007-05-01 at 09:34 -0400, Mark Lord wrote:
> >>> now at 3232363876949 nsecs
> >> ...
> >>
> >> Oh, in case it matters any:
> >> that /proc/timer_list is from the system with CFS-V7 also patched in.
> >
> > It should not. What happens when you disable HPET ?
>
> Booting with clocksource=tsc still hangs in what appears
> to be the exact same place.
>
> Screenshots for that are available here: http://rtr.ca/hrtimers/

404 Not Found

The requested URL /hrtimers/ was not found on this server.

Regards,
Michal

--
Michal K. K. Piotrowski
Kernel Monkeys
(http://kernel.wikidot.com/start)

2007-05-01 14:41:18

by Mark Lord

[permalink] [raw]
Subject: Re: [BUG] 2.6.21: Kernel won't boot with either/both of CONFIG_NO_HZ, CONFIG_HIGH_RES_TIMERS

Michal Piotrowski wrote:
> On 01/05/07, Mark Lord <[email protected]> wrote:
>> Thomas Gleixner wrote:
>> > On Tue, 2007-05-01 at 09:34 -0400, Mark Lord wrote:
>> >>> now at 3232363876949 nsecs
>> >> ...
>> >>
>> >> Oh, in case it matters any:
>> >> that /proc/timer_list is from the system with CFS-V7 also patched in.
>> >
>> > It should not. What happens when you disable HPET ?
>>
>> Booting with clocksource=tsc still hangs in what appears
>> to be the exact same place.
>>
>> Screenshots for that are available here: http://rtr.ca/hrtimers/
>
> 404 Not Found

Fixed. Sorry about that!

2007-05-01 15:22:08

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [BUG] 2.6.21: Kernel won't boot with either/both of CONFIG_NO_HZ, CONFIG_HIGH_RES_TIMERS

On Tue, 2007-05-01 at 10:13 -0400, Mark Lord wrote:
> Thomas Gleixner wrote:
> > On Tue, 2007-05-01 at 09:34 -0400, Mark Lord wrote:
> >>> now at 3232363876949 nsecs
> >> ...
> >>
> >> Oh, in case it matters any:
> >> that /proc/timer_list is from the system with CFS-V7 also patched in.
> >
> > It should not. What happens when you disable HPET ?
>
> Booting with clocksource=tsc still hangs in what appears
> to be the exact same place.

I meant disable HPET via: hpet=disable on the commandline

> Screenshots for that are available here: http://rtr.ca/hrtimers/
>
> Of possible interest is that the bottom of the 25line screen capture
> differs somewhat from the 50line capture.. see for yourself.
> This is 100% consistent from boot to boot.
>
> Using CONFIG_DETECT_SOFTLOCKUP=y eliminates the problem,
> so that's really got to be a huge clue, somehow ?

Unfortunately not. Other than it runs the watchdog task once a second it
does not affect the time/clockevent... mechanisms at all

tglx


2007-05-01 15:55:23

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [BUG] 2.6.21: Kernel won't boot with either/both of CONFIG_NO_HZ, CONFIG_HIGH_RES_TIMERS

On Tue, 2007-05-01 at 10:13 -0400, Mark Lord wrote:
> Of possible interest is that the bottom of the 25line screen capture
> differs somewhat from the 50line capture.. see for yourself.
> This is 100% consistent from boot to boot.
>
> Using CONFIG_DETECT_SOFTLOCKUP=y eliminates the problem,
> so that's really got to be a huge clue, somehow ?

I twisted my brain, why the watchdog thread might change the problem and
I think I have a rough idea of the scenario.

Can you apply the following patch, which prints out the CPU on which the
kernel messages are generated and upload the screenshot when the hang
happens ? Oh, please enable CONFIG_PRINTK_TIME or add "time" to the
kernel commandline.

tglx

Index: linux-2.6.21/kernel/printk.c
===================================================================
--- linux-2.6.21.orig/kernel/printk.c
+++ linux-2.6.21/kernel/printk.c
@@ -567,10 +567,11 @@ asmlinkage int vprintk(const char *fmt,
t = printk_clock();
nanosec_rem = do_div(t, 1000000000);
tlen = sprintf(tbuf,
- "<%c>[%5lu.%06lu] ",
+ "<%c>[%5lu.%06lu] %d ",
loglev_char,
(unsigned long)t,
- nanosec_rem/1000);
+ nanosec_rem/1000,
+ printk_cpu);

for (tp = tbuf; tp < tbuf + tlen; tp++)
emit_log_char(*tp);


2007-05-01 16:31:35

by Mark Lord

[permalink] [raw]
Subject: Re: [BUG] 2.6.21: Kernel won't boot with either/both of CONFIG_NO_HZ, CONFIG_HIGH_RES_TIMERS

Thomas Gleixner wrote:
>
> I meant disable HPET via: hpet=disable on the commandline
>
>> Screenshots for that are available here: http://rtr.ca/hrtimers/

hpet=disable fails, without the "switched to high resolution timer" message(s).
New snapshot of it now at the link above.

I also applied Tejun's early libata/SCSI race fix,
and that made no difference either.

Cheers

2007-05-01 16:44:12

by Daniel Walker

[permalink] [raw]
Subject: Re: [BUG] 2.6.21: Kernel won't boot with either/both of CONFIG_NO_HZ, CONFIG_HIGH_RES_TIMERS

On Tue, 2007-05-01 at 12:31 -0400, Mark Lord wrote:
> Thomas Gleixner wrote:
> >
> > I meant disable HPET via: hpet=disable on the commandline
> >
> >> Screenshots for that are available here: http://rtr.ca/hrtimers/
>
> hpet=disable fails, without the "switched to high resolution timer" message(s).
> New snapshot of it now at the link above.
>
> I also applied Tejun's early libata/SCSI race fix,
> and that made no difference either.
>
> Cheers

Have you tried !SMP or adding the boot arg "maxcpus=1" .. Hadn't seen
that in prior emails.

Daniel

2007-05-01 16:52:40

by Mark Lord

[permalink] [raw]
Subject: Re: [BUG] 2.6.21: Kernel won't boot with either/both of CONFIG_NO_HZ, CONFIG_HIGH_RES_TIMERS

Thomas Gleixner wrote:
> On Tue, 2007-05-01 at 10:13 -0400, Mark Lord wrote:
>> Of possible interest is that the bottom of the 25line screen capture
>> differs somewhat from the 50line capture.. see for yourself.
>> This is 100% consistent from boot to boot.
>>
>> Using CONFIG_DETECT_SOFTLOCKUP=y eliminates the problem,
>> so that's really got to be a huge clue, somehow ?
>
> I twisted my brain, why the watchdog thread might change the problem and
> I think I have a rough idea of the scenario.
>
> Can you apply the following patch, which prints out the CPU on which the
> kernel messages are generated and upload the screenshot when the hang
> happens ? Oh, please enable CONFIG_PRINTK_TIME or add "time" to the
> kernel commandline.

Done, and done.
And I managed to capture more of the boot messages, too.
This new capture is in the "sequence" subdir at the previous link.

Cheers

2007-05-02 07:44:57

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [BUG] 2.6.21: Kernel won't boot with either/both of CONFIG_NO_HZ, CONFIG_HIGH_RES_TIMERS

Mark,

On Tue, 2007-05-01 at 12:52 -0400, Mark Lord wrote:
> Done, and done.
> And I managed to capture more of the boot messages, too.
> This new capture is in the "sequence" subdir at the previous link.

thanks. Can you apply this patch please:
http://lkml.org/lkml/2007/4/13/190

It somehow did not make it into 2.6.21.

tglx


2007-05-02 08:06:14

by Andrew Morton

[permalink] [raw]
Subject: Re: [BUG] 2.6.21: Kernel won't boot with either/both of CONFIG_NO_HZ, CONFIG_HIGH_RES_TIMERS

On Wed, 02 May 2007 09:47:15 +0200 Thomas Gleixner <[email protected]> wrote:

> Mark,
>
> On Tue, 2007-05-01 at 12:52 -0400, Mark Lord wrote:
> > Done, and done.
> > And I managed to capture more of the boot messages, too.
> > This new capture is in the "sequence" subdir at the previous link.
>
> thanks. Can you apply this patch please:
> http://lkml.org/lkml/2007/4/13/190
>
> It somehow did not make it into 2.6.21.
>

Alas, poor me. I ain't going to merge a contention-reduction patch when
we're at -rc6. If a patch fixes a bug, please tell me!

2007-05-02 12:53:22

by Mark Lord

[permalink] [raw]
Subject: Re: [BUG] 2.6.21: Kernel won't boot with either/both of CONFIG_NO_HZ, CONFIG_HIGH_RES_TIMERS

re: [PATCH] highres/dyntick: prevent xtime lock contention

|mark> I have a new notebook (Dell Inspiron 9400) with Core2-Duo T7400 @ 2.1Ghz.
|mark> When either/both of CONFIG_NO_HZ, CONFIG_HIGH_RES_TIMERS is used,
|mark> the 2.6.21 kernel hangs on startup just after printing one/both of these:
|mark>
|mark> kernel: switched to high resolution mode on cpu 1
|mark> kernel: switched to high resolution mode on cpu 0
|
|thomas> Can you apply this patch please:
|thomas> http://lkml.org/lkml/2007/4/13/190
|thomas> It somehow did not make it into 2.6.21.
|
|andrew> Alas, poor me. I ain't going to merge a contention-reduction patch when
|andrew> we're at -rc6. If a patch fixes a bug, please tell me!

Okay, patch applied to 2.6.21.1, and I'm typing this email
from thunderbird while running the patched kernel (woo-hoo!).

So I guess it boots, works, and we need this patch in 2.6.21.2 & 2.6.22.

>Subject [PATCH] highres/dyntick: prevent xtime lock contention
>From Thomas Gleixner <>
>Date Fri, 13 Apr 2007 21:05:57 +0200
>Digg This
>
>While the !highres/!dyntick code assigns the duty of the do_timer() call
>to one specific CPU, this was dropped in the highres/dyntick part during
>development.
>
>Steven Rostedt discovered the xtime lock contention on highres/dyntick
>due to several CPUs trying to update jiffies.
>
>Add the single CPU assignment back. In the dyntick case this needs to
>be handled carefully, as the CPU which has the do_timer() duty must drop
>the assignement and let it be grabbed by another CPU, which is active.
>Otherwise the do_timer() calls would not happen during the long sleep.
>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Acked-by: Mark Lord <[email protected]>

diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c index bfda3f7..a96ec9a 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -31,7 +31,7 @@ DEFINE_PER_CPU(struct tick_device, tick_cpu_device);
*/
ktime_t tick_next_period;
ktime_t tick_period;
-static int tick_do_timer_cpu = -1;
+int tick_do_timer_cpu __read_mostly = -1;
DEFINE_SPINLOCK(tick_device_lock);

/*
@@ -295,6 +295,12 @@ static void tick_shutdown(unsigned int *cpup)
clockevents_exchange_device(dev, NULL);
td->evtdev = NULL;
}
+ /* Transfer the do_timer job away from this cpu */
+ if (*cpup == tick_do_timer_cpu) {
+ int cpu = first_cpu(cpu_online_map);
+
+ tick_do_timer_cpu = (cpu != NR_CPUS) ? cpu : -1;
+ }
spin_unlock_irqrestore(&tick_device_lock, flags);
}

diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h index c9d203b..5645e6a 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -5,6 +5,7 @@ DECLARE_PER_CPU(struct tick_device, tick_cpu_device);
extern spinlock_t tick_device_lock;
extern ktime_t tick_next_period;
extern ktime_t tick_period;
+extern int tick_do_timer_cpu __read_mostly;

extern void tick_setup_periodic(struct clock_event_device *dev, int broadcast);
extern void tick_handle_periodic(struct clock_event_device *dev);
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 51556b9..3a8e524 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -221,6 +221,18 @@ void tick_nohz_stop_sched_tick(void)
ts->tick_stopped = 1;
ts->idle_jiffies = last_jiffies;
}
+
+ /*
+ * If this cpu is the one which updates jiffies, then
+ * give up the assignment and let it be taken by the
+ * cpu which runs the tick timer next, which might be
+ * this cpu as well. If we don't drop this here the
+ * jiffies might be stale and do_timer() never
+ * invoked.
+ */
+ if (cpu == tick_do_timer_cpu)
+ tick_do_timer_cpu = -1;
+
/*
* calculate the expiry time for the next timer wheel
* timer
@@ -338,12 +350,24 @@ static void tick_nohz_handler(struct clock_event_device *dev)
{
struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
struct pt_regs *regs = get_irq_regs();
+ int cpu = smp_processor_id();
ktime_t now = ktime_get();

dev->next_event.tv64 = KTIME_MAX;

+ /*
+ * Check if the do_timer duty was dropped. We don't care about
+ * concurrency: This happens only when the cpu in charge went
+ * into a long sleep. If two cpus happen to assign themself to
+ * this duty, then the jiffies update is still serialized by
+ * xtime_lock.
+ */
+ if (unlikely(tick_do_timer_cpu == -1))
+ tick_do_timer_cpu = cpu;
+
/* Check, if the jiffies need an update */
- tick_do_update_jiffies64(now);
+ if (tick_do_timer_cpu == cpu)
+ tick_do_update_jiffies64(now);

/*
* When we are idle and the tick is stopped, we have to touch
@@ -431,9 +455,23 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
struct hrtimer_cpu_base *base = timer->base->cpu_base;
struct pt_regs *regs = get_irq_regs();
ktime_t now = ktime_get();
+ int cpu = smp_processor_id();
+
+#ifdef CONFIG_NO_HZ
+ /*
+ * Check if the do_timer duty was dropped. We don't care about
+ * concurrency: This happens only when the cpu in charge went
+ * into a long sleep. If two cpus happen to assign themself to
+ * this duty, then the jiffies update is still serialized by
+ * xtime_lock.
+ */
+ if (unlikely(tick_do_timer_cpu == -1))
+ tick_do_timer_cpu = cpu;
+#endif

/* Check, if the jiffies need an update */
- tick_do_update_jiffies64(now);
+ if (tick_do_timer_cpu == cpu)
+ tick_do_update_jiffies64(now);

/*
* Do not call, when we are not in irq context and have

2007-05-02 15:38:36

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [BUG] 2.6.21: Kernel won't boot with either/both of CONFIG_NO_HZ, CONFIG_HIGH_RES_TIMERS

On Wed, 2007-05-02 at 01:05 -0700, Andrew Morton wrote:
> > thanks. Can you apply this patch please:
> > http://lkml.org/lkml/2007/4/13/190
> >
> > It somehow did not make it into 2.6.21.
> >
>
> Alas, poor me. I ain't going to merge a contention-reduction patch when
> we're at -rc6. If a patch fixes a bug, please tell me!

It was not a bug fix at this point, I merily tried to solve the lock
contention problem, which was reported by Steven Rostedt.

The confusing thing on Marks config bisecting was that the softlock
detection fixed his problem. The only difference of the softlock
detection is that it starts the watchdog thread and a one second timer.
I was looking into the code and noticed that this patch did not make it.

Honestly it was a shot into the dark and I'd really like to know why
this makes the problem go away.

Mark, did you ever enable CONFIG_PROVE_LOCKING as a single add on to
your not working config ? If not, can you please try ?

tglx


2007-05-02 22:29:28

by Mark Lord

[permalink] [raw]
Subject: Re: [BUG] 2.6.21: Kernel won't boot with either/both of CONFIG_NO_HZ, CONFIG_HIGH_RES_TIMERS

Oh crap. It's back.

I don't know what's different from before,
but the system now locks up again exactly the same way,
even with the lock contention fix applied.

With or without CONFIG_PROVE_LOCKING=y in the config.

I've gotta get some work done here (this is my primary development machine),
so it's back to CONFIG_DETECT_SOFTLOCKUP=y for a while.

If anyone has a patch to dump out state/whatever just before that
last message that preceeds all lockups, then pass it along and
I'll queue it up for testing as soon as I can.

Cheers

2007-05-02 22:38:55

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [BUG] 2.6.21: Kernel won't boot with either/both of CONFIG_NO_HZ, CONFIG_HIGH_RES_TIMERS

On Wed, 2007-05-02 at 18:29 -0400, Mark Lord wrote:
> Oh crap. It's back.
>
> I don't know what's different from before,
> but the system now locks up again exactly the same way,
> even with the lock contention fix applied.

Which is less surprising and confusing than the previous result. It
points to some subtle race condition somewhere, which got affected by
the slightly timing change.

> With or without CONFIG_PROVE_LOCKING=y in the config.

Ok.

> I've gotta get some work done here (this is my primary development machine),
> so it's back to CONFIG_DETECT_SOFTLOCKUP=y for a while.
>
> If anyone has a patch to dump out state/whatever just before that
> last message that preceeds all lockups, then pass it along and
> I'll queue it up for testing as soon as I can.

I try to come up with something, but I'm travelling tomorrow, so it
might be not before end of week.

tglx


2007-05-03 02:29:18

by Mark Lord

[permalink] [raw]
Subject: Re: [BUG] 2.6.21: Kernel won't boot with either/both of CONFIG_NO_HZ, CONFIG_HIGH_RES_TIMERS

Thomas Gleixner wrote:
> ..
> I try to come up with something, but I'm travelling tomorrow, so it
> might be not before end of week.

Thanks, Thomas.

I believe we definitely want to nail this down before 2.6.22-final,
but there's a good workaround in the interim (CONFIG_DETECT_SOFTLOCKUP=y)
and we've got at least a couple of months 'till then.

I think I may have fiddled with the RTC config, so I'll compare with my
old config and see what changed.

Cheers

2007-05-08 18:46:38

by Mark Lord

[permalink] [raw]
Subject: Re: [BUG] 2.6.21: Kernel won't boot with either/both of CONFIG_NO_HZ, CONFIG_HIGH_RES_TIMERS

Mark Lord wrote:
> Thomas Gleixner wrote:
>> ..
>> I try to come up with something, but I'm travelling tomorrow, so it
>> might be not before end of week.
>
> Thanks, Thomas.
>
> I believe we definitely want to nail this down before 2.6.22-final,
> but there's a good workaround in the interim (CONFIG_DETECT_SOFTLOCKUP=y)
> and we've got at least a couple of months 'till then.
>
> I think I may have fiddled with the RTC config, so I'll compare with my
> old config and see what changed.

Okay. I turned on CONFIG_COMPAT_VDSO=y and now the system fails to start up
in the same way/place as before.

Rebuilding again with CONFIG_COMPAT_VDSO not set, and all is as "well" as
before -- system starts up so long as CONFIG_DETECT_SOFTLOCKUP=y is given.

Cheers