2007-05-25 21:08:54

by Rui Nuno Capela

[permalink] [raw]
Subject: 2.6.21-rt2..8 troubles

Hi Ingo et al.

It's been quite a while, since last time I've complained about the -rt
kernel patch series. This time I'm afraid I have a nasty specialty I've
been trying to figure out and isolate but to no definitive results.

Fact is, since 2.6.21-rt2 and still on latest -rt8, that I'm facing
troubled behavior while running on a Core2 T7200 laptop (SMP). Somehow,
soon or later, the whole system starts crawling to death. It just slows
down to some kind of Big Freeze, with no evidence over the console
whatsoever, so that I'm ultimately left with a brick on my hands.

This behavior is consistent and occurs every time after a while. It
surely does not occur on 2.6.21-rt1 and earlier. Even stranger, it does
not occur on another but older [email protected] desktop (HT/SMP) where a very
identical system image is deployed (openSUSE 10.2 i386, gcc 4.1.2, KDE
3.5.7)

I wish I could give you more details, but the fact is I don't know where
to look. The machine just freezes silently, again and again, with all
kernels from -rt2 to -rt8 inclusive, with no traceable evidence, at
least to my knowledge. The only symptom that I can come about is that,
from some moment on and ever since, the system cannot start any new
process anymore, or otherwise takes forever to realize and launch any
new started process thread.

A sample dmesg output:
http://www.rncbc.org/datahub/dmesg-2.6.21-rt5.0
The corresponding .config:
http://www.rncbc.org/datahub/config-2.6.21-rt5.0

Again, there's no logged evidence of the problem, which is as nasty as
repeatable after each boot. Unfortunately, it's not quite
deterministically reproducible, this behavior of turning into an
unresponsive brick ;) It's just a matter of time, or so I think. That's
why I have no clues.

Is there anything I can do better to help myself figuring out this
issue? As this is a modern laptop such things like a serial console are
unavailable, but it would be nice to track things up over netconsole
perhaps?

I just need some bright and nice directions now ;) Hope someone finds
this worth of attention too. Meanwhile, I'll be happy with 2.6.21-rt1 :)

Cheers.
--
rncbc aka Rui Nuno Capela
[email protected]


2007-05-26 16:08:34

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.21-rt2..8 troubles

On Fri, 2007-05-25 at 21:58 +0100, Rui Nuno Capela wrote:
> Is there anything I can do better to help myself figuring out this
> issue? As this is a modern laptop such things like a serial console are
> unavailable, but it would be nice to track things up over netconsole
> perhaps?
>
> I just need some bright and nice directions now ;) Hope someone finds
> this worth of attention too. Meanwhile, I'll be happy with 2.6.21-rt1 :)

Can you boot with "hpet=disable" on the command line ?

If that does not help, please provide the output of /proc/timer_list.

Thanks,

tglx


2007-05-26 21:33:18

by Rui Nuno Capela

[permalink] [raw]
Subject: Re: 2.6.21-rt2..8 troubles

Thomas Gleixner wrote:
> On Fri, 2007-05-25 at 21:58 +0100, Rui Nuno Capela wrote:
>> Is there anything I can do better to help myself figuring out this
>> issue? As this is a modern laptop such things like a serial console are
>> unavailable, but it would be nice to track things up over netconsole
>> perhaps?
>>
>> I just need some bright and nice directions now ;) Hope someone finds
>> this worth of attention too. Meanwhile, I'll be happy with 2.6.21-rt1 :)
>
> Can you boot with "hpet=disable" on the command line ?
>

Nope. It doesn't seem to have significant effect. Same time-bomb
behavior: after an indeterminate period of uptime, the systems stops
responding and cannot spawn new processes (current running ones still
live on, strange).

> If that does not help, please provide the output of /proc/timer_list.
>

This is with my latest iteration:
http://www.rncbc.org/datahub/config-2.6.21.1-rt8.0

Normal boot on which it behaves as badly as reported:
http://www.rncbc.org/datahub/dmesg-2.6.21.1-rt8.0

# cat /proc/timer_list
Timer List Version: v0.3
HRTIMER_MAX_CLOCK_BASES: 2
now at 131736771907 nsecs

cpu: 0
clock 0:
.index: 0
.resolution: 1 nsecs
.get_time: ktime_get_real
.offset: 1180213690448299114 nsecs
active timers:
clock 1:
.index: 1
.resolution: 1 nsecs
.get_time: ktime_get
.offset: 0 nsecs
active timers:
#0: <ed7c4ef4>, tick_sched_timer, S:01
# expires at 131737000000 nsecs [in 228093 nsecs]
#1: <ed7c4ef4>, it_real_fn, S:01
# expires at 131751277843 nsecs [in 14505936 nsecs]
#2: <ed7c4ef4>, hrtimer_wakeup, S:01
# expires at 131802703679 nsecs [in 65931772 nsecs]
#3: <ed7c4ef4>, hrtimer_wakeup, S:01
# expires at 131802705006 nsecs [in 65933099 nsecs]
#4: <ed7c4ef4>, hrtimer_wakeup, S:01
# expires at 132412838830 nsecs [in 676066923 nsecs]
#5: <ed7c4ef4>, it_real_fn, S:01
# expires at 137026607454 nsecs [in 5289835547 nsecs]
#6: <ed7c4ef4>, hrtimer_wakeup, S:01
# expires at 141381493725 nsecs [in 9644721818 nsecs]
#7: <ed7c4ef4>, hrtimer_wakeup, S:01
# expires at 170796028701 nsecs [in 39059256794 nsecs]
.expires_next : 131737000000 nsecs
.hres_active : 1
.nr_events : 40634
.nohz_mode : 2
.idle_tick : 131724000000 nsecs
.tick_stopped : 0
.idle_jiffies : 4294799020
.idle_calls : 178848
.idle_sleeps : 133212
.idle_entrytime : 131736069830 nsecs
.idle_sleeptime : 100895567465 nsecs
.last_jiffies : 4294799033
.next_jiffies : 4294799039
.idle_expires : 131736000000 nsecs
jiffies: 4294799033

cpu: 1
clock 0:
.index: 0
.resolution: 1 nsecs
.get_time: ktime_get_real
.offset: 1180213690448299114 nsecs
active timers:
clock 1:
.index: 1
.resolution: 1 nsecs
.get_time: ktime_get
.offset: 0 nsecs
active timers:
#0: <ed7c4ef4>, hrtimer_wakeup, S:01
# expires at 131737067173 nsecs [in 295266 nsecs]
#1: <ed7c4ef4>, tick_sched_timer, S:01
# expires at 131737250000 nsecs [in 478093 nsecs]
#2: <ed7c4ef4>, hrtimer_wakeup, S:01
# expires at 139151071745 nsecs [in 7414299838 nsecs]
#3: <ed7c4ef4>, hrtimer_wakeup, S:01
# expires at 139151133755 nsecs [in 7414361848 nsecs]
#4: <ed7c4ef4>, hrtimer_wakeup, S:01
# expires at 139151154005 nsecs [in 7414382098 nsecs]
.expires_next : 131737067173 nsecs
.hres_active : 1
.nr_events : 31510
.nohz_mode : 2
.idle_tick : 131734250000 nsecs
.tick_stopped : 0
.idle_jiffies : 4294799030
.idle_calls : 151213
.idle_sleeps : 107018
.idle_entrytime : 131735193036 nsecs
.idle_sleeptime : 108256832194 nsecs
.last_jiffies : 4294799032
.next_jiffies : 4294799040
.idle_expires : 131743000000 nsecs
jiffies: 4294799033


Tick Device: mode: 1
Clock Event Device: hpet
max_delta_ns: 2147483647
min_delta_ns: 3352
mult: 61496110
shift: 32
mode: 3
next_event: 131737000000 nsecs
set_next_event: hpet_legacy_next_event
set_mode: hpet_legacy_set_mode
event_handler: tick_handle_oneshot_broadcast
tick_broadcast_mask: 00000003
tick_broadcast_oneshot_mask: 00000001


Tick Device: mode: 1
Clock Event Device: lapic
max_delta_ns: 806914928
min_delta_ns: 1442
mult: 44650051
shift: 32
mode: 1
next_event: 131737000000 nsecs
set_next_event: lapic_next_event
set_mode: lapic_timer_setup
event_handler: hrtimer_interrupt

Tick Device: mode: 1
Clock Event Device: lapic
max_delta_ns: 806914928
min_delta_ns: 1442
mult: 44650051
shift: 32
mode: 3
next_event: 131737067173 nsecs
set_next_event: lapic_next_event
set_mode: lapic_timer_setup
event_handler: hrtimer_interrupt
--


Alternate boot with hpet=disabled as suggested, but no better results:
http://www.rncbc.org/datahub/dmesg-2.6.21.1-rt8.0-hpet_disabled

# cat /proc/timer_list
Timer List Version: v0.3
HRTIMER_MAX_CLOCK_BASES: 2
now at 269529706096 nsecs

cpu: 0
clock 0:
.index: 0
.resolution: 1 nsecs
.get_time: ktime_get_real
.offset: 1180214106093436428 nsecs
active timers:
clock 1:
.index: 1
.resolution: 1 nsecs
.get_time: ktime_get
.offset: 0 nsecs
active timers:
#0: <ed2a2ef4>, tick_sched_timer, S:01
# expires at 269530000000 nsecs [in 293904 nsecs]
#1: <ed2a2ef4>, hrtimer_wakeup, S:01
# expires at 269554568320 nsecs [in 24862224 nsecs]
#2: <ed2a2ef4>, hrtimer_wakeup, S:01
# expires at 269585566924 nsecs [in 55860828 nsecs]
#3: <ed2a2ef4>, hrtimer_wakeup, S:01
# expires at 269822782823 nsecs [in 293076727 nsecs]
#4: <ed2a2ef4>, hrtimer_wakeup, S:01
# expires at 272726158017 nsecs [in 3196451921 nsecs]
#5: <ed2a2ef4>, it_real_fn, S:01
# expires at 278007767018 nsecs [in 8478060922 nsecs]
#6: <ed2a2ef4>, hrtimer_wakeup, S:01
# expires at 283716431029 nsecs [in 14186724933 nsecs]
#7: <ed2a2ef4>, hrtimer_wakeup, S:01
# expires at 283716456168 nsecs [in 14186750072 nsecs]
#8: <ed2a2ef4>, hrtimer_wakeup, S:01
# expires at 295789281627 nsecs [in 26259575531 nsecs]
.expires_next : 269530000000 nsecs
.hres_active : 1
.nr_events : 63228
.nohz_mode : 2
.idle_tick : 269527000000 nsecs
.tick_stopped : 0
.idle_jiffies : 4294936823
.idle_calls : 217590
.idle_sleeps : 168323
.idle_entrytime : 269528785728 nsecs
.idle_sleeptime : 230915526366 nsecs
.last_jiffies : 4294936825
.next_jiffies : 4294936840
.idle_expires : 269543000000 nsecs
jiffies: 4294936826

cpu: 1
clock 0:
.index: 0
.resolution: 1 nsecs
.get_time: ktime_get_real
.offset: 1180214106093436428 nsecs
active timers:
clock 1:
.index: 1
.resolution: 1 nsecs
.get_time: ktime_get
.offset: 0 nsecs
active timers:
#0: <ed2a2ef4>, tick_sched_timer, S:01
# expires at 269530250000 nsecs [in 543904 nsecs]
#1: <ed2a2ef4>, it_real_fn, S:01
# expires at 269546379364 nsecs [in 16673268 nsecs]
#2: <ed2a2ef4>, hrtimer_wakeup, S:01
# expires at 283723356553 nsecs [in 14193650457 nsecs]
.expires_next : 269530250000 nsecs
.hres_active : 1
.nr_events : 64947
.nohz_mode : 2
.idle_tick : 269527250000 nsecs
.tick_stopped : 0
.idle_jiffies : 4294936824
.idle_calls : 172684
.idle_sleeps : 111081
.idle_entrytime : 269529298565 nsecs
.idle_sleeptime : 234502295072 nsecs
.last_jiffies : 4294936826
.next_jiffies : 4294936833
.idle_expires : 269536000000 nsecs
jiffies: 4294936826


Tick Device: mode: 1
Clock Event Device: pit
max_delta_ns: 27461866
min_delta_ns: 12571
mult: 5124677
shift: 32
mode: 3
next_event: 269530250000 nsecs
set_next_event: pit_next_event
set_mode: init_pit_timer
event_handler: tick_handle_oneshot_broadcast
tick_broadcast_mask: 00000003
tick_broadcast_oneshot_mask: 00000002


Tick Device: mode: 1
Clock Event Device: lapic
max_delta_ns: 807031401
min_delta_ns: 1443
mult: 44643607
shift: 32
mode: 3
next_event: 269530000000 nsecs
set_next_event: lapic_next_event
set_mode: lapic_timer_setup
event_handler: hrtimer_interrupt

Tick Device: mode: 1
Clock Event Device: lapic
max_delta_ns: 807031401
min_delta_ns: 1443
mult: 44643607
shift: 32
mode: 1
next_event: 269530250000 nsecs
set_next_event: lapic_next_event
set_mode: lapic_timer_setup
event_handler: hrtimer_interrupt
--

Thanks for the hints.

Cheers.
--
rncbc aka Rui Nuno Capela
[email protected]

2007-05-31 15:57:22

by Steven Rostedt

[permalink] [raw]
Subject: Re: 2.6.21-rt2..8 troubles


On Fri, May 25, 2007 at 09:58:12PM +0100, Rui Nuno Capela wrote:
>
> I wish I could give you more details, but the fact is I don't know
> where
> to look. The machine just freezes silently, again and again, with all
> kernels from -rt2 to -rt8 inclusive, with no traceable evidence, at
> least to my knowledge. The only symptom that I can come about is that,
> from some moment on and ever since, the system cannot start any new
> process anymore, or otherwise takes forever to realize and launch any
> new started process thread.
>

I have a box that looks like it's doing the same thing. Unfortunately
for now it's being used to test other things.

But I did do a show-task and see a bunch of D processes. I'll post that
output when I get that box free again.

-- Steve

2007-06-06 00:45:43

by Rui Nuno Capela

[permalink] [raw]
Subject: Re: 2.6.21-rt2..8 troubles

Rui Nuno Capela wrote:
> Thomas Gleixner wrote:
>> On Fri, 2007-05-25 at 21:58 +0100, Rui Nuno Capela wrote:
>>> Is there anything I can do better to help myself figuring out this
>>> issue? As this is a modern laptop such things like a serial console are
>>> unavailable, but it would be nice to track things up over netconsole
>>> perhaps?
>>>
>>> I just need some bright and nice directions now ;) Hope someone finds
>>> this worth of attention too. Meanwhile, I'll be happy with 2.6.21-rt1 :)
>> Can you boot with "hpet=disable" on the command line ?
>>
>
> Nope. It doesn't seem to have significant effect. Same time-bomb
> behavior: after an indeterminate period of uptime, the systems stops
> responding and cannot spawn new processes (current running ones still
> live on, strange).
>
>> If that does not help, please provide the output of /proc/timer_list.
>>
>
> This is with my latest iteration:
> http://www.rncbc.org/datahub/config-2.6.21.1-rt8.0
>
> Normal boot on which it behaves as badly as reported:
> http://www.rncbc.org/datahub/dmesg-2.6.21.1-rt8.0
>
> # cat /proc/timer_list
> Timer List Version: v0.3
> HRTIMER_MAX_CLOCK_BASES: 2
> now at 131736771907 nsecs
>
> cpu: 0
> clock 0:
> .index: 0
> .resolution: 1 nsecs
> .get_time: ktime_get_real
> .offset: 1180213690448299114 nsecs
> active timers:
> clock 1:
> .index: 1
> .resolution: 1 nsecs
> .get_time: ktime_get
> .offset: 0 nsecs
> active timers:
> #0: <ed7c4ef4>, tick_sched_timer, S:01
> # expires at 131737000000 nsecs [in 228093 nsecs]
> #1: <ed7c4ef4>, it_real_fn, S:01
> # expires at 131751277843 nsecs [in 14505936 nsecs]
> #2: <ed7c4ef4>, hrtimer_wakeup, S:01
> # expires at 131802703679 nsecs [in 65931772 nsecs]
> #3: <ed7c4ef4>, hrtimer_wakeup, S:01
> # expires at 131802705006 nsecs [in 65933099 nsecs]
> #4: <ed7c4ef4>, hrtimer_wakeup, S:01
> # expires at 132412838830 nsecs [in 676066923 nsecs]
> #5: <ed7c4ef4>, it_real_fn, S:01
> # expires at 137026607454 nsecs [in 5289835547 nsecs]
> #6: <ed7c4ef4>, hrtimer_wakeup, S:01
> # expires at 141381493725 nsecs [in 9644721818 nsecs]
> #7: <ed7c4ef4>, hrtimer_wakeup, S:01
> # expires at 170796028701 nsecs [in 39059256794 nsecs]
> .expires_next : 131737000000 nsecs
> .hres_active : 1
> .nr_events : 40634
> .nohz_mode : 2
> .idle_tick : 131724000000 nsecs
> .tick_stopped : 0
> .idle_jiffies : 4294799020
> .idle_calls : 178848
> .idle_sleeps : 133212
> .idle_entrytime : 131736069830 nsecs
> .idle_sleeptime : 100895567465 nsecs
> .last_jiffies : 4294799033
> .next_jiffies : 4294799039
> .idle_expires : 131736000000 nsecs
> jiffies: 4294799033
>
> cpu: 1
> clock 0:
> .index: 0
> .resolution: 1 nsecs
> .get_time: ktime_get_real
> .offset: 1180213690448299114 nsecs
> active timers:
> clock 1:
> .index: 1
> .resolution: 1 nsecs
> .get_time: ktime_get
> .offset: 0 nsecs
> active timers:
> #0: <ed7c4ef4>, hrtimer_wakeup, S:01
> # expires at 131737067173 nsecs [in 295266 nsecs]
> #1: <ed7c4ef4>, tick_sched_timer, S:01
> # expires at 131737250000 nsecs [in 478093 nsecs]
> #2: <ed7c4ef4>, hrtimer_wakeup, S:01
> # expires at 139151071745 nsecs [in 7414299838 nsecs]
> #3: <ed7c4ef4>, hrtimer_wakeup, S:01
> # expires at 139151133755 nsecs [in 7414361848 nsecs]
> #4: <ed7c4ef4>, hrtimer_wakeup, S:01
> # expires at 139151154005 nsecs [in 7414382098 nsecs]
> .expires_next : 131737067173 nsecs
> .hres_active : 1
> .nr_events : 31510
> .nohz_mode : 2
> .idle_tick : 131734250000 nsecs
> .tick_stopped : 0
> .idle_jiffies : 4294799030
> .idle_calls : 151213
> .idle_sleeps : 107018
> .idle_entrytime : 131735193036 nsecs
> .idle_sleeptime : 108256832194 nsecs
> .last_jiffies : 4294799032
> .next_jiffies : 4294799040
> .idle_expires : 131743000000 nsecs
> jiffies: 4294799033
>
>
> Tick Device: mode: 1
> Clock Event Device: hpet
> max_delta_ns: 2147483647
> min_delta_ns: 3352
> mult: 61496110
> shift: 32
> mode: 3
> next_event: 131737000000 nsecs
> set_next_event: hpet_legacy_next_event
> set_mode: hpet_legacy_set_mode
> event_handler: tick_handle_oneshot_broadcast
> tick_broadcast_mask: 00000003
> tick_broadcast_oneshot_mask: 00000001
>
>
> Tick Device: mode: 1
> Clock Event Device: lapic
> max_delta_ns: 806914928
> min_delta_ns: 1442
> mult: 44650051
> shift: 32
> mode: 1
> next_event: 131737000000 nsecs
> set_next_event: lapic_next_event
> set_mode: lapic_timer_setup
> event_handler: hrtimer_interrupt
>
> Tick Device: mode: 1
> Clock Event Device: lapic
> max_delta_ns: 806914928
> min_delta_ns: 1442
> mult: 44650051
> shift: 32
> mode: 3
> next_event: 131737067173 nsecs
> set_next_event: lapic_next_event
> set_mode: lapic_timer_setup
> event_handler: hrtimer_interrupt
> --
>
>
> Alternate boot with hpet=disabled as suggested, but no better results:
> http://www.rncbc.org/datahub/dmesg-2.6.21.1-rt8.0-hpet_disabled
>
> # cat /proc/timer_list
> Timer List Version: v0.3
> HRTIMER_MAX_CLOCK_BASES: 2
> now at 269529706096 nsecs
>
> cpu: 0
> clock 0:
> .index: 0
> .resolution: 1 nsecs
> .get_time: ktime_get_real
> .offset: 1180214106093436428 nsecs
> active timers:
> clock 1:
> .index: 1
> .resolution: 1 nsecs
> .get_time: ktime_get
> .offset: 0 nsecs
> active timers:
> #0: <ed2a2ef4>, tick_sched_timer, S:01
> # expires at 269530000000 nsecs [in 293904 nsecs]
> #1: <ed2a2ef4>, hrtimer_wakeup, S:01
> # expires at 269554568320 nsecs [in 24862224 nsecs]
> #2: <ed2a2ef4>, hrtimer_wakeup, S:01
> # expires at 269585566924 nsecs [in 55860828 nsecs]
> #3: <ed2a2ef4>, hrtimer_wakeup, S:01
> # expires at 269822782823 nsecs [in 293076727 nsecs]
> #4: <ed2a2ef4>, hrtimer_wakeup, S:01
> # expires at 272726158017 nsecs [in 3196451921 nsecs]
> #5: <ed2a2ef4>, it_real_fn, S:01
> # expires at 278007767018 nsecs [in 8478060922 nsecs]
> #6: <ed2a2ef4>, hrtimer_wakeup, S:01
> # expires at 283716431029 nsecs [in 14186724933 nsecs]
> #7: <ed2a2ef4>, hrtimer_wakeup, S:01
> # expires at 283716456168 nsecs [in 14186750072 nsecs]
> #8: <ed2a2ef4>, hrtimer_wakeup, S:01
> # expires at 295789281627 nsecs [in 26259575531 nsecs]
> .expires_next : 269530000000 nsecs
> .hres_active : 1
> .nr_events : 63228
> .nohz_mode : 2
> .idle_tick : 269527000000 nsecs
> .tick_stopped : 0
> .idle_jiffies : 4294936823
> .idle_calls : 217590
> .idle_sleeps : 168323
> .idle_entrytime : 269528785728 nsecs
> .idle_sleeptime : 230915526366 nsecs
> .last_jiffies : 4294936825
> .next_jiffies : 4294936840
> .idle_expires : 269543000000 nsecs
> jiffies: 4294936826
>
> cpu: 1
> clock 0:
> .index: 0
> .resolution: 1 nsecs
> .get_time: ktime_get_real
> .offset: 1180214106093436428 nsecs
> active timers:
> clock 1:
> .index: 1
> .resolution: 1 nsecs
> .get_time: ktime_get
> .offset: 0 nsecs
> active timers:
> #0: <ed2a2ef4>, tick_sched_timer, S:01
> # expires at 269530250000 nsecs [in 543904 nsecs]
> #1: <ed2a2ef4>, it_real_fn, S:01
> # expires at 269546379364 nsecs [in 16673268 nsecs]
> #2: <ed2a2ef4>, hrtimer_wakeup, S:01
> # expires at 283723356553 nsecs [in 14193650457 nsecs]
> .expires_next : 269530250000 nsecs
> .hres_active : 1
> .nr_events : 64947
> .nohz_mode : 2
> .idle_tick : 269527250000 nsecs
> .tick_stopped : 0
> .idle_jiffies : 4294936824
> .idle_calls : 172684
> .idle_sleeps : 111081
> .idle_entrytime : 269529298565 nsecs
> .idle_sleeptime : 234502295072 nsecs
> .last_jiffies : 4294936826
> .next_jiffies : 4294936833
> .idle_expires : 269536000000 nsecs
> jiffies: 4294936826
>
>
> Tick Device: mode: 1
> Clock Event Device: pit
> max_delta_ns: 27461866
> min_delta_ns: 12571
> mult: 5124677
> shift: 32
> mode: 3
> next_event: 269530250000 nsecs
> set_next_event: pit_next_event
> set_mode: init_pit_timer
> event_handler: tick_handle_oneshot_broadcast
> tick_broadcast_mask: 00000003
> tick_broadcast_oneshot_mask: 00000002
>
>
> Tick Device: mode: 1
> Clock Event Device: lapic
> max_delta_ns: 807031401
> min_delta_ns: 1443
> mult: 44643607
> shift: 32
> mode: 3
> next_event: 269530000000 nsecs
> set_next_event: lapic_next_event
> set_mode: lapic_timer_setup
> event_handler: hrtimer_interrupt
>
> Tick Device: mode: 1
> Clock Event Device: lapic
> max_delta_ns: 807031401
> min_delta_ns: 1443
> mult: 44643607
> shift: 32
> mode: 1
> next_event: 269530250000 nsecs
> set_next_event: lapic_next_event
> set_mode: lapic_timer_setup
> event_handler: hrtimer_interrupt
> --
>

Just for the heads-up, I'm still suffering from this same illness, and
it seems even worse (big freeze happens earlier) on 2.6.21.3-rt9.

There's no way around. On one box it works flawlessly (desktop,
[email protected]) while on the patient one (laptop, core2 T7200) it bricks
silently.

Shrugs:)
--
rncbc aka Rui Nuno Capela
[email protected]

2007-06-08 15:47:21

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.21-rt2..8 troubles

On Wed, 2007-06-06 at 01:44 +0100, Rui Nuno Capela wrote:
> Just for the heads-up, I'm still suffering from this same illness, and
> it seems even worse (big freeze happens earlier) on 2.6.21.3-rt9.
>
> There's no way around. On one box it works flawlessly (desktop,
> [email protected]) while on the patient one (laptop, core2 T7200) it bricks
> silently.

Sorry for responding late. To have some idea where the breakage comes
from, can you please try

http://www.tglx.de/projects/hrtimers/2.6.22-rc4/patch-2.6.22-rc4-hrt5.patch

whether it has the same behaviour.

Thanks,

tglx


2007-06-08 18:23:56

by Rui Nuno Capela

[permalink] [raw]
Subject: Re: 2.6.21-rt2..8 troubles

Hi Thomas,

On Fri, June 8, 2007 16:47, Thomas Gleixner wrote:
> On Wed, 2007-06-06 at 01:44 +0100, Rui Nuno Capela wrote:
>
>> Just for the heads-up, I'm still suffering from this same illness, and
>> it seems even worse (big freeze happens earlier) on 2.6.21.3-rt9.
>>
>> There's no way around. On one box it works flawlessly (desktop,
>> [email protected]) while on the patient one (laptop, core2 T7200) it bricks
>> silently.
>
> Sorry for responding late. To have some idea where the breakage comes
> from, can you please try
>
> http://www.tglx.de/projects/hrtimers/2.6.22-rc4/patch-2.6.22-rc4-hrt5.pat
> ch
>
> whether it has the same behaviour.
>

Just built from linux-2.6.22-rc4.tar.bz2, with patch-2.6.22-rc4-hrt5.
All's working apparentely nice on this offending machine (laptop, intel
core2 T7200). In fact, I'm writing this very reply under it and through
ipw3945 wifi module--which never was so pragmatic on -rt2..9 ;)

Nevertheless, this is not preempt-realtime (-rt) is it? And I it never
complained about vanilla.

Is this good news though?
--
rncbc aka Rui Nuno Capela
[email protected]

2007-06-08 18:50:59

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.21-rt2..8 troubles

On Fri, 2007-06-08 at 19:21 +0100, Rui Nuno Capela wrote:
> >> There's no way around. On one box it works flawlessly (desktop,
> >> [email protected]) while on the patient one (laptop, core2 T7200) it bricks
> >> silently.
> >
> > Sorry for responding late. To have some idea where the breakage comes
> > from, can you please try
> >
> > http://www.tglx.de/projects/hrtimers/2.6.22-rc4/patch-2.6.22-rc4-hrt5.pat
> > ch
> >
> > whether it has the same behaviour.
> >
>
> Just built from linux-2.6.22-rc4.tar.bz2, with patch-2.6.22-rc4-hrt5.
> All's working apparentely nice on this offending machine (laptop, intel
> core2 T7200). In fact, I'm writing this very reply under it and through
> ipw3945 wifi module--which never was so pragmatic on -rt2..9 ;)
>
> Nevertheless, this is not preempt-realtime (-rt) is it? And I it never
> complained about vanilla.
>
> Is this good news though?

Well, the patch carries the same high resolution timer fixes as -rt, so
I just wanted to exclude those. Thanks for testing.

I'm spinning -rt10 with a couple of fixes. Should be out sometimes
tomorrow. If the problem persists, we need to dig deeper.

Thanks,

tglx


2007-06-11 19:38:26

by Rui Nuno Capela

[permalink] [raw]
Subject: Re: 2.6.21-rt2..8 troubles

Thomas Gleixner wrote:
> On Fri, 2007-06-08 at 19:21 +0100, Rui Nuno Capela wrote:
>> Just built from linux-2.6.22-rc4.tar.bz2, with patch-2.6.22-rc4-hrt5.
>> All's working apparentely nice on this offending machine (laptop, intel
>> core2 T7200). In fact, I'm writing this very reply under it and through
>> ipw3945 wifi module--which never was so pragmatic on -rt2..9 ;)
>>
>> Nevertheless, this is not preempt-realtime (-rt) is it? And I it never
>> complained about vanilla.
>>
>> Is this good news though?
>
> Well, the patch carries the same high resolution timer fixes as -rt, so
> I just wanted to exclude those. Thanks for testing.
>
> I'm spinning -rt10 with a couple of fixes. Should be out sometimes
> tomorrow. If the problem persists, we need to dig deeper.
>

Uhoh. I'm sorry to tell, but the problem is still creeping on
2.6.21.4-rt11 and -rt12 :(

So sorry.
--
rncbc aka Rui Nuno Capela
[email protected]

2007-06-11 19:45:35

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.21-rt2..8 troubles

On Mon, 2007-06-11 at 20:36 +0100, Rui Nuno Capela wrote:
> > I'm spinning -rt10 with a couple of fixes. Should be out sometimes
> > tomorrow. If the problem persists, we need to dig deeper.
> >
>
> Uhoh. I'm sorry to tell, but the problem is still creeping on
> 2.6.21.4-rt11 and -rt12 :(
>
> So sorry.

Hmm. Does it happen, when you boot with maxcpus=1 on the kernel
commandline ?

tglx


2007-06-11 19:58:57

by Daniel Walker

[permalink] [raw]
Subject: Re: 2.6.21-rt2..8 troubles

On Mon, 2007-06-11 at 21:45 +0200, Thomas Gleixner wrote:
> On Mon, 2007-06-11 at 20:36 +0100, Rui Nuno Capela wrote:
> > > I'm spinning -rt10 with a couple of fixes. Should be out sometimes
> > > tomorrow. If the problem persists, we need to dig deeper.
> > >
> >
> > Uhoh. I'm sorry to tell, but the problem is still creeping on
> > 2.6.21.4-rt11 and -rt12 :(
> >
> > So sorry.
>
> Hmm. Does it happen, when you boot with maxcpus=1 on the kernel
> commandline ?

I think 2.6.21-rt2 had some apic updates also, (along with hpet updates)
so testing with "noapic" on the command line might be helpful too ..

Daniel

2007-06-11 20:53:58

by Rui Nuno Capela

[permalink] [raw]
Subject: Re: 2.6.21-rt2..8 troubles

Daniel Walker wrote:
> On Mon, 2007-06-11 at 21:45 +0200, Thomas Gleixner wrote:
>> On Mon, 2007-06-11 at 20:36 +0100, Rui Nuno Capela wrote:
>>>> I'm spinning -rt10 with a couple of fixes. Should be out sometimes
>>>> tomorrow. If the problem persists, we need to dig deeper.
>>>>
>>> Uhoh. I'm sorry to tell, but the problem is still creeping on
>>> 2.6.21.4-rt11 and -rt12 :(
>>>
>>> So sorry.
>> Hmm. Does it happen, when you boot with maxcpus=1 on the kernel
>> commandline ?
>
> I think 2.6.21-rt2 had some apic updates also, (along with hpet updates)
> so testing with "noapic" on the command line might be helpful too ..
>

Thomas,

Yes, "maxcpus=1" seems to keep it running, but then I render my Core2
just half-baked ;)

Daniel,

No, "noapic" does not seem to help any better.

HTH
--
rncbc aka Rui Nuno Capela
[email protected]

2007-06-11 21:14:45

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.21-rt2..8 troubles

On Mon, 2007-06-11 at 21:50 +0100, Rui Nuno Capela wrote:
> Thomas,
>
> Yes, "maxcpus=1" seems to keep it running, but then I render my Core2
> just half-baked ;)

Yes, I know :(

/me goes into desperate mode

Is this a DELL laptop ?

tglx


2007-06-11 21:29:17

by Rui Nuno Capela

[permalink] [raw]
Subject: Re: 2.6.21-rt2..8 troubles

Thomas Gleixner wrote:
> On Mon, 2007-06-11 at 21:50 +0100, Rui Nuno Capela wrote:
>> Thomas,
>>
>> Yes, "maxcpus=1" seems to keep it running, but then I render my Core2
>> just half-baked ;)
>
> Yes, I know :(
>
> /me goes into desperate mode
>
> Is this a DELL laptop ?
>

Nope. It's a Fujitsu-Siemens Amilo Si 1520 -- Intel Core2 Duo [email protected].

Works great with 2.6.21-rt1, and 2.6.22-rc4-hrt5, but that you already
know :)

Cheers.
--
rncbc aka Rui Nuno Capela
[email protected]

2007-06-11 21:43:16

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.21-rt2..8 troubles

On Mon, 2007-06-11 at 22:25 +0100, Rui Nuno Capela wrote:
> Nope. It's a Fujitsu-Siemens Amilo Si 1520 -- Intel Core2 Duo [email protected].

Yeah, there are Dell ones which have similar or worse symptoms.

> Works great with 2.6.21-rt1, and 2.6.22-rc4-hrt5, but that you already
> know :)

Ok. I go back and figure out which differences we have between
2.6.21-rt>8 and the -hrt queue.

tglx


2007-06-11 22:38:09

by Daniel Walker

[permalink] [raw]
Subject: Re: 2.6.21-rt2..8 troubles

On Mon, 2007-06-11 at 23:42 +0200, Thomas Gleixner wrote:
> On Mon, 2007-06-11 at 22:25 +0100, Rui Nuno Capela wrote:
> > Nope. It's a Fujitsu-Siemens Amilo Si 1520 -- Intel Core2 Duo [email protected].
>
> Yeah, there are Dell ones which have similar or worse symptoms.
>
> > Works great with 2.6.21-rt1, and 2.6.22-rc4-hrt5, but that you already
> > know :)
>
> Ok. I go back and figure out which differences we have between
> 2.6.21-rt>8 and the -hrt queue.

Are you sure it's strictly and HRT issue? I didn't see a
!CONFIG_HIGH_RES_TIMERS test ..

Daniel

2007-06-11 23:09:56

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.21-rt2..8 troubles

On Mon, 2007-06-11 at 15:34 -0700, Daniel Walker wrote:
> On Mon, 2007-06-11 at 23:42 +0200, Thomas Gleixner wrote:
> > On Mon, 2007-06-11 at 22:25 +0100, Rui Nuno Capela wrote:
> > > Nope. It's a Fujitsu-Siemens Amilo Si 1520 -- Intel Core2 Duo [email protected].
> >
> > Yeah, there are Dell ones which have similar or worse symptoms.
> >
> > > Works great with 2.6.21-rt1, and 2.6.22-rc4-hrt5, but that you already
> > > know :)
> >
> > Ok. I go back and figure out which differences we have between
> > 2.6.21-rt>8 and the -hrt queue.
>
> Are you sure it's strictly and HRT issue? I didn't see a
> !CONFIG_HIGH_RES_TIMERS test ..

The main difference between -rt1 and -rt2 was the update of -hrt, which
not only affects CONFIG_HIGH_RES_TIMERS. There are enough
CONFIG_HIGH_RES_TIMERS=n related changes to clock events and friends as
well.

tglx


2007-06-12 10:11:46

by Rui Nuno Capela

[permalink] [raw]
Subject: Re: 2.6.21-rt2..8 troubles


On Tue, June 12, 2007 00:08, Thomas Gleixner wrote:
> On Mon, 2007-06-11 at 15:34 -0700, Daniel Walker wrote:
>
>> On Mon, 2007-06-11 at 23:42 +0200, Thomas Gleixner wrote:
>>
>>> On Mon, 2007-06-11 at 22:25 +0100, Rui Nuno Capela wrote:
>>>
>>>> Nope. It's a Fujitsu-Siemens Amilo Si 1520 -- Intel Core2 Duo
>>>> [email protected].
>>>>
>>>
>>> Yeah, there are Dell ones which have similar or worse symptoms.
>>>
>>>
>>>> Works great with 2.6.21-rt1, and 2.6.22-rc4-hrt5, but that you
>>>> already know :)
>>>
>>> Ok. I go back and figure out which differences we have between
>>> 2.6.21-rt>8 and the -hrt queue.
>>>
>>
>> Are you sure it's strictly and HRT issue? I didn't see a
>> !CONFIG_HIGH_RES_TIMERS test ..
>>
>
> The main difference between -rt1 and -rt2 was the update of -hrt, which
> not only affects CONFIG_HIGH_RES_TIMERS. There are enough
> CONFIG_HIGH_RES_TIMERS=n related changes to clock events and friends as
> well.
>

In deed, FWIW and IIRC, I can confirm that the show-stopper problem was
still present when tried with CONFIG_HIGH_RES_TIMERS not set (=N).

Bye now.
--
rncbc aka Rui Nuno Capela
[email protected]