2007-10-12 02:04:45

by Steven Rostedt

[permalink] [raw]
Subject: 2.6.23-rt1

We are pleased to announce the 2.6.23-rt1 tree, which can be
downloaded from the location:

http://www.kernel.org/pub/linux/kernel/projects/rt/

Changes since 2.6.23-rc9-rt2

- updated to 2.6.23

- spin_trylock_irqsave macro fix (S?bastien Dugu?)

- move rcu_preempt_boost init earlier (Steven Rostedt)

- rt task send IPI condition update (Mike Kravetz)


to build a 2.6.23-rt1 tree, the following patches should be applied:

http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.23.tar.bz2
http://www.kernel.org/pub/linux/kernel/projects/rt/patch-2.6.23-rt1.bz2

The broken out patches are also available.

-- Steve







2007-10-15 11:02:00

by Rui Nuno Capela

[permalink] [raw]
Subject: 2.6.23-rt1 trouble

On Fri, October 12, 2007 03:04, Steven Rostedt wrote:
> We are pleased to announce the 2.6.23-rt1 tree, which can be
> downloaded from the location:
>
> http://www.kernel.org/pub/linux/kernel/projects/rt/
>
> Changes since 2.6.23-rc9-rt2
>
> - updated to 2.6.23
>
> - spin_trylock_irqsave macro fix (S?bastien Dugu?)
>
> - move rcu_preempt_boost init earlier (Steven Rostedt)
>
> - rt task send IPI condition update (Mike Kravetz)
>

I am experiencing some highly annoying but intermitent freezing on a
pentium4 2.80G HT/SMT box, when doing normal desktop work with 2.6.23-rt1.

The same crippling behavior does not occur on a Core 2 Due T7200 2.0G SMP,
so I suspect it's something due specific to the SMT scheduling support
(Hyper-Threading). But can't tell for sure, obviously :)

The symptoms are noticeable primarily as some X/GUI intermitent freezing,
sometimes only one application, then several and ultimately the whole X
desktop becomes completely unresponsive. It looks like scheduling
problems. There is this hint that switching to a spare console terminal
(via Ctrl+Alt+Fn) might cause later recovery. But its just a question of
some more time for it just happens again and again, one after another,
several applications becoming temporarily frozen and just by luck the
system gets back to normal, probably due to some incidental shake-up :)
but there are other times that nothing seems to help with no alternative
to the power-reset switch.

I could not find any evidence on dmesg or in the system logs, of any
apparent trouble. No BUGs, no oops, no panics, no nothing. It just
freezes, this and that, now and then. It just makes it all unworkable and
obviously subject to ditching.

Again, this only happens on this P4/HT box. On a Core2 Duo laptop, with
same 2.6.23-rt1 with the very same kernel configuration, it does not show
any illness and is running quite fine.

Remember one report I had about a similar freezing behavior? Now it's
happening the other way around: the core2 is OK, the pentium4 is KO.

One naive suspicion goes like the new rcu-preempt code is to blame, since
I don't remember having this or any other trouble with 2.6.23-rc8-rt1.

Until now this post looks like more of a rant than a proper bug report, I
know. Question is how I can help myself in figuring this all out? Advice
on debugging, tracing or statistical collection would be really
appreciated, as will for any hint in pin pointing the issue, of course.

Help?
--
rncbc aka Rui Nuno Capela
[email protected]

2007-10-17 17:39:32

by Rui Nuno Capela

[permalink] [raw]
Subject: Re: 2.6.23-rt1 trouble

On Mon, October 15, 2007 11:49, Rui Nuno Capela wrote:
> On Fri, October 12, 2007 03:04, Steven Rostedt wrote:
>
>> We are pleased to announce the 2.6.23-rt1 tree, which can be
>> downloaded from the location:
>>
>> http://www.kernel.org/pub/linux/kernel/projects/rt/
>>
>> Changes since 2.6.23-rc9-rt2
>>
>> - updated to 2.6.23
>>
>> - spin_trylock_irqsave macro fix (S?bastien Dugu?)
>>
>> - move rcu_preempt_boost init earlier (Steven Rostedt)
>>
>> - rt task send IPI condition update (Mike Kravetz)
>>
>
> I am experiencing some highly annoying but intermitent freezing on a
> pentium4 2.80G HT/SMT box, when doing normal desktop work with 2.6.23-rt1.
>
>
> The same crippling behavior does not occur on a Core 2 Due T7200 2.0G
> SMP, so I suspect it's something due specific to the SMT scheduling
> support (Hyper-Threading). But can't tell for sure, obviously :)
>

I was wrong. After several trials the same behavior also occurs on the
Core2 Duo T7200. It just took longer to show its nasty.


> The symptoms are noticeable primarily as some X/GUI intermitent freezing,
> sometimes only one application, then several and ultimately the whole X
> desktop becomes completely unresponsive. It looks like scheduling
> problems. There is this hint that switching to a spare console terminal
> (via Ctrl+Alt+Fn) might cause later recovery. But its just a question of
> some more time for it just happens again and again, one after another,
> several applications becoming temporarily frozen and just by luck the
> system gets back to normal, probably due to some incidental shake-up :)
> but there are other times that nothing seems to help with no alternative
> to the power-reset switch.
>
> I could not find any evidence on dmesg or in the system logs, of any
> apparent trouble. No BUGs, no oops, no panics, no nothing. It just
> freezes, this and that, now and then. It just makes it all unworkable
> and obviously subject to ditching.
>
> Again, this only happens on this P4/HT box. On a Core2 Duo laptop, with
> same 2.6.23-rt1 with the very same kernel configuration, it does not show
> any illness and is running quite fine.
>

False. It used to run fine, until the creeps happen first time :(


> Remember one report I had about a similar freezing behavior? Now it's
> happening the other way around: the core2 is OK, the pentium4 is KO.
>

Now it applies to all 2.6.23-rt1 images I could test upon.


> One naive suspicion goes like the new rcu-preempt code is to blame, since
> I don't remember having this or any other trouble with 2.6.23-rc8-rt1.
>

Not be sure anymore, but this seems to be still a valid assumption.

Just in case someone might try in reproducing this showstopper, the
kernel .config is available from here:

http://www.rncbc.org/datahub/config-2.6.23-rt1.0

dmesg output as right after init:

http://www.rncbc.org/datahub/dmesg-2.6.23-rt1.0

which can't really tell where to look :)


Cheers.
--
rncbc aka Rui Nuno Capela
[email protected]

2007-10-27 14:45:18

by Rui Nuno Capela

[permalink] [raw]
Subject: Re: 2.6.23-rt4 (was 2.6.23-rt1 trouble)

> On Mon, October 15, 2007 11:49, Rui Nuno Capela wrote:
>>
>> I am experiencing some highly annoying but intermitent freezing on a
>> pentium4 2.80G HT/SMT box, when doing normal desktop work with 2.6.23-rt1.
>>
>>
>> The same crippling behavior does not occur on a Core 2 Due T7200 2.0G
>> SMP, so I suspect it's something due specific to the SMT scheduling
>> support (Hyper-Threading). But can't tell for sure, obviously :)
>>
>
> I was wrong. After several trials the same behavior also occurs on the
> Core2 Duo T7200. It just took longer to show its nasty.
>
>
>> The symptoms are noticeable primarily as some X/GUI intermitent freezing,
>> sometimes only one application, then several and ultimately the whole X
>> desktop becomes completely unresponsive. It looks like scheduling
>> problems. There is this hint that switching to a spare console terminal
>> (via Ctrl+Alt+Fn) might cause later recovery. But its just a question of
>> some more time for it just happens again and again, one after another,
>> several applications becoming temporarily frozen and just by luck the
>> system gets back to normal, probably due to some incidental shake-up :)
>> but there are other times that nothing seems to help with no alternative
>> to the power-reset switch.
>>
>> I could not find any evidence on dmesg or in the system logs, of any
>> apparent trouble. No BUGs, no oops, no panics, no nothing. It just
>> freezes, this and that, now and then. It just makes it all unworkable
>> and obviously subject to ditching.
>>
>> Again, this only happens on this P4/HT box. On a Core2 Duo laptop, with
>> same 2.6.23-rt1 with the very same kernel configuration, it does not show
>> any illness and is running quite fine.
>>
>
> False. It used to run fine, until the creeps happen first time :(
>
>
>> Remember one report I had about a similar freezing behavior? Now it's
>> happening the other way around: the core2 is OK, the pentium4 is KO.
>>
>
> Now it applies to all 2.6.23-rt1 images I could test upon.
>
>
>> One naive suspicion goes like the new rcu-preempt code is to blame, since
>> I don't remember having this or any other trouble with 2.6.23-rc8-rt1.
>>
>
> Not be sure anymore, but this seems to be still a valid assumption.
>

just to let you know that still the same trouble persists with 2.6.23.1-rt4

.config can be found here:
http://www.rncbc.org/datahub/config-2.6.23.1-rt4.0


cheers.
--
rncbc aka Rui Nuno Capela
[email protected]

2007-10-27 16:06:56

by Steven Rostedt

[permalink] [raw]
Subject: Re: 2.6.23-rt4 (was 2.6.23-rt1 trouble)

--

On Sat, 27 Oct 2007, Rui Nuno Capela wrote:

> > On Mon, October 15, 2007 11:49, Rui Nuno Capela wrote:
> >>
> >> I am experiencing some highly annoying but intermitent freezing on a
> >> pentium4 2.80G HT/SMT box, when doing normal desktop work with 2.6.23-rt1.
> >>
> >>
> >> The same crippling behavior does not occur on a Core 2 Due T7200 2.0G
> >> SMP, so I suspect it's something due specific to the SMT scheduling
> >> support (Hyper-Threading). But can't tell for sure, obviously :)
> >>
> >
> > I was wrong. After several trials the same behavior also occurs on the
> > Core2 Duo T7200. It just took longer to show its nasty.
> >
> >
> >> The symptoms are noticeable primarily as some X/GUI intermitent freezing,
> >> sometimes only one application, then several and ultimately the whole X
> >> desktop becomes completely unresponsive. It looks like scheduling

When things start to freeze, could you capture the output of a sysrq-t.


> >> problems. There is this hint that switching to a spare console terminal
> >> (via Ctrl+Alt+Fn) might cause later recovery. But its just a question of
> >> some more time for it just happens again and again, one after another,
> >> several applications becoming temporarily frozen and just by luck the
> >> system gets back to normal, probably due to some incidental shake-up :)
> >> but there are other times that nothing seems to help with no alternative
> >> to the power-reset switch.
> >>
> >> I could not find any evidence on dmesg or in the system logs, of any
> >> apparent trouble. No BUGs, no oops, no panics, no nothing. It just
> >> freezes, this and that, now and then. It just makes it all unworkable
> >> and obviously subject to ditching.
> >>
> >> Again, this only happens on this P4/HT box. On a Core2 Duo laptop, with
> >> same 2.6.23-rt1 with the very same kernel configuration, it does not show
> >> any illness and is running quite fine.
> >>
> >
> > False. It used to run fine, until the creeps happen first time :(
> >
> >
> >> Remember one report I had about a similar freezing behavior? Now it's
> >> happening the other way around: the core2 is OK, the pentium4 is KO.
> >>
> >
> > Now it applies to all 2.6.23-rt1 images I could test upon.
> >
> >
> >> One naive suspicion goes like the new rcu-preempt code is to blame, since
> >> I don't remember having this or any other trouble with 2.6.23-rc8-rt1.
> >>
> >
> > Not be sure anymore, but this seems to be still a valid assumption.
> >
>
> just to let you know that still the same trouble persists with 2.6.23.1-rt4
>
> .config can be found here:
> http://www.rncbc.org/datahub/config-2.6.23.1-rt4.0
>

I have a P4HT laptop (unfortunately with no serial). I use it as one of my
main machines, so it will suck for me when it freezes ;-). I'll take your
config and try it out.

I'll most likely do this on Monday since process Wife has the highest
priority over the weekend ;-)

-- Steve

2007-10-27 20:07:48

by Rui Nuno Capela

[permalink] [raw]
Subject: Re: 2.6.23-rt4 (was 2.6.23-rt1 trouble)

Steven Rostedt wrote:
> --
>
> On Sat, 27 Oct 2007, Rui Nuno Capela wrote:
>
>>> On Mon, October 15, 2007 11:49, Rui Nuno Capela wrote:
>>>> I am experiencing some highly annoying but intermitent freezing on a
>>>> pentium4 2.80G HT/SMT box, when doing normal desktop work with 2.6.23-rt1.
>>>>
>>>>
>>>> The same crippling behavior does not occur on a Core 2 Due T7200 2.0G
>>>> SMP, so I suspect it's something due specific to the SMT scheduling
>>>> support (Hyper-Threading). But can't tell for sure, obviously :)
>>>>
>>> I was wrong. After several trials the same behavior also occurs on the
>>> Core2 Duo T7200. It just took longer to show its nasty.
>>>
>>>
>>>> The symptoms are noticeable primarily as some X/GUI intermitent freezing,
>>>> sometimes only one application, then several and ultimately the whole X
>>>> desktop becomes completely unresponsive. It looks like scheduling
>
> When things start to freeze, could you capture the output of a sysrq-t.
>

yes, you can find a complete serial console capture here, where it holds
the final SysRq-T output:
http://www.rncbc.org/datahub/console-2.6.23.1-rt4.1-1.log

the corresponding .config is also there:
http://www.rncbc.org/datahub/config-2.6.23.1-rt4.1

the reason this is a new kernel build, following a shot in the dark from
nick mainsbridge, which let out the ntfs module build (CONFIG_NTFS_FS is
not set) suggesting that would mitigate similar freezes.

in deed the general feel is that it seems to run longer and less prone
to those incidents, but that is just a gut feeling, nothing more.

hope this be any useful.

>
>>>> problems. There is this hint that switching to a spare console terminal
>>>> (via Ctrl+Alt+Fn) might cause later recovery. But its just a question of
>>>> some more time for it just happens again and again, one after another,
>>>> several applications becoming temporarily frozen and just by luck the
>>>> system gets back to normal, probably due to some incidental shake-up :)
>>>> but there are other times that nothing seems to help with no alternative
>>>> to the power-reset switch.
>>>>
>>>> I could not find any evidence on dmesg or in the system logs, of any
>>>> apparent trouble. No BUGs, no oops, no panics, no nothing. It just
>>>> freezes, this and that, now and then. It just makes it all unworkable
>>>> and obviously subject to ditching.
>>>>
>>>> Again, this only happens on this P4/HT box. On a Core2 Duo laptop, with
>>>> same 2.6.23-rt1 with the very same kernel configuration, it does not show
>>>> any illness and is running quite fine.
>>>>
>>> False. It used to run fine, until the creeps happen first time :(
>>>
>>>
>>>> Remember one report I had about a similar freezing behavior? Now it's
>>>> happening the other way around: the core2 is OK, the pentium4 is KO.
>>>>
>>> Now it applies to all 2.6.23-rt1 images I could test upon.
>>>
>>>
>>>> One naive suspicion goes like the new rcu-preempt code is to blame, since
>>>> I don't remember having this or any other trouble with 2.6.23-rc8-rt1.
>>>>
>>> Not be sure anymore, but this seems to be still a valid assumption.
>>>
>> just to let you know that still the same trouble persists with 2.6.23.1-rt4
>>
>> .config can be found here:
>> http://www.rncbc.org/datahub/config-2.6.23.1-rt4.0
>>
>
> I have a P4HT laptop (unfortunately with no serial). I use it as one of my
> main machines, so it will suck for me when it freezes ;-). I'll take your
> config and try it out.
>
> I'll most likely do this on Monday since process Wife has the highest
> priority over the weekend ;-)
>

that is also true here :)

otoh, as reported before, the freezes are not exclusive to P4HT, which
is the main box I'm been reporting here (the one which has a good old
serial port anyway), but also applies to my other laptop, an core2 duo
t7200.

bye
--
rncbc aka Rui Nuno Capela
[email protected]

2007-10-30 19:13:37

by Rui Nuno Capela

[permalink] [raw]
Subject: Re: 2.6.23.1-rt5 (was 2.6.23-rt1 trouble)

Rui Nuno Capela wrote:
> Steven Rostedt wrote:
>>
>> When things start to freeze, could you capture the output of a sysrq-t.
>>
>
> yes, you can find a complete serial console capture here, where it holds
> the final SysRq-T output:
> http://www.rncbc.org/datahub/console-2.6.23.1-rt4.1-1.log
>
> the corresponding .config is also there:
> http://www.rncbc.org/datahub/config-2.6.23.1-rt4.1
>
> the reason this is a new kernel build, following a shot in the dark from
> nick mainsbridge, which let out the ntfs module build (CONFIG_NTFS_FS is
> not set) suggesting that would mitigate similar freezes.
>
> in deed the general feel is that it seems to run longer and less prone
> to those incidents, but that is just a gut feeling, nothing more.
>
> hope this be any useful.
>
>> I have a P4HT laptop (unfortunately with no serial). I use it as one of my
>> main machines, so it will suck for me when it freezes ;-). I'll take your
>> config and try it out.
>>
>> I'll most likely do this on Monday since process Wife has the highest
>> priority over the weekend ;-)
>>
>
> that is also true here :)
>
> otoh, as reported before, the freezes are not exclusive to P4HT, which
> is the main box I'm been reporting here (the one which has a good old
> serial port anyway), but also applies to my other laptop, an core2 duo
> t7200.
>

still in trouble with 2.6.23.1-rt5 :(

.config:
http://www.rncbc.org/datahub/config-2.6.23.1-rt5.0

serial console capture:
http://www.rncbc.org/datahub/console-2.6.23.1-rt5.0-1.log

one thing about SysRq-T: while in the middle of a freeze it just doesn't
dump anything, just the 'SysRq : Show State' line. you can see that
several times on the console output above.

only when it is somewhat recovering, as said, by making some last resort
measures like SysRq-E as in 'SysRq: terminate All Tasks' for instance,
then SysRq-T will dump something. my guess is that this later dump will
be moot :/

back to production with 2.6.22.1-rt9 ;)
--
rncbc aka Rui Nuno Capela
[email protected]

2007-10-30 19:33:33

by Steven Rostedt

[permalink] [raw]
Subject: Re: 2.6.23.1-rt5 (was 2.6.23-rt1 trouble)


--
On Tue, 30 Oct 2007, Rui Nuno Capela wrote:

>
> still in trouble with 2.6.23.1-rt5 :(
>
> .config:
> http://www.rncbc.org/datahub/config-2.6.23.1-rt5.0
>
> serial console capture:
> http://www.rncbc.org/datahub/console-2.6.23.1-rt5.0-1.log

"NVRM: loading NVIDIA UNIX x86 Kernel Module 100.14.19 Wed Sep 12
14:12:24 PDT 2007"

Is that the nVidia module? Showing a date of Sept 12th also makes this
look suspicious.

If this is the case, could you run without that module to see if we get
the same freezes.

Thanks,

-- Steve

>
> one thing about SysRq-T: while in the middle of a freeze it just doesn't
> dump anything, just the 'SysRq : Show State' line. you can see that
> several times on the console output above.
>
> only when it is somewhat recovering, as said, by making some last resort
> measures like SysRq-E as in 'SysRq: terminate All Tasks' for instance,
> then SysRq-T will dump something. my guess is that this later dump will
> be moot :/
>
> back to production with 2.6.22.1-rt9 ;)
> --
> rncbc aka Rui Nuno Capela
> [email protected]
>
>

2007-10-30 19:56:53

by Rui Nuno Capela

[permalink] [raw]
Subject: Re: 2.6.23.1-rt5 (was 2.6.23-rt1 trouble)

Steven Rostedt wrote:
> --
> On Tue, 30 Oct 2007, Rui Nuno Capela wrote:
>
>> still in trouble with 2.6.23.1-rt5 :(
>>
>> .config:
>> http://www.rncbc.org/datahub/config-2.6.23.1-rt5.0
>>
>> serial console capture:
>> http://www.rncbc.org/datahub/console-2.6.23.1-rt5.0-1.log
>
> "NVRM: loading NVIDIA UNIX x86 Kernel Module 100.14.19 Wed Sep 12
> 14:12:24 PDT 2007"
>
> Is that the nVidia module? Showing a date of Sept 12th also makes this
> look suspicious.
>
> If this is the case, could you run without that module to see if we get
> the same freezes.
>

yes, running without the *cough* nvidia module is under way. but do you
remember that on my other laptop the freezes also happen and on that one
there's no single proprietary modules in it? (same .config btw) problem
is, like most modern laptops, it doesn't come bundled with a serial port
so that can't give you a serial console evidence ...

thanks for the reminder, anyway ;)

> Thanks,
>
> -- Steve
>
>> one thing about SysRq-T: while in the middle of a freeze it just doesn't
>> dump anything, just the 'SysRq : Show State' line. you can see that
>> several times on the console output above.
>>
>> only when it is somewhat recovering, as said, by making some last resort
>> measures like SysRq-E as in 'SysRq: terminate All Tasks' for instance,
>> then SysRq-T will dump something. my guess is that this later dump will
>> be moot :/
>>
>> back to production with 2.6.22.1-rt9 ;)

bye now
--
rncbc aka Rui Nuno Capela
[email protected]

2007-10-30 20:07:32

by Steven Rostedt

[permalink] [raw]
Subject: Re: 2.6.23.1-rt5 (was 2.6.23-rt1 trouble)


--
> >
>
> yes, running without the *cough* nvidia module is under way. but do you
> remember that on my other laptop the freezes also happen and on that one
> there's no single proprietary modules in it? (same .config btw) problem
> is, like most modern laptops, it doesn't come bundled with a serial port
> so that can't give you a serial console evidence ...
>
> thanks for the reminder, anyway ;)
>

hmm, it could also be a separate issue. This other laptop is the one that
took a while to freeze. Correct?

Was it only on X apps? Or could it possible be something we could do via a
command line and then we could see vga output (if any).

-- Steve

2007-10-30 21:04:51

by Rui Nuno Capela

[permalink] [raw]
Subject: Re: 2.6.23.1-rt5 (was 2.6.23-rt1 trouble)

Steven Rostedt wrote:
> --
>> yes, running without the *cough* nvidia module is under way. but do you
>> remember that on my other laptop the freezes also happen and on that one
>> there's no single proprietary modules in it? (same .config btw) problem
>> is, like most modern laptops, it doesn't come bundled with a serial port
>> so that can't give you a serial console evidence ...
>>
>> thanks for the reminder, anyway ;)
>>
>
> hmm, it could also be a separate issue. This other laptop is the one that
> took a while to freeze. Correct?
>

yes, it seems that was the case, but not sure if still applies. the
moment of the freezes are rather random, sometimes in the (tainted)
desktop one it takes a while too before it starts to hiccup.

> Was it only on X apps? Or could it possible be something we could do via a
> command line and then we could see vga output (if any).
>

most of the time, yes, it's on X applications. if I get into a tty (eg.
via Alt-Ctrl-F1) it seems to eventually recover from the intermitent
frozen state easier, but its just a matter of time to have it also dead
on the console, sooner or later. once it starts freezing the first time
I know the whole system is, how should i say? doomed ...

never saw a bug, oops or panik message. it just drops dead, until
SysRq-B gets hit in despair :)

cheers
--
rncbc aka Rui Nuno Capela
[email protected]

2007-11-03 18:22:42

by Gabriel C

[permalink] [raw]
Subject: Re: 2.6.23.1-rt5 (was 2.6.23-rt1 trouble)

>> Steven Rostedt wrote:
>>> When things start to freeze, could you capture the output of a sysrq-t.
>>>

Hi ,

I have also the same problem on my SMP box[1] ( i686 ), random hard freeze I can just hard reset the box.

I'm not able to capture any output :/ keyboard does not work , netconsole does not have any output when freeze occurs,
nor I have any 'BUG:' , 'Oops:' or the like messages in any logs.
The only thing I noticed is :

...

Clocksource tsc unstable (delta = 4686944957 ns)
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^ <-- may be some sort data corruption :/

...

as latest kernel message before most freeze.

Booting with different clocksource does not make any difference.

I do not use nvidia , in fact I don't use any external kernel module on that box.

Anyway it seems to work fine on non-SMP also my laptops does not have that problem with the same kernel.

If needed I can post the config but I tested a lot different configs with the same result.

Is there any rt-git tree I can test ?


Regards,

Gabriel


[1] http://194.231.229.228/lara/lara.html
http://194.231.229.228/lara/lara.lspci