Hello,
I didn't tested -mm1 but -mm2 has definitely too many problems currently,
let's start:
1- reiser3 oopsed[1] twice while suspending to ram. It seems
reproducible (have some activity on the fs and suspend)
2- I had already written this email once, but the box completely
froze, nothing in the logs, only mouse and X activity. I suspect
again of reiser3.
3- This laptop experienced 2 long stalls (20~25 sec) during boot,
apparently after scanning usb_storage devices and starting portmap.
I logged the call traces (sysrq+t) during this time, I don't know if
it is useful[2].
Is it time for me to learn to git bisect? (Tomorrow morning I'll try
(CET) if plain 2.6.15 also shows the same stalls).
4- I'm also affected by the ACPI Misaligned resource pointer error.
5- That's an older problem I never reported (never tracked to be a
reiser4 problem): reiser4 shows a very bad slowness. Use case: backup
my ~/ (rsync)
a- from reiser4 to xfs rsync stalls for some seconds from time to
time while building the file list (call trace during the stall[3])
Even using mutt and editing a file with vim causes short freezes)
b- from xfs to reiser4 after finishing the copy, sync-ing takes ages,
gkrellm disk monitor shows 1MB/s
[1]: http://oioio.altervista.org/linux/dsc03133.jpg
[2]: http://oioio.altervista.org/linux/boot-2.6.15-mm2.3
[3]: http://oioio.altervista.org/linux/dmesg_reiser4_stalls
The reiser oops seems reproducible by suspending with some dirty cache
(I've been able to suspend/resume cycle 3 times without reiser crashing
but I also didn't have big activities on that partition).
If really necessary I can try to reproduce it (oh, poor filesystem).
Other than that are ther suggestions/patches to start with?
Oh, my .config here:
http://oioio.altervista.org/linux/config-2.6.15-mm2-2
ciao
--
mattia
:wq!
Mattia Dongili <[email protected]> wrote:
>
> Hello,
>
> I didn't tested -mm1 but -mm2 has definitely too many problems currently,
> let's start:
Thanks for testing and reporting - it really helps.
> 1- reiser3 oopsed[1] twice while suspending to ram. It seems
> reproducible (have some activity on the fs and suspend)
No significant reiser3 changes in there, so I'd be suspecting something
else has gone haywire.
> 2- I had already written this email once, but the box completely
> froze, nothing in the logs, only mouse and X activity. I suspect
> again of reiser3.
Yes, that sounds like a filesystem failed while holding locks.
> 3- This laptop experienced 2 long stalls (20~25 sec) during boot,
> apparently after scanning usb_storage devices and starting portmap.
You mean before starting portmap?
> I logged the call traces (sysrq+t) during this time, I don't know if
> it is useful[2].
Hard to see anything in there. If you set CONFIG_FRAME_POINTER=y you'll
get better traces.
> Is it time for me to learn to git bisect? (Tomorrow morning I'll try
> (CET) if plain 2.6.15 also shows the same stalls).
Please test the next Linus git tree (2.6.15-git7) and see if we've
propagated it into there too.
There's not much point in fiddling with -mm2. If git7 is OK then please
test the next -mm and if it still fails then yes, doing a bisection would
really help.
<types madly>
See http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt
> 4- I'm also affected by the ACPI Misaligned resource pointer error.
ACPI cc'ed
> 5- That's an older problem I never reported (never tracked to be a
> reiser4 problem): reiser4 shows a very bad slowness. Use case: backup
> my ~/ (rsync)
> a- from reiser4 to xfs rsync stalls for some seconds from time to
> time while building the file list (call trace during the stall[3])
> Even using mutt and editing a file with vim causes short freezes)
> b- from xfs to reiser4 after finishing the copy, sync-ing takes ages,
> gkrellm disk monitor shows 1MB/s
Don't know, sorry.
> [1]: http://oioio.altervista.org/linux/dsc03133.jpg
> [2]: http://oioio.altervista.org/linux/boot-2.6.15-mm2.3
> [3]: http://oioio.altervista.org/linux/dmesg_reiser4_stalls
>
> The reiser oops seems reproducible by suspending with some dirty cache
> (I've been able to suspend/resume cycle 3 times without reiser crashing
> but I also didn't have big activities on that partition).
> If really necessary I can try to reproduce it (oh, poor filesystem).
> Other than that are ther suggestions/patches to start with?
Pavel, have you heard of anything like this??
Hi!
> Thanks for testing and reporting - it really helps.
>
> > 1- reiser3 oopsed[1] twice while suspending to ram. It seems
> > reproducible (have some activity on the fs and suspend)
>
> No significant reiser3 changes in there, so I'd be suspecting something
> else has gone haywire.
Suspend to *RAM*? That really does not do anything that should kill
the filesystems. Has it ever worked before? When? Any SATA?
> > The reiser oops seems reproducible by suspending with some dirty cache
> > (I've been able to suspend/resume cycle 3 times without reiser crashing
> > but I also didn't have big activities on that partition).
> > If really necessary I can try to reproduce it (oh, poor filesystem).
> > Other than that are ther suggestions/patches to start with?
>
> Pavel, have you heard of anything like this??
No, never seen this before. With suspend to disk and wrong setup,
"kill my filesystem" is easy; but suspend to RAM? Best bet would be
disk driver doing something really stupid.
Pavel
--
Thanks, Sharp!
Hi!
> > [1]: http://oioio.altervista.org/linux/dsc03133.jpg
Looking at the picture... I'd try again without preempt. Any taints?
Is machine rock solid otherwise?
Pavel
--
Thanks, Sharp!
On Wed, January 11, 2006 11:00 am, Pavel Machek said:
> Hi!
>
>> Thanks for testing and reporting - it really helps.
>>
>> > 1- reiser3 oopsed[1] twice while suspending to ram. It seems
>> > reproducible (have some activity on the fs and suspend)
>>
>> No significant reiser3 changes in there, so I'd be suspecting something
>> else has gone haywire.
>
> Suspend to *RAM*? That really does not do anything that should kill
> the filesystems. Has it ever worked before? When? Any SATA?
yes, s2ram! it's a somewhat old laptop (ICH3 chipset), no SATA and it's
been working since quite a long time (don't know exactely I stopped
testing s2ram long time ago and retried only on 2.6.14), lspci follows:
0000:00:00.0 Host bridge: Intel Corporation 82830 830 Chipset Host Bridge
(rev 02)
0000:00:01.0 PCI bridge: Intel Corporation 82830 830 Chipset AGP Bridge
(rev 02)
0000:00:1d.0 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #1)
(rev 01)
0000:00:1d.1 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #2)
(rev 01)
0000:00:1d.2 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #3)
(rev 01)
0000:00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 41)
0000:00:1f.0 ISA bridge: Intel Corporation 82801CAM ISA Bridge (LPC) (rev 01)
0000:00:1f.1 IDE interface: Intel Corporation 82801CAM IDE U100 (rev 01)
0000:00:1f.3 SMBus: Intel Corporation 82801CA/CAM SMBus Controller (rev 01)
0000:00:1f.5 Multimedia audio controller: Intel Corporation 82801CA/CAM
AC'97 Audio Controller (rev 01)
0000:00:1f.6 Modem: Intel Corporation 82801CA/CAM AC'97 Modem Controller
(rev 01)
0000:01:00.0 VGA compatible controller: ATI Technologies Inc Radeon
Mobility M6 LY
0000:02:02.0 FireWire (IEEE 1394): Texas Instruments TSB43AA22 IEEE-1394
Controller (PHY/Link Integrated) (rev 02)
0000:02:05.0 CardBus bridge: Ricoh Co Ltd RL5c476 II (rev 80)
0000:02:05.1 CardBus bridge: Ricoh Co Ltd RL5c476 II (rev 80)
0000:02:08.0 Ethernet controller: Intel Corporation 82801CAM (ICH3)
PRO/100 VE (LOM) Ethernet Controller (rev 41)
It's one of the sony PCG-GR214EP/GR214MP/GR215MP/GR314MP/GR315MP/7K/9K
(I'm sorry for line wraps or amenities, I only have a webmail client
available at work...)
Pavel, soon you'll receive a successful s2ram report for
Documentation/power :)
--
mattia
:wq!
On Wed, January 11, 2006 2:00 am, Andrew Morton said:
> Mattia Dongili <[email protected]> wrote:
[...]
>> 3- This laptop experienced 2 long stalls (20~25 sec) during boot,
>> apparently after scanning usb_storage devices and starting portmap.
>
> You mean before starting portmap?
well, _while_ starting portmap. As you can see from the traces I have
portmap sleeping in sys_poll, consider my reflexes are not that fast
so the trace might be well more than 10 secs after the /etc/init.d/portmap
was run.
Trying to stop and start it again didn't have any delay.
>> Is it time for me to learn to git bisect? (Tomorrow morning I'll try
>> (CET) if plain 2.6.15 also shows the same stalls).
>
> Please test the next Linus git tree (2.6.15-git7) and see if we've
> propagated it into there too.
>
> There's not much point in fiddling with -mm2. If git7 is OK then please
> test the next -mm and if it still fails then yes, doing a bisection would
> really help.
>
> <types madly>
>
> See http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt
ooooh :) really really thanks!!
I was starting to script something that just some hours later revealed to
be like stGit (well at least had the same base idea).
Anyway I'm currently using -git7 and building -mm3, -git7 is OK:
no stalls, no reiser3 oops (yet) and no ACPI misaligned pointer.
Will report on -mm3 as soon as possible
--
mattia
:wq!
Hello,
I gave -mm3 a run:
On Tue, Jan 10, 2006 at 05:00:37PM -0800, Andrew Morton wrote:
> Mattia Dongili <[email protected]> wrote:
[...]
> > 1- reiser3 oopsed[1] twice while suspending to ram. It seems
> > reproducible (have some activity on the fs and suspend)
>
> No significant reiser3 changes in there, so I'd be suspecting something
> else has gone haywire.
It's still there. But I caught it during normal runtime, a couple more
pictures (yes, I'm going to try to setup a netconsole):
the oops (sorry, it sucks, probably useless):
http://oioio.altervista.org/linux/dsc03136.jpg
a screen with the call traces, the keyboard was still active and sysrq
combos worked. It showed many processes stuck within reiser's log writer:
http://oioio.altervista.org/linux/dsc03138.jpg
going to revert the
> > 3- This laptop experienced 2 long stalls (20~25 sec) during boot,
> > apparently after scanning usb_storage devices and starting portmap.
umph... it's still here call traces:
http://oioio.altervista.org/linux/portmap_stall_trace1
[...]
> There's not much point in fiddling with -mm2. If git7 is OK then please
> test the next -mm and if it still fails then yes, doing a bisection would
> really help.
>
> <types madly>
>
> See http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt
give me some hours for this.
Anyway I'd like to try to revert some reiser3 changes first and see if
the oops goes (I can leave with a longer boot but I'm worried about my
root fs :))
> > 4- I'm also affected by the ACPI Misaligned resource pointer error.
still here:
# grep -e 'Jan 11.*Misaligned.*' /var/log/syslog
Jan 11 18:07:10 inferi kernel: **** SET: Misaligned resource pointer: cff4b7e2 Type 07 Len 0
Jan 11 18:07:10 inferi kernel: **** SET: Misaligned resource pointer: cff4b7e2 Type 07 Len 0
Jan 11 18:07:10 inferi kernel: **** SET: Misaligned resource pointer: cff4b2e2 Type 07 Len 0
Jan 11 18:07:10 inferi kernel: **** SET: Misaligned resource pointer: cf657902 Type 07 Len 0
Jan 11 18:07:10 inferi kernel: **** SET: Misaligned resource pointer: cf657602 Type 07 Len 0
Jan 11 18:07:10 inferi kernel: **** SET: Misaligned resource pointer: cf657402 Type 07 Len 0
Jan 11 18:07:10 inferi kernel: **** SET: Misaligned resource pointer: cf657402 Type 07 Len 0
Jan 11 18:18:53 inferi kernel: **** SET: Misaligned resource pointer: cf657d02 Type 07 Len 0
Jan 11 18:21:14 inferi kernel: **** SET: Misaligned resource pointer: cf657d02 Type 07 Len 0
DSDT and lspci can be found here if useful:
http://oioio.altervista.org/linux/DSDT.aml
http://oioio.altervista.org/linux/DSDT.dsl
http://oioio.altervista.org/linux/lspci-v
On Wed, Jan 11, 2006 at 11:00:16AM +0100, Pavel Machek wrote:
[...]
> > > 1- reiser3 oopsed[1] twice while suspending to ram. It seems
> > > reproducible (have some activity on the fs and suspend)
[...]
> Suspend to *RAM*? That really does not do anything that should kill
Sorry, it seems it has nothing to do with s2ram.
--
mattia
:wq!
[cleaned up some Cc as this is not interesting to all MLs]
Andrew,
first bisection spotted the cause of the stalls at boot (happening while
starting portmap and after usb-storage scan):
time-clocksource-infrastructure.patch
time-generic-timekeeping-infrastructure.patch
time-i386-conversion-part-1-move-timer_pitc-to-i8253c.patch
time-i386-conversion-part-2-rework-tsc-support.patch
time-i386-conversion-part-3-enable-generic-timekeeping.patch
time-i386-conversion-part-4-remove-old-timer_opts-code.patch
time-i386-conversion-part-5-acpi-pm-variable-renaming-and-config-change.patch
time-i386-clocksource-drivers.patch
time-fix-cpu-frequency-detection.patch
Cc-ed john stultz
actually git bisect[1] pointed to time-fix-cpu-frequency-detection.patch
but it's clearly wrong. Reverting all the above patches (I suppose they
are somewhat related) fixes the stalls I experience. I can test
corrections if necessary.
eh, see: during the multiple boots I noticed that just hitting sysrq
make the boot process go on when stuck
...
sda: assuming drive cache: write through
sda: sda1
sd 0:0:0:0: Attached scsi removable disk sda
/etc/rcS.d/S04udev: line 224: 2347 Quit sleep 0.2
...
Starting portmap daemon: portmap.
/etc/rcS.d/S43portmap: line 24: 2678 Quit sleep 1
...
I'll re-bisect the thing tomorrow to spot what's causing reiserfs oops
[1]: I imported the patches into git by means of:
while [ $(quilt next) ] ; do
quilt push && git add && git commit -a -m "$(quilt top)" ;
done
--
mattia
:wq!
On Thu, 2006-01-12 at 23:08 +0100, Mattia Dongili wrote:
> [cleaned up some Cc as this is not interesting to all MLs]
>
> Andrew,
>
> first bisection spotted the cause of the stalls at boot (happening while
> starting portmap and after usb-storage scan):
>
> time-clocksource-infrastructure.patch
> time-generic-timekeeping-infrastructure.patch
> time-i386-conversion-part-1-move-timer_pitc-to-i8253c.patch
> time-i386-conversion-part-2-rework-tsc-support.patch
> time-i386-conversion-part-3-enable-generic-timekeeping.patch
> time-i386-conversion-part-4-remove-old-timer_opts-code.patch
> time-i386-conversion-part-5-acpi-pm-variable-renaming-and-config-change.patch
> time-i386-clocksource-drivers.patch
> time-fix-cpu-frequency-detection.patch
>
> Cc-ed john stultz
>
> actually git bisect[1] pointed to time-fix-cpu-frequency-detection.patch
> but it's clearly wrong. Reverting all the above patches (I suppose they
> are somewhat related) fixes the stalls I experience. I can test
> corrections if necessary.
Hmmm. I'm not quite understanding. Does reverting just
time-fix-cpu-frequency-detection.patch change anything? I just sent out
a fix for an error case that patch, but I doubt you'd be hitting it.
Looking at the log here:
http://oioio.altervista.org/linux/boot-2.6.15-mm2.3
I'm curious if you're getting cpufreq effects during interval while the
TSC is being used as a clocksource before we switch to using the acpi_pm
clocksource.
After the system boots up, does it keep accurate time? Time doesn't
obviously move too fast or to slow compared to a watch?
Few things to try (independently):
1. Does booting w/ idle=poll change the behavior?
2. Does booting w/ clocksource=jiffies change the behavior?
3. After booting up, run:
echo tsc > /sys/devices/system/clocksource/clocksource0/current_clocksource
And check that the system keeps accurate time.
Thanks for the great testing and feedback!
-john
Hello,
sorry, it took me a while to come back to this issue.
On Thu, Jan 12, 2006 at 03:26:01PM -0800, john stultz wrote:
> On Thu, 2006-01-12 at 23:08 +0100, Mattia Dongili wrote:
> > [cleaned up some Cc as this is not interesting to all MLs]
> >
> > Andrew,
> >
> > first bisection spotted the cause of the stalls at boot (happening while
> > starting portmap and after usb-storage scan):
> >
> > time-clocksource-infrastructure.patch
> > time-generic-timekeeping-infrastructure.patch
> > time-i386-conversion-part-1-move-timer_pitc-to-i8253c.patch
> > time-i386-conversion-part-2-rework-tsc-support.patch
> > time-i386-conversion-part-3-enable-generic-timekeeping.patch
> > time-i386-conversion-part-4-remove-old-timer_opts-code.patch
> > time-i386-conversion-part-5-acpi-pm-variable-renaming-and-config-change.patch
> > time-i386-clocksource-drivers.patch
> > time-fix-cpu-frequency-detection.patch
> >
> > Cc-ed john stultz
> >
> > actually git bisect[1] pointed to time-fix-cpu-frequency-detection.patch
> > but it's clearly wrong. Reverting all the above patches (I suppose they
> > are somewhat related) fixes the stalls I experience. I can test
> > corrections if necessary.
>
> Hmmm. I'm not quite understanding. Does reverting just
> time-fix-cpu-frequency-detection.patch change anything? I just sent out
no, that's why I reverted the full thing.
> a fix for an error case that patch, but I doubt you'd be hitting it.
>
> Looking at the log here:
> http://oioio.altervista.org/linux/boot-2.6.15-mm2.3
>
> I'm curious if you're getting cpufreq effects during interval while the
> TSC is being used as a clocksource before we switch to using the acpi_pm
> clocksource.
What should I expect? I didn't notice anything in particular and
actually the box stays alive for just a few minutes, then reiserfs
explodes so I have no chance to notice anything in the long run.
> After the system boots up, does it keep accurate time? Time doesn't
> obviously move too fast or to slow compared to a watch?
yes, during the short time it stays alive there's no difference between
my watch and my laptop.
> Few things to try (independently):
> 1. Does booting w/ idle=poll change the behavior?
yes, no more stalls
> 2. Does booting w/ clocksource=jiffies change the behavior?
yes, same as above
> 3. After booting up, run:
> echo tsc > /sys/devices/system/clocksource/clocksource0/current_clocksource
> And check that the system keeps accurate time.
didn't try as there seems to be no problem in timekeeping
thanks
--
mattia
:wq!
[CC-in relevant people/ML]
Hello!
second bisection result!
On Tue, Jan 10, 2006 at 05:00:37PM -0800, Andrew Morton wrote:
> Mattia Dongili <[email protected]> wrote:
[...]
> > 1- reiser3 oopsed[1] twice while suspending to ram. It seems
> > reproducible (have some activity on the fs and suspend)
>
> No significant reiser3 changes in there, so I'd be suspecting something
> else has gone haywire.
you're right: git-xfs.patch is the bad guy.
Unfortunately netconsole isn't helpful in capturing the oops (no serial
ports here) but I have two more shots (more readable):
http://oioio.altervista.org/linux/dsc03148.jpg
http://oioio.altervista.org/linux/dsc03149.jpg
BTW: I'm building -mm4 right now, will report tomorrow if the oops
persists.
--
mattia
:wq!
On Sun, Jan 15, 2006 at 11:14:58PM +0100, Mattia Dongili wrote:
> [CC-in relevant people/ML]
>
> Hello!
>
> second bisection result!
>
> On Tue, Jan 10, 2006 at 05:00:37PM -0800, Andrew Morton wrote:
> > Mattia Dongili <[email protected]> wrote:
> [...]
> > > 1- reiser3 oopsed[1] twice while suspending to ram. It seems
> > > reproducible (have some activity on the fs and suspend)
> >
> > No significant reiser3 changes in there, so I'd be suspecting something
> > else has gone haywire.
>
> you're right: git-xfs.patch is the bad guy.
>
> Unfortunately netconsole isn't helpful in capturing the oops (no serial
> ports here) but I have two more shots (more readable):
> http://oioio.altervista.org/linux/dsc03148.jpg
> http://oioio.altervista.org/linux/dsc03149.jpg
Hmm, thats odd. It seems to be coming from:
reiserfs_commit_page -> reiserfs_add_ordered_list -> __add_jh(inline)
I guess XFS may have left a buffer_head in an unusual state (with some
private flag/b_private set), somehow, and perhaps that buffer_head has
later been allocated for a page in a reiserfs write. Does this patch,
below, help at all?
I see one BUG check in __add_jh for non-NULL b_private, but can't see
the top of your console output from the photos - is there a preceding
line with "kernel BUG at ..." in it?
cheers.
--
Nathan
--- fs/buffer.c.orig 2006-01-16 10:15:01.332010750 +1100
+++ fs/buffer.c 2006-01-16 10:18:15.276131500 +1100
@@ -1027,7 +1027,7 @@ try_again:
/* Link the buffer to its page */
set_bh_page(bh, page, offset);
- bh->b_end_io = NULL;
+ init_buffer(bh, NULL, NULL);
}
return head;
/*
On Mon, Jan 16, 2006 at 09:48:17AM +1100, Nathan Scott wrote:
> On Sun, Jan 15, 2006 at 11:14:58PM +0100, Mattia Dongili wrote:
[...]
> > you're right: git-xfs.patch is the bad guy.
> >
> > Unfortunately netconsole isn't helpful in capturing the oops (no serial
> > ports here) but I have two more shots (more readable):
> > http://oioio.altervista.org/linux/dsc03148.jpg
> > http://oioio.altervista.org/linux/dsc03149.jpg
>
> Hmm, thats odd. It seems to be coming from:
> reiserfs_commit_page -> reiserfs_add_ordered_list -> __add_jh(inline)
>
> I guess XFS may have left a buffer_head in an unusual state (with some
> private flag/b_private set), somehow, and perhaps that buffer_head has
> later been allocated for a page in a reiserfs write. Does this patch,
> below, help at all?
I won't be able to test and report until tomorrow afternoon (CET),
please be patient.
> I see one BUG check in __add_jh for non-NULL b_private, but can't see
> the top of your console output from the photos - is there a preceding
> line with "kernel BUG at ..." in it?
this is another shot of the same oops caught some days ago
http://oioio.altervista.org/linux/dsc03133.jpg
unfortunately it happened while running X so that's all I currently
have... and I can't remember now about the BUG.
oh, and BTW I have / and /usr on reiserfs while /home is xfs and I can
easily reproduce the oops by starting X (with a simple user so fiddling
in /home) and then installing and removing software in /usr.
thanks.
--
mattia
:wq!
Thanks Nathan, Mattia, and Andrew.
vs, can you or Jeff look at whether our buffer head inits are
sufficiently hardened by next Monday (I know that vs has a lot of mail
to catch up to)? Jeff, if you have time before then, it would be nice
if you could handle it, I know hardening V3 is an interest of yours.
Thanks,
Hans
Nathan Scott wrote:
>On Sun, Jan 15, 2006 at 11:14:58PM +0100, Mattia Dongili wrote:
>
>
>>[CC-in relevant people/ML]
>>
>>Hello!
>>
>>second bisection result!
>>
>>On Tue, Jan 10, 2006 at 05:00:37PM -0800, Andrew Morton wrote:
>>
>>
>>>Mattia Dongili <[email protected]> wrote:
>>>
>>>
>>[...]
>>
>>
>>>>1- reiser3 oopsed[1] twice while suspending to ram. It seems
>>>> reproducible (have some activity on the fs and suspend)
>>>>
>>>>
>>>No significant reiser3 changes in there, so I'd be suspecting something
>>>else has gone haywire.
>>>
>>>
>>you're right: git-xfs.patch is the bad guy.
>>
>>Unfortunately netconsole isn't helpful in capturing the oops (no serial
>>ports here) but I have two more shots (more readable):
>>http://oioio.altervista.org/linux/dsc03148.jpg
>>http://oioio.altervista.org/linux/dsc03149.jpg
>>
>>
>
>Hmm, thats odd. It seems to be coming from:
>reiserfs_commit_page -> reiserfs_add_ordered_list -> __add_jh(inline)
>
>I guess XFS may have left a buffer_head in an unusual state (with some
>private flag/b_private set), somehow, and perhaps that buffer_head has
>later been allocated for a page in a reiserfs write. Does this patch,
>below, help at all?
>
>I see one BUG check in __add_jh for non-NULL b_private, but can't see
>the top of your console output from the photos - is there a preceding
>line with "kernel BUG at ..." in it?
>
>cheers.
>
>
>
On Mon, Jan 16, 2006 at 01:29:58AM -0800, Hans Reiser wrote:
> Thanks Nathan, Mattia, and Andrew.
>
> vs, can you or Jeff look at whether our buffer head inits are
> sufficiently hardened by next Monday (I know that vs has a lot of mail
> to catch up to)? Jeff, if you have time before then, it would be nice
> if you could handle it, I know hardening V3 is an interest of yours.
Chris Mason just sent a patch to -fsdevel that initialized bh->b_private
in reiserfs. Mattia, I'll bounce you the patch privately, could you try
to see if it helps?
[...]
> Nathan Scott wrote:
>>On Sun, Jan 15, 2006 at 11:14:58PM +0100, Mattia Dongili wrote:
[...]
>>>you're right: git-xfs.patch is the bad guy.
>>>
>>>Unfortunately netconsole isn't helpful in capturing the oops (no serial
>>>ports here) but I have two more shots (more readable):
>>>http://oioio.altervista.org/linux/dsc03148.jpg
>>>http://oioio.altervista.org/linux/dsc03149.jpg
>>>
>>>
>>
>>Hmm, thats odd. It seems to be coming from:
>>reiserfs_commit_page -> reiserfs_add_ordered_list -> __add_jh(inline)
>>
>>I guess XFS may have left a buffer_head in an unusual state (with some
>>private flag/b_private set), somehow, and perhaps that buffer_head has
>>later been allocated for a page in a reiserfs write. Does this patch,
>>below, help at all?
Yes, it does help, good catch! I'm currently running -mm4, which presented
the same behaviour, with your one liner and I can't reproduce the oops
anymore (fortunately for my root FS!).
thanks
--
mattia
:wq!
On Mon, January 16, 2006 12:48 pm, Christoph Hellwig said:
> On Mon, Jan 16, 2006 at 01:29:58AM -0800, Hans Reiser wrote:
>> Thanks Nathan, Mattia, and Andrew.
>>
>> vs, can you or Jeff look at whether our buffer head inits are
>> sufficiently hardened by next Monday (I know that vs has a lot of mail
>> to catch up to)? Jeff, if you have time before then, it would be nice
>> if you could handle it, I know hardening V3 is an interest of yours.
>
> Chris Mason just sent a patch to -fsdevel that initialized bh->b_private
> in reiserfs. Mattia, I'll bounce you the patch privately, could you try
> to see if it helps?
It does help, as does Nathan's patch.
BTW: I'm testing it on -mm4 (I reproduced the same oops here)
Thanks
--
mattia
:wq!
On Sat, 2006-01-14 at 13:08 +0100, Mattia Dongili wrote:
> On Thu, Jan 12, 2006 at 03:26:01PM -0800, john stultz wrote:
> > On Thu, 2006-01-12 at 23:08 +0100, Mattia Dongili wrote:
> > > [cleaned up some Cc as this is not interesting to all MLs]
> > > first bisection spotted the cause of the stalls at boot (happening while
> > > starting portmap and after usb-storage scan):
> > >
> > > time-clocksource-infrastructure.patch
> > > time-generic-timekeeping-infrastructure.patch
> > > time-i386-conversion-part-1-move-timer_pitc-to-i8253c.patch
> > > time-i386-conversion-part-2-rework-tsc-support.patch
> > > time-i386-conversion-part-3-enable-generic-timekeeping.patch
> > > time-i386-conversion-part-4-remove-old-timer_opts-code.patch
> > > time-i386-conversion-part-5-acpi-pm-variable-renaming-and-config-change.patch
> > > time-i386-clocksource-drivers.patch
> > > time-fix-cpu-frequency-detection.patch
> > >
> > > Cc-ed john stultz
> > >
> > > actually git bisect[1] pointed to time-fix-cpu-frequency-detection.patch
> > > but it's clearly wrong. Reverting all the above patches (I suppose they
> > > are somewhat related) fixes the stalls I experience. I can test
> > > corrections if necessary.
[snip]
> > Looking at the log here:
> > http://oioio.altervista.org/linux/boot-2.6.15-mm2.3
> >
> > I'm curious if you're getting cpufreq effects during interval while the
> > TSC is being used as a clocksource before we switch to using the acpi_pm
> > clocksource.
>
> What should I expect? I didn't notice anything in particular and
> actually the box stays alive for just a few minutes, then reiserfs
> explodes so I have no chance to notice anything in the long run.
>
> > After the system boots up, does it keep accurate time? Time doesn't
> > obviously move too fast or to slow compared to a watch?
>
> yes, during the short time it stays alive there's no difference between
> my watch and my laptop.
Ok, that's good. It means the ACPI PM clocksource is doing the right
thing.
> > Few things to try (independently):
> > 1. Does booting w/ idle=poll change the behavior?
>
> yes, no more stalls
Ok, this points to the TSC is changing frequency (likely due to C3
halting).
> > 2. Does booting w/ clocksource=jiffies change the behavior?
>
> yes, same as above
Ok, good, interrupts are getting there at the right frequency.
> > 3. After booting up, run:
> > echo tsc > /sys/devices/system/clocksource/clocksource0/current_clocksource
> > And check that the system keeps accurate time.
>
> didn't try as there seems to be no problem in timekeeping
Well, it would have re-inforced the TSC being the issue, but I'm fairly
confident that that is the case.
My theory: The stalls are due to the TSC frequency not being consistent
for the small window at boot between when it is installed and when the
ACPI PM clocksource is installed.
I'll try to narrow that window down a bit and see if that doesn't
resolve the issue.
thanks
-john
On Mon, Jan 16, 2006 at 12:16:21PM -0800, john stultz wrote:
> On Sat, 2006-01-14 at 13:08 +0100, Mattia Dongili wrote:
> > On Thu, Jan 12, 2006 at 03:26:01PM -0800, john stultz wrote:
[...]
> > > Few things to try (independently):
> > > 1. Does booting w/ idle=poll change the behavior?
> >
> > yes, no more stalls
>
> Ok, this points to the TSC is changing frequency (likely due to C3
> halting).
humm... no C3 state here :) did you mean this C3?
# cat /proc/acpi/processor/CPU0/power
active state: C2
max_cstate: C8
bus master activity: 00000000
states:
C1: type[C1] promotion[C2] demotion[--] latency[000] usage[00007790]
*C2: type[C2] promotion[--] demotion[C1] latency[010] usage[02310093]
> > > 2. Does booting w/ clocksource=jiffies change the behavior?
> >
> > yes, same as above
>
> Ok, good, interrupts are getting there at the right frequency.
>
> > > 3. After booting up, run:
> > > echo tsc > /sys/devices/system/clocksource/clocksource0/current_clocksource
> > > And check that the system keeps accurate time.
> >
> > didn't try as there seems to be no problem in timekeeping
>
> Well, it would have re-inforced the TSC being the issue, but I'm fairly
> confident that that is the case.
Now that filesystem problems have gone I have longer uptimes. OhOhOh,
very funnny, echoing tsc into the mentioned sysfs entry makes the clock
almost completely stop:
# date
Mon Jan 16 21:29:26 CET 2006
now count with me 1..2..3..4..5
# date
Mon Jan 16 21:29:27 CET 2006
I'd say the time is far from being accurate.
> My theory: The stalls are due to the TSC frequency not being consistent
> for the small window at boot between when it is installed and when the
> ACPI PM clocksource is installed.
seems you're right, issuing a 'sleep 1' takes ages before it returns and
it's exactely where the boot process got stuck.
> I'll try to narrow that window down a bit and see if that doesn't
> resolve the issue.
I'll be happy to test new patches if necessary (I'm running -mm4)
--
mattia
:wq!
On Mon, 2006-01-16 at 21:40 +0100, Mattia Dongili wrote:
> On Mon, Jan 16, 2006 at 12:16:21PM -0800, john stultz wrote:
> > On Sat, 2006-01-14 at 13:08 +0100, Mattia Dongili wrote:
> > > On Thu, Jan 12, 2006 at 03:26:01PM -0800, john stultz wrote:
> [...]
> > > > Few things to try (independently):
> > > > 1. Does booting w/ idle=poll change the behavior?
> > >
> > > yes, no more stalls
> >
> > Ok, this points to the TSC is changing frequency (likely due to C3
> > halting).
>
> humm... no C3 state here :) did you mean this C3?
> # cat /proc/acpi/processor/CPU0/power
> active state: C2
> max_cstate: C8
> bus master activity: 00000000
> states:
> C1: type[C1] promotion[C2] demotion[--] latency[000] usage[00007790]
> *C2: type[C2] promotion[--] demotion[C1] latency[010] usage[02310093]
Hrmm. Interesting. I'm not aware of C2 causing TSC stalls. This may be
in part why we don't disable the TSC earlier.
> > > > 2. Does booting w/ clocksource=jiffies change the behavior?
> > >
> > > yes, same as above
> >
> > Ok, good, interrupts are getting there at the right frequency.
> >
> > > > 3. After booting up, run:
> > > > echo tsc > /sys/devices/system/clocksource/clocksource0/current_clocksource
> > > > And check that the system keeps accurate time.
> > >
> > > didn't try as there seems to be no problem in timekeeping
> >
> > Well, it would have re-inforced the TSC being the issue, but I'm fairly
> > confident that that is the case.
>
> Now that filesystem problems have gone I have longer uptimes. OhOhOh,
> very funnny, echoing tsc into the mentioned sysfs entry makes the clock
> almost completely stop:
> # date
> Mon Jan 16 21:29:26 CET 2006
> now count with me 1..2..3..4..5
> # date
> Mon Jan 16 21:29:27 CET 2006
>
> I'd say the time is far from being accurate.
Ok, as I thought. Although 5:1 seems a bit large for cpufreq scaling. It
does appear that the TSC is halting.
> > My theory: The stalls are due to the TSC frequency not being consistent
> > for the small window at boot between when it is installed and when the
> > ACPI PM clocksource is installed.
>
> seems you're right, issuing a 'sleep 1' takes ages before it returns and
> it's exactely where the boot process got stuck.
>
> > I'll try to narrow that window down a bit and see if that doesn't
> > resolve the issue.
>
> I'll be happy to test new patches if necessary (I'm running -mm4)
Below is a patch that should provide a bit more debug info. If you
could, boot up w/ it (be sure to enable CONFIG_PARANOID_GENERIC_TIME)
and send me your dmesg.
thanks
-john
diff --git a/arch/i386/kernel/tsc.c b/arch/i386/kernel/tsc.c
index cab2546..5291722 100644
--- a/arch/i386/kernel/tsc.c
+++ b/arch/i386/kernel/tsc.c
@@ -61,6 +61,15 @@ static inline int check_tsc_unstable(voi
void mark_tsc_unstable(void)
{
+#ifdef CONFIG_PARANOID_GENERIC_TIME
+ /* give a clue as to why the TSC is unstable */
+ static int warncount=0;
+ if(warncount < 10) {
+ warncount++;
+ printk("TSC: Marked unstable\n");
+ dump_stack();
+ }
+#endif
tsc_unstable = 1;
}
EXPORT_SYMBOL_GPL(mark_tsc_unstable);
@@ -317,15 +326,62 @@ core_initcall(cpufreq_tsc);
static unsigned long current_tsc_khz = 0;
static int tsc_update_callback(void);
+#ifdef CONFIG_PARANOID_GENERIC_TIME
+/* This will hurt performance! */
+static DEFINE_SPINLOCK(checktsc_lock);
+static cycle_t last_tsc;
+
static cycle_t read_tsc(void)
{
+ static int once = 1;
+
+ unsigned long flags;
cycle_t ret;
+ spin_lock_irqsave(&checktsc_lock, flags);
+
rdtscll(ret);
+ if (once && ret < last_tsc) {
+ once = 0;
+ spin_unlock_irqrestore(&checktsc_lock, flags);
+ printk("BUG in read_tsc(): TSC went backward!\n");
+ if (num_online_cpus() > 1)
+ printk("... Unsynced TSCs?\n");
+ printk("... [ from %016Lx to %016Lx ]\n", last_tsc, ret);
+
+ } else {
+ last_tsc = ret;
+ spin_unlock_irqrestore(&checktsc_lock, flags);
+ }
+
return ret;
}
+static struct clocksource clocksource_tsc;
+
+void tsc_print_debuginfo(void)
+{
+ printk("TSC clocksource: %lu khz, rating=%lu mult=%lu shift=%lu\n",
+ current_tsc_khz, clocksource_tsc.rating,
+ clocksource_tsc.mult, clocksource_tsc.shift);
+}
+
+#else /* CONFIG_PARANOID_GENERIC_TIME */
+
+
+static cycle_t read_tsc(void)
+{
+ cycle_t ret;
+
+ rdtscll(ret);
+
+ return ret;
+}
+
+#define tsc_print_debuginfo()
+#endif /* CONFIG_PARANOID_GENERIC_TIME */
+
static struct clocksource clocksource_tsc = {
.name = "tsc",
.rating = 300,
@@ -341,6 +397,10 @@ static int tsc_update_callback(void)
{
int change = 0;
+ static int dbug_count;
+ if (!(dbug_count++ % 1000))
+ tsc_print_debuginfo();
+
/* check to see if we should switch to the safe clocksource: */
if (clocksource_tsc.rating != 50 && check_tsc_unstable()) {
clocksource_tsc.rating = 50;
diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h
index 400c2b4..ffab419 100644
--- a/include/asm-generic/bug.h
+++ b/include/asm-generic/bug.h
@@ -39,4 +39,14 @@
#endif
#endif
+#define WARN_ON_ONCE(condition) \
+do { \
+ static int warn_once = 1; \
+ \
+ if (condition) { \
+ warn_once = 0; \
+ WARN_ON(1); \
+ } \
+} while (0);
+
#endif
diff --git a/kernel/time/timeofday.c b/kernel/time/timeofday.c
index 300a8ce..4be96a8 100644
--- a/kernel/time/timeofday.c
+++ b/kernel/time/timeofday.c
@@ -122,6 +122,112 @@ static nsec_t suspend_start;
*/
static struct timer_list timeofday_timer;
+
+#ifdef CONFIG_PARANOID_GENERIC_TIME
+/* This will hurt performance! */
+static DEFINE_SPINLOCK(check_monotonic_lock);
+static ktime_t last_monotonic_ktime;
+
+static ktime_t get_check_value(void)
+{
+ unsigned long flags;
+ ktime_t ret;
+
+ spin_lock_irqsave(&check_monotonic_lock, flags);
+ ret = last_monotonic_ktime;
+ spin_unlock_irqrestore(&check_monotonic_lock, flags);
+
+ return ret;
+}
+
+static void check_monotonic_clock(ktime_t prev, ktime_t now)
+{
+ unsigned long flags;
+
+ /* check for monotonic inconsistencies */
+ if (now.tv64 < prev.tv64) {
+ static int warn = 1;
+
+ if (warn) {
+ warn = 0;
+
+ printk("check_monotonic_clock: monotonic inconsistency"
+ " detected!\n");
+ printk(" from %16Lx (%llu) to %16Lx (%llu).\n",
+ ktime_to_ns(prev),
+ ktime_to_ns(prev),
+ ktime_to_ns(now),
+ ktime_to_ns(now));
+ WARN_ON(1);
+ }
+ }
+ spin_lock_irqsave(&check_monotonic_lock, flags);
+ last_monotonic_ktime = now;
+ spin_unlock_irqrestore(&check_monotonic_lock, flags);
+}
+
+/* timespec version */
+#define check_monotonic_clock_ts(prev, now) \
+ check_monotonic_clock(prev, timespec_to_ktime(now))
+
+/* Call holding atleast a readlock on system_time_lock */
+void verify_timekeeping_state(void)
+{
+ /* ensure all the timespec and ktime values are consistent: */
+ WARN_ON_ONCE(system_time.tv64 != timespec_to_ktime(mono_time_ts).tv64);
+ WARN_ON_ONCE(ktime_add(system_time, wall_time_offset).tv64 !=
+ timespec_to_ktime(wall_time_ts).tv64);
+ WARN_ON_ONCE(wall_time_offset.tv64 !=
+ timespec_to_ktime(monotonic_time_offset_ts).tv64);
+}
+
+static void check_periodic_interval(cycle_t now)
+{
+ static cycle_t last;
+
+ cycle_t delta;
+ nsec_t ns_offset;
+
+ if (last != 0 && now != 0) {
+ delta = (now - last)& clock->mask;
+
+ ns_offset = cyc2ns(clock, ntp_adj, delta);
+
+ if (ns_offset > (nsec_t)2*PERIODIC_INTERVAL_MS *1000000) {
+ static int warn_count = 10;
+ if (warn_count > 0) {
+ warn_count--;
+ printk("check_periodic_interval: "
+ "Long interval! %llu ns.\n",
+ ns_offset);
+ printk(" Something may "
+ "be blocking interrupts.\n");
+ }
+ }
+ if (ns_offset < (nsec_t)PERIODIC_INTERVAL_MS *1000000) {
+ static int warn_count = 10;
+ if (warn_count > 0) {
+ warn_count--;
+ printk("check_periodic_interval: "
+ "short interval! %llu ns.\n",
+ ns_offset);
+ printk(" bad calibration "
+ "or timers may be broken.\n");
+ }
+ }
+ }
+ last = now;
+}
+
+#else /* CONFIG_PARANOID_GENERIC_TIME */
+ /* XXX can we optimize this out? */
+# define get_check_value(void) ktime_set(0,0)
+# define check_monotonic_clock(x,y) do { } while (0)
+# define check_monotonic_clock_ts(x,ts) do { } while (0)
+# define verify_timekeeping_state() do { } while (0)
+# define check_periodic_interval(x) do { } while (0)
+#endif /* CONFIG_PARANOID_GENERIC_TIME */
+
/**
* update_legacy_time_values - sync legacy time values
*
@@ -188,7 +294,14 @@ static inline nsec_t __get_nsec_offset(v
static ktime_t __get_monotonic_clock(void)
{
nsec_t offset = __get_nsec_offset();
- return ktime_add_ns(system_time, offset);
+#ifdef CONFIG_PARANOID_GENERIC_TIME
+ ktime_t check = get_check_value();
+#endif
+ ktime_t ret;
+
+ ret = ktime_add_ns(system_time, offset);
+ check_monotonic_clock(check,ret);
+ return ret;
}
/**
@@ -269,6 +382,9 @@ ktime_t get_realtime_offset(void)
*/
void get_monotonic_clock_ts(struct timespec *ts)
{
+#ifdef CONFIG_PARANOID_GENERIC_TIME
+ ktime_t check = get_check_value();
+#endif
unsigned long seq;
nsec_t offset;
@@ -280,6 +396,7 @@ void get_monotonic_clock_ts(struct times
} while (read_seqretry(&system_time_lock, seq));
timespec_add_ns(ts, offset);
+ check_monotonic_clock_ts(check, *ts);
}
/**
@@ -362,7 +479,9 @@ int do_settimeofday(struct timespec *tv)
update_legacy_time_values();
write_sequnlock_irqrestore(&system_time_lock, flags);
-
+#ifdef CONFIG_PARANOID_GENERIC_TIME
+ printk("do_settimeofday() was called!\n");
+#endif
/* signal hrtimers about time change */
clock_was_set();
@@ -518,6 +637,7 @@ static void timeofday_periodic_hook(unsi
/* read time source & calc time since last call: */
cycle_now = read_clocksource(clock);
+ check_periodic_interval(cycle_now);
cycle_delta = (cycle_now - cycle_last) & clock->mask;
delta_nsec = cyc2ns_fixed_rem(ts_interval, &cycle_delta, &remainder);
@@ -562,6 +682,7 @@ static void timeofday_periodic_hook(unsi
ntp_adj = 0;
remainder = 0;
something_changed = 1;
+ check_periodic_interval(0);
}
/*
@@ -611,6 +732,8 @@ static void timeofday_periodic_hook(unsi
update_legacy_time_values();
+ verify_timekeeping_state();
+
write_sequnlock_irqrestore(&system_time_lock, flags);
/* notify the posix timers if wall_time_offset changed */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index a314e66..3616828 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -46,6 +46,11 @@ config LOG_BUF_SHIFT
13 => 8 KB
12 => 4 KB
+config PARANOID_GENERIC_TIME
+ default y
+ depends on GENERIC_TIME
+ bool "Paraniod Timekeeping Checks"
+
config DETECT_SOFTLOCKUP
bool "Detect Soft Lockups"
depends on DEBUG_KERNEL
john stultz <[email protected]> writes:
> > bus master activity: 00000000
> > states:
> > C1: type[C1] promotion[C2] demotion[--] latency[000] usage[00007790]
> > *C2: type[C2] promotion[--] demotion[C1] latency[010] usage[02310093]
>
> Hrmm. Interesting. I'm not aware of C2 causing TSC stalls. This may be
> in part why we don't disable the TSC earlier.
On the dual core athlons C1 occasionally loses some ticks (it's not a real stall) when going
in/out of HLT. Since the different cores have different HLT patterns depending on load
that causes them to drift against slowly each other, and it adds up over longer runtime.
Instead of adding lots of ugly checking code I would just check the CPUs like I do
in x86-64 and not use the TSC if the test fails. I believe the logic currently in there
handles all modern hardware that is 64bit capable correctly.
-Andi
On Mon, 2006-01-16 at 23:19 +0100, Andi Kleen wrote:
> john stultz <[email protected]> writes:
> > > bus master activity: 00000000
> > > states:
> > > C1: type[C1] promotion[C2] demotion[--] latency[000] usage[00007790]
> > > *C2: type[C2] promotion[--] demotion[C1] latency[010] usage[02310093]
> >
> > Hrmm. Interesting. I'm not aware of C2 causing TSC stalls. This may be
> > in part why we don't disable the TSC earlier.
>
> On the dual core athlons C1 occasionally loses some ticks (it's not a real stall) when going
> in/out of HLT. Since the different cores have different HLT patterns depending on load
> that causes them to drift against slowly each other, and it adds up over longer runtime.
Yes, AMD SMP systems already mark the TSC as unstable w/ my code.
Unfortunately in this case, we're dealing w/ a single 1Ghz PIII where
the TSC is slowing down (due to what seems to be stalls, but could be
cpufreq scaling).
> Instead of adding lots of ugly checking code I would just check the CPUs like I do
> in x86-64 and not use the TSC if the test fails. I believe the logic currently in there
> handles all modern hardware that is 64bit capable correctly.
If you're suggesting disabling the TSC based off of the results of the
unsynchronized_tsc() check in arch/x86-64/kerenl/time.c, I actually
already do almost the same thing (very much inspired by your code).
Although let me know if you mean something different.
thanks
-john
On Mon, 2006-01-16 at 21:40 +0100, Mattia Dongili wrote:
> On Mon, Jan 16, 2006 at 12:16:21PM -0800, john stultz wrote:
> > I'll try to narrow that window down a bit and see if that doesn't
> > resolve the issue.
>
> I'll be happy to test new patches if necessary (I'm running -mm4)
See if this patch doesn't resolve the issue. Its a bit hackish, but
basically I'm just holding off installing any clocksources until later
on at boot. This avoids some of the clocksource churn.
You will still see the TSC be installed briefly, since in this cases it
doesn't realize its a bad choice until after its being used for one
interval. However it should be much shorter (only ~50ms) of a window.
thanks
-john
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -51,13 +51,27 @@ static LIST_HEAD(clocksource_list);
static DEFINE_SPINLOCK(clocksource_lock);
static char override_name[32];
+static int finished_booting;
+
+/* clocksource_done_booting - Called near the end of bootup
+ *
+ * Hack to avoid lots of clocksource churn at boot time
+ */
+static int clocksource_done_booting(void)
+{
+ finished_booting = 1;
+ return 0;
+}
+
+late_initcall(clocksource_done_booting);
+
/**
* get_next_clocksource - Returns the selected clocksource
*/
struct clocksource *get_next_clocksource(void)
{
spin_lock(&clocksource_lock);
- if (next_clocksource) {
+ if (next_clocksource && finished_booting) {
curr_clocksource = next_clocksource;
next_clocksource = NULL;
}
Christoph Hellwig wrote:
>On Mon, Jan 16, 2006 at 01:29:58AM -0800, Hans Reiser wrote:
>
>
>>Thanks Nathan, Mattia, and Andrew.
>>
>>vs, can you or Jeff look at whether our buffer head inits are
>>sufficiently hardened by next Monday (I know that vs has a lot of mail
>>to catch up to)? Jeff, if you have time before then, it would be nice
>>if you could handle it, I know hardening V3 is an interest of yours.
>>
>>
>
>Chris Mason just sent a patch to -fsdevel that initialized bh->b_private
>in reiserfs. Mattia, I'll bounce you the patch privately, could you try
>to see if it helps?
>
>
>
>
>
I remember wondering if that fixed it, but was too lazy to go back and
look at whether it was exactly equivalent.;-)
Hans
On Mon, Jan 16, 2006 at 01:42:43PM -0800, john stultz wrote:
> On Mon, 2006-01-16 at 21:40 +0100, Mattia Dongili wrote:
[...]
> > # date
> > Mon Jan 16 21:29:26 CET 2006
> > now count with me 1..2..3..4..5
> > # date
> > Mon Jan 16 21:29:27 CET 2006
> >
> > I'd say the time is far from being accurate.
>
> Ok, as I thought. Although 5:1 seems a bit large for cpufreq scaling. It
> does appear that the TSC is halting.
it's far worse than 5:1 actually I wasn't only counting to 5 :)
[...]
> > I'll be happy to test new patches if necessary (I'm running -mm4)
>
> Below is a patch that should provide a bit more debug info. If you
> could, boot up w/ it (be sure to enable CONFIG_PARANOID_GENERIC_TIME)
> and send me your dmesg.
done, I did put it here[1] and added some comments at the end of the
file to better show what was going on.
[1]: http://oioio.altervista.org/linux/tsc_debug.html
and also included at the end of this mail as plain text w/o
comments
Will test the other patch now.
Oh, by the way, after I caught the above dmesg I got this:
[ 556.512000] TSC: Marked unstable
[ 556.512000] [<c0103bb3>] show_trace+0x13/0x20
[ 556.512000] [<c0103bde>] dump_stack+0x1e/0x20
[ 556.512000] [<c0108b77>] mark_tsc_unstable+0x27/0x40
[ 556.512000] [<c01090fb>] time_cpufreq_notifier+0x19b/0x1b0
[ 556.512000] [<c0127e3d>] notifier_call_chain+0x2d/0x50
[ 556.512000] [<c029b551>] cpufreq_notify_transition+0xb1/0x170
[ 556.512000] [<d120f39d>] speedstep_target+0xad/0xe0 [speedstep_ich]
[ 556.512000] [<c029b1d7>] __cpufreq_driver_target+0x77/0x80
[ 556.512000] [<d11de66e>] do_dbs_timer+0x1be/0x200 [cpufreq_ondemand]
[ 556.512000] [<c012b1da>] run_workqueue+0x9a/0x140
[ 556.512000] [<c012b477>] worker_thread+0x107/0x140
[ 556.512000] [<c012eaa9>] kthread+0xd9/0xe0
[ 556.512000] [<c0101005>] kernel_thread_helper+0x5/0x10
[ 556.532000] check_periodic_interval: Long interval! 135987395 ns.
[ 556.532000] Something may be blocking interrupts.
[ 563.016000] TSC: Marked unstable
[ 563.016000] [<c0103bb3>] show_trace+0x13/0x20
[ 563.016000] [<c0103bde>] dump_stack+0x1e/0x20
[ 563.016000] [<c0108b77>] mark_tsc_unstable+0x27/0x40
[ 563.016000] [<c01090fb>] time_cpufreq_notifier+0x19b/0x1b0
[ 563.016000] [<c0127e3d>] notifier_call_chain+0x2d/0x50
[ 563.016000] [<c029b5dc>] cpufreq_notify_transition+0x13c/0x170
[ 563.016000] [<d120f3c0>] speedstep_target+0xd0/0xe0 [speedstep_ich]
[ 563.016000] [<c029b1d7>] __cpufreq_driver_target+0x77/0x80
[ 563.016000] [<d11de5ee>] do_dbs_timer+0x13e/0x200 [cpufreq_ondemand]
[ 563.016000] [<c012b1da>] run_workqueue+0x9a/0x140
[ 563.016000] [<c012b477>] worker_thread+0x107/0x140
[ 563.016000] [<c012eaa9>] kthread+0xd9/0xe0
[ 563.016000] [<c0101005>] kernel_thread_helper+0x5/0x10
[ 563.032000] check_periodic_interval: Long interval! 140156640 ns.
[ 563.032000] Something may be blocking interrupts.
[ 576.756000] Time: jiffies clocksource has been installed.
--
mattia
:wq!
On Mon, Jan 16, 2006 at 01:42:43PM -0800, john stultz wrote:
[...]
> and send me your dmesg.
ops, forgot it, here it is as plain text
[ 0.000000] Linux version 2.6.15-mm4-3 (mattia@inferi) (gcc version 4.0.3 20051201 (prerelease) (Debian 4.0.2-5)) #4 PREEMPT Tue Jan 17 16:00:41 CET 2006
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: 0000000000000000 - 000000000009e800 (usable)
[ 0.000000] BIOS-e820: 000000000009e800 - 00000000000a0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000c0000 - 00000000000d0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000d8000 - 00000000000e0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
[ 0.000000] BIOS-e820: 0000000000100000 - 000000000fef0000 (usable)
[ 0.000000] BIOS-e820: 000000000fef0000 - 000000000feff000 (ACPI data)
[ 0.000000] BIOS-e820: 000000000feff000 - 000000000ff00000 (ACPI NVS)
[ 0.000000] BIOS-e820: 000000000ff00000 - 000000000ff80000 (usable)
[ 0.000000] BIOS-e820: 000000000ff80000 - 0000000010000000 (reserved)
[ 0.000000] BIOS-e820: 00000000ff800000 - 00000000ffc00000 (reserved)
[ 0.000000] BIOS-e820: 00000000fffffc00 - 0000000100000000 (reserved)
[ 0.000000] 255MB LOWMEM available.
[ 0.000000] On node 0 totalpages: 65408
[ 0.000000] DMA zone: 4096 pages, LIFO batch:0
[ 0.000000] DMA32 zone: 0 pages, LIFO batch:0
[ 0.000000] Normal zone: 61312 pages, LIFO batch:15
[ 0.000000] HighMem zone: 0 pages, LIFO batch:0
[ 0.000000] DMI present.
[ 0.000000] ACPI: RSDP (v000 PTLTD ) @ 0x000f6cd0
[ 0.000000] ACPI: RSDT (v001 SONY C0 0x20010809 PTL 0x00000000) @ 0x0fefa88f
[ 0.000000] ACPI: FADT (v001 SONY C0 0x20010809 PTL 0x01000000) @ 0x0fefef64
[ 0.000000] ACPI: BOOT (v001 SONY C0 0x20010809 PTL 0x00000001) @ 0x0fefefd8
[ 0.000000] ACPI: DSDT (v001 SONY C0 0x20010809 PTL 0x0100000d) @ 0x00000000
[ 0.000000] ACPI: PM-Timer IO Port: 0x1008
[ 0.000000] Allocating PCI resources starting at 20000000 (gap: 10000000:ef800000)
[ 0.000000] Detected 994.334 MHz processor.
[ 16.592942] Built 1 zonelists
[ 16.592947] Local APIC disabled by BIOS -- reenabling.
[ 16.592952] Found and enabled local APIC!
[ 16.592958] mapped APIC to ffffd000 (fee00000)
[ 16.592965] Enabling fast FPU save and restore... done.
[ 16.592970] Enabling unmasked SIMD FPU exception support... done.
[ 16.592976] Initializing CPU#0
[ 16.592988] Kernel command line: root=/dev/hda1 ro vga=extended video=radeonfb:800x600-32@60 fbcon=font:Acorn8x8 lapic resume=/dev/hda2
[ 16.593550] CPU 0 irqstacks, hard=c0416000 soft=c0417000
[ 16.593557] PID hash table entries: 1024 (order: 10, 16384 bytes)
[ 17.544379] Console: colour VGA+ 80x50
[ 17.545836] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
[ 17.546478] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
[ 17.561178] Memory: 255112k/261632k available (2067k kernel code, 5900k reserved, 913k data, 152k init, 0k highmem)
[ 17.561225] Checking if this processor honours the WP bit even in supervisor mode... Ok.
[ 17.640678] Calibrating delay using timer specific routine.. 1989.94 BogoMIPS (lpj=3979887)
[ 17.640821] Mount-cache hash table entries: 512
[ 17.641031] CPU: After generic identify, caps: 0383fbff 00000000 00000000 00000000 00000000 00000000 00000000
[ 17.641045] CPU: After vendor identify, caps: 0383fbff 00000000 00000000 00000000 00000000 00000000 00000000
[ 17.641061] CPU: L1 I cache: 16K, L1 D cache: 16K
[ 17.641134] CPU: L2 cache: 512K
[ 17.641176] CPU: After all inits, caps: 0383fbff 00000000 00000000 00000040 00000000 00000000 00000000
[ 17.641188] Intel machine check architecture supported.
[ 17.641233] Intel machine check reporting enabled on CPU#0.
[ 17.641286] mtrr: v2.0 (20020519)
[ 17.641331] CPU: Intel(R) Pentium(R) III Mobile CPU 1000MHz stepping 01
[ 17.641437] Checking 'hlt' instruction... OK.
[ 17.658974] tbxface-0109 [02] load_tables : ACPI Tables successfully acquired
[ 17.670609] Parsing all Control Methods:
[ 17.671138] Table [DSDT](id 0005) - 555 Objects with 54 Devices 191 Methods 21 Regions
[ 17.671230] ACPI Namespace successfully loaded at root c04491f8
[ 17.705135] evxfevnt-0091 [03] enable : Transition to ACPI mode successful
[ 17.813101] NET: Registered protocol family 16
[ 17.813176] ACPI: bus type pci registered
[ 17.816737] PCI: PCI BIOS revision 2.10 entry at 0xfd9aa, last bus=2
[ 17.816789] PCI: Using configuration type 1
[ 17.817061] ACPI: Subsystem revision 20051216
[ 17.818967] evgpeblk-0941 [06] ev_create_gpe_block : GPE 00 to 0F [_GPE] 2 regs on int 0x9
[ 17.819111] evgpeblk-0941 [06] ev_create_gpe_block : GPE 10 to 1F [_GPE] 2 regs on int 0x9
[ 17.820643] evgpeblk-1037 [05] ev_initialize_gpe_bloc: Found 6 Wake, Enabled 0 Runtime GPEs in this block
[ 17.821507] evgpeblk-1037 [05] ev_initialize_gpe_bloc: Found 1 Wake, Enabled 0 Runtime GPEs in this block
[ 17.826779] Completing Region/Field/Buffer/Package initialization:..............................................................
[ 17.833828] Initialized 20/21 Regions 0/0 Fields 25/25 Buffers 17/27 Packages (564 nodes)
[ 17.833917] Executing all Device _STA and_INI methods:..........................................................
[ 17.843774] 58 Devices found containing: 3 _STA, 3 _INI methods
[ 17.843883] ACPI: Interpreter enabled
[ 17.843925] ACPI: Using PIC for interrupt routing
[ 17.845783] ACPI: PCI Root Bridge [PCI0] (0000:00)
[ 17.845829] PCI: Probing PCI hardware (bus 00)
[ 17.883395] PCI quirk: region 1000-107f claimed by ICH4 ACPI/GPIO/TCO
[ 17.883443] PCI quirk: region 1180-11bf claimed by ICH4 GPIO
[ 17.883527] PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.1
[ 17.883771] Boot video device is 0000:01:00.0
[ 17.884009] PCI: Transparent bridge - 0000:00:1e.0
[ 17.884128] PCI: Bus #03 (-#06) may be hidden behind transparent bridge #02 (-#02) (try 'pci=assign-busses')
[ 17.884237] PCI: Bus #07 (-#0a) may be hidden behind transparent bridge #02 (-#02) (try 'pci=assign-busses')
[ 17.884324] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
[ 17.902237] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.AGP_._PRT]
[ 17.905223] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.HUB_._PRT]
[ 17.912405] ACPI: Embedded Controller [EC0] (gpe 28) interrupt mode.
[ 17.956448] ACPI: PCI Interrupt Link [LNKA] (IRQs *9)
[ 17.957856] ACPI: PCI Interrupt Link [LNKB] (IRQs 9) *0, disabled.
[ 17.959316] ACPI: PCI Interrupt Link [LNKC] (IRQs 9) *0, disabled.
[ 17.960988] ACPI: PCI Interrupt Link [LNKD] (IRQs *9)
[ 17.962597] ACPI: PCI Interrupt Link [LNKE] (IRQs *9)
[ 17.964195] ACPI: PCI Interrupt Link [LNKF] (IRQs 9) *0
[ 17.965645] ACPI: PCI Interrupt Link [LNKG] (IRQs 9) *0, disabled.
[ 17.967108] ACPI: PCI Interrupt Link [LNKH] (IRQs 9) *0, disabled.
[ 17.967877] Linux Plug and Play Support v0.97 (c) Adam Belay
[ 17.968022] PCI: Using ACPI for IRQ routing
[ 17.968067] PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report
[ 17.973247] PCI: Bridge: 0000:00:01.0
[ 17.973292] IO window: 3000-3fff
[ 17.973336] MEM window: d0100000-d01fffff
[ 17.973379] PREFETCH window: d8000000-dfffffff
[ 17.973440] PCI: Bus 3, cardbus bridge: 0000:02:05.0
[ 17.973484] IO window: 00004400-000044ff
[ 17.973528] IO window: 00004800-000048ff
[ 17.973571] PREFETCH window: 20000000-21ffffff
[ 17.973616] MEM window: 24000000-25ffffff
[ 17.973659] PCI: Bus 7, cardbus bridge: 0000:02:05.1
[ 17.973703] IO window: 00004c00-00004cff
[ 17.973746] IO window: 00001400-000014ff
[ 17.973790] PREFETCH window: 22000000-23ffffff
[ 17.973835] MEM window: 26000000-27ffffff
[ 17.973878] PCI: Bridge: 0000:00:1e.0
[ 17.973920] IO window: 4000-4fff
[ 17.973964] MEM window: d0200000-d02fffff
[ 17.974009] PREFETCH window: 20000000-23ffffff
[ 17.974067] PCI: Setting latency timer of device 0000:00:1e.0 to 64
[ 17.974130] **** SET: Misaligned resource pointer: cff4a7e2 Type 07 Len 0
[ 17.975437] ACPI: PCI Interrupt Link [LNKF] enabled at IRQ 9
[ 17.975484] PCI: setting IRQ 9 as level-triggered
[ 17.975489] ACPI: PCI Interrupt 0000:02:05.0[A] -> Link [LNKF] -> GSI 9 (level, low) -> IRQ 9
[ 17.975605] PCI: Setting latency timer of device 0000:02:05.0 to 64
[ 17.975653] **** SET: Misaligned resource pointer: cff4a7e2 Type 07 Len 0
[ 17.976947] ACPI: PCI Interrupt Link [LNKG] enabled at IRQ 9
[ 17.976992] ACPI: PCI Interrupt 0000:02:05.1[B] -> Link [LNKG] -> GSI 9 (level, low) -> IRQ 9
[ 17.977107] PCI: Setting latency timer of device 0000:02:05.1 to 64
[ 17.977203] Simple Boot Flag at 0x36 set to 0x1
[ 17.978505] Initializing Cryptographic API
[ 17.978554] io scheduler noop registered
[ 17.978631] io scheduler anticipatory registered
[ 17.978708] io scheduler deadline registered
[ 17.978795] io scheduler cfq registered
[ 17.979079] acpi_bus-0201 [01] bus_set_power : Device is not power manageable
[ 17.979240] **** SET: Misaligned resource pointer: cff4a2e2 Type 07 Len 0
[ 17.980662] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 9
[ 17.980708] ACPI: PCI Interrupt 0000:01:00.0[A] -> Link [LNKA] -> GSI 9 (level, low) -> IRQ 9
[ 17.992388] radeonfb: Retrieved PLL infos from BIOS
[ 17.992434] radeonfb: Reference=27.00 MHz (RefDiv=60) Memory=166.00 Mhz, System=166.00 MHz
[ 17.992488] radeonfb: PLL min 12000 max 35000
[ 18.020366] Time: tsc clocksource has been installed.
[ 18.020412] TSC clocksource: 994334 khz, rating=300 mult=4218204 shift=22
[ 18.927606] Non-DDC laptop panel detected
[ 19.922774] radeonfb: Monitor 1 type LCD found
[ 19.922818] radeonfb: Monitor 2 type no found
[ 19.922864] radeonfb: panel ID string: Samsung LTN150P1-L02
[ 19.922909] radeonfb: detected LVDS panel size from BIOS: 1400x1050
[ 19.922954] radeondb: BIOS provided dividers will be used
[ 20.158575] radeonfb: Dynamic Clock Power Management enabled
[ 20.430358] Console: switching to colour frame buffer device 100x75
[ 20.431228] radeonfb (0000:01:00.0): ATI Radeon LY
[ 20.433773] ACPI: AC Adapter [ACAD] (on-line)
[ 20.442738] ACPI: Battery Slot [BAT1] (battery present)
[ 20.445059] ACPI: Battery Slot [BAT2] (battery absent)
[ 20.445181] ACPI: Lid Switch [LID]
[ 20.445268] ACPI: Power Button (CM) [PWRB]
[ 20.445603] ACPI: CPU0 (power states: C1[C1] C2[C2])
[ 20.452669] ACPI: Thermal Zone [ATF0] (49 C)
[ 20.452793] isapnp: Scanning for PnP cards...
[ 20.806346] isapnp: No Plug & Play device found
[ 20.842860] PNP: No PS/2 controller found. Probing ports directly.
[ 20.844641] serio: i8042 AUX port at 0x60,0x64 irq 12
[ 20.844831] serio: i8042 KBD port at 0x60,0x64 irq 1
[ 20.845099] Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
[ 20.845226] ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
[ 20.845420] ICH3M: IDE controller at PCI slot 0000:00:1f.1
[ 20.845536] PCI: Enabling device 0000:00:1f.1 (0005 -> 0007)
[ 20.845656] pci_irq-0384 [02] pci_irq_derive : Unable to derive IRQ for device 0000:00:1f.1
[ 20.845880] ACPI: PCI Interrupt 0000:00:1f.1[A]: no GSI
[ 20.846005] ICH3M: chipset revision 1
[ 20.846080] ICH3M: not 100% native mode: will probe irqs later
[ 20.846202] ide0: BM-DMA at 0x1860-0x1867, BIOS settings: hda:DMA, hdb:pio
[ 20.846377] ide1: BM-DMA at 0x1868-0x186f, BIOS settings: hdc:pio, hdd:pio
[ 20.846547] Probing IDE interface ide0...
[ 20.855671] check_periodic_interval: short interval! 37827221 ns.
[ 20.861705] bad calibration or timers may be broken.
[ 20.867883] check_periodic_interval: short interval! 12223630 ns.
[ 20.873956] bad calibration or timers may be broken.
[ 20.880240] check_periodic_interval: short interval! 12367068 ns.
[ 20.886412] bad calibration or timers may be broken.
[ 20.892744] check_periodic_interval: short interval! 12514412 ns.
[ 20.898987] bad calibration or timers may be broken.
[ 20.905372] check_periodic_interval: short interval! 12638672 ns.
[ 20.911666] bad calibration or timers may be broken.
[ 20.918195] hda: FUJITSU MHV2080AH, ATA DISK drive
[ 20.924698] check_periodic_interval: short interval! 19340529 ns.
[ 20.931112] bad calibration or timers may be broken.
[ 20.937686] check_periodic_interval: short interval! 12999194 ns.
[ 20.944178] bad calibration or timers may be broken.
[ 20.950834] check_periodic_interval: short interval! 13158854 ns.
[ 20.957396] bad calibration or timers may be broken.
[ 20.964062] check_periodic_interval: short interval! 13238120 ns.
[ 20.970600] bad calibration or timers may be broken.
[ 20.977278] check_periodic_interval: short interval! 13227073 ns.
[ 20.983785] bad calibration or timers may be broken.
[ 20.994991] ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
[ 21.001511] Probing IDE interface ide1...
[ 21.011841] Probing IDE interface ide1...
[ 21.022153] hda: max request size: 128KiB
[ 21.028919] hda: 156301488 sectors (80026 MB) w/8192KiB Cache, CHS=65535/16/63, UDMA(100)
[ 21.035484] hda: cache flushes supported
[ 21.041980] hda: hda1 hda2 hda3 hda4 < hda5 hda6 >
[ 21.049415] mice: PS/2 mouse device common for all mice
[ 21.055969] NET: Registered protocol family 2
[ 21.063402] input: AT Translated Set 2 keyboard as /class/input/input0
[ 21.070101] IP route cache hash table entries: 4096 (order: 2, 16384 bytes)
[ 21.076947] TCP established hash table entries: 16384 (order: 4, 65536 bytes)
[ 21.083747] TCP bind hash table entries: 16384 (order: 4, 65536 bytes)
[ 21.090516] TCP: Hash tables configured (established 16384 bind 16384)
[ 21.097112] TCP reno registered
[ 21.103773] TCP bic registered
[ 21.110268] NET: Registered protocol family 1
[ 21.116704] NET: Registered protocol family 17
[ 21.123072] Using IPI Shortcut mode
[ 21.154137] ACPI wakeup devices:
[ 21.160382] PWRB USB1 USB2 USB3 CRD0 CRD1 LAN EC0 COMA MODE
[ 21.166805] ACPI: (supports S0 S3 S4 S5)
[ 21.173777] ReiserFS: hda1: found reiserfs format "3.6" with standard journal
[ 21.182730] ReiserFS: hda1: using ordered data mode
[ 21.189268] ReiserFS: hda1: journal params: device hda1, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30
[ 21.204655] ReiserFS: hda1: checking transaction log (hda1)
[ 21.212097] ReiserFS: hda1: Using r5 hash to sort names
[ 21.218665] VFS: Mounted root (reiserfs filesystem) readonly.
[ 21.225419] Freeing unused kernel memory: 152k freed
[ 22.826555] hw_random hardware driver 1.0.0 loaded
[ 22.863045] usbcore: registered new driver usbfs
[ 22.870147] usbcore: registered new driver hub
[ 22.880784] Linux agpgart interface v0.101 (c) Dave Jones
[ 22.894656] agpgart: Detected an Intel 830M Chipset.
[ 22.916452] agpgart: AGP aperture is 256M @ 0xe0000000
[ 22.939765] USB Universal Host Controller Interface driver v3.0
[ 22.946632] ACPI: PCI Interrupt 0000:00:1d.0[A] -> Link [LNKA] -> GSI 9 (level, low) -> IRQ 9
[ 22.953532] PCI: Setting latency timer of device 0000:00:1d.0 to 64
[ 22.953540] uhci_hcd 0000:00:1d.0: UHCI Host Controller
[ 22.960618] uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 1
[ 22.967503] uhci_hcd 0000:00:1d.0: irq 9, io base 0x00001800
[ 22.974540] usb usb1: configuration #1 chosen from 1 choice
[ 22.981477] hub 1-0:1.0: USB hub found
[ 22.988320] hub 1-0:1.0: 2 ports detected
[ 23.095630] **** SET: Misaligned resource pointer: cf4f3c02 Type 07 Len 0
[ 23.103937] ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 9
[ 23.110791] ACPI: PCI Interrupt 0000:00:1d.1[B] -> Link [LNKD] -> GSI 9 (level, low) -> IRQ 9
[ 23.117689] PCI: Setting latency timer of device 0000:00:1d.1 to 64
[ 23.117697] uhci_hcd 0000:00:1d.1: UHCI Host Controller
[ 23.124540] uhci_hcd 0000:00:1d.1: new USB bus registered, assigned bus number 2
[ 23.131384] uhci_hcd 0000:00:1d.1: irq 9, io base 0x00001820
[ 23.138329] usb usb2: configuration #1 chosen from 1 choice
[ 23.145151] hub 2-0:1.0: USB hub found
[ 23.151959] hub 2-0:1.0: 2 ports detected
[ 23.259401] **** SET: Misaligned resource pointer: cf4f3d02 Type 07 Len 0
[ 23.267849] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 9
[ 23.274658] ACPI: PCI Interrupt 0000:00:1d.2[C] -> Link [LNKC] -> GSI 9 (level, low) -> IRQ 9
[ 23.281570] PCI: Setting latency timer of device 0000:00:1d.2 to 64
[ 23.281578] uhci_hcd 0000:00:1d.2: UHCI Host Controller
[ 23.288508] uhci_hcd 0000:00:1d.2: new USB bus registered, assigned bus number 3
[ 23.295415] uhci_hcd 0000:00:1d.2: irq 9, io base 0x00001840
[ 23.302388] usb usb3: configuration #1 chosen from 1 choice
[ 23.309274] hub 3-0:1.0: USB hub found
[ 23.316089] hub 3-0:1.0: 2 ports detected
[ 23.423175] PCI: Enabling device 0000:00:1f.5 (0000 -> 0001)
[ 23.430113] **** SET: Misaligned resource pointer: cff382e2 Type 07 Len 0
[ 23.438391] ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 9
[ 23.445206] ACPI: PCI Interrupt 0000:00:1f.5[B] -> Link [LNKB] -> GSI 9 (level, low) -> IRQ 9
[ 23.452147] PCI: Setting latency timer of device 0000:00:1f.5 to 64
[ 23.530922] usb 2-1: new low speed USB device using uhci_hcd and address 2
[ 23.716804] usb 2-1: configuration #1 chosen from 1 choice
[ 23.966556] usb 3-1: new full speed USB device using uhci_hcd and address 2
[ 24.026538] intel8x0_measure_ac97_clock: measured 52457 usecs
[ 24.033536] intel8x0: clocking to 48000
[ 24.150626] usb 3-1: configuration #1 chosen from 1 choice
[ 24.257221] ACPI: PCI Interrupt 0000:02:05.0[A] -> Link [LNKF] -> GSI 9 (level, low) -> IRQ 9
[ 24.264352] Yenta: CardBus bridge found at 0000:02:05.0 [104d:80e7]
[ 24.296380] e100: Intel(R) PRO/100 Network Driver, 3.4.14-k4-NAPI
[ 24.303496] e100: Copyright(c) 1999-2005 Intel Corporation
[ 24.406897] Yenta: ISA IRQ mask 0x0cb8, PCI irq 9
[ 24.413893] Socket status: 30000006
[ 24.420837] pcmcia: parent PCI bridge I/O window: 0x4000 - 0x4fff
[ 24.427833] cs: IO port probe 0x4000-0x4fff: clean.
[ 24.435270] pcmcia: parent PCI bridge Memory window: 0xd0200000 - 0xd02fffff
[ 24.442308] pcmcia: parent PCI bridge Memory window: 0x20000000 - 0x23ffffff
[ 24.450316] acpi_bus-0201 [01] bus_set_power : Device is not power manageable
[ 24.457455] **** SET: Misaligned resource pointer: cf515ca2 Type 07 Len 0
[ 24.465863] ACPI: PCI Interrupt Link [LNKE] enabled at IRQ 9
[ 24.473011] ACPI: PCI Interrupt 0000:02:08.0[A] -> Link [LNKE] -> GSI 9 (level, low) -> IRQ 9
[ 24.502383] e100: eth0: e100_probe: addr 0xd0204000, irq 9, MAC addr 08:00:46:26:50:59
[ 24.509668] ACPI: PCI Interrupt 0000:02:05.1[B] -> Link [LNKG] -> GSI 9 (level, low) -> IRQ 9
[ 24.517039] Yenta: CardBus bridge found at 0000:02:05.1 [104d:80e7]
[ 24.650695] Yenta: ISA IRQ mask 0x0cb8, PCI irq 9
[ 24.658069] Socket status: 30000006
[ 24.665379] pcmcia: parent PCI bridge I/O window: 0x4000 - 0x4fff
[ 24.672771] cs: IO port probe 0x4000-0x4fff: clean.
[ 24.680545] pcmcia: parent PCI bridge Memory window: 0xd0200000 - 0xd02fffff
[ 24.688072] pcmcia: parent PCI bridge Memory window: 0x20000000 - 0x23ffffff
[ 25.099689] input: Logitech USB Mouse as /class/input/input1
[ 25.114390] input: USB HID v1.10 Mouse [Logitech USB Mouse] on usb-0000:00:1d.1-1
[ 25.121976] usbcore: registered new driver usbhid
[ 25.129424] drivers/usb/input/hid-core.c: v2.6:USB HID core driver
[ 25.217824] SCSI subsystem initialized
[ 25.275368] Initializing USB Mass Storage driver...
[ 25.286225] scsi0 : SCSI emulation for USB Mass Storage devices
[ 25.294165] usb-storage: device found at 2
[ 25.294169] usb-storage: waiting for device to settle before scanning
[ 25.294185] usbcore: registered new driver usb-storage
[ 25.302507] USB Mass Storage support registered.
[ 25.435207] Vendor: Sony Model: MSC-U02 Rev: 1.00
[ 25.443398] Type: Direct-Access ANSI SCSI revision: 00
[ 25.460353] usb-storage: device scan complete
[ 25.492546] SCSI device sda: 7904 512-byte hdwr sectors (4 MB)
[ 25.501255] sda: Write Protect is off
[ 25.508991] sda: Mode Sense: 00 6a 10 00
[ 25.508995] sda: assuming drive cache: write through
[ 25.521716] SCSI device sda: 7904 512-byte hdwr sectors (4 MB)
[ 25.530498] sda: Write Protect is off
[ 25.538129] sda: Mode Sense: 00 6a 10 00
[ 25.538134] sda: assuming drive cache: write through
[ 25.545770] sda: sda1
[ 25.586782] sd 0:0:0:0: Attached scsi removable disk sda
[ 25.808408] Adding 248996k swap on /dev/hda2. Priority:-1 extents:1 across:248996k
[ 26.050205] input: PC Speaker as /class/input/input2
[ 26.104604] speedstep: frequency transition measured seems out of range (4800 nSec), falling back to a safe one of 500000 nSec.
[ 26.137841] sonypi: Sony Programmable I/O Controller Driver v1.26.
[ 26.149857] sonypi: detected type2 model, verbose = 0, fnkeyinit = off, camera = off, compat = off, mask = 0xffffffff, useinput = on, acpi = on
[ 26.163316] sonypi: enabled at irq=11, port1=0x1080, port2=0x1084
[ 26.170087] sonypi: device allocated minor is 63
[ 26.191119] input: Sony Vaio Jogdial as /class/input/input3
[ 26.226397] input: Sony Vaio Keys as /class/input/input4
[ 26.281565] tun: Universal TUN/TAP device driver, 1.6
[ 26.292148] tun: (C) 1999-2004 Max Krasnyansky <[email protected]>
[ 26.320519] ACPI Sony Notebook Control Driver v0.2 successfully installed
[ 26.347144] Real Time Clock Driver v1.12ac
[ 26.392679] Synaptics Touchpad, model: 1, fw: 5.9, id: 0x2548b1, caps: 0x804753/0x0
[ 26.408654] input: SynPS/2 Synaptics TouchPad as /class/input/input5
[ 26.434670] do_settimeofday() was called!
[ 26.452811] do_settimeofday() was called!
[ 26.623365] ReiserFS: hda3: found reiserfs format "3.6" with standard journal
[ 26.643990] ReiserFS: hda3: using ordered data mode
[ 26.650660] ReiserFS: hda3: journal params: device hda3, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30
[ 26.665746] ReiserFS: hda3: checking transaction log (hda3)
[ 26.672893] ReiserFS: hda3: Using r5 hash to sort names
[ 26.701880] SGI XFS with ACLs, security attributes, no debug enabled
[ 26.710619] XFS mounting filesystem hda6
[ 26.721647] Ending clean XFS mount for filesystem: hda6
[ 27.245644] TSC clocksource: 994334 khz, rating=300 mult=4218204 shift=22
[ 27.465933] TSC clocksource: 994334 khz, rating=300 mult=4218204 shift=22
[ 27.681885] TSC clocksource: 994334 khz, rating=300 mult=4218204 shift=22
[ 27.897878] TSC clocksource: 994334 khz, rating=300 mult=4218204 shift=22
[ 28.190949] do_settimeofday() was called!
[ 39.544076] TSC: Marked unstable
[ 39.550209] [<c0103bb3>] show_trace+0x13/0x20
[ 39.556348] [<c0103bde>] dump_stack+0x1e/0x20
[ 39.562434] [<c0108b77>] mark_tsc_unstable+0x27/0x40
[ 39.568578] [<c01090fb>] time_cpufreq_notifier+0x19b/0x1b0
[ 39.574709] [<c0127e3d>] notifier_call_chain+0x2d/0x50
[ 39.580821] [<c029b5dc>] cpufreq_notify_transition+0x13c/0x170
[ 39.586921] [<d120f3c0>] speedstep_target+0xd0/0xe0 [speedstep_ich]
[ 39.593001] [<c029b1d7>] __cpufreq_driver_target+0x77/0x80
[ 39.599034] [<d11de5ee>] do_dbs_timer+0x13e/0x200 [cpufreq_ondemand]
[ 39.605099] [<c012b1da>] run_workqueue+0x9a/0x140
[ 39.611147] [<c012b477>] worker_thread+0x107/0x140
[ 39.617097] [<c012eaa9>] kthread+0xd9/0xe0
[ 39.622948] [<c0101005>] kernel_thread_helper+0x5/0x10
[ 243.600000] Time: acpi_pm clocksource has been installed.
[ 246.108000] ip_tables: (C) 2000-2006 Netfilter Core Team
[ 246.236000] ip_conntrack version 2.4 (2044 buckets, 16352 max) - 232 bytes per conntrack
[ 246.636000] e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex
[ 252.524000] Installing knfsd (copyright (C) 1996 [email protected]).
[ 556.512000] TSC: Marked unstable
[ 556.512000] [<c0103bb3>] show_trace+0x13/0x20
[ 556.512000] [<c0103bde>] dump_stack+0x1e/0x20
[ 556.512000] [<c0108b77>] mark_tsc_unstable+0x27/0x40
[ 556.512000] [<c01090fb>] time_cpufreq_notifier+0x19b/0x1b0
[ 556.512000] [<c0127e3d>] notifier_call_chain+0x2d/0x50
[ 556.512000] [<c029b551>] cpufreq_notify_transition+0xb1/0x170
[ 556.512000] [<d120f39d>] speedstep_target+0xad/0xe0 [speedstep_ich]
[ 556.512000] [<c029b1d7>] __cpufreq_driver_target+0x77/0x80
[ 556.512000] [<d11de66e>] do_dbs_timer+0x1be/0x200 [cpufreq_ondemand]
[ 556.512000] [<c012b1da>] run_workqueue+0x9a/0x140
[ 556.512000] [<c012b477>] worker_thread+0x107/0x140
[ 556.512000] [<c012eaa9>] kthread+0xd9/0xe0
[ 556.512000] [<c0101005>] kernel_thread_helper+0x5/0x10
[ 556.532000] check_periodic_interval: Long interval! 135987395 ns.
[ 556.532000] Something may be blocking interrupts.
[ 563.016000] TSC: Marked unstable
[ 563.016000] [<c0103bb3>] show_trace+0x13/0x20
[ 563.016000] [<c0103bde>] dump_stack+0x1e/0x20
[ 563.016000] [<c0108b77>] mark_tsc_unstable+0x27/0x40
[ 563.016000] [<c01090fb>] time_cpufreq_notifier+0x19b/0x1b0
[ 563.016000] [<c0127e3d>] notifier_call_chain+0x2d/0x50
[ 563.016000] [<c029b5dc>] cpufreq_notify_transition+0x13c/0x170
[ 563.016000] [<d120f3c0>] speedstep_target+0xd0/0xe0 [speedstep_ich]
[ 563.016000] [<c029b1d7>] __cpufreq_driver_target+0x77/0x80
[ 563.016000] [<d11de5ee>] do_dbs_timer+0x13e/0x200 [cpufreq_ondemand]
[ 563.016000] [<c012b1da>] run_workqueue+0x9a/0x140
[ 563.016000] [<c012b477>] worker_thread+0x107/0x140
[ 563.016000] [<c012eaa9>] kthread+0xd9/0xe0
[ 563.016000] [<c0101005>] kernel_thread_helper+0x5/0x10
[ 563.032000] check_periodic_interval: Long interval! 140156640 ns.
[ 563.032000] Something may be blocking interrupts.
[ 576.756000] Time: jiffies clocksource has been installed.
[ 878.020000] TSC: Marked unstable
[ 878.020000] [<c0103bb3>] show_trace+0x13/0x20
[ 878.020000] [<c0103bde>] dump_stack+0x1e/0x20
[ 878.020000] [<c0108b77>] mark_tsc_unstable+0x27/0x40
[ 878.020000] [<c01090fb>] time_cpufreq_notifier+0x19b/0x1b0
[ 878.020000] [<c0127e3d>] notifier_call_chain+0x2d/0x50
[ 878.020000] [<c029b551>] cpufreq_notify_transition+0xb1/0x170
[ 878.020000] [<d120f39d>] speedstep_target+0xad/0xe0 [speedstep_ich]
[ 878.020000] [<c029b1d7>] __cpufreq_driver_target+0x77/0x80
[ 878.020000] [<d11de66e>] do_dbs_timer+0x1be/0x200 [cpufreq_ondemand]
[ 878.020000] [<c012b1da>] run_workqueue+0x9a/0x140
[ 878.020000] [<c012b477>] worker_thread+0x107/0x140
[ 878.020000] [<c012eaa9>] kthread+0xd9/0xe0
[ 878.020000] [<c0101005>] kernel_thread_helper+0x5/0x10
[ 880.020000] TSC: Marked unstable
[ 880.020000] [<c0103bb3>] show_trace+0x13/0x20
[ 880.020000] [<c0103bde>] dump_stack+0x1e/0x20
[ 880.020000] [<c0108b77>] mark_tsc_unstable+0x27/0x40
[ 880.020000] [<c01090fb>] time_cpufreq_notifier+0x19b/0x1b0
[ 880.020000] [<c0127e3d>] notifier_call_chain+0x2d/0x50
[ 880.020000] [<c029b5dc>] cpufreq_notify_transition+0x13c/0x170
[ 880.020000] [<d120f3c0>] speedstep_target+0xd0/0xe0 [speedstep_ich]
[ 880.020000] [<c029b1d7>] __cpufreq_driver_target+0x77/0x80
[ 880.020000] [<d11de5ee>] do_dbs_timer+0x13e/0x200 [cpufreq_ondemand]
[ 880.020000] [<c012b1da>] run_workqueue+0x9a/0x140
[ 880.020000] [<c012b477>] worker_thread+0x107/0x140
[ 880.020000] [<c012eaa9>] kthread+0xd9/0xe0
[ 880.020000] [<c0101005>] kernel_thread_helper+0x5/0x10
--
mattia
:wq!
On Mon, Jan 16, 2006 at 04:49:18PM -0800, john stultz wrote:
> On Mon, 2006-01-16 at 21:40 +0100, Mattia Dongili wrote:
> > On Mon, Jan 16, 2006 at 12:16:21PM -0800, john stultz wrote:
> > > I'll try to narrow that window down a bit and see if that doesn't
> > > resolve the issue.
> >
> > I'll be happy to test new patches if necessary (I'm running -mm4)
>
> See if this patch doesn't resolve the issue. Its a bit hackish, but
> basically I'm just holding off installing any clocksources until later
> on at boot. This avoids some of the clocksource churn.
With the patch applied the boot went smooth.
thanks
--
mattia
:wq!
On Tue, 2006-01-17 at 18:49 +0100, Mattia Dongili wrote:
> On Mon, Jan 16, 2006 at 04:49:18PM -0800, john stultz wrote:
> > On Mon, 2006-01-16 at 21:40 +0100, Mattia Dongili wrote:
> > > On Mon, Jan 16, 2006 at 12:16:21PM -0800, john stultz wrote:
> > > > I'll try to narrow that window down a bit and see if that doesn't
> > > > resolve the issue.
> > >
> > > I'll be happy to test new patches if necessary (I'm running -mm4)
> >
> > See if this patch doesn't resolve the issue. Its a bit hackish, but
> > basically I'm just holding off installing any clocksources until later
> > on at boot. This avoids some of the clocksource churn.
>
> With the patch applied the boot went smooth.
Great. Thanks for the testing. I'll send that one off to Andrew.
thanks again,
-john
On Tue, Jan 17, 2006 at 11:11:30AM -0800, john stultz wrote:
> On Tue, 2006-01-17 at 18:49 +0100, Mattia Dongili wrote:
> > On Mon, Jan 16, 2006 at 04:49:18PM -0800, john stultz wrote:
> > > On Mon, 2006-01-16 at 21:40 +0100, Mattia Dongili wrote:
> > > > On Mon, Jan 16, 2006 at 12:16:21PM -0800, john stultz wrote:
> > > > > I'll try to narrow that window down a bit and see if that doesn't
> > > > > resolve the issue.
> > > >
> > > > I'll be happy to test new patches if necessary (I'm running -mm4)
> > >
> > > See if this patch doesn't resolve the issue. Its a bit hackish, but
> > > basically I'm just holding off installing any clocksources until later
> > > on at boot. This avoids some of the clocksource churn.
> >
> > With the patch applied the boot went smooth.
>
> Great. Thanks for the testing. I'll send that one off to Andrew.
I'm sorry, my very bad... it seems I somewhat forgot to remove
clocksource=jiffies from the command line and that fooled me.
the stall is still there, tomorrow I'll reapply your debug-patch and
report the full dmesg (with the finished_booting-hack enabled).
Sorry again.
--
mattia
:wq!
On Tue, January 17, 2006 11:49 pm, Mattia Dongili said:
[...]
> the stall is still there, tomorrow I'll reapply your debug-patch and
> report the full dmesg (with the finished_booting-hack enabled).
Full dmesg attached.
Stalls happened between in the 16.39-16.55 interval, in 16.70-16.96 and
17.89-18.64. They were all much longer than stated in the log timestamp
(I'd say ~10:1 ratio).
Sorry again for my previous false notice about the bug being solved...
--
mattia
:wq!
On Wed, 2006-01-18 at 11:47 +0100, Mattia Dongili wrote:
> On Tue, January 17, 2006 11:49 pm, Mattia Dongili said:
> [...]
> > the stall is still there, tomorrow I'll reapply your debug-patch and
> > report the full dmesg (with the finished_booting-hack enabled).
>
> Full dmesg attached.
> Stalls happened between in the 16.39-16.55 interval, in 16.70-16.96 and
> 17.89-18.64. They were all much longer than stated in the log timestamp
> (I'd say ~10:1 ratio).
>
> Sorry again for my previous false notice about the bug being solved...
Hey Mattia,
Sorry I've been so quiet recently, I'm still working on this one. The
difficult spot is that if the cpufreq notification driver is a module,
then there will always be a window between the point at which we start
using the TSC to the point where we find out that the TSC is changing
frequency. Not sure what to do here just yet.
Although I'm curious: Did the recent changes in 2.6.16-rc1-mm2 had any
effect on this issue?
thanks
-john
On Tue, Jan 24, 2006 at 02:27:14PM -0800, john stultz wrote:
> On Wed, 2006-01-18 at 11:47 +0100, Mattia Dongili wrote:
> > On Tue, January 17, 2006 11:49 pm, Mattia Dongili said:
> > [...]
> > > the stall is still there, tomorrow I'll reapply your debug-patch and
> > > report the full dmesg (with the finished_booting-hack enabled).
> >
> > Full dmesg attached.
> > Stalls happened between in the 16.39-16.55 interval, in 16.70-16.96 and
> > 17.89-18.64. They were all much longer than stated in the log timestamp
> > (I'd say ~10:1 ratio).
> >
> > Sorry again for my previous false notice about the bug being solved...
>
> Hey Mattia,
> Sorry I've been so quiet recently, I'm still working on this one. The
No problem.
> difficult spot is that if the cpufreq notification driver is a module,
> then there will always be a window between the point at which we start
> using the TSC to the point where we find out that the TSC is changing
> frequency. Not sure what to do here just yet.
I was wondering if you could force an do_gettimeofday call quite early
in order to lower tsc priority as soon as possible, but maybe I'm not
entirely into that code :)
> Although I'm curious: Did the recent changes in 2.6.16-rc1-mm2 had any
> effect on this issue?
no, I'm currently running it and the same behaviour still applies.
ciao
--
mattia
:wq!
On Wed, 2006-01-25 at 00:04 +0100, Mattia Dongili wrote:
> On Tue, Jan 24, 2006 at 02:27:14PM -0800, john stultz wrote:
> > difficult spot is that if the cpufreq notification driver is a module,
> > then there will always be a window between the point at which we start
> > using the TSC to the point where we find out that the TSC is changing
> > frequency. Not sure what to do here just yet.
>
> I was wondering if you could force an do_gettimeofday call quite early
> in order to lower tsc priority as soon as possible, but maybe I'm not
> entirely into that code :)
Well, it isn't do_gettimeofday that needs to be called, but we need a
way to decide if we should call tsc_mark_unstable(). Currently we do
that when we get a cpufreq transition notification if the cpu's TSC is
not constant. The problem being: on your system, that notification
isn't called until after the cpufreq driver module loads. This is of
course, after we've started to use the TSC.
If the cpufreq driver loaded earlier, or we had some other way of
checking if the TSC was not constant, we could call tsc_mark_unstable()
then.
We'll probably have to do a manual check like what the cpufreq driver
does early on so we can have this info before we install the TSC
clocksource. I'll let you know when I have a patch to try.
> > Although I'm curious: Did the recent changes in 2.6.16-rc1-mm2 had any
> > effect on this issue?
>
> no, I'm currently running it and the same behaviour still applies.
Drat. Well, thanks for testing.
-john
On Tue, 2006-01-24 at 15:48 -0800, john stultz wrote:
> On Wed, 2006-01-25 at 00:04 +0100, Mattia Dongili wrote:
> > On Tue, Jan 24, 2006 at 02:27:14PM -0800, john stultz wrote:
> > > difficult spot is that if the cpufreq notification driver is a module,
> > > then there will always be a window between the point at which we start
> > > using the TSC to the point where we find out that the TSC is changing
> > > frequency. Not sure what to do here just yet.
> >
> > I was wondering if you could force an do_gettimeofday call quite early
> > in order to lower tsc priority as soon as possible, but maybe I'm not
> > entirely into that code :)
>
> Well, it isn't do_gettimeofday that needs to be called, but we need a
> way to decide if we should call tsc_mark_unstable(). Currently we do
> that when we get a cpufreq transition notification if the cpu's TSC is
> not constant. The problem being: on your system, that notification
> isn't called until after the cpufreq driver module loads. This is of
> course, after we've started to use the TSC.
>
> If the cpufreq driver loaded earlier, or we had some other way of
> checking if the TSC was not constant, we could call tsc_mark_unstable()
> then.
>
> We'll probably have to do a manual check like what the cpufreq driver
> does early on so we can have this info before we install the TSC
> clocksource. I'll let you know when I have a patch to try.
Mattia,
Just to verify I'm not barking up the wrong tree here, could you try
building w/ CONFIG_X86_SPEEDSTEP_ICH=y instead of as a module?
This should force the cpufreq driver to load earlier, and hopefully
we'll get a notification earlier as well.
thanks
-john
On Tue, Jan 24, 2006 at 04:12:09PM -0800, john stultz wrote:
> On Tue, 2006-01-24 at 15:48 -0800, john stultz wrote:
[...]
> > Well, it isn't do_gettimeofday that needs to be called, but we need a
> > way to decide if we should call tsc_mark_unstable(). Currently we do
> > that when we get a cpufreq transition notification if the cpu's TSC is
> > not constant. The problem being: on your system, that notification
> > isn't called until after the cpufreq driver module loads. This is of
> > course, after we've started to use the TSC.
Ah!! the driver doesn't notify the transitions happening at
initialization as it doesn't use the cpufreq interfaces to do so and
booting with the performance governor just keep the frequency at it
highest limit.
The first cpufreq_set_policy call (and subsequent notification) in my
case is performed by the ondemand governor pretty late (when the
cpufrequtils script is run).
[...]
> Just to verify I'm not barking up the wrong tree here, could you try
> building w/ CONFIG_X86_SPEEDSTEP_ICH=y instead of as a module?
done, as expected things remain the same and tsc is marked unstable only
when cpufreutils loads the ondemand that immediately switches to the
lowest available frequency.
Dmesg is attached (CONFIG_PARANOID_GENERIC_TIME=y), as always I had
stalls at the known places (the 2 sleep calls done by init-scripts) at
20.88 and 22.09
thanks!
--
mattia
:wq!