2007-01-02 14:01:43

by Remy Bohmer

[permalink] [raw]
Subject: [BUG-RT] RTC has been stopped-> long delay during boot, soft reboot->GRUB fails to call getrtsecs()

Hello Ingo,

I have discovered 3 problems that are likely all related to the same
root-cause, likely to be caused by the RT-kernel.
I use the 2.6.19-rt15 kernel, with the configuration attached to this mail.
It is running on a standard x86, i945, Celeron 2.93 GHZ (=UP), Fedora Core 6

So, I have set the following options:
CONFIG_HIGH_RES_TIMERS=y
CONFIG_NO_HZ=y

The problems:
1. During (cold and warm) boot the synchronisation of the hardware
clock takes often very long time, up to approx. 30 seconds. (This is
the call: /sbin/hwclock --hctosys --localtime)
2. After reboot the next boot into grub hangs on the getrtsecs() call
(in Grub 0.97 code). (RTC clock does not run or has been stopped by
the last boot)
3. After the same reboot, looking into the BIOS, the time-of-day clock
is stopped also (system clock).

Do you recognise this problem?
Can it be related to the CONFIG_NO_HZ option?

Notice that we do not call "hwclock --systohc" on shutdown or reboot
to sync the hardware clock. Does this call also restart the RTC clock
normally? (If it does, I think this would be strange (I believe that
if the kernel itself stops the RTC, it has to restarted by the kernel
also), otherwise it is probably not allowed to execute this command
during normal runtime)

Kind Regards,

Remy Bohmer


Attachments:
(No filename) (1.26 kB)
kernel-i686.config (35.58 kB)
Download all attachments

2007-01-02 16:19:21

by Ingo Molnar

[permalink] [raw]
Subject: Re: [BUG-RT] RTC has been stopped-> long delay during boot, soft reboot->GRUB fails to call getrtsecs()


* Remy Bohmer <[email protected]> wrote:

> Hello Ingo,
>
> I have discovered 3 problems that are likely all related to the same
> root-cause, likely to be caused by the RT-kernel.
> I use the 2.6.19-rt15 kernel, with the configuration attached to this mail.
> It is running on a standard x86, i945, Celeron 2.93 GHZ (=UP), Fedora Core 6
>
> So, I have set the following options:
> CONFIG_HIGH_RES_TIMERS=y
> CONFIG_NO_HZ=y
>
> The problems:
> 1. During (cold and warm) boot the synchronisation of the hardware
> clock takes often very long time, up to approx. 30 seconds. (This is
> the call: /sbin/hwclock --hctosys --localtime)

i tried this on a recent -rt kernel and there's no delay:

[root@europe ~]# time /sbin/hwclock --hctosys --localtime

real 0m0.756s
user 0m0.754s
sys 0m0.002s

could you try a more recent kernel like 2.6.20-rc2-rt3? We fixed a good
number of high-res timers related bugs that could result in similar
hangs. But maybe it's still unfixed, it's just a guess.

Ingo

2007-01-02 16:38:54

by Remy Bohmer

[permalink] [raw]
Subject: Re: [BUG-RT] RTC has been stopped-> long delay during boot, soft reboot->GRUB fails to call getrtsecs()

Hello Ingo,

In the mean time I have tested more with this problem:
1. Once, the RTC clock gets stopped during shutdown, it is **NEVER**
going to run again.
And I continuously see the problems with grub and the BIOS
time-of-day. Finally I had to remove the battery from the motherboard
to reset the RTC clock (Nothing else worked, even total poweroff). Now
it is ticking again... until...
2. Using a kernel without CONFIG_NO_HZ seems to solve all the problems
with the RTC clock, even hwclock functions normally.

With CONFIG_NO_HZ enabled, the problem with hwclock **only** occurs
when it is called for the first time by the init scripts, directly
after kernel boot. Every sub sequential call to hwclock show normal
times like your measurement.


Best Regards and also a Happy New Year,

Remy

2007/1/2, Ingo Molnar <[email protected]>:
>
> * Remy Bohmer <[email protected]> wrote:
>
> > Hello Ingo,
> >
> > I have discovered 3 problems that are likely all related to the same
> > root-cause, likely to be caused by the RT-kernel.
> > I use the 2.6.19-rt15 kernel, with the configuration attached to this mail.
> > It is running on a standard x86, i945, Celeron 2.93 GHZ (=UP), Fedora Core 6
> >
> > So, I have set the following options:
> > CONFIG_HIGH_RES_TIMERS=y
> > CONFIG_NO_HZ=y
> >
> > The problems:
> > 1. During (cold and warm) boot the synchronisation of the hardware
> > clock takes often very long time, up to approx. 30 seconds. (This is
> > the call: /sbin/hwclock --hctosys --localtime)
>
> i tried this on a recent -rt kernel and there's no delay:
>
> [root@europe ~]# time /sbin/hwclock --hctosys --localtime
>
> real 0m0.756s
> user 0m0.754s
> sys 0m0.002s
>
> could you try a more recent kernel like 2.6.20-rc2-rt3? We fixed a good
> number of high-res timers related bugs that could result in similar
> hangs. But maybe it's still unfixed, it's just a guess.
>
> Ingo
>
>

2007-01-05 12:43:29

by Remy Bohmer

[permalink] [raw]
Subject: Re: [BUG-RT] RTC has been stopped-> long delay during boot, soft reboot->GRUB fails to call getrtsecs()

Hello Dries,

Thanks for your reply, but as it looks a lot like the same problem, I
want to mention that we do NOT have a Dell system here. It is a
Fujitsu Siemens i945 motherboard. I also saw the problem about "long
delays during boot by hwclock" on an old i845 Kontron Motherboard,
running the same installation/kernel as I mentioned, which we were
using for years and never showed this problems until we start using
this kernel. The RTC clock that was stopped is only seen twice on this
Fujitsu siemens board, not on any other.

So, as it is not a Dell system ,and we therefor have a different BIOS,
I doubt it is BIOS related. Further, I discovered that the problem
also occured in a system that is not tickless. I can therefor exclude
now that it is NOT related to the CONFIG_NO_HZ option, despite what I
mentioned in my previous mail.

In my case it also was NO battery problem, but removing the battery
was the only way to reset the RTC to get it ticking again.

We have enabled HPET in BIOS and kernel.

I have tested the 2.6.20rc3-rt0 kernel of Ingo (as he suggested) by
booting it a few times, and until now we have not seen this problem,
but the long term will learn if it is really gone.


Kind Regards,

Remy B?hmer



2007/1/5, Dries Kimpe <[email protected]>:
> In-Reply-To: <[email protected]>
>
> I found this mail on the LKML.org list, and didn't want to bother to
> subscribe to the list, so I post this directly. Sorry ;-)
>
> I'm suspecting the problem is not related to the rt-kernel at all.
>
> This looks like a well known (but no real solution as far as I know)
> DELL bios problem.
>
> * Somehow, the RTC gets corrupted and stops counting.
> * On recent dell laptops (D420, a.o.) the BIOS sometimes checks the
> clock (everytime a thorough BIOS check is done)
> and just stops with the message "time-of-day clock stopped" (look for
> this on google); On some systems, one can enter the BIOS setup at this
> point,
> causing the bios to reset the clock and solving the problem. On others
> (like the D420), the only problem is to make the B IOS reinitialize the
> clock.
>
> * Once the clock is corrupted, it never runs again (some say a reboot in
> XP can solve it);
> It is NOT a battery problem. Just disconnecting the battery, causing
> the BIOS to reinitialize NVRAM solves the problem.
>
> I use to have this problem on my D420, and it seemed to go away by:
> - disabling the RTC interrupt in the kernel
> - enabling the HPET timer RTC emulation
>
> More info:
> http://www.ubuntuforums.org/showthread.php?t=176954
> http://www.ubuntuforums.org/showthread.php?t=149565
> https://launchpad.net/distros/ubuntu/+source/linux-source-2.6.15/+bug/43745
>
> Hope this helps.
>
> Greetings,
> Dries
>
>
>
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>

2007-01-05 16:28:03

by Remy Bohmer

[permalink] [raw]
Subject: Re: [BUG-RT] RTC has been stopped-> long delay during boot, soft reboot->GRUB fails to call getrtsecs()

Hello All,

Thanks to the hint of Dries, I discovered that there was a difference
in my kernel configuration and the kernel configuration of Ingo
I streamlined my configuration to that of Ingo, and now the slow boot
issue is completely gone. I do not understand the exact relation, but
it works for me.

Thanks to you all.

FYI: Attached I have put the differences I had to make to make it work.


Kind Regards,

Remy Bohmer


2007/1/5, Remy Bohmer <[email protected]>:
> Hello Dries,
>
> Thanks for your reply, but as it looks a lot like the same problem, I
> want to mention that we do NOT have a Dell system here. It is a
> Fujitsu Siemens i945 motherboard. I also saw the problem about "long
> delays during boot by hwclock" on an old i845 Kontron Motherboard,
> running the same installation/kernel as I mentioned, which we were
> using for years and never showed this problems until we start using
> this kernel. The RTC clock that was stopped is only seen twice on this
> Fujitsu siemens board, not on any other.
>
> So, as it is not a Dell system ,and we therefor have a different BIOS,
> I doubt it is BIOS related. Further, I discovered that the problem
> also occured in a system that is not tickless. I can therefor exclude
> now that it is NOT related to the CONFIG_NO_HZ option, despite what I
> mentioned in my previous mail.
>
> In my case it also was NO battery problem, but removing the battery
> was the only way to reset the RTC to get it ticking again.
>
> We have enabled HPET in BIOS and kernel.
>
> I have tested the 2.6.20rc3-rt0 kernel of Ingo (as he suggested) by
> booting it a few times, and until now we have not seen this problem,
> but the long term will learn if it is really gone.
>
>
> Kind Regards,
>
> Remy B?hmer
>
>
>
> 2007/1/5, Dries Kimpe <[email protected]>:
> > In-Reply-To: <[email protected]>
> >
> > I found this mail on the LKML.org list, and didn't want to bother to
> > subscribe to the list, so I post this directly. Sorry ;-)
> >
> > I'm suspecting the problem is not related to the rt-kernel at all.
> >
> > This looks like a well known (but no real solution as far as I know)
> > DELL bios problem.
> >
> > * Somehow, the RTC gets corrupted and stops counting.
> > * On recent dell laptops (D420, a.o.) the BIOS sometimes checks the
> > clock (everytime a thorough BIOS check is done)
> > and just stops with the message "time-of-day clock stopped" (look for
> > this on google); On some systems, one can enter the BIOS setup at this
> > point,
> > causing the bios to reset the clock and solving the problem. On others
> > (like the D420), the only problem is to make the B IOS reinitialize the
> > clock.
> >
> > * Once the clock is corrupted, it never runs again (some say a reboot in
> > XP can solve it);
> > It is NOT a battery problem. Just disconnecting the battery, causing
> > the BIOS to reinitialize NVRAM solves the problem.
> >
> > I use to have this problem on my D420, and it seemed to go away by:
> > - disabling the RTC interrupt in the kernel
> > - enabling the HPET timer RTC emulation
> >
> > More info:
> > http://www.ubuntuforums.org/showthread.php?t=176954
> > http://www.ubuntuforums.org/showthread.php?t=149565
> > https://launchpad.net/distros/ubuntu/+source/linux-source-2.6.15/+bug/43745
> >
> > Hope this helps.
> >
> > Greetings,
> > Dries
> >
> >
> >
> > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
> >
>


Attachments:
(No filename) (3.38 kB)
config.diff (1.87 kB)
Download all attachments