2003-06-24 16:13:44

by James Bottomley

[permalink] [raw]
Subject: Large backwards time steps panic 2.5.73

I've got one of those fun machines with a failing bios batter that
always boots up with the BIOS clock about a year into the future.

2.5.73 always panics around the time ntpdate sets the clock back to its
normal value with:

kernel BUG at kernel/timer.c:377!
Kernel addresses on the stack:
[<10129510>] .L1641+0x0/0x38
[<10105be4>] dump_stack+0x18/0x24
[<101309fc>] .L996+0x0/0x58
[<10147e7c>] .L1188+0x18/0x48
[<10131130>] run_timer_softirq+0x15c/0x1b8
[<10131210>] do_timer+0x68/0x118
[<1012ccec>] __crc_scsi_put_command+0x8d/0xe9
[<101738cc>] locate_fd+0xd0/0x15c
[<1010737c>] do_cpu_irq_mask+0x90/0xf0
[<10334f24>] start_kernel+0x4/0x1fc
[<1010a068>] intr_return+0x0/0x14
[<10161a7c>] get_empty_filp+0x64/0xfc
[<101107a8>] .L894+0x14/0x18
[<10161a7c>] get_empty_filp+0x64/0xfc
[<10160044>] .L1853+0x94/0xcc
[<101738cc>] locate_fd+0xd0/0x15c
[<10334f24>] start_kernel+0x4/0x1fc
[<1011077c>] .L892+0x0/0x18
[<103350dc>] start_kernel+0x1bc/0x1fc

Reverting the patch

ChangeSet 1.1348.6.16 2003/06/20 22:13:39 [email protected]
[PATCH] revert adjtimex changes

From: John Stultz, George Anzinger, Eric Piel

Fixes the problem for me.

The above trace is from a HP PA-RISC machine running 2.5.73-pa1.

James



2003-06-24 19:44:15

by john stultz

[permalink] [raw]
Subject: Re: Large backwards time steps panic 2.5.73

On Tue, 2003-06-24 at 09:26, James Bottomley wrote:
> I've got one of those fun machines with a failing bios batter that
> always boots up with the BIOS clock about a year into the future.
>
> 2.5.73 always panics around the time ntpdate sets the clock back to its
> normal value with:
>
> kernel BUG at kernel/timer.c:377!
> Kernel addresses on the stack:

[snip]

> Reverting the patch
>
> ChangeSet 1.1348.6.16 2003/06/20 22:13:39 [email protected]
> [PATCH] revert adjtimex changes
>
> From: John Stultz, George Anzinger, Eric Piel
>
> Fixes the problem for me.
>
> The above trace is from a HP PA-RISC machine running 2.5.73-pa1.

Hmm. Odd. What is the HZ frequency on this machine?

thanks
-john


2003-06-24 19:53:18

by James Bottomley

[permalink] [raw]
Subject: Re: Large backwards time steps panic 2.5.73

On Tue, 2003-06-24 at 14:50, john stultz wrote:
> On Tue, 2003-06-24 at 09:26, James Bottomley wrote:
> > The above trace is from a HP PA-RISC machine running 2.5.73-pa1.
>
> Hmm. Odd. What is the HZ frequency on this machine?

On the kernel with the panic, 100. If I build a 64 bit kernel (which I
haven't done for .73 yet) I'll get 1000

James



2003-06-24 20:01:02

by john stultz

[permalink] [raw]
Subject: Re: Large backwards time steps panic 2.5.73

On Tue, 2003-06-24 at 13:07, James Bottomley wrote:
> On Tue, 2003-06-24 at 14:50, john stultz wrote:
> > On Tue, 2003-06-24 at 09:26, James Bottomley wrote:
> > > The above trace is from a HP PA-RISC machine running 2.5.73-pa1.
> >
> > Hmm. Odd. What is the HZ frequency on this machine?
>
> On the kernel with the panic, 100. If I build a 64 bit kernel (which I
> haven't done for .73 yet) I'll get 1000

Ok I'd be curious if it occurs there as well

The only bits the patch should touch are used in adjtimex, and adjtimex
is very limited on how much it can adjust time. If you're a year off or
whatever, its more likely ntpdate is calling stime/settimeofday.

Could you boot w/o ntp starting up, then manually run "ntpdate -b
<server>" to see if that causes it as well?

thanks
-john


2003-06-25 15:34:59

by James Bottomley

[permalink] [raw]
Subject: Re: Large backwards time steps panic 2.5.73

On Tue, 2003-06-24 at 15:06, john stultz wrote:
> The only bits the patch should touch are used in adjtimex, and adjtimex
> is very limited on how much it can adjust time. If you're a year off or
> whatever, its more likely ntpdate is calling stime/settimeofday.
>
> Could you boot w/o ntp starting up, then manually run "ntpdate -b
> <server>" to see if that causes it as well?

I can't seem to reproduce this with any sort of regularity. The only
data points I have are

- It doesn't occur when the adjtimex reversion is backed out
- It seems to occur shortly after the machine is rebooted with the clock
set to the future.
- it reproduces much more readily if ntpd is running

I've stuck some debugging code in there and find that the ->base for the
timer is NULL and the timer function is igmp_ifc_timer_expire.

I'll continue looking at this.

James