2001-10-29 20:31:12

by jjs

[permalink] [raw]
Subject: Nasty suprise with uptime

Hi guys,

This weekend I checked our mail/dns servers (on kernel 2.2.17-pre4)
and received a nasty surprise. The uptime, which had been 496+ days
on Friday, was back down to a few hours. I was ready to lart somebody
with great vigor when I realized the uptime counter had simply wrapped
around.

So, I thought to myself, at least the 2.4 kernels on our new boxes won't

have that silly, irritating limitation - or will they?

I checked include/linux/kernel.h on my workstation, which is running
2.4.14-pre3, and found that the uptime field in struct sysinfo is
exactly
the same as that in the 2.2. kernel on the mailservers, e.g.

--- snip ---
struct sysinfo {
long uptime; /* Seconds since boot */
--- snip ---

Say it ain't so! maybe I'm a bit dense, but is the 2.4 kernel also going

to wrap around after 497 days uptime? I'd be glad if someone would
point out the error in my understanding.

Thanks,

jjs



2001-10-29 20:33:52

by Alan

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

> and received a nasty surprise. The uptime, which had been 496+ days
> on Friday, was back down to a few hours. I was ready to lart somebody
> with great vigor when I realized the uptime counter had simply wrapped
> around.
>
> So, I thought to myself, at least the 2.4 kernels on our new boxes won't

It wraps at 496 days. The drivers are aware of it and dont crash the box

Alan

2001-10-29 20:39:32

by jjs

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

Alan Cox wrote:

> > and received a nasty surprise. The uptime, which had been 496+ days
> > on Friday, was back down to a few hours. I was ready to lart somebody
> > with great vigor when I realized the uptime counter had simply wrapped
> > around.
> >
> > So, I thought to myself, at least the 2.4 kernels on our new boxes won't
>
> It wraps at 496 days. The drivers are aware of it and dont crash the box

Yes, and these boxes are still running fine - other
than showing some processes that were started
in the year 2003... but DAMN, what an eyesore -
uptime ruined as far as anybody can tell, times
and dates no longer making any sense.

So, is there an implicit Linux policy to upgrade
the distro, or at least the kernel, every 496 days
whether it needs it or not?

;-)

cu

jjs

2001-10-29 20:48:22

by Matthew Dharm

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

No, but there are a couple of applicable Linux policies:
(1) If it breaks, you get to keep both halves.
(2) If it's broken, fix it yourself.

:)

Matt

On Mon, Oct 29, 2001 at 12:39:37PM -0800, J Sloan wrote:
> Alan Cox wrote:
>
> > > and received a nasty surprise. The uptime, which had been 496+ days
> > > on Friday, was back down to a few hours. I was ready to lart somebody
> > > with great vigor when I realized the uptime counter had simply wrapped
> > > around.
> > >
> > > So, I thought to myself, at least the 2.4 kernels on our new boxes won't
> >
> > It wraps at 496 days. The drivers are aware of it and dont crash the box
>
> Yes, and these boxes are still running fine - other
> than showing some processes that were started
> in the year 2003... but DAMN, what an eyesore -
> uptime ruined as far as anybody can tell, times
> and dates no longer making any sense.
>
> So, is there an implicit Linux policy to upgrade
> the distro, or at least the kernel, every 496 days
> whether it needs it or not?
>
> ;-)
>
> cu
>
> jjs
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Matthew Dharm Home: [email protected]
Maintainer, Linux USB Mass Storage Driver

NYET! The evil stops here!
-- Pitr
User Friendly, 6/22/1998


Attachments:
(No filename) (1.46 kB)
(No filename) (232.00 B)
Download all attachments

2001-10-29 20:52:22

by jjs

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

Matthew Dharm wrote:

> No, but there are a couple of applicable Linux policies:
> (1) If it breaks, you get to keep both halves.
> (2) If it's broken, fix it yourself.

hmm, maybe I'll send in an uptime patch and
see what sort of feedback I get...

Gotta get those 1000 day uptimes to silence
the bsd bigots!

cu

jjs

2001-10-29 22:26:01

by David Relson

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

Let's assume you have the counter changed to 32 bits - RIGHT NOW
(tm). Build a kernel, install it, reboot. It'll be over a year (approx
Jan 2003) before the change will be noticeable...

Methinks that's a long time to wait for a result :-)

David


At 04:52 PM 10/29/01, J Sloan wrote:
>Matthew Dharm wrote:
>
> > No, but there are a couple of applicable Linux policies:
> > (1) If it breaks, you get to keep both halves.
> > (2) If it's broken, fix it yourself.
>
>hmm, maybe I'll send in an uptime patch and
>see what sort of feedback I get...
>
>Gotta get those 1000 day uptimes to silence
>the bsd bigots!
>
>cu
>
>jjs

2001-10-29 23:10:20

by Mike Fedyk

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

On Mon, Oct 29, 2001 at 12:31:12PM -0800, J Sloan wrote:
> Say it ain't so! maybe I'm a bit dense, but is the 2.4 kernel also going
> to wrap around after 497 days uptime? I'd be glad if someone would
> point out the error in my understanding.

Ahh, so that's why there haven't been any reports of higher uptimes... ;)

2001-10-29 23:20:31

by jjs

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

Mike Fedyk wrote:

> On Mon, Oct 29, 2001 at 12:31:12PM -0800, J Sloan wrote:
> > Say it ain't so! maybe I'm a bit dense, but is the 2.4 kernel also going
> > to wrap around after 497 days uptime? I'd be glad if someone would
> > point out the error in my understanding.
>
> Ahh, so that's why there haven't been any reports of higher uptimes... ;)

Yes, it all makes sense now -

Say, if the uptime field were unsigned it could
reach 995 days uptime before wraparound -

Surely nobody would mind having to upgrade
their kernel after 994+ days....

Well strictly speaking an upgrade isn't
forced, but if the (perceived) uptime is down
the tubes anyway, might as well update the
kernel, or the distro level for that matter.

cu

jjs





2001-10-29 23:28:00

by Mike Fedyk

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

On Mon, Oct 29, 2001 at 03:20:02PM -0800, J Sloan wrote:
> Mike Fedyk wrote:
>
> > On Mon, Oct 29, 2001 at 12:31:12PM -0800, J Sloan wrote:
> > > Say it ain't so! maybe I'm a bit dense, but is the 2.4 kernel also going
> > > to wrap around after 497 days uptime? I'd be glad if someone would
> > > point out the error in my understanding.
> >
> > Ahh, so that's why there haven't been any reports of higher uptimes... ;)
>
> Yes, it all makes sense now -
>

Just imagine the headline:

Kernel 2.6 now allows uptimes higher than 1.47 years!

_Click_Here_

*Click*

.........

The change was origionally included in 2.5.4, which was origionally known as
2.4.14...

;)

2001-10-29 23:29:40

by Zan Lynx

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

A 32 bit uptime patch should also include a new kernel parameter that
could be passed from LILO: uptime. Then you could test the uptime patch
by passing uptime=4294967295

Or make /proc/uptime writable.

David Relson wrote:

> Let's assume you have the counter changed to 32 bits - RIGHT NOW
> (tm). Build a kernel, install it, reboot. It'll be over a year
> (approx Jan 2003) before the change will be noticeable...
>
> Methinks that's a long time to wait for a result :-)
>
> David
>


2001-10-30 07:36:43

by Neale Banks

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

On Mon, 29 Oct 2001, Alan Cox wrote:

> > and received a nasty surprise. The uptime, which had been 496+ days
> > on Friday, was back down to a few hours. I was ready to lart somebody
> > with great vigor when I realized the uptime counter had simply wrapped
> > around.
> >
> > So, I thought to myself, at least the 2.4 kernels on our new boxes won't
>
> It wraps at 496 days. The drivers are aware of it and dont crash the box

You mean there was a time when uptime>496days would crash a system?

If so, approximtely when did that get fixed?

(I'm thinking back to an as yet unexplained crash of a 2.0.38 system at
~496days uptime :-( )

Thanks,
Neale.

2001-10-30 07:46:03

by Mike Fedyk

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

On Tue, Oct 30, 2001 at 06:46:03PM +1100, Neale Banks wrote:
> On Mon, 29 Oct 2001, Alan Cox wrote:
>
> > > and received a nasty surprise. The uptime, which had been 496+ days
> > > on Friday, was back down to a few hours. I was ready to lart somebody
> > > with great vigor when I realized the uptime counter had simply wrapped
> > > around.
> > >
> > > So, I thought to myself, at least the 2.4 kernels on our new boxes won't
> >
> > It wraps at 496 days. The drivers are aware of it and dont crash the box
>
> You mean there was a time when uptime>496days would crash a system?
>
> If so, approximtely when did that get fixed?
>
> (I'm thinking back to an as yet unexplained crash of a 2.0.38 system at
> ~496days uptime :-( )
>

AFAIK, the system didn't crash, but the uptime counter went down to zero.

2001-10-30 08:15:04

by Ville Herva

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

On Mon, Oct 29, 2001 at 11:46:15PM -0800, you [Mike Fedyk] claimed:
> On Tue, Oct 30, 2001 at 06:46:03PM +1100, Neale Banks wrote:
> >
> > You mean there was a time when uptime>496days would crash a system?
> >
> > If so, approximtely when did that get fixed?
> >
> > (I'm thinking back to an as yet unexplained crash of a 2.0.38 system at
> > ~496days uptime :-( )
>
> AFAIK, the system didn't crash, but the uptime counter went down to zero.

Oh yes, sometimes 2.0 kernel would crash at 497.1 days?. I guess it depends
on what you were doing at the time and what drivers and options you were
using. I think most of the jiffies wraparound bugs were cleaned at the
2.1.x time (so I have been told.)

(I've experienced one such crash, I'm not sure whether it was 2.0.36 or
2.0.38.)


-- v --

[email protected]

?) it is 497.10 days or 2^32 seconds, not 496 days.

2001-10-30 08:21:35

by Ville Herva

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

On Mon, Oct 29, 2001 at 03:20:02PM -0800, you [J Sloan] claimed:
> Mike Fedyk wrote:
>
> > On Mon, Oct 29, 2001 at 12:31:12PM -0800, J Sloan wrote:
> > > Say it ain't so! maybe I'm a bit dense, but is the 2.4 kernel also going
> > > to wrap around after 497 days uptime? I'd be glad if someone would
> > > point out the error in my understanding.
> >
> > Ahh, so that's why there haven't been any reports of higher uptimes... ;)
>
> Yes, it all makes sense now -
>
> Say, if the uptime field were unsigned it could
> reach 995 days uptime before wraparound -

AFAIK, the jiffies field _is_ unsigned already. In fact 2.0 kernels had some
problems at 2^31 HZ as well. (Stuff like select misbehaving, and some
procps utils giving incorrect results).

2^32 HZ is 2^32/100 seconds is 2^32/3600/100/24 = 497.1 days.
2^31 HZ is 2^31/100 seconds is 2^31/3600/100/24 = 248.55 days.

(HZ=1/100 by default on x86 etc, it is 1/1024 or 1/1000 at least on alpha).

You need 64 bit jiffies for longer uptimes.

BTW, on win95 the HZ is 1024, which caused it to _always_ crash if it ever
reached 48.5 days of uptime. I've seen NT4 SMP to to crash at same point as
well (though it doesn't do it always).


-- v --

[email protected]

2001-10-30 08:22:18

by Ville Herva

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

On Tue, Oct 30, 2001 at 10:15:08AM +0200, you [Ville Herva] claimed:
>
> ?) it is 497.10 days or 2^32 seconds, not 496 days.

Sorry, 2^32 / 100 seconds.


-- v --

[email protected]

2001-10-30 08:30:46

by George Anzinger

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

J Sloan wrote:
>
> Alan Cox wrote:
>
> > > and received a nasty surprise. The uptime, which had been 496+ days
> > > on Friday, was back down to a few hours. I was ready to lart somebody
> > > with great vigor when I realized the uptime counter had simply wrapped
> > > around.
> > >
> > > So, I thought to myself, at least the 2.4 kernels on our new boxes won't
> >
> > It wraps at 496 days. The drivers are aware of it and dont crash the box
>
> Yes, and these boxes are still running fine - other
> than showing some processes that were started
> in the year 2003... but DAMN, what an eyesore -
> uptime ruined as far as anybody can tell, times
> and dates no longer making any sense.
>
> So, is there an implicit Linux policy to upgrade
> the distro, or at least the kernel, every 496 days
> whether it needs it or not?

Time for a plug for the High-res-timers project. We have expanded
jiffies to 64 bits. It can be read as the CLOCK_MONOTONIC via the new
POSIX timers interface (part of high-res-timers). Haven't fixed uptime
yet, but hay, I got 496 days to do it :)

Find our latest patch here:
https://sourceforge.net/projects/high-res-timers/

George

2001-10-30 08:36:47

by jdow

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

From: "Ville Herva" <[email protected]>

> Oh yes, sometimes 2.0 kernel would crash at 497.1 days?. I guess it depends
> on what you were doing at the time and what drivers and options you were
> using. I think most of the jiffies wraparound bugs were cleaned at the
> 2.1.x time (so I have been told.)
>
> (I've experienced one such crash, I'm not sure whether it was 2.0.36 or
> 2.0.38.)

Thanks for the information.

Gee, now I don't feel so bad about bringing down my firewall machine that
had been up 454.25 days before I brought it down to reconfigure it with
two NICs to support the new DSL connection. To think I was only 43 days
from a possible crash anyway takes some of the sting out of the event.

{^_-} Joanne Dow, [email protected]

2001-10-30 08:59:07

by George Anzinger

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

J Sloan wrote:
>
> Mike Fedyk wrote:
>
> > On Mon, Oct 29, 2001 at 12:31:12PM -0800, J Sloan wrote:
> > > Say it ain't so! maybe I'm a bit dense, but is the 2.4 kernel also going
> > > to wrap around after 497 days uptime? I'd be glad if someone would
> > > point out the error in my understanding.
> >
> > Ahh, so that's why there haven't been any reports of higher uptimes... ;)
>
> Yes, it all makes sense now -
>
> Say, if the uptime field were unsigned it could
> reach 995 days uptime before wraparound -

Actually 497 days is from the max jiffies in an unsigned int. Up time
converts this to seconds... (HZ = 100) jiffies units are 1/HZ.

George

>
> Surely nobody would mind having to upgrade
> their kernel after 994+ days....
>
> Well strictly speaking an upgrade isn't
> forced, but if the (perceived) uptime is down
> the tubes anyway, might as well update the
> kernel, or the distro level for that matter.
>
> cu
>
> jjs
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2001-10-30 09:00:17

by Alan

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

> Say, if the uptime field were unsigned it could
> reach 995 days uptime before wraparound -

Only on a 33bit processor - and those are kind of rare

2001-10-30 09:28:07

by Alan

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

> > If so, approximtely when did that get fixed?
> >
> > (I'm thinking back to an as yet unexplained crash of a 2.0.38 system at
> > ~496days uptime :-( )
> >
> AFAIK, the system didn't crash, but the uptime counter went down to zero.

Some drivers used to get handling the wrap wrong and then break. Also in
the older kernels the alpha floppy driver only worked for the first 25 of
each 50 days

2001-10-30 09:47:38

by bert hubert

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

On Mon, Oct 29, 2001 at 12:39:37PM -0800, J Sloan wrote:

> So, is there an implicit Linux policy to upgrade
> the distro, or at least the kernel, every 496 days
> whether it needs it or not?

Having huge uptimes is by the way not adviseable operational policy
according to many. Chances are you will be in for a nasty surprise when you
reboot - do you remember after a year which daemons you 'started by hand'
and how?

Regards,

bert

--
http://www.PowerDNS.com Versatile DNS Software & Services
Trilab The Technology People
Netherlabs BV / Rent-a-Nerd.nl - Nerd Available -
'SYN! .. SYN|ACK! .. ACK!' - the mating call of the internet

2001-10-30 10:37:23

by Mike Fedyk

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

On Tue, Oct 30, 2001 at 10:47:51AM +0100, bert hubert wrote:
> On Mon, Oct 29, 2001 at 12:39:37PM -0800, J Sloan wrote:
>
> > So, is there an implicit Linux policy to upgrade
> > the distro, or at least the kernel, every 496 days
> > whether it needs it or not?
>
> Having huge uptimes is by the way not adviseable operational policy
> according to many. Chances are you will be in for a nasty surprise when you
> reboot - do you remember after a year which daemons you 'started by hand'
> and how?
>

Very, very true. This has happened to me a couple times with only a couple
months uptime... :(

My configs have since stabalized so that hasn't been a problem for me
recently...

Mike

2001-10-30 11:05:15

by George Anzinger

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

Jonathan Briggs wrote:
>
> A 32 bit uptime patch should also include a new kernel parameter that
> could be passed from LILO: uptime. Then you could test the uptime patch
> by passing uptime=4294967295
>
> Or make /proc/uptime writable.

NO NO NO!

First uptime is a conversion of jiffies. Second, the POSIX standard
wants a CLOCK_MONOTONIC which, by definition, can not be set. Jiffies
is the most reasonable source for this clock. I am afraid you will have
to accumulate "real" time for uptime :)

George


>
> David Relson wrote:
>
> > Let's assume you have the counter changed to 32 bits - RIGHT NOW
> > (tm). Build a kernel, install it, reboot. It'll be over a year
> > (approx Jan 2003) before the change will be noticeable...
> >
> > Methinks that's a long time to wait for a result :-)
> >
> > David
> >
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2001-10-30 13:50:38

by Tim Walberg

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

Wouldn't it be fairly simple for the kernel to just remember the (wall
clock) time at boot, and uptime just subtract that from the current
(wall clock) time? It would be another variable in the kernel requiring
storage but not having a whole lot of use, but what's another 8 bytes
these days? (ok, maybe it would be more critical in embedded and other
such space critical applications, but not for general desktop/server
use...)


tw



On 10/30/2001 00:53 -0800, george anzinger wrote:
>> Jonathan Briggs wrote:
>> >
>> > A 32 bit uptime patch should also include a new kernel parameter that
>> > could be passed from LILO: uptime. Then you could test the uptime patch
>> > by passing uptime=4294967295
>> >
>> > Or make /proc/uptime writable.
>>
>> NO NO NO!
>>
>> First uptime is a conversion of jiffies. Second, the POSIX standard
>> wants a CLOCK_MONOTONIC which, by definition, can not be set. Jiffies
>> is the most reasonable source for this clock. I am afraid you will have
>> to accumulate "real" time for uptime :)
>>
>> George
>>
>>
>> >
>> > David Relson wrote:
>> >
>> > > Let's assume you have the counter changed to 32 bits - RIGHT NOW
>> > > (tm). Build a kernel, install it, reboot. It'll be over a year
>> > > (approx Jan 2003) before the change will be noticeable...
>> > >
>> > > Methinks that's a long time to wait for a result :-)
>> > >
>> > > David
>> > >
>> >
>> > -
>> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> > the body of a message to [email protected]
>> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>> > Please read the FAQ at http://www.tux.org/lkml/
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
End of included message



--
[email protected]


Attachments:
(No filename) (1.95 kB)
(No filename) (175.00 B)
Download all attachments

2001-10-30 14:47:23

by GOMBAS Gabor

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

On Tue, Oct 30, 2001 at 07:50:43AM -0600, Tim Walberg wrote:

> Wouldn't it be fairly simple for the kernel to just remember the (wall
> clock) time at boot, and uptime just subtract that from the current
> (wall clock) time?

So every people with faulty CMOS batteries would have 30+ years of
uptime. And if the CMOS date is ahead of the real one and the admin
sets it back, you will get negative uptimes etc. If you want such
amusements, it is far easier to write an uptime program that just calls
random() instead of asking the kernel :)

Gabor

--
Gabor Gombas Eotvos Lorand University
E-mail: [email protected] Hungary

2001-10-30 15:39:24

by Tim Walberg

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

Hmm... ever hear of NTP? My general rule of thumb:
Never trust any CMOS clock; let the kernel keep track
of time and periodically update the CMOS clock so
that you (hopefully) get a reasonable starting point
when you boot. Trusting any clock with a cheap power
source to provide accurate time-keeping is an exercise
in futility... (and it's not necessarily the power
source's fault - even an outrageously expensive power
source doesn't guarantee good time-keeping). I think
of a CMOS clock as kind of a book mark. If the book
mark gets lost, I can still find where I left off,
it just takes a little more work.


tw

On 10/30/2001 15:47 +0100, GOMBAS Gabor wrote:
>> On Tue, Oct 30, 2001 at 07:50:43AM -0600, Tim Walberg wrote:
>>
>> > Wouldn't it be fairly simple for the kernel to just remember the (wall
>> > clock) time at boot, and uptime just subtract that from the current
>> > (wall clock) time?
>>
>> So every people with faulty CMOS batteries would have 30+ years of
>> uptime. And if the CMOS date is ahead of the real one and the admin
>> sets it back, you will get negative uptimes etc. If you want such
>> amusements, it is far easier to write an uptime program that just calls
>> random() instead of asking the kernel :)
>>
>> Gabor
>>
>> --
>> Gabor Gombas Eotvos Lorand University
>> E-mail: [email protected] Hungary
End of included message



--
[email protected]


Attachments:
(No filename) (1.43 kB)
(No filename) (175.00 B)
Download all attachments

2001-10-30 15:40:43

by Chris Meadors

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

On Tue, 30 Oct 2001, bert hubert wrote:

> Having huge uptimes is by the way not adviseable operational policy
> according to many. Chances are you will be in for a nasty surprise when you
> reboot - do you remember after a year which daemons you 'started by hand'
> and how?

While this isn't exactly on topic for l-k, I just thought that I would
share my recent pain as it does fit in this thread.

I had a box that I inherited. It just did its job. Actually most of my
co-workers didn't even know which box performed this function, and when
they saw the physical box, they didn't know what it did.

It was running a 2.0.x kernel. One day after running for a long time (I'm
guessing close to 500 days) it just went wacky. I tried getting into the
machine by ssh, but nothing was really working right at all. So I figured
a reboot was in order.

This was a crappy 486/66, with no reset button. So a power cycle was
called for. When I started the machine back up, it did the file system
has not been checked in a long time thing, and it started the fsck.

After a bit I saw read errors start to spew to the screen, tons of bad
blocks. Then the machine squealed for a few seconds, clicked, and then
all was silent, except for the steady stream of errors on the console.

A second power cycle confermed what I already knew. The BIOS reported a
failure in the disk controller (the drive would spin up for 2 seconds
squeal and click a little bit as it spun back down).

This machine was configured to do a task, just forward messages to a
paging terminal. It's configuration was never changed. It had a one of
those floppy-tape drives in it. I knew were the backup tape was, it was
made 3 years ago when the machine was first put into action.

Of course the tape was unreadble at this point. So the installation and
configuration was recreated from my memory. Luckly I have a good memory,
but it did take me 2 days to get everything running right again.

So the moral of the story is. Reboot every-so-often. Set your fsck to
run at around 2 months, x number of reboots is good too. I like to
stagger my partitions with 5 reboots between each, even on journaled
filesystems. And verify your backups even if the machine isn't changing.

I usually follow those rules, but the cute little 486 in the corner with
the 240MB hard drive, 16MB of RAM, and the monster uptimes was just too
much fun to brag about. I'm not bragging anymore, and it is disassembled
on the floor in my office.

-Chris
--
Two penguins were walking on an iceberg. The first penguin said to the
second, "you look like you are wearing a tuxedo." The second penguin
said, "I might be..." --David Lynch, Twin Peaks

2001-10-30 16:18:04

by GOMBAS Gabor

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

On Tue, Oct 30, 2001 at 09:39:13AM -0600, Tim Walberg wrote:

> Hmm... ever hear of NTP?

Do you want to include an NTP daemon in the kernel? The timestamp
you suggested is taken way before any user mode daemon starts. Sure,
you can do the timestamp in userspace if you do not mind to lose a few
minutes precision (or whatever time the NTP daemon needs to
synchronize), but then we could just get rid of /proc/uptime and
claim that the whole thing is an userspace issue.

And what about my home machine if I do not want to dial in to my ISP right
after boot? You say that uptime should not be calculated if there are no
NTP servers reachable?

Gabor

--
Gabor Gombas Eotvos Lorand University
E-mail: [email protected] Hungary

2001-10-30 16:22:05

by Laurent de Segur

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

NTP? Oh, this thing I once installed and that was hanging my machine at boot
time for a couple of minutes until it timed out when my laptop was not
connected to the network?
Thanks, but no thanks.

Laurent

> From: Chris Meadors <[email protected]>
> Date: Tue, 30 Oct 2001 10:56:30 -0500 (EST)
> To: bert hubert <[email protected]>
> Cc: Linux kernel <[email protected]>
> Subject: Re: Nasty suprise with uptime
>
> On Tue, 30 Oct 2001, bert hubert wrote:
>
>> Having huge uptimes is by the way not adviseable operational policy
>> according to many. Chances are you will be in for a nasty surprise when you
>> reboot - do you remember after a year which daemons you 'started by hand'
>> and how?
>
> While this isn't exactly on topic for l-k, I just thought that I would
> share my recent pain as it does fit in this thread.
>
> I had a box that I inherited. It just did its job. Actually most of my
> co-workers didn't even know which box performed this function, and when
> they saw the physical box, they didn't know what it did.
>
> It was running a 2.0.x kernel. One day after running for a long time (I'm
> guessing close to 500 days) it just went wacky. I tried getting into the
> machine by ssh, but nothing was really working right at all. So I figured
> a reboot was in order.
>
> This was a crappy 486/66, with no reset button. So a power cycle was
> called for. When I started the machine back up, it did the file system
> has not been checked in a long time thing, and it started the fsck.
>
> After a bit I saw read errors start to spew to the screen, tons of bad
> blocks. Then the machine squealed for a few seconds, clicked, and then
> all was silent, except for the steady stream of errors on the console.
>
> A second power cycle confermed what I already knew. The BIOS reported a
> failure in the disk controller (the drive would spin up for 2 seconds
> squeal and click a little bit as it spun back down).
>
> This machine was configured to do a task, just forward messages to a
> paging terminal. It's configuration was never changed. It had a one of
> those floppy-tape drives in it. I knew were the backup tape was, it was
> made 3 years ago when the machine was first put into action.
>
> Of course the tape was unreadble at this point. So the installation and
> configuration was recreated from my memory. Luckly I have a good memory,
> but it did take me 2 days to get everything running right again.
>
> So the moral of the story is. Reboot every-so-often. Set your fsck to
> run at around 2 months, x number of reboots is good too. I like to
> stagger my partitions with 5 reboots between each, even on journaled
> filesystems. And verify your backups even if the machine isn't changing.
>
> I usually follow those rules, but the cute little 486 in the corner with
> the 240MB hard drive, 16MB of RAM, and the monster uptimes was just too
> much fun to brag about. I'm not bragging anymore, and it is disassembled
> on the floor in my office.
>
> -Chris
> --
> Two penguins were walking on an iceberg. The first penguin said to the
> second, "you look like you are wearing a tuxedo." The second penguin
> said, "I might be..." --David Lynch, Twin Peaks
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2001-10-30 16:25:45

by Matt Bernstein

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

At 18:46 +1100 Neale Banks wrote:

>> It wraps at 496 days. The drivers are aware of it and dont crash the box
>
>You mean there was a time when uptime>496days would crash a system?

Amusingly I had to fight at an old workplace of mine to upgrade to a 2.1
kernel. I built 2.1.103 with a snapshot of a compiler which eventually
became egcs-1.0.3 (IIRC) which hit that bug (long after I'd left the job).

I believe the fix went in at 2.1.106 or so :-/

2001-10-30 16:35:35

by Jesse Pollard

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

Chris Meadors <[email protected]>:
>This was a crappy 486/66, with no reset button. So a power cycle was
>called for. When I started the machine back up, it did the file system
>has not been checked in a long time thing, and it started the fsck.
...
>I usually follow those rules, but the cute little 486 in the corner with
>the 240MB hard drive, 16MB of RAM, and the monster uptimes was just too
>much fun to brag about. I'm not bragging anymore, and it is disassembled
>on the floor in my office.

Hey now, I have one of those (8MB instead of 16) and it's still working
fine... HOWEVER, I did replace the disk controller and disk drive during
one of the upgrade/update cycles with a leftover 2GB drive... Dumped the
128MB ftape, and 540 MB disk.

I decided that the backups were easier to do over the net...
Installs are easier that way too...
The Slackware two disk install lets me restore across the net... (haven't
had to do it though).

And it functions just fine as a firewall handling a DSL connection, e-mail
port, and http port. It has been happy for last 73 days (house lost power
for a couple of hours.. UPS only good for 30min). Now that I have a spare
CDROM, it's time to look into another upgrade to migrate from disk drives
to CD boot (if I can cooerce the BIOS to boot a CD.. otherwise I'll just
have to leave a boot floppy in).
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]

Any opinions expressed are solely my own.

2001-10-30 18:17:46

by jjs

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

george anzinger wrote:

> J Sloan wrote:
>
> > Say, if the uptime field were unsigned it could
> > reach 995 days uptime before wraparound -
>
> Actually 497 days is from the max jiffies in an unsigned int. Up time
> converts this to seconds... (HZ = 100) jiffies units are 1/HZ.

]Yes, I see now you are right -

Once I bothered to do the arithmetic,
I see it's already being treated as an
unsigned long, so just changing the
type in the struct won't buy us anything....

So much for quick fixes - I wonder what
FreeBSD does here...

cu

jjs

2001-10-30 19:19:25

by jjs

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

Alan Cox wrote:

> > Say, if the uptime field were unsigned it could
> > reach 995 days uptime before wraparound -
>
> Only on a 33bit processor - and those are kind of rare

Yes (DOH) - I was fooled when I saw the
long data type - when I actually did the
arithmetic to make sure, I saw that was
a red herring.

cu

jjs


2001-10-30 19:42:57

by Oden Eriksson

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

On Tuesdayen den 30 October 2001 17.25, Matt Bernstein wrote:
> At 18:46 +1100 Neale Banks wrote:
> >> It wraps at 496 days. The drivers are aware of it and dont crash the box
> >
> >You mean there was a time when uptime>496days would crash a system?

I think this thread is pretty stupid..., who with a little common sense ever
treasure uptime? What's important is the total downtime..., and why...

I liked one idea that popped up here where one would use a random uptime...,
that would get ignorant people to be amazed and wonder how _that_ could
be....

--
Oden Eriksson

2001-10-30 21:52:43

by jjs

[permalink] [raw]
Subject: [OT] Re: Nasty suprise with uptime

bert hubert wrote:

> On Mon, Oct 29, 2001 at 12:39:37PM -0800, J Sloan wrote:
>
> > So, is there an implicit Linux policy to upgrade
> > the distro, or at least the kernel, every 496 days
> > whether it needs it or not?
>
> Having huge uptimes is by the way not adviseable operational policy
> according to many.

Well, it is certainly a calling card for reliability.

> Chances are you will be in for a nasty surprise when you
> reboot -

not bloody likely - if such a case did exist, big
brother would let us know immediately that a
service is missing.

> do you remember after a year which daemons you 'started by hand'
> and how?

All daemons are started with init scripts. If init
scripts do not exist for a particular service, they
are created and activated with checkconfig.

In addition, all installed programs are in rpm
format, and are installed in the most standard,
vanilla configuration possible.

cu

jjs

2001-10-30 22:53:34

by Mike Castle

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

On Tue, Oct 30, 2001 at 12:53:20AM -0800, george anzinger wrote:
> Jonathan Briggs wrote:
> > A 32 bit uptime patch should also include a new kernel parameter that
> > could be passed from LILO: uptime. Then you could test the uptime patch
> > by passing uptime=4294967295
>
> NO NO NO!
>
> First uptime is a conversion of jiffies. Second, the POSIX standard
> wants a CLOCK_MONOTONIC which, by definition, can not be set. Jiffies

I believe that at least some SVR4 systems allow you to set lbolt, either
during runtime or at boot.

You have to be able to do that to test things. Like drivers (last I check,
NCR still crashes upon wrap around), or utilities that monitor the uptime
so they can remind you that a reboot is necessary soon (so that you don't
crash).

mrc
--
Mike Castle [email protected] http://www.netcom.com/~dalgoda/
We are all of us living in the shadow of Manhattan. -- Watchmen
fatal ("You are in a maze of twisty compiler features, all different"); -- gcc

2001-10-30 22:53:14

by Jan Dvorak

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

On Mon, Oct 29, 2001 at 12:39:37PM -0800, J Sloan wrote:
> So, is there an implicit Linux policy to upgrade
> the distro, or at least the kernel, every 496 days
> whether it needs it or not?

Rather, you should think about your poor hw. It's nice to sit down at least once
a year, to clean up your box of that spider/ant feudalistic colonies, bug
airports, to check connectors, upgrade some components, and other such things
which you can't risk doing online at 32bit platform. You know, there are
some x86s which wasn't projected to even LAST as long as one year :-)

Jan

2001-10-30 23:25:14

by jjs

[permalink] [raw]
Subject: [OT] Re: Nasty suprise with uptime

Jan Dvorak wrote:

> On Mon, Oct 29, 2001 at 12:39:37PM -0800, J Sloan wrote:
> > So, is there an implicit Linux policy to upgrade
> > the distro, or at least the kernel, every 496 days
> > whether it needs it or not?
>
> Rather, you should think about your poor hw. It's nice to sit down at least once
> a year, to clean up your box of that spider/ant feudalistic colonies, bug
> airports, to check connectors, upgrade some components, and other such things
> which you can't risk doing online at 32bit platform. You know, there are
> some x86s which wasn't projected to even LAST as long as one year :-)

Certainly a point -

It's not too unreasonable to bring down a
server for maintenance every 16 months.

However this is good, expensive hardware...

Consider HP-UX 10.20, a 32-bit, 1996 vintage
commercial unix, in many ways somewhat
primitive compared to Linux:

root@zinc:/root# uname -a
HP-UX zinc B.10.20 U 9000/800 2003576880 unlimited-user license
root@zinc:/root# uptime
3:24pm up 681 days, 6:43, 12 users, load average: 1.17, 1.15, 1.15

So clearly, it's not rocket science....

cu

jjs

2001-10-31 22:36:38

by J Sloan

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

Vile Hernia wrote:

> BTW, on win95 the HZ is 1024, which caused it to _always_ crash if it ever
> reached 48.5 days of uptime. I've seen NT4 SMP to to crash at same point as
> well (though it doesn't do it always).

It's funny that windoze went for years
without anybody ever realizing about
the 49 day crash - heck, one crash
every 49 days is lost in the noise on
a windoze pee cee - no wonder they
never noticed.

OTOH, when our Linux uptimes went back
to zero at 497 days, I noticed immediately,
and screamed bloody murder until I found
it was just a timer wraparound.

cu

jjs


2001-11-01 00:47:04

by Gerhard Mack

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

On Wed, 31 Oct 2001, J Sloan wrote:

> Vile Hernia wrote:
>
> > BTW, on win95 the HZ is 1024, which caused it to _always_ crash if it ever
> > reached 48.5 days of uptime. I've seen NT4 SMP to to crash at same point as
> > well (though it doesn't do it always).
>
> It's funny that windoze went for years
> without anybody ever realizing about
> the 49 day crash - heck, one crash
> every 49 days is lost in the noise on
> a windoze pee cee - no wonder they
> never noticed.
>
> OTOH, when our Linux uptimes went back
> to zero at 497 days, I noticed immediately,
> and screamed bloody murder until I found
> it was just a timer wraparound.
>
Seems to be a cultural diffrence between windows and linux users.

Probably something for ESR or whoever to do a paper on ;)

Gerhard


--
Gerhard Mack

[email protected]

<>< As a computer I find your faith in technology amusing.

2001-11-09 00:46:21

by Dr. Kelsey Hudson

[permalink] [raw]
Subject: Re: Nasty suprise with uptime

On Mon, 29 Oct 2001, J Sloan wrote:

> Gotta get those 1000 day uptimes to silence
> the bsd bigots!

Silencing the BSD bigots would be akin to silencing the microsoft
bigots... The slime that promotes bsd is almost as intolerable :)

Kelsey Hudson [email protected]
Software Engineer
Compendium Technologies, Inc (619) 725-0771
---------------------------------------------------------------------------