2008-01-08 16:07:53

by Tuomo Valkonen

[permalink] [raw]
Subject: The ext3 way of journalling


The ext3 journalling code can be summarised as:

superblock->last_checked = random();
sync(superblock)

I hate it: every time Linux crashes, e.g. due to power failure, it takes
almost an hour to boot, because the kernel has decided to corrupt the
superblock to indicate that it's been years since last file system
check. And obviously the crappy init system provides no simple way to
stop the checking, to put it in the background, or whatever. The FOSS
herd is totally concentrated on creating a WIMP idiot box -- a cheap
plastic clone of Windows -- instead of fixing such fundamental problems.
Windows, by the way, boots like a blaze compared to woeful Linux crap
(even without the very definition of pure shit: udev, which the crap
known as Linux practically requires these days).

A partial contributor to the slow fsck process is:

hde: ST3160023AS, ATA DISK drive
hde: applying pessimistic Seagate errata fix

# hdparm -t /dev/hde
/dev/hde:
Timing buffered disk reads: 48 MB in 3.01 seconds = 15.96 MB/sec

Thank you very much. The disk worked perfectly well without that "fix"
in earlier (2.2 or was it some 2.4?) kernels and, in Windows too. That
raw timing is worse than the _encrypted_ transfer rate I get from other
disks.

One should always indicate the version of software when complaining. Well,

$ uname -a
Linux noi 2.6.14 #1 PREEMPT Sun Oct 30 20:18:48 EET 2005 i686 GNU/Linux

I've tried upgrading, and failed: the megatonne monolith with a gazillion
hidden options (and totally worthless make oldconfig) is impossible to
compile these days, and the distros' stock kernel are utter and total crap
that load drivers in wrong order etc., and are difficult to configure
(demanding crap that demands udev to edit their initrds). Not to even
speak of the udev-demanding scsi-mapping insanity of SATA etc. devices
these days.

I've had it with Linux. It's no longer for power users. It's so complex
that it's only for idiot users that are content with the shoddy defaults,
and (paid) developers.

--
Tuomo


2008-01-08 16:35:26

by Jan Engelhardt

[permalink] [raw]
Subject: Re: The ext3 way of journalling


On Jan 8 2008 16:07, Tuomo Valkonen wrote:
>
>One should always indicate the version of software when complaining. Well,
>
> $ uname -a
> Linux noi 2.6.14 #1 PREEMPT Sun Oct 30 20:18:48 EET 2005 i686 GNU/Linux
>
>I've tried upgrading, and failed: the megatonne monolith with a gazillion
>hidden options (and totally worthless make oldconfig) is impossible to
>compile these days,

Do it step-by-step.

git checkout v2.6.15
make oldconfig
git checkout v2.6.16
make oldconfig
...

>and the distros' stock kernel are utter and total crap

I can recommend that you try another distribution then.

>that load drivers in wrong order etc.,

What specific modules and which order do you need for the disks?
There is also kernel-side loading order coming up:
http://lwn.net/Articles/260856/

>and are difficult to configure

I do not really have to configure anything on my machine. Then again,
yours might be vastly different. I can seamlessy switch distro and
self-built kernels, with the only extra that I have to call mkinitrd
(on opensuse that works without arguments even) for the self-built one.

>(demanding crap that demands udev to edit their initrds).

mkinitrd should take care of that.

2008-01-08 16:39:48

by John Stoffel

[permalink] [raw]
Subject: Re: The ext3 way of journalling

>>>>> "Tuomo" == Tuomo Valkonen <[email protected]> writes:

Tuomo> The ext3 journalling code can be summarised as:

superblock-> last_checked = random();
Tuomo> sync(superblock)

Tuomo> I hate it: every time Linux crashes, e.g. due to power failure,
Tuomo> it takes almost an hour to boot, because the kernel has decided
Tuomo> to corrupt the superblock to indicate that it's been years
Tuomo> since last file system check.

Bullshit. But first, why don't you post some bootup message logs so
people can actually look at the problem, instead of your space wasting
rant? Oh wait... I just did waste the time. Sigh...

Look at your filesystems, using 'tune2fs' and see if the ext3 journal
is actually turned on and used. If it's not, then I can see why
you're having problems on reboots.

Tuomo> And obviously the crappy init system provides no simple way to
Tuomo> stop the checking, to put it in the background, or
Tuomo> whatever. The FOSS herd is totally concentrated on creating a
Tuomo> WIMP idiot box -- a cheap plastic clone of Windows -- instead
Tuomo> of fixing such fundamental problems. Windows, by the way,
Tuomo> boots like a blaze compared to woeful Linux crap (even without
Tuomo> the very definition of pure shit: udev, which the crap known as
Tuomo> Linux practically requires these days).

<troll> And I fell for it.

Tuomo> A partial contributor to the slow fsck process is:

Tuomo> hde: ST3160023AS, ATA DISK drive
Tuomo> hde: applying pessimistic Seagate errata fix

Tuomo> # hdparm -t /dev/hde
Tuomo> /dev/hde:
Tuomo> Timing buffered disk reads: 48 MB in 3.01 seconds = 15.96 MB/sec

Tuomo> Thank you very much. The disk worked perfectly well without
Tuomo> that "fix" in earlier (2.2 or was it some 2.4?) kernels and, in
Tuomo> Windows too. That raw timing is worse than the _encrypted_
Tuomo> transfer rate I get from other disks.

So go back to 2.4 then, noone is stopping you. But I'd rather have a
slower disk that didn't corrupt my data behind my back...

Tuomo> One should always indicate the version of software when
Tuomo> complaining. Well,

Tuomo> $ uname -a Linux noi 2.6.14 #1 PREEMPT Sun Oct 30 20:18:48
Tuomo> EET 2005 i686 GNU/Linux

What CPU are you using? Chipset? Output of lspci? dmesg output?

Tuomo> I've tried upgrading, and failed: the megatonne monolith with a
Tuomo> gazillion hidden options (and totally worthless make oldconfig)
Tuomo> is impossible to compile these days, and the distros' stock
Tuomo> kernel are utter and total crap that load drivers in wrong
Tuomo> order etc., and are difficult to configure (demanding crap that
Tuomo> demands udev to edit their initrds). Not to even speak of the
Tuomo> udev-demanding scsi-mapping insanity of SATA etc. devices these
Tuomo> days.

What are you talking about?

Tuomo> I've had it with Linux. It's no longer for power users. It's so
Tuomo> complex that it's only for idiot users that are content with
Tuomo> the shoddy defaults, and (paid) developers.

"It's so complex it's for Idiot users" is really funny to read.


John

2008-01-08 16:53:17

by Tuomo Valkonen

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On 2008-01-08, Jan Engelhardt <[email protected]> wrote:
> Do it step-by-step.

Still too much work.

> I can recommend that you try another distribution then.

They all suck.

>>that load drivers in wrong order etc.,
>
> What specific modules and which order do you need for the disks?
> There is also kernel-side loading order coming up:
> http://lwn.net/Articles/260856/
>
>>and are difficult to configure
>
> I do not really have to configure anything on my machine. Then again,
> yours might be vastly different.

Typically distros' stock kernels load the intorable integrated buzz-chip
as the first sound card, and the wrong network adapter as eth0. SATA and
USB disks appear as some randomly ordered scsi nodes, and so on. This
could be configured with udev if one were willing to learn -- for a
fundamentally very trivial task -- yet another wheel-reinving unnecessarily
cryptic piece of shit, to tolerate distros breaking your cryptic config
constantly, and to tolerate its intolerable slowness at boot. (/dev should
still be the UI for modifying dynamic device mappings, reacting to ln, mv,
chmod, etc., instead of being reduced into a race-condition ridden tmpfs
shadow, that loses your normally created symlinks and permissions at boot.)

But the stock kernels also takes age at boot in (trying to) load a zillion
unnecessary drivers. They need to be configured to not load them. But
_obviously_ they won't use a _simple_ listing of the modules to load
(/etc/modules), but demand complex initrd editing. And all of those tools
that I've seen and have not required too much work, have again demanded
udev. No thanks.

--
Tuomo

2008-01-08 16:53:30

by Andi Kleen

[permalink] [raw]
Subject: Re: The ext3 way of journalling

Tuomo Valkonen <[email protected]> writes:

> The ext3 journalling code can be summarised as:
>
> superblock->last_checked = random();
> sync(superblock)
>
> I hate it: every time Linux crashes, e.g. due to power failure, it takes
> almost an hour to boot,

tune2fs -i0 -c0 device for each file system

Yes that should be default, unfortunately it is not. It's one
of the first things I do on new machines.

> Thank you very much. The disk worked perfectly well without that "fix"

fsck is actually seek bound, most likely it won't make much difference
for fsck. Seeky disk IO is always slow on a spinning disk.

There's actually been a patchkit recently to make fsck much faster
by clustering metadata better so it can be reached with less seeks,
but that hasn't reached mainline yet and and will unfortunately
require freshly created file systems.

-Andi

2008-01-08 16:59:26

by Tuomo Valkonen

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On 2008-01-08, John Stoffel <[email protected]> wrote:
> Look at your filesystems, using 'tune2fs' and see if the ext3 journal
> is actually turned on and used. If it's not, then I can see why
> you're having problems on reboots.

Journalling is on, but it's no use because the superblock always has
corrupted last-checked time at boot. "File system check forced: 31352
days since last check" or so.

> What CPU are you using? Chipset? Output of lspci? dmesg output?

Athlon XP 2500+, SiI 3112 (the obsoleted driver that makes the disk
appear as the predictable hde, not the random scsi mapping driver).

As for the rest... I'm on Windows, because I can't be arsed waiting
for an hour for Linux to boot.

--
Tuomo

2008-01-08 17:05:24

by Tuomo Valkonen

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On 2008-01-08, Andi Kleen <[email protected]> wrote:
> tune2fs -i0 -c0 device for each file system
>
> Yes that should be default, unfortunately it is not. It's one
> of the first things I do on new machines.

I have ages ago increased those counts, but I don't want to
completely disable them. The problem is that the superblock
is corrupted to indicate absurd "31352 days since last check".
Who knows, maybe it would even corrupt those settings.

--
Tuomo

2008-01-08 17:18:52

by Jan Engelhardt

[permalink] [raw]
Subject: Re: The ext3 way of journalling


On Jan 8 2008 16:52, Tuomo Valkonen wrote:
>
>> I can recommend that you try another distribution then.
>
>They all suck.

Roll your own.

>Typically distros' stock kernels load the intorable integrated buzz-chip
>as the first sound card,

While that is true, configuration tools such as, ads aside, yast2
(designed for those "idiots" you referred to with 'WIMP idiot box') has
a setting for which card should be loaded first. "Power users" may still
use the index= option of sound card modules and wire it up in
/etc/modprobe.d if they prefer.

>and the wrong network adapter as eth0.

You can guess my answer: udev will fix it. And actually, udev will
record the MAC address and the device name the first time it encounters
a new device, hence will always use the same interface name for a MAC
address. So the MAC--interface mapping may be wrong on the first
install (until you 'fix' the mapping so that it is to your liking),
but will remain the same afterwards.
Exempt are some nvidia-based weirdo chips which assign a random
MAC at every boot.

>SATA and
>USB disks appear as some randomly ordered scsi nodes, and so on.

Well what do you expect of it? The kernel does not keep USB port <->
SCSI device mappings. Neither USB device <-> SCSI device mapping,
because not all USB ports or USB devices are mass-storage devices.
It just is not the kernel's job.

Now that you mention that,
/dev/disk/by-id/ata-DIAMOND_250G_2B5400_030400026 always has had the
contents I expected it to have. Wonder how that comes!? Don't tell me
you are using those old-fashioned /dev/sda - that would be negligent.
Some/most(?) distros do not follow /dev/disk consistently yet, so you
are free to blame them. Not to forget that udev makes this possible :)

>This could be configured with udev if one were willing to learn -- for
>a fundamentally very trivial task --

Nothing to configure, this is standard udev configuration file
boilerplate and comes prepackaged. Upgrade udev to version 114 at least.

>yet another wheel-reinving unnecessarily
>cryptic piece of shit, to tolerate distros breaking your cryptic config
>constantly,

Mine plays very well.

$ cat /etc/crypttab
home /dev/disk/by-id/ata-DIAMOND_250G_2B5400_030400026-part2 none
cipher=aes-cbc-essiv:sha256

and /dev/mapper/home is a fixed name.

>and to tolerate its intolerable slowness at boot. (/dev should
>still be the UI for modifying dynamic device mappings, reacting to ln, mv,
>chmod, etc., instead of being reduced into a race-condition ridden tmpfs
>shadow, that loses your normally created symlinks and permissions at boot.)

May I remind you that the kernel also "loses" all your network interface
configuration, routes, firewalling rules and all sysctl settings at
boot (sic: reboot & powerdown).

>But the stock kernels also takes age at boot in (trying to) load a zillion
>unnecessary drivers.

Distros have to decide whether to

- not autoload a zillion of modules, potentially generating lots of
crying "idiot" users

- autoload a zillion of modules, potentially firing you up.


>They need to be configured to not load them. But
>_obviously_ they won't use a _simple_ listing of the modules to load
>(/etc/modules), but demand complex initrd editing.

Nonsense. The kernel notices udev about all available hardware and udev
will load modules. It has nothing to do with initrd, in fact, this very
step of loading a gazillion of modules is done after initrd has passed
control on to /sbin/init. At least, in opensuse.

2008-01-08 17:20:27

by Andre Noll

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On 16:07, Tuomo Valkonen wrote:

> I hate it: every time Linux crashes, e.g. due to power failure, it takes
> almost an hour to boot, because the kernel has decided to corrupt the
> superblock to indicate that it's been years since last file system
> check.

Use tune2fs to deactivate checking.

> And obviously the crappy init system provides no simple way to
> stop the checking, to put it in the background, or whatever.

Modify the init scripts or use another distro.

> The FOSS herd is totally concentrated on creating a WIMP idiot box --
> a cheap plastic clone of Windows -- instead of fixing such fundamental
> problems.
> Windows, by the way, boots like a blaze compared to woeful Linux crap
> (even without the very definition of pure shit: udev, which the crap
> known as Linux practically requires these days).

Don't use udev then. Good old static dev works fine if you have a fixed
set of devices.

> A partial contributor to the slow fsck process is:
>
> hde: ST3160023AS, ATA DISK drive
> hde: applying pessimistic Seagate errata fix
>
> # hdparm -t /dev/hde
> /dev/hde:
> Timing buffered disk reads: 48 MB in 3.01 seconds = 15.96 MB/sec

You're using the sil3112 driver? Edit its blacklist and remove the
entry for your drive. That gives you the usual speed.

> Linux noi 2.6.14 #1 PREEMPT Sun Oct 30 20:18:48 EET 2005 i686 GNU/Linux
>
> I've tried upgrading, and failed: the megatonne monolith with a gazillion
> hidden options (and totally worthless make oldconfig)

Gradually upgrade to 2.6.15, 2.6.16...

> is impossible to compile these days,

Check your tool-chain. Many people compile recent kernels with no problems.

> and the distros' stock kernel are utter and total crap
> that load drivers in wrong order etc., and are difficult to configure
> (demanding crap that demands udev to edit their initrds).

Use a kernel.org version.

> Not to even speak of the udev-demanding scsi-mapping insanity of SATA
> etc. devices these days.

Nobody forces you to use udev. Moreover, you can write your own udev
rules that match your expectations.

> I've had it with Linux. It's no longer for power users. It's so complex
> that it's only for idiot users that are content with the shoddy defaults,
> and (paid) developers.

You're not ranting about Linux but about your Distro. Complain on
the corresponding distro-specific mailing list, use another distro
and and stop whining.

Thanks
Andre
--
The only person who always got his work done by Friday was Robinson Crusoe


Attachments:
(No filename) (2.47 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2008-01-08 17:49:01

by Tuomo Valkonen

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On 2008-01-08, Jan Engelhardt <[email protected]> wrote:
> Roll your own.

Nah, too much work, and I want all distros to perish.

> "Power users" may still
> use the index= option of sound card modules and wire it up in
> /etc/modprobe.d if they prefer.

Another very cryptic directory whose contents say nothing to me.
Configuration files should be self-documenting and editable,
instead having to be created based on long documentation.
The simple /etc/modules -- which at least Debian's stock kernels
do not use -- qualifies, but few other files these days do.

> You can guess my answer: udev will fix it.

And break everything else, such as my symlinks, permissions, etc.
I'm not going to learn its cryptic special-case config files for
such trivial tasks as creating a fucking symlink or change the
permissions of a file, for which exist general purpose methods:
chmod, chown, ln -s.

> Well what do you expect of it? The kernel does not keep USB port <->
> SCSI device mappings. Neither USB device <-> SCSI device mapping,
> because not all USB ports or USB devices are mass-storage devices.
> It just is not the kernel's job.

Mapping everything to scsi nodes is brain damaged. The old hda, hdb,
etc. mappings had somewhat clear correspondence between to physical
evice addresses, and were easy to use without such complicated crap
as udev. Of course, I'd prefer just device unique IDs being used,
where possible... but I'm not going to suffer udev for that.

> Mine plays very well.

I was not talking of encrypted disks but cryptic udev config files,
that the distros are going to break on every upgrade.

> May I remind you that the kernel also "loses" all your network interface
> configuration, routes, firewalling rules and all sysctl settings at
> boot (sic: reboot & powerdown).

But traditional /dev does not lose permissions and symlinks. udev
tmpfs shadow brain damage does. You have to illogically and
inconveniently edit udev's cryptic config files instead, and yet
it in no way stops /dev from being modified.

> Distros have to decide whether to
>
> - not autoload a zillion of modules, potentially generating lots of
> crying "idiot" users
>
> - autoload a zillion of modules, potentially firing you up.

And they most of them cater for idiot users, and the rest for
develors what want to do it all from scratch. Mere power users
who want to tune the system to their needs, but easily and without
WIMPshit (through _simple_ self-documenting configuration files),
are forgotten these days.

> Nonsense. The kernel notices udev about all available hardware and udev
> will load modules. It has nothing to do with initrd, in fact, this very
> step of loading a gazillion of modules is done after initrd has passed
> control on to /sbin/init. At least, in opensuse.

I've never seen a system that would do so. And I won't use udev.

--
Tuomo

2008-01-08 17:55:24

by Tuomo Valkonen

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On 2008-01-08, Andre Noll <[email protected]> wrote:
> Use tune2fs to deactivate checking.

So, a workaround is the answer to a clear bug. Typical FOSS.

> Modify the init scripts or use another distro.

Another typical FOSS answer. "You have the source, you can fix it."
With what time?

> Don't use udev then. Good old static dev works fine if you have a fixed
> set of devices.

It doesn't, with the unpredictable SCSI mapping insanity.

> You're using the sil3112 driver? Edit its blacklist and remove the
> entry for your drive. That gives you the usual speed.

Recompiling? No thanks. Compiling the Linux kernel is too painful.

> Check your tool-chain. Many people compile recent kernels with no problems.

And recompile and recompile and recompile ad infinitum, because always
some option was missing or wrong, there being far too many of them and
hidden all over the place.

> Nobody forces you to use udev. Moreover, you can write your own udev
> rules that match your expectations.

See above on having time to learn over-cryptic systems.

> You're not ranting about Linux but about your Distro. Complain on
> the corresponding distro-specific mailing list, use another distro
> and and stop whining.

I don't use a distro kernel. I use a kernel I compiled myself over
two years ago. I have tried compiling newer ones, but it's too much
work to get all the options right. And then there's the problem that
the "good" driver for my SATA disk may not be there anymore in the
latest kernels, and so on.

--
Tuomo

Subject: Re: The ext3 way of journalling

On 1/8/08, Tuomo Valkonen <[email protected]> wrote:
> On 2008-01-08, Andre Noll <[email protected]> wrote:
> > Use tune2fs to deactivate checking.
>
> So, a workaround is the answer to a clear bug. Typical FOSS.

It isn't a bug. It is a feature; Think about silent corruption that
may damage your filesystem; Only an FSCK can see how much of damage is
done and fix it. Journaling will help you to a degree, but not all the
way.

> > Modify the init scripts or use another distro.
>
> Another typical FOSS answer. "You have the source, you can fix it."
> With what time?

heh, the way I look at it, you either have the time, or money. Use one
and get rid of your problem. If you have them and aren't willing to
commit to fix your problem, maybe your problem isn't important enough
to you. If you have no time and no money, ranting here doesn't help
you either.

cheers,
Masoud

> > Don't use udev then. Good old static dev works fine if you have a fixed
> > set of devices.
>
> It doesn't, with the unpredictable SCSI mapping insanity.
>
> > You're using the sil3112 driver? Edit its blacklist and remove the
> > entry for your drive. That gives you the usual speed.
>
> Recompiling? No thanks. Compiling the Linux kernel is too painful.
>
> > Check your tool-chain. Many people compile recent kernels with no problems.
>
> And recompile and recompile and recompile ad infinitum, because always
> some option was missing or wrong, there being far too many of them and
> hidden all over the place.
>
> > Nobody forces you to use udev. Moreover, you can write your own udev
> > rules that match your expectations.
>
> See above on having time to learn over-cryptic systems.
>
> > You're not ranting about Linux but about your Distro. Complain on
> > the corresponding distro-specific mailing list, use another distro
> > and and stop whining.
>
> I don't use a distro kernel. I use a kernel I compiled myself over
> two years ago. I have tried compiling newer ones, but it's too much
> work to get all the options right. And then there's the problem that
> the "good" driver for my SATA disk may not be there anymore in the
> latest kernels, and so on.
>
> --
> Tuomo
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2008-01-08 18:11:31

by Jan Engelhardt

[permalink] [raw]
Subject: Re: The ext3 way of journalling


On Jan 8 2008 17:52, Tuomo Valkonen wrote:
>On 2008-01-08, Andre Noll <[email protected]> wrote:
>> Use tune2fs to deactivate checking.
>
>So, a workaround is the answer to a clear bug. Typical FOSS.

Well if it is a problem for you, why do not you come and fix it?

>> Modify the init scripts or use another distro.
>
>Another typical FOSS answer. "You have the source, you can fix it."
>With what time?

If you do not like spending time yourself, hire someone.

>> Check your tool-chain. Many people compile recent kernels with no problems.
>
>And recompile and recompile and recompile ad infinitum, because always
>some option was missing or wrong, there being far too many of them and
>hidden all over the place.

Yes. Either you compile or you use a distro kernel. But you do not want
either, so that kinda narrows it down.

>> Nobody forces you to use udev. Moreover, you can write your own udev
>> rules that match your expectations.
>
>See above on having time to learn over-cryptic systems.

http://linux.oneandoneis2.org/LNW.htm . Replace Windows by <whatever
favorite OS you wanted to originally have>.

>> You're not ranting about Linux but about your Distro. Complain on
>> the corresponding distro-specific mailing list, use another distro
>> and and stop whining.
>
>I don't use a distro kernel. I use a kernel I compiled myself over
>two years ago. I have tried compiling newer ones, but it's too much
>work to get all the options right. And then there's the problem that
>the "good" driver for my SATA disk may not be there anymore in the
>latest kernels, and so on.

I did the same previously. As soon as there was more than three
machines to administer, I stopped building kernels the typical way
for production machines and instead built one central RPM (sort of
distro kernel). I never look back.

2008-01-08 18:15:43

by Theodore Ts'o

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On Tue, Jan 08, 2008 at 05:01:30PM +0000, Tuomo Valkonen wrote:
> On 2008-01-08, Andi Kleen <[email protected]> wrote:
> > tune2fs -i0 -c0 device for each file system
> >
> > Yes that should be default, unfortunately it is not. It's one
> > of the first things I do on new machines.
>
> I have ages ago increased those counts, but I don't want to
> completely disable them. The problem is that the superblock
> is corrupted to indicate absurd "31352 days since last check".
> Who knows, maybe it would even corrupt those settings.

Newer e2fsprogs display a better message, and you can set an
/etc/e2fsck.conf setting:

[options]
buggy_init_scripts = 1

That will fix the this issue. The problem you are facing is that you
have your hardware clock set to ticking localtime, instead of GMT.
Windows ticks localtime, which is a mistake carried over from the
1970's and MS-DOS. Ticking localtime has all sorts of problems, among
which is if you reboot around the transition between Summer Time (or
Daylight Savings Time, depending on your contry) and normal time, the
OS has no idea whether the DST adjustment has been applied or not.

It gets even worse if you have multiple operating systems, because
then one OS may have made the adjustment, and other one may no have
made the adjustment. It's for that reason that if you reboot around
the right time of year, Windows throws up a big dialog box asking you
what the correct time should be. Genius!

The problem on the Linux side is that some distributions, and Ubuntu
is the worse offender, but probably not the only one, do not correctly
set the system clock before they run fsck. And if you live east of
GMT, such that your localtime offset is positive instead of negative,
then time can appear to go backwards and e2fsck can't trust the last
superblock check time. Old versions of e2fsprogs display a funny
large time interval due to an integer overflow bug; that's since been
fixed. (This bug doesn't support people in the US, because of our
time zone offset, but it tends to affect people in Europe who are
dual-booting with Windows and hance have their hardware clock tick
localtime.)

Now, there are good reasons for doing periodic checks every N mounts
and after M months. And it has to do with PC class hardware. (Ted's
aphorism: "PC class hardware is cr*p"). Windows users don't notice it
much because they generally blame the occasional blue screen of death
or corrupted file as an OS bug. But very often, it is a hardware
issue, particularily on the cheaper PC class machines with no ECC
memory, and cheapest, unshielded hard drive cables from Taiwan that
the manufacturers can find. Hence, the default is to do periodic
checks, since if you don't a random corruption can cause massive
filesystem corruption leading to massive data loss.

But, if you're confident in your hardware, you can turn that off.
tune2fs -c 0 will disable the number of mounts check, and tune2fs -i 0
will turn of the periodic time-based check. And given that you have a
Linux distribution with buggy init scripts, that is one way of working
around the problem.

You could also simply change your CMOS/hardware clock to use GMT time,
and not localtime. But that doesn't work well when you need to
dual-boot with Windows, since Windows doesn't support GMT time for the
hardware clock.

Another approach would involve using the /etc/e2fsck.conf settings
described above, but that will require possibly upgrading the version
of e2fsprogs that you have. This will be the preferred mechanism
going forward, but perhaps not for the version of e2fsprogs you have
installed on your system.

Finally, I'm sorry this has obviously caused you so much stress. If
you're happier using some other OS, please use whatever OS you find
makes you happiest. I find that other deficiencies in Windows caused
my blood pressure to boil when I was forced (for a previous job) to
work on making programs run on Windows. I consider the fact that I
can spend full-time working on Linux to be a blessing. But if you
don't feel that way, my condolences, and please do what you need to do
so you can stay in your happy place.

Best regards,

- Ted

2008-01-08 18:17:13

by Tuomo Valkonen

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On 2008-01-08, Masoud Sharbiani "مسعود شربیانی" <[email protected]> wrote:
> It isn't a bug. It is a feature;

To me, it seems to be a rather clear bug when the last-checked field
contains an absurd value of years ago, on _all_ disks, and yet there's
no complaint of other superblock corruption.

> heh, the way I look at it, you either have the time, or money.

Bullshit. There may be time to fix one thing, but there are a million
things broken in FOSS these days, and it's constantly getting worse.

--
Tuomo

2008-01-08 18:21:16

by Jan Engelhardt

[permalink] [raw]
Subject: Re: The ext3 way of journalling


On Jan 8 2008 17:48, Tuomo Valkonen wrote:
>
>> You can guess my answer: udev will fix it.
>
>And break everything else, such as my symlinks, permissions, etc.
>I'm not going to learn its cryptic special-case config files for
>such trivial tasks as creating a fucking symlink or change the
>permissions of a file, for which exist general purpose methods:
>chmod, chown, ln -s.

So create /sdev and fill it with all the device nodes that you deem
static and worthy of chowning. As far as sound goes, /dev/dsp and
/dev/snd/* is made writable by something called resmgr+hal+hal_resmgr
(yes, more clutter!). I have not worried about device permissions
in a long time. The only exception is vmware, but that is its very
own problem.

>> Mine plays very well.
>
>I was not talking of encrypted disks but cryptic udev config files,
>that the distros are going to break on every upgrade.

If yours does, then that is bad luck. Mine (again) does not.
That is what a distro is supposed to bring: a system that does not
break ("as much", if you like the extra) on an upgrade as if you
did upgrade software and such by hand.

>> May I remind you that the kernel also "loses" all your network interface
>> configuration, routes, firewalling rules and all sysctl settings at
>> boot (sic: reboot & powerdown).
>
>But traditional /dev does not lose permissions and symlinks. udev
>tmpfs shadow brain damage does. You have to illogically and
>inconveniently edit udev's cryptic config files instead, and yet
>it in no way stops /dev from being modified.

If you dislike udev so much, why don't you just add

chmod 666 /dev/*

to /etc/init.d/boot.local...

>> Distros have to decide whether to
>>
>> - not autoload a zillion of modules, potentially generating lots of
>> crying "idiot" users
>>
>> - autoload a zillion of modules, potentially firing you up.
>
>And they most of them cater for idiot users, and the rest for
>develors what want to do it all from scratch. Mere power users
>who want to tune the system to their needs, but easily and without
>WIMPshit (through _simple_ self-documenting configuration files),
>are forgotten these days.

ISTR that most Windows programs do not even have a configuration
file, but store their bits and pieces in the *even more cryptic*
thing they call Registry. Do you prefer _that_?

>> Nonsense. The kernel notices udev about all available hardware and udev
>> will load modules. It has nothing to do with initrd, in fact, this very
>> step of loading a gazillion of modules is done after initrd has passed
>> control on to /sbin/init. At least, in opensuse.
>
>I've never seen a system that would do so.

You just did not use the right distribution yet.

2008-01-08 18:25:15

by Alan

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On Tue, 8 Jan 2008 18:16:49 +0000 (UTC)
Tuomo Valkonen <[email protected]> wrote:

> On 2008-01-08, Masoud Sharbiani "مسعود شربیانی" <[email protected]> wrote:
> > It isn't a bug. It is a feature;
>
> To me, it seems to be a rather clear bug when the last-checked field
> contains an absurd value of years ago, on _all_ disks, and yet there's
> no complaint of other superblock corruption.

So report it to your distribution vendor, and they can help you work out
why you are about the only person on the planet reporting that problem.

Alan

2008-01-08 18:25:37

by Tuomo Valkonen

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On 2008-01-08, Jan Engelhardt <[email protected]> wrote:
> http://linux.oneandoneis2.org/LNW.htm . Replace Windows by <whatever
> favorite OS you wanted to originally have>.

Linux is too much like Windows, and that's a big part of the problem.
People are obssessed on providing WIMPshit interfaces to everything,
and the underlying layers are allowed to become complex and cluttered,
being designed according to the worse-is-better fallacy, and are only
suitable for developers with the time to study them, not power users,
who are completely left out.

--
Tuomo

2008-01-08 18:29:41

by Andre Noll

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On 17:52, Tuomo Valkonen wrote:
> On 2008-01-08, Andre Noll <[email protected]> wrote:
> > Use tune2fs to deactivate checking.
>
> So, a workaround is the answer to a clear bug. Typical FOSS.

It's not a workaround. The ext3 maintainers argue that every file
system should be checked from time to time. Therefore it's the
default. You do not agree with them, so change the default and be
happy. Or use another fs which claims that no check is necessary.

> Another typical FOSS answer. "You have the source, you can fix it."

Indeed, and I think it's a very good answer.

> With what time?

Use the time you are currently using for whining on mailing lists.

> > Don't use udev then. Good old static dev works fine if you have a fixed
> > set of devices.
>
> It doesn't, with the unpredictable SCSI mapping insanity.

Then write a set of udev rules for your SCSI devices. It's easy.

> > Check your tool-chain. Many people compile recent kernels with no problems.
>
> And recompile and recompile and recompile ad infinitum, because always
> some option was missing or wrong, there being far too many of them and
> hidden all over the place.

If you're not willing to compile, you'll have to use what other
people provide. It's your choice, but at least there _is_ a choice.

> > You're not ranting about Linux but about your Distro. Complain on
> > the corresponding distro-specific mailing list, use another distro
> > and and stop whining.
>
> I don't use a distro kernel. I use a kernel I compiled myself over
> two years ago. I have tried compiling newer ones, but it's too much
> work to get all the options right.

I tend to disagree. It's not hard at all to configure a kernel if
you know the hardware. It's even easier if you already have a working
config for some kernel version. Just use "make oldconfig" to upgrade
from one version to the next as already suggested by others and me.

> And then there's the problem that the "good" driver for my SATA disk
> may not be there anymore in the latest kernels, and so on.

That is clearly a regression and I'm sure Jeff and other maintainers
of the driver would be interested in details on this matter.

Andre
--
The only person who always got his work done by Friday was Robinson Crusoe


Attachments:
(No filename) (2.20 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2008-01-08 18:33:08

by Diego Calleja

[permalink] [raw]
Subject: Re: The ext3 way of journalling

http://freebsd.org
http://netbsd.org
http://openbsd.org
http://opensolaris.org


There're so many options, that wasting your time arguing with people that thinks
that you're a troll is worthless.

2008-01-08 18:40:39

by Tuomo Valkonen

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On 2008-01-08, Andre Noll <[email protected]> wrote:
> It's not a workaround. The ext3 maintainers argue that every file
> system should be checked from time to time. Therefore it's the
> default. You do not agree with them, so change the default and be
> happy.

The thing is, I agree with them (although the default intervals could
be a bit longer), but it gets confused and thinks it's been years since
last check, when it hasn't. I have my doubts that Theodore Tso's reply
is the problem here, because I didn't use to have this problem; it
appeared relatively recently. But maybe old versions of e2fsck were
smarter...

> Use the time you are currently using for whining on mailing lists.

That's doable on time you'd spend idling anyway. There are only so
many hours of the day that you can do work that demands considerable
thinking.

> Then write a set of udev rules for your SCSI devices. It's easy.

It isn't. There's no simple pre-existing setting to edit. And besides,
it's the wrong approach from the POV of clean consistent design. It's
the kludged worse-is-better approach that results in unusable
clusterfucks.

> If you're not willing to compile, you'll have to use what other
> people provide. It's your choice, but at least there _is_ a choice.

I'd compile, if the thing to be compiled weren't made uncompilable.
I'd use pre-compiled shit, if it wasn't shit.

> I tend to disagree. It's not hard at all to configure a kernel if
> you know the hardware. It's even easier if you already have a working
> config for some kernel version. Just use "make oldconfig" to upgrade
>=66rom one version to the next as already suggested by others and me.

It didn't use to be hard for 1.2 or so, but it's been constantly getting
worse, along with all the bloat.

>> And then there's the problem that the "good" driver for my SATA disk
>> may not be there anymore in the latest kernels, and so on.
>
> That is clearly a regression and I'm sure Jeff and other maintainers
> of the driver would be interested in details on this matter.

I don't know the details; all I know is that I've heard that the old
SATA drivers that appear as /dev/hd$PREDICTABLE are being obsoleted in
favour of those that appear as /dev/sd$RANDOM along with all the USB
devices and whatnot.

--
Tuomo

2008-01-08 18:45:18

by Tuomo Valkonen

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On 2008-01-08, Diego Calleja <[email protected]> wrote:
> http://freebsd.org
> http://netbsd.org
> http://openbsd.org
> http://opensolaris.org
>
> There're so many options, that wasting your time arguing with people that thinks
> that you're a troll is worthless.

Unfortunately they do not support my hardware, and also switching OS distro
on a full system is painful. (Although I heard previously commercial OSS is
freely available now, so there might now be support for my sound card..
At some point at least FreeBSD was missing support for my SATA controller,
though...)

--
Tuomo

2008-01-08 18:49:34

by Alan

[permalink] [raw]
Subject: Re: The ext3 way of journalling

> be a bit longer), but it gets confused and thinks it's been years since
> last check, when it hasn't. I have my doubts that Theodore Tso's reply
> is the problem here, because I didn't use to have this problem; it
> appeared relatively recently. But maybe old versions of e2fsck were

This is a bug in your system or distribution. We used to see it years ago
when the disk caches didn't get flushed before power off. The last write
would vanish and on a shutdown was almost always the superblock time
update.

Alan

2008-01-08 20:52:13

by Andi Kleen

[permalink] [raw]
Subject: Re: The ext3 way of journalling

Theodore Tso <[email protected]> writes:
>
> Now, there are good reasons for doing periodic checks every N mounts
> and after M months. And it has to do with PC class hardware. (Ted's
> aphorism: "PC class hardware is cr*p").

If these reasons are good ones (some skepticism here) then the correct
way to really handle this would be to do regular background scrubbing
during runtime; ideally with metadata checksums so that you can actually
detect all corruption.

But since fsck is so slow and disks are so big this whole thing
is a ticking time bomb now. e.g. it is not uncommon to require tens
of minutes or even hours of fsck time and some server that reboots
only every few months will eat that when it happens to reboot.
This means you get a quite long downtime.

-Andi

2008-01-08 21:03:28

by Ondrej Zary

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On Tuesday 08 January 2008 21:51:53 Andi Kleen wrote:
> Theodore Tso <[email protected]> writes:
> > Now, there are good reasons for doing periodic checks every N mounts
> > and after M months. And it has to do with PC class hardware. (Ted's
> > aphorism: "PC class hardware is cr*p").
>
> If these reasons are good ones (some skepticism here) then the correct
> way to really handle this would be to do regular background scrubbing
> during runtime; ideally with metadata checksums so that you can actually
> detect all corruption.
>
> But since fsck is so slow and disks are so big this whole thing
> is a ticking time bomb now. e.g. it is not uncommon to require tens
> of minutes or even hours of fsck time and some server that reboots
> only every few months will eat that when it happens to reboot.
> This means you get a quite long downtime.

That's why I always do "tune2fs -c 0 -i 0" on any new filesystem. It probably
should be default.

>
> -Andi
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/



--
Ondrej Zary

2008-01-08 21:49:53

by John Stoffel

[permalink] [raw]
Subject: Re: The ext3 way of journalling

>>>>> "Tuomo" == Tuomo Valkonen <[email protected]> writes:

Tuomo> On 2008-01-08, John Stoffel <[email protected]> wrote:
>> Look at your filesystems, using 'tune2fs' and see if the ext3 journal
>> is actually turned on and used. If it's not, then I can see why
>> you're having problems on reboots.

Tuomo> Journalling is on, but it's no use because the superblock always has
Tuomo> corrupted last-checked time at boot. "File system check forced: 31352
Tuomo> days since last check" or so.

As Andy say, reset the counts using tune2fs, then make sure they are
actually reset. I've been using ext3 for a long time and even with
crashes, it's been good about coming up and replaying the journal
nicely.

Again, we can't tell much without boot logs.

>> What CPU are you using? Chipset? Output of lspci? dmesg output?

Tuomo> Athlon XP 2500+, SiI 3112 (the obsoleted driver that makes the
Tuomo> disk appear as the predictable hde, not the random scsi mapping
Tuomo> driver).

I use a Sil 3112 as well for some of my disks (I've got six, two each
of SCSI, PATA and SATA) and it all works well. So does an ancient but
upto date Debian Unstable install running 2.6.24-rc6, so it's not
impossible to install new kernels on old systems.

Get rid of initrd and you should be all set. But again, without
details we can't really help.

Tuomo> As for the rest... I'm on Windows, because I can't be arsed waiting
Tuomo> for an hour for Linux to boot.

So reboot it before you goto bed tonight and tell us what it says in
the morning. Esp in terms of the filesystems and their counts.

Hmm... but thinking about it, you're running 2.4.x something, and
there were bugs back then with ext3, so you just might be hitting some
of those bugs. Can you goto the latest 2.4.x release?

John

2008-01-08 21:57:29

by Theodore Ts'o

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On Tue, Jan 08, 2008 at 09:51:53PM +0100, Andi Kleen wrote:
> Theodore Tso <[email protected]> writes:
> >
> > Now, there are good reasons for doing periodic checks every N mounts
> > and after M months. And it has to do with PC class hardware. (Ted's
> > aphorism: "PC class hardware is cr*p").
>
> If these reasons are good ones (some skepticism here) then the correct
> way to really handle this would be to do regular background scrubbing
> during runtime; ideally with metadata checksums so that you can actually
> detect all corruption.

That's why we're adding various checksums to ext4...

And yes, I agree that background scrubbing is a good idea. Larry
McVoy a while back told me the results of using a fast CRC to get
checksums on all of his archived data files, and then periodically
recalculating the CRC's and checking them against the stored checksum
values. The surprising thing was that once every so often (and the
fact that it happens at all is disturbing), he would find that a file
had a broken checksum even though it had apparently never been
intentionally modified (it was in an archived file set, the modtime of
the file hadn't changed, etc.)

And the fact that disk manufacturers on their high end enterprise
disks design their block guard system to detect cases where a block
gets written to a different part of the disk than where the OS
requested it to be written, and that I've been told of at least one
commercial large-scale enterprise database which puts a logical block
number in the on-disk format of their tablespace files to detect this
problem --- should give you some pause about how much faith at least
some people who are paid a lot of money to worry about absolute data
integrity have in modern-day hard drives....

> But since fsck is so slow and disks are so big this whole thing
> is a ticking time bomb now. e.g. it is not uncommon to require tens
> of minutes or even hours of fsck time and some server that reboots
> only every few months will eat that when it happens to reboot.
> This means you get a quite long downtime.

What I actually recommend (and what I do myself) is to use
devicemapper to create a snapshot, and then run "e2fsck -p" on the
snapshot. If the snapshot without *any* errors (i.e., exit code of
0), then it can run "tune2fs -C 0 -T now /dev/XXX", and discard the
snapshot, and exit. If e2fsck returns any non-zero error code,
indicating that it found changes, the output of e2fsck should be sent
e-mailed to the system administrator so they can schedule downtime and
fix the filesystem corruption.

This avoids the long downtime at reboot time. You can do the above in
a cron script that runs at some convenient time during low usage
(i.e., 3am localtime on a Saturday morning, or whatever).

- Ted

2008-01-08 21:58:20

by Pavel Machek

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On Tue 2008-01-08 16:07:27, Tuomo Valkonen wrote:
>
> The ext3 journalling code can be summarised as:
>
> superblock->last_checked = random();
> sync(superblock)
>
> I hate it: every time Linux crashes, e.g. due to power failure, it takes
> almost an hour to boot, because the kernel has decided to corrupt the
> superblock to indicate that it's been years since last file system
> check. And obviously the crappy init system provides no simple way to
> stop the checking, to put it in the background, or whatever. The FOSS

rm /sbin/fsck?

And no, ext3 here does not corrupt last_checked here.
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-01-08 22:24:28

by Bodo Eggert

[permalink] [raw]
Subject: Re: The ext3 way of journalling

Tuomo Valkonen <[email protected]> wrote:
> On 2008-01-08, Jan Engelhardt <[email protected]> wrote:

>> "Power users" may still
>> use the index= option of sound card modules and wire it up in
>> /etc/modprobe.d if they prefer.
>
> Another very cryptic directory whose contents say nothing to me.
> Configuration files should be self-documenting and editable,
> instead having to be created based on long documentation.
> The simple /etc/modules -- which at least Debian's stock kernels
> do not use -- qualifies, but few other files these days do.
>
>> You can guess my answer: udev will fix it.
>
> And break everything else, such as my symlinks, permissions, etc.
> I'm not going to learn its cryptic special-case config files for
> such trivial tasks as creating a fucking symlink or change the
> permissions of a file, for which exist general purpose methods:
> chmod, chown, ln -s.

Edit the start script, append your commands to create the links.
Or edit the correct file in /etc/udev/rules.d to include the links.

>> Well what do you expect of it? The kernel does not keep USB port <->
>> SCSI device mappings. Neither USB device <-> SCSI device mapping,
>> because not all USB ports or USB devices are mass-storage devices.
>> It just is not the kernel's job.
>
> Mapping everything to scsi nodes is brain damaged. The old hda, hdb,
> etc. mappings had somewhat clear correspondence between to physical
> evice addresses, and were easy to use without such complicated crap
> as udev. Of course, I'd prefer just device unique IDs being used,
> where possible... but I'm not going to suffer udev for that.

If you wire your devices to a named/numbered port, you can use one device
node for each port. Therefore it's sane to create hda to hdd, fd0 to fd3,
etc. But if you don't have a fixed mapping (USB), or if you'd have too
many possible (and mostly unused) ports for the available amount of devices
(SCSI), you can not create fixed numbers. Even for hde ..., you have no
native ordering, you can only tell the first controller from the rest.

When the SCSI naming was created, there were at most 256 SCSI devices of
each kind. Since each partition is a SCSI device, too, you had at most
16 disks of up to 15 partitions! This was later extended to 128 disks.
If you'd hardcode the controller, linux would have been unable to support
more than 8 SCSI controllers.

The result was the semi-random enumeration of the SCSI devices, and all
the hacks trying to work around that. (google for Joerg Schilling lkml)
Udev provides a clean way of naming the devices, using the static
information from the devices or the path.

I agree that the current udev documentation makes it hard to even find the
settings you'd like to change. That's because the documentation is meant
for developers, while the configuration is themed by the distribution.

The udev files itself are as simple as possible. Imagine them to be
<bloat>XML</bloat>, like the pile of HAL, DBUS and KDE automatically
mounting my disks using the wrong settings into the wrong directory
and forcing me to su in order to umount them! I temporarily beat that
beast, but the kill is still on the TODO list. Maybe as soon as I find
more^W usable^W documentation ...

>> May I remind you that the kernel also "loses" all your network interface
>> configuration, routes, firewalling rules and all sysctl settings at
>> boot (sic: reboot & powerdown).
>
> But traditional /dev does not lose permissions and symlinks. udev
> tmpfs shadow brain damage does. You have to illogically and
> inconveniently edit udev's cryptic config files instead, and yet
> it in no way stops /dev from being modified.

You aren't stopped from directly poking the memory and crashing the systems
either - if you are root. Or from deleting all nodes on a classic /dev.
Don't do that then.

>> Nonsense. The kernel notices udev about all available hardware and udev
>> will load modules. It has nothing to do with initrd, in fact, this very
>> step of loading a gazillion of modules is done after initrd has passed
>> control on to /sbin/init. At least, in opensuse.
>
> I've never seen a system that would do so. And I won't use udev.

No system can load modules that are not on initrd, and only needed-for-boot
modules are usually put on the initrd. The bulk of modules must be loaded
from the real root. This seems to work quite well (except not here because
I prefer to include all necessary drivers in the kernel.)

I've debugged hotplug, too, and I found it was a wrapper around a wrapper
around ... a script that would be supposed to load the required firmware,
except it did neither load it nor provide a way to find out why it didn't.
Each HOWTO mentioned a different directory to use, and a different filename.
I'd have settled for manually loading the firmware, but hacking the scripts
to pieces was finally easier than finding out how I was supposed to do that.

I did not need to install a firmware after this, therefore I don't know if
udev is better, but it CAN'T be worse.

2008-01-08 23:06:46

by Matthias Schniedermeyer

[permalink] [raw]
Subject: Re: The ext3 way of journalling

> > Don't use udev then. Good old static dev works fine if you have a fixed
> > set of devices.
>
> It doesn't, with the unpredictable SCSI mapping insanity.

That what LABEL und UUID-Support in mount is for.

You label the filesystems (e2label for ext2 and ext3) and use that label to mount them

- fstab -
LABEL=root / xfs defaults,noatime 0 1
LABEL=boot /boot ext2 defaults,noatime 0 2
...
- snip -





Bis denn

--
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.

2008-01-09 02:06:09

by Bernd Petrovitsch

[permalink] [raw]
Subject: Re: The ext3 way of journalling

Sorry for feeding the troll:

On Die, 2008-01-08 at 17:52 +0000, Tuomo Valkonen wrote:
> On 2008-01-08, Andre Noll <[email protected]> wrote:
> > Use tune2fs to deactivate checking.
>
> So, a workaround is the answer to a clear bug. Typical FOSS.

At least you get a simple solution for your problem: Configure your
system in a slightly different way (once!) and that's it.
There are far too many commercial vendors out there where you probably
wouldn't get an answer at all. And if, days or weeks later.

And if a - in your opinion! - wrong default value (for whatever reason)
is a bug in your universe, then you should probably adopt the
interpretation of words of this universe.
BTW changing the default value in the source now doesn't fix the
sub-optimal default value on your system.


> > Modify the init scripts or use another distro.
>
> Another typical FOSS answer. "You have the source, you can fix it."
> With what time?

If you don't have the time to fix your problems, why should anyone else
have the time to fix your problems?

That is BTW the typical "i'm your customer and you have to obey me"
attitude. Perhaps you want to buy somewhere the time to fix your
problems.

Bernd
--
Firmix Software GmbH http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
Embedded Linux Development and Services

2008-01-09 03:22:52

by Kyle Moffett

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On Jan 08, 2008, at 15:51:53, Andi Kleen wrote:
> Theodore Tso <[email protected]> writes:
>> Now, there are good reasons for doing periodic checks every N
>> mounts and after M months. And it has to do with PC class
>> hardware. (Ted's aphorism: "PC class hardware is cr*p").
>
> If these reasons are good ones (some skepticism here) then the
> correct way to really handle this would be to do regular background
> scrubbing during runtime; ideally with metadata checksums so that
> you can actually detect all corruption.

Poor man's background scrubbing:

(A) Use LVM like virtually all modern distros offer
(B) Leave some extra space in your LVM volume group (enough for 1
snapshot over the time it takes to do an FSCK).
(C) Periodically run the following scriptlet:

set -e
START="$(date +'%Y%m%d%H%M%S')"
lvcreate -s -n "${VOLUME}-snap" "${VG}/${VOLUME}"
if nice +20 fsck -fy "/dev/mapper/${VG}_${VOLUME}-snap"; then
echo 'Background scrubbing succeeded!'
tune2fs -T "${START}" "/dev/mapper/${VG}_${VOLUME}"
else
echo 'Background scrubbing failed! Reboot to fsck soon!'
tune2fs -C 16383 -T "19000101" "/dev/mapper/${VG}_${VOLUME}"
fi
lvremove "${VG}/${VOLUME}-snap"

Basically you can fsck the offline snapshot in the background. If it
succeeds you can adjust the "last checked" date to the time when the
snapshot was taken and if it fails you can schedule an FSCK at next
reboot (and possibly remount the filesystem read-only or reboot
immediately).

You can do the same thing for your /boot volume, although you
probably have to manually use dmsetup since most bootloaders can't
interpret LVM volumes.

I've always been surprised that distros like RedHat which
automatically use LVM don't stuff this in their weekly or monthly
checks on desktop systems. User experience could also be
dramatically improved with automated smartd configuration and user-
interactive logging and warning messages.


> But since fsck is so slow and disks are so big this whole thing is
> a ticking time bomb now. e.g. it is not uncommon to require tens of
> minutes or even hours of fsck time and some server that reboots
> only every few months will eat that when it happens to reboot. This
> means you get a quite long downtime.

My servers all have an "interval-between-checks" of 2-6 weeks and are
configured to run nice +20 background "fsck" checks during off-hours
between once every few days and once every few weeks. I also have
the "max mount count" numbers set to primes between 7 and 37
(depending on the filesystem) so that troubled or frequently-rebooted
systems are more frequently verified. The end result is that I
almost never have the dreaded 4-hour-fsck-on-boot problem. A drive
has certainly been fscked within the last few weeks of operation, and
I will only ever have multiple large filesystems all fscked at the
same time very rarely (gcd of their max-mount-counts).

Cheers,
Kyle Moffett

2008-01-09 07:56:29

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On Tue, 08 Jan 2008 22:21:02 EST, Kyle Moffett said:

> lvcreate -s -n "${VOLUME}-snap" "${VG}/${VOLUME}"


> Basically you can fsck the offline snapshot in the background.

Something the lvcreate manpage is specifically not clear about is:

Does this create a snapshot of the *disk* at that moment, or does it capture
"disk plus still-to-be-written blocks in the cache"? (Phrased differently, does
it Do The Right Thing regarding "blocks queued before lvcreate" and "blocks
queued for write after lvcreate")?

If the snapshot doesn't capture the blocks queued but still unwritten by
kjournald and similar, then you're still hitting the same old problems that
you always get when you fsck an "active disk".


Attachments:
(No filename) (226.00 B)

2008-01-09 08:00:57

by Somchai Smythe

[permalink] [raw]
Subject: Re: The ext3 way of journalling

The help for CONFIG_DM_SNAPSHOT says it is EXPERIMENTAL (in
2.6.23.12). So this would mean that there is very high risk of
software failure using snapshots. Would you want to do that for your
fsck?

On 1/9/08, Kyle Moffett <[email protected]> wrote:
> On Jan 08, 2008, at 15:51:53, Andi Kleen wrote:
> > Theodore Tso <[email protected]> writes:
> >> Now, there are good reasons for doing periodic checks every N
> >> mounts and after M months. And it has to do with PC class
> >> hardware. (Ted's aphorism: "PC class hardware is cr*p").
> >
> > If these reasons are good ones (some skepticism here) then the
> > correct way to really handle this would be to do regular background
> > scrubbing during runtime; ideally with metadata checksums so that
> > you can actually detect all corruption.
>
> Poor man's background scrubbing:
>
> (A) Use LVM like virtually all modern distros offer
> (B) Leave some extra space in your LVM volume group (enough for 1
> snapshot over the time it takes to do an FSCK).
> (C) Periodically run the following scriptlet:
>
> set -e
> START="$(date +'%Y%m%d%H%M%S')"
> lvcreate -s -n "${VOLUME}-snap" "${VG}/${VOLUME}"
> if nice +20 fsck -fy "/dev/mapper/${VG}_${VOLUME}-snap"; then
> echo 'Background scrubbing succeeded!'
> tune2fs -T "${START}" "/dev/mapper/${VG}_${VOLUME}"
> else
> echo 'Background scrubbing failed! Reboot to fsck soon!'
> tune2fs -C 16383 -T "19000101" "/dev/mapper/${VG}_${VOLUME}"
> fi
> lvremove "${VG}/${VOLUME}-snap"
>
> Basically you can fsck the offline snapshot in the background. If it
> succeeds you can adjust the "last checked" date to the time when the
> snapshot was taken and if it fails you can schedule an FSCK at next
> reboot (and possibly remount the filesystem read-only or reboot
> immediately).
>
> You can do the same thing for your /boot volume, although you
> probably have to manually use dmsetup since most bootloaders can't
> interpret LVM volumes.
>
> I've always been surprised that distros like RedHat which
> automatically use LVM don't stuff this in their weekly or monthly
> checks on desktop systems. User experience could also be
> dramatically improved with automated smartd configuration and user-
> interactive logging and warning messages.
>
>
> > But since fsck is so slow and disks are so big this whole thing is
> > a ticking time bomb now. e.g. it is not uncommon to require tens of
> > minutes or even hours of fsck time and some server that reboots
> > only every few months will eat that when it happens to reboot. This
> > means you get a quite long downtime.
>
> My servers all have an "interval-between-checks" of 2-6 weeks and are
> configured to run nice +20 background "fsck" checks during off-hours
> between once every few days and once every few weeks. I also have
> the "max mount count" numbers set to primes between 7 and 37
> (depending on the filesystem) so that troubled or frequently-rebooted
> systems are more frequently verified. The end result is that I
> almost never have the dreaded 4-hour-fsck-on-boot problem. A drive
> has certainly been fscked within the last few weeks of operation, and
> I will only ever have multiple large filesystems all fscked at the
> same time very rarely (gcd of their max-mount-counts).
>
> Cheers,
> Kyle Moffett
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2008-01-09 08:21:21

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On Wed, 09 Jan 2008 15:00:46 +0700, BuraphaLinux Server said:
> The help for CONFIG_DM_SNAPSHOT says it is EXPERIMENTAL (in
> 2.6.23.12). So this would mean that there is very high risk of
> software failure using snapshots. Would you want to do that for your
> fsck?

The overall current state of EXPERIMENTAL in the tree is, quite frankly,
somewhere between a complete crock and a wheelbarrow full of bovine fertilizer.
There's a lot of things labeled EXPERIMENTAL that shouldn't be, and a lot of
bleeding edge things that may eat your disks and cause your dog to turn green
that aren't labelled EXPERIMENTAL.

Having said that, I have *no* idea what state the snapshot code is in, but I
have approximately zero confidence that the EXPERIMENTAL tag tells me anything
actually useful about the snapshot code.


Attachments:
(No filename) (226.00 B)

2008-01-09 09:45:17

by Tuomo Valkonen

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On 2008-01-09 00:06 +0100, Matthias Schniedermeyer wrote:
> That what LABEL und UUID-Support in mount is for.

That's udev shit. I don't want it.

--
Tuomo

2008-01-09 09:54:23

by Martin Schwidefsky

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On Jan 8, 2008 7:15 PM, Theodore Tso <[email protected]> wrote:
> That will fix the this issue. The problem you are facing is that you
> have your hardware clock set to ticking localtime, instead of GMT.
> Windows ticks localtime, which is a mistake carried over from the
> 1970's and MS-DOS. Ticking localtime has all sorts of problems, among
> which is if you reboot around the transition between Summer Time (or
> Daylight Savings Time, depending on your contry) and normal time, the
> OS has no idea whether the DST adjustment has been applied or not.

Actually you can force Windows to accept a hardware clock in UTC:
HKEY_LOCAL_MACHINE/SYSTEMCurrentControlSetControl/TimeZoneInformation/RealTimeIsUniversal

I'm using this on my dual-boot machine at home because I that stupid
Daylight Savings Time change twice a year really annoyed me. So far
the only downside I found is that you have to remember that the time
you enter in the BIOS has to be UTC.

--
blue skies,
Martin

2008-01-09 10:21:55

by Matthias Schniedermeyer

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On 09.01.2008 09:56, Tuomo Valkonen wrote:
> On 2008-01-09 00:06 +0100, Matthias Schniedermeyer wrote:
> > That what LABEL und UUID-Support in mount is for.
>
> That's udev shit. I don't want it.

No.




Bis denn

--
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.

2008-01-09 10:28:37

by Matthias Schniedermeyer

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On 09.01.2008 11:21, Matthias Schniedermeyer wrote:
> On 09.01.2008 09:56, Tuomo Valkonen wrote:
> > On 2008-01-09 00:06 +0100, Matthias Schniedermeyer wrote:
> > > That what LABEL und UUID-Support in mount is for.
> >
> > That's udev shit. I don't want it.
>
> No.

To be more verbose.

The 'LABEL=' is native mount turf and is much older than udev.

That udev ALSO supports the same labels, by providing sym-links in
/dev/disk/by-label/<...>, is written on another page.




Bis denn

--
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.

2008-01-09 12:26:20

by Theodore Ts'o

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On Wed, Jan 09, 2008 at 10:54:11AM +0100, Martin Schwidefsky wrote:
> On Jan 8, 2008 7:15 PM, Theodore Tso <[email protected]> wrote:
> > That will fix the this issue. The problem you are facing is that you
> > have your hardware clock set to ticking localtime, instead of GMT.
> > Windows ticks localtime, which is a mistake carried over from the
> > 1970's and MS-DOS. Ticking localtime has all sorts of problems, among
> > which is if you reboot around the transition between Summer Time (or
> > Daylight Savings Time, depending on your contry) and normal time, the
> > OS has no idea whether the DST adjustment has been applied or not.
>
> Actually you can force Windows to accept a hardware clock in UTC:
> HKEY_LOCAL_MACHINE/SYSTEMCurrentControlSetControl/TimeZoneInformation/RealTimeIsUniversal

Oh, so cool!!! Do you know off hand what version of Windows started
honoring that registry setting?

And what do you set that registry value to? Just a boolean "true"?

Now, how to convince Ubuntu to put this in their FAQ so I stop having
their ahhh, less than clueful dual-booting Windows users who happen to
live in Europe stop submitting bugs on this issue....

- Ted

2008-01-09 12:31:14

by Theodore Ts'o

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On Wed, Jan 09, 2008 at 11:28:21AM +0100, Matthias Schniedermeyer wrote:
> On 09.01.2008 11:21, Matthias Schniedermeyer wrote:
> > On 09.01.2008 09:56, Tuomo Valkonen wrote:
> > > On 2008-01-09 00:06 +0100, Matthias Schniedermeyer wrote:
> > > > That what LABEL und UUID-Support in mount is for.
> > >
> > > That's udev shit. I don't want it.
> >
> > No.
>
> To be more verbose.
>
> The 'LABEL=' is native mount turf and is much older than udev.

Native fsck supports it to; "LABEL=" and "UUID=" support has been in
e2fsprogs since July 3rd, 1999. (Mount had it a little before then,
but you needed both mount and fsck support before the feature could be
used.)

And it has *nothing* to do with udev....

- Ted

2008-01-09 12:44:46

by Michal Schmidt

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On Wed, 9 Jan 2008 07:25:56 -0500
Theodore Tso <[email protected]> wrote:

> On Wed, Jan 09, 2008 at 10:54:11AM +0100, Martin Schwidefsky wrote:
> > On Jan 8, 2008 7:15 PM, Theodore Tso <[email protected]> wrote:
> > > That will fix the this issue. The problem you are facing is that
> > > you have your hardware clock set to ticking localtime, instead of
> > > GMT. Windows ticks localtime, which is a mistake carried over
> > > from the 1970's and MS-DOS. Ticking localtime has all sorts of
> > > problems, among which is if you reboot around the transition
> > > between Summer Time (or Daylight Savings Time, depending on your
> > > contry) and normal time, the OS has no idea whether the DST
> > > adjustment has been applied or not.
> >
> > Actually you can force Windows to accept a hardware clock in UTC:
> > HKEY_LOCAL_MACHINE/SYSTEMCurrentControlSetControl/TimeZoneInformation/RealTimeIsUniversal
>
> Oh, so cool!!! Do you know off hand what version of Windows started
> honoring that registry setting?
>
> And what do you set that registry value to? Just a boolean "true"?
>
> Now, how to convince Ubuntu to put this in their FAQ so I stop having
> their ahhh, less than clueful dual-booting Windows users who happen to
> live in Europe stop submitting bugs on this issue....

According to http://www.cl.cam.ac.uk/~mgk25/mswish/ut-rtc.html it's
been there since Windows NT, but it is more or less broken in all newer
versions.

Michal

2008-01-09 12:50:18

by Theodore Ts'o

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On Wed, Jan 09, 2008 at 02:55:53AM -0500, [email protected] wrote:
>
> Does this create a snapshot of the *disk* at that moment, or does it
> capture "disk plus still-to-be-written blocks in the cache"?
> (Phrased differently, does it Do The Right Thing regarding "blocks
> queued before lvcreate" and "blocks queued for write after
> lvcreate")?
>
> If the snapshot doesn't capture the blocks queued but still
> unwritten by kjournald and similar, then you're still hitting the
> same old problems that you always get when you fsck an "active
> disk".

Actually, it does better than that. For ext3 and xfs, it will take a
snapshot of the filesystem in a quiscent state; that is, it will force
the journal transaction to close, suspend all filesystem activity,
take a snapshot of the disk as if it had been unmounted, and then
allow filesystem activity to continue.

So if you look at an ext3 filesystem taken in this way, you will see
that the NEEDS_RECOVERY flag is not set, since the ext3 journal is
empty on the snapshot. So snapshots are also a great way of doing
stable backups. For the purposes of stable backups, you'll also want
to quiesce your application files, particularly databases.

For example, in the case of mysql, send the server the sql commands
"flush tables with read lock; flush logs", take the snapshot, and
then after the snapshot send the server the sql command "unlock tables".
For more information, see:

http://forums.mysql.com/read.php?26,185026,185302#msg-185302

If you do this, you will get a snapshot of your disk where *both* the
database and the filesystem is at a stable state, perfect for doing a
backup.

- Ted

2008-01-09 13:39:32

by Mathieu Segaud

[permalink] [raw]
Subject: Re: The ext3 way of journalling

Vous m'avez dit r?cemment :

> On 2008-01-08, John Stoffel <[email protected]> wrote:
>> Look at your filesystems, using 'tune2fs' and see if the ext3 journal
>> is actually turned on and used. If it's not, then I can see why
>> you're having problems on reboots.
>
> Journalling is on, but it's no use because the superblock always has
> corrupted last-checked time at boot. "File system check forced: 31352
> days since last check" or so.

fix your hardware clock then

--
Mathieu

2008-01-09 13:54:17

by Martin Schwidefsky

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On Jan 9, 2008 1:25 PM, Theodore Tso <[email protected]> wrote:
> On Wed, Jan 09, 2008 at 10:54:11AM +0100, Martin Schwidefsky wrote:
> > Actually you can force Windows to accept a hardware clock in UTC:
> > HKEY_LOCAL_MACHINE/SYSTEMCurrentControlSetControl/TimeZoneInformation/RealTimeIsUniversal
>
> Oh, so cool!!! Do you know off hand what version of Windows started
> honoring that registry setting?

I have a Windows XP next to Linux on my home box. So I can say from
experience that Windows XP works. I have no idea if older versions had
that registry entry as well.

> And what do you set that registry value to? Just a boolean "true"?

I can check my dual-boot machine when I get home. I think it was just "1".

> Now, how to convince Ubuntu to put this in their FAQ so I stop having
> their ahhh, less than clueful dual-booting Windows users who happen to
> live in Europe stop submitting bugs on this issue....

An entry in some FAQ would indeed be helpful. It would have saved me
some hassle.

--
blue skies,
Martin

2008-01-09 14:17:19

by Tuomo Valkonen

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On 2008-01-09, Mathieu SEGAUD <[email protected]> wrote:
> fix your hardware clock then

It displays just the right time. On boot anyway. (Linux has had some
serious problems keeping the time after the switch from 2.6.7 to 2.6.14,
advanding even 15 minutes a day -- that ntpd doesn't seem to be able
to keep up with -- requiring running adjtimexconfig every now and
then for new settings. But the cmos clock displays the right time.)

--
Tuomo

2008-01-09 19:47:59

by Martin Schwidefsky

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On Jan 9, 2008 2:53 PM, Martin Schwidefsky <[email protected]> wrote:
> On Jan 9, 2008 1:25 PM, Theodore Tso <[email protected]> wrote:
> > On Wed, Jan 09, 2008 at 10:54:11AM +0100, Martin Schwidefsky wrote:
> > > Actually you can force Windows to accept a hardware clock in UTC:
> > > HKEY_LOCAL_MACHINE/SYSTEMCurrentControlSetControl/TimeZoneInformation/RealTimeIsUniversal
> >
> > Oh, so cool!!! Do you know off hand what version of Windows started
> > honoring that registry setting?
>
> I have a Windows XP next to Linux on my home box. So I can say from
> experience that Windows XP works. I have no idea if older versions had
> that registry entry as well.
>
> > And what do you set that registry value to? Just a boolean "true"?
>
> I can check my dual-boot machine when I get home. I think it was just "1".

RealTimeIsUniversal is a REG_DWORD with content "1".

--
blue skies,
Martin

2008-01-10 11:50:32

by Helge Hafting

[permalink] [raw]
Subject: Re: The ext3 way of journalling

Matthias Schniedermeyer wrote:
>>> Don't use udev then. Good old static dev works fine if you have a fixed
>>> set of devices.
>>>
>> It doesn't, with the unpredictable SCSI mapping insanity.
>>
>
> That what LABEL und UUID-Support in mount is for.
>
> You label the filesystems (e2label for ext2 and ext3) and use that label to mount them
>
> - fstab -
> LABEL=root / xfs defaults,noatime 0 1
> LABEL=boot /boot ext2 defaults,noatime 0 2
>
Would've been nice if they worked, but they don't.

Disks should be so easy to identify uniquely, because they have
storage space that can be used for that label.

So I tried (debian linux, last year).

Mount by label was fine, of course.
Until the 33rd reboot, when it was decided that a
fsck was necessary "just to be safe". The problem was that fsck
fail to find the correct device when /etc/fstab specifies a label
instead of a device. The boot failed, reboot with init=/bin/sh
and replace the dysfunctional labels with oldfashioned device names.

I can live with this kind of problem on my desktop, but this machine
was going to be a internet router for a customer, so occational
boot failure requiring intervention was not an option.

Helge Hafting





2008-01-10 13:17:40

by Theodore Ts'o

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On Wed, Jan 09, 2008 at 02:16:52PM +0000, Tuomo Valkonen wrote:
> On 2008-01-09, Mathieu SEGAUD <[email protected]> wrote:
> > fix your hardware clock then
>
> It displays just the right time. On boot anyway. (Linux has had some
> serious problems keeping the time after the switch from 2.6.7 to 2.6.14,
> advanding even 15 minutes a day -- that ntpd doesn't seem to be able
> to keep up with -- requiring running adjtimexconfig every now and
> then for new settings. But the cmos clock displays the right time.)

What do you mean by "on boot"? Which boot message, precisely? Is the
time printed before or after e2fsck is run, and by which program?

- Ted

2008-01-10 13:42:44

by Tuomo Valkonen

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On 2008-01-10 08:16 -0500, Theodore Tso wrote:
> > It displays just the right time. On boot anyway. (Linux has had some
> > serious problems keeping the time after the switch from 2.6.7 to 2.6.14,
> > advanding even 15 minutes a day -- that ntpd doesn't seem to be able
> > to keep up with -- requiring running adjtimexconfig every now and
> > then for new settings. But the cmos clock displays the right time.)
>
> What do you mean by "on boot"? Which boot message, precisely? Is the
> time printed before or after e2fsck is run, and by which program?

The time is right as displayed by `date` after boot, i.e. after it has
been loaded from the CMOS clock that does keep the (local, IIRC) time
just allright. But then it often starts advancing very fast.

--
Tuomo

2008-01-10 14:02:22

by Lennart Sorensen

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On Thu, Jan 10, 2008 at 12:30:31PM +0100, Helge Hafting wrote:
> >- fstab -
> >LABEL=root / xfs defaults,noatime 0 1
> >LABEL=boot /boot ext2 defaults,noatime 0 2
> >
> Would've been nice if they worked, but they don't.
>
> Disks should be so easy to identify uniquely, because they have
> storage space that can be used for that label.
>
> So I tried (debian linux, last year).
>
> Mount by label was fine, of course.
> Until the 33rd reboot, when it was decided that a
> fsck was necessary "just to be safe". The problem was that fsck
> fail to find the correct device when /etc/fstab specifies a label
> instead of a device. The boot failed, reboot with init=/bin/sh
> and replace the dysfunctional labels with oldfashioned device names.
>
> I can live with this kind of problem on my desktop, but this machine
> was going to be a internet router for a customer, so occational
> boot failure requiring intervention was not an option.

I use this:
UUID=35963d32-f15e-497e-859a-ed1cb366b0f3 / ext3 defaults 0 1

No problem with fsck or anything on my debian system. I had problems
trying to use LABELs but UUID always worked when I tried it. Back under
sarge I had problems getting it to work though.

Works much better than trying to use disk names, since I have 3
different IDE controllers and the modules seem to never quite initialize
in the same order every time from the initramfs.

--
Len Sorensen

2008-01-10 14:41:26

by Matthias Schniedermeyer

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On 10.01.2008 12:30, Helge Hafting wrote:
> Matthias Schniedermeyer wrote:
>>>> Don't use udev then. Good old static dev works fine if you have a fixed
>>>> set of devices.
>>>>
>>> It doesn't, with the unpredictable SCSI mapping insanity.
>>>
>>
>> That what LABEL und UUID-Support in mount is for.
>>
>> You label the filesystems (e2label for ext2 and ext3) and use that label to mount them
>>
>> - fstab -
>> LABEL=root / xfs defaults,noatime 0 1
>> LABEL=boot /boot ext2 defaults,noatime 0 2
>>
> Would've been nice if they worked, but they don't.
>
> Disks should be so easy to identify uniquely, because they have
> storage space that can be used for that label.
>
> So I tried (debian linux, last year).
>
> Mount by label was fine, of course.
> Until the 33rd reboot, when it was decided that a
> fsck was necessary "just to be safe". The problem was that fsck
> fail to find the correct device when /etc/fstab specifies a label
> instead of a device. The boot failed, reboot with init=/bin/sh
> and replace the dysfunctional labels with oldfashioned device names.
>
> I can live with this kind of problem on my desktop, but this machine
> was going to be a internet router for a customer, so occational
> boot failure requiring intervention was not an option.

As written by Theodore somewhere else in this thread support for labels
in fsck came later, so maybe the fsck-version on your problematic-server
was too old.

Personally i never had a problem with labels and i use them for about
4-5 years now.





Bis denn

--
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.

2008-01-11 16:22:59

by Bodo Eggert

[permalink] [raw]
Subject: Re: The ext3 way of journalling

Matthias Schniedermeyer <[email protected]> wrote:

>> > Don't use udev then. Good old static dev works fine if you have a fixed
>> > set of devices.
>>
>> It doesn't, with the unpredictable SCSI mapping insanity.
>
> That what LABEL und UUID-Support in mount is for.
>
> You label the filesystems (e2label for ext2 and ext3) and use that label to
> mount them
>
> - fstab -
> LABEL=root / xfs defaults,noatime 0 1
> LABEL=boot /boot ext2 defaults,noatime 0 2

What can happen if someone does tune2fs -Lroot /dev/usbstick
and puts that stick into this system?

2008-01-11 18:39:37

by Lennart Sorensen

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On Fri, Jan 11, 2008 at 05:22:45PM +0100, Bodo Eggert wrote:
> What can happen if someone does tune2fs -Lroot /dev/usbstick
> and puts that stick into this system?

Don't know. I use UUIDs rather than LABELs. Having duplicated labels
just means being careless. Having duplicate UUIDs should require being
malicous.

--
Len Sorensen

2008-01-12 01:41:35

by Bodo Eggert

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On Fri, 11 Jan 2008, Lennart Sorensen wrote:
> On Fri, Jan 11, 2008 at 05:22:45PM +0100, Bodo Eggert wrote:

> > What can happen if someone does tune2fs -Lroot /dev/usbstick
> > and puts that stick into this system?
>
> Don't know. I use UUIDs rather than LABELs. Having duplicated labels
> just means being careless. Having duplicate UUIDs should require being
> malicous.

That's exactly what you have to assume for your users. Otherwise, you could
remove any security feature from the system.
--
Fun things to slip into your budget
Not in a budget, but in an annual report:
An employee stole 500,000+. They accounted for it on the annual report as
'involountary employee relations expense'

2008-01-12 07:15:52

by Tim Connors

[permalink] [raw]
Subject: Re: The ext3 way of journalling

Bodo Eggert <[email protected]> said on Sat, 12 Jan 2008 02:41:17 +0100 (CET):
> On Fri, 11 Jan 2008, Lennart Sorensen wrote:
> > On Fri, Jan 11, 2008 at 05:22:45PM +0100, Bodo Eggert wrote:
>
> > > What can happen if someone does tune2fs -Lroot /dev/usbstick
> > > and puts that stick into this system?
> >
> > Don't know. I use UUIDs rather than LABELs. Having duplicated labels
> > just means being careless. Having duplicate UUIDs should require being
> > malicous.
>
> That's exactly what you have to assume for your users. Otherwise, you could
> remove any security feature from the system.

If they've got physical access to your machine, you've already lost.

--
TimC
A bug in the code is worth two in the documentation. --unknown

2008-01-12 10:08:53

by Matthias Schniedermeyer

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On 12.01.2008 18:10, TimC wrote:
> Bodo Eggert <[email protected]> said on Sat, 12 Jan 2008 02:41:17 +0100 (CET):
> > On Fri, 11 Jan 2008, Lennart Sorensen wrote:
> > > On Fri, Jan 11, 2008 at 05:22:45PM +0100, Bodo Eggert wrote:
> >
> > > > What can happen if someone does tune2fs -Lroot /dev/usbstick
> > > > and puts that stick into this system?
> > >
> > > Don't know. I use UUIDs rather than LABELs. Having duplicated labels
> > > just means being careless. Having duplicate UUIDs should require being
> > > malicous.
> >
> > That's exactly what you have to assume for your users. Otherwise, you could
> > remove any security feature from the system.
>
> If they've got physical access to your machine, you've already lost.

As a last resort there is always the option to encrypt everything.

Of course you loose the LABEL & UUID support with that.

But i circumvented that by a custom udev script and marking the MBR in
the documented 4 bytes for an ID that is used by said script to create
an appropriate symlink.

Together with a matching autofs-conf i can still automatically mount all
my >50 encrypted HDDs i have stacked on my shelf. :-)






Bis denn

--
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.

2008-01-12 15:06:51

by Theodore Ts'o

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On Thu, Jan 10, 2008 at 03:41:11PM +0200, Tuomo Valkonen wrote:
> On 2008-01-10 08:16 -0500, Theodore Tso wrote:
> > > It displays just the right time. On boot anyway. (Linux has had some
> > > serious problems keeping the time after the switch from 2.6.7 to 2.6.14,
> > > advanding even 15 minutes a day -- that ntpd doesn't seem to be able
> > > to keep up with -- requiring running adjtimexconfig every now and
> > > then for new settings. But the cmos clock displays the right time.)
> >
> > What do you mean by "on boot"? Which boot message, precisely? Is the
> > time printed before or after e2fsck is run, and by which program?
>
> The time is right as displayed by `date` after boot, i.e. after it has
> been loaded from the CMOS clock that does keep the (local, IIRC) time
> just allright. But then it often starts advancing very fast.

So running the "date" command after the boot sequence is completely
finished. That doesn't mean that system clock was correct at the time
when fsck is run.

See, here's the the problem. You have the CMOS hardware clock, which
for people who are dual-booting with Windows, is unfortunately ticking
local time, instead of GMT time (or if you want to be pedantic, UTC
time; whatever). When the kernel is first loaded and starts
executing, it will set the Linux system clock from the CMOS hardware
clock. However, it has *no* idea whether the CMOS hardware clock is
ticking localtime or UTC time. The Linux system clock (i.e., what is
returned via the gettimeofday() or time() functions) is always UTC
time.

What happens later is that distribution init scripts will adjust the
system clock either forward or backwards if the system is set up so
that hardware is in Windows bug-compatibility mode where the CMOS
hwclock is ticking localtime. If it is 1400 GMT, then in the
US/Eastern timezone, the clock will be 9:00am, so the clock will be
pushed four hours later. If you are in the Central European Timezone,
then the local time will be 3pm, and the clock will be pushed
*backwards* by one hour.

The question is when does this happen. In some buggy distributions,
this happens *after* e2fsck is run. And it is in those distributions
e2fsck can sometimes get confused about when the last time the
filesystem was checked --- especially if the system is getting
rebooted a lot (which tends to be the case with people who are
dual-booting). So the cases where this happens a lot are (a) people
who are using windows and so the CMOS hwclock is ticking localtime,
(b) distributions that don't adjust the Linux system clock before
e2fsck runs. Unfortunately Ubuntu users in Europe fit this
demographic hugely, and Ubuntu refuses to fix this problem[1], so it's
been personally very vexing, because the users complain to *me*, and I
can't fix the problem, because it's a distribution init script issue.

So what I tell people is to upgrade to the latest e2fsprogs, and then
set in /etc/e2fsck.conf:

[options]
buggy_init_scripts = 1

Maybe someday Ubuntu will get this right --- but I'm not counting on it.


[1] Something about installer CD's, and not wanting to ask the users
any questions, not even what time zone they are in, or some other
crazyness. I never completely understood the argument and their
design constraints.

- Ted

P.S. If there are other scripts which are started, they can also get
confused because the time is getting warped backwards early-on. I
haven't done an analysis to find out which sort programs might be
vulnerable to this, but this is not necessarily an e2fsck-specific
problems. After all, it *is* reasonable to expect that the time
returned by time(0) or gettimeofday() is correct, and many programs do
make that assumption....

2008-01-12 19:25:03

by Andrey Vul

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On Jan 12, 2008 10:06 AM, Theodore Tso <[email protected]> wrote:
[snip]
> Unfortunately Ubuntu users [snip] fit this demographic hugely, and
> Ubuntu refuses to fix this problem[1], so it's been personally very
> vexing, because the users complain to *me*, and I can't fix the problem,
> because it's a distribution init script issue.
Ubuntu refuses to be power user friendly. They've forgotten the True
Meaning (tm) of Linux and try to be Windows-friendly, i.e., No Choices
(tm).


> Maybe someday Ubuntu will get this right --- but I'm not counting on it.
The alternative CD installer still looks like a semi-dumbed-down
debian installer. Hell, even the command-line base install is severely
bloated - it's the exact opposite of LFS or gentoo.
Still, it's *usable* in comparison to the livecd.
>
> [1] Something about installer CD's, and not wanting to ask the users
> any questions, not even what time zone they are in, or some other
> crazyness. I never completely understood the argument and their
> design constraints.
Idiot friendliness and no exceptions to power users (e.g.., bloated
init scripts, UUID fstab). I switched to debian-unstable ages ago
*just* because apt is _really_ easy to use. Which I use secondarily to
Gentoo, where things Just Work (tm), once you patch the package
ebuilds to process your .patch files anyway and, while the packages
have *lots* of patches, it doesn't bloat the code *and* you can
disable the patches with the "vanilla" USE flag.

--
Andrey Vul

2008-01-13 22:13:58

by Tuomo Valkonen

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On 2008-01-12 10:06 -0500, Theodore Tso wrote:
> So running the "date" command after the boot sequence is completely
> finished. That doesn't mean that system clock was correct at the time
> when fsck is run.

Unless ntpd has managed to change it by that time, it was correct,
in the local time zone. But judging from the "too big difference,
refusing to correct" behaviour of ntpd while the system is up,
I doubt ntpd would correct the time at boot, if it were considerably
wrong. This rapidly advancing system clock is A COMPLETELY DIFFERENT
PROBLEM that started with switch from 2.6.7 to 2.6.14. Before that
it worked just fine.

> Unfortunately Ubuntu users in Europe fit this
> demographic hugely, and Ubuntu refuses to fix this problem[1], so it's
> been personally very vexing, because the users complain to *me*, and I
> can't fix the problem, because it's a distribution init script issue.

FYI, I'm not running Ubuntu. This system once used to by Debian,
but since Etch is already obsolete wrt. many non-base software,
and I'm not going to suffer stable/unstable anymore, and megafrozen
megadistros with everything between the earth and skies suck anyway,
largely consists of self-installed software by now. Maybe they broke
time initialisations for etch.

Also, I must say that e2fsck is brain-damaged, if it can be confused
by/do the stupid then when the system clock has warped by just a few
hours, not the _days_ that a file system check interval typically is,
and users need to specifically kludge around such misbehaviour in
e2fsck.

--
Tuomo

2008-01-13 22:23:23

by Tuomo Valkonen

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On 2008-01-14 00:13 +0200, Tuomo Valkonen wrote:
> Also, I must say that e2fsck is brain-damaged, if it can be confused
> by/do the stupid then when the system clock has warped by just a few
> hours, not the _days_ that a file system check interval typically is,
> and users need to specifically kludge around such misbehaviour in
> e2fsck.

Just to clarify, I had about 60 days of uptime, and hence at least
60 days since the last FS check/mount/etc., when Linux crashed those
few days ago, and wanted to start checking disks with "9192 days since
last file system check".

--
Tuomo

2008-01-13 23:12:07

by Theodore Ts'o

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On Mon, Jan 14, 2008 at 12:23:10AM +0200, Tuomo Valkonen wrote:
> On 2008-01-14 00:13 +0200, Tuomo Valkonen wrote:
> > Also, I must say that e2fsck is brain-damaged, if it can be confused
> > by/do the stupid then when the system clock has warped by just a few
> > hours, not the _days_ that a file system check interval typically is,
> > and users need to specifically kludge around such misbehaviour in
> > e2fsck.
>
> Just to clarify, I had about 60 days of uptime, and hence at least
> 60 days since the last FS check/mount/etc., when Linux crashed those
> few days ago, and wanted to start checking disks with "9192 days since
> last file system check".

Well, let's see. 9192 days is a little over 25 years, so that means
the filesystem was marked as having done an fsck in 2008-25 or roughly
1983. If you're not seeing any other corruption when e2fsck runs,
it's highly unlikely that the superblock is getting corrupted. It's
much more likely that this early in your boot cycle, your clock is
sometimes incorrect.

My suggestion to you is to rig your init scripts to print out the the
current time/date using "/bin/date" and to print out the superblock
information using "dumpe2fs -h /dev/hdXX" and record the information
someplace useful. A simple way to do this would be via the following
command inserted into /etc/init.d/checkroot.sh:

(date; /sbin/dumpe2fs -h /dev/XXX) | logsave -a /var/log/boot-debug -

where you've replaced /dev/XXX with the block device of the filesystem
which keeps on getting checked erroneously.

All I can say is that most people aren't see what you're seeing, so
there is something unique about your system which is causing this
problem to show up. 9192 days means it's not the time going backwards
scenario; somehow the last checked value is getting set to some very
bogus value. Normally the only way this could happen is for the time
to be set to a bogus value (i.e., 1982) when the filesystem check
takes place. Is the "9192" number roughly constant, or is it always
changing?

I wonder if the battery-backed hardware clock in your system is
busted, and so you're always starting the system with some completely
bogus time. If your machine is on the network, then the "ntpdate"
program could be setting your time so that it looks correct, but
that's after e2fsck is run. If you really, really, can't guarantee
that the time on your system is correct in early boot, about the only
thing you really *can* do is to use the command "tune2fs -i 0
/dev/XXX" and disable time-based checks altogether.

Regards and best of luck,

- Ted

2008-01-14 01:08:35

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: The ext3 way of journalling

In article <[email protected]> you wrote:
> Just to clarify, I had about 60 days of uptime, and hence at least
> 60 days since the last FS check/mount/etc., when Linux crashed those
> few days ago, and wanted to start checking disks with "9192 days since
> last file system check".

This, however sounds like a typical "RTC has forgotten time" problem which
is typical for some motherboards. 9192days is very close to 1984. I never
see any coruption like that, but broken BIOS clocks quite often.

Gruss
Bernd

2008-01-14 07:26:43

by Tuomo Valkonen

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On 2008-01-13 18:11 -0500, Theodore Tso wrote:
> It's much more likely that this early in your boot cycle, your clock is
> sometimes incorrect.

I doubt it. I get this nearly _always_ when the system crashes, which
accounts for the vast majority of the times I boot it. (I wish swsusp
didn't suck so much..)

> Is the "9192" number roughly constant, or is it always changing?

No. That's the number I got last time, but typically I've got
something in the 3xxxx range.

> If your machine is on the network, then the "ntpdate"
> program could be setting your time so that it looks correct, but
> that's after e2fsck is run.

ntpdate isn't run by any of the init scripts. ntpd is, but like I
already mentioned, I doubt it would correct vastly incorrect time,
not even being able to track and correct when it advances fast.

--
Tuomo

2008-01-14 09:42:50

by Bernd Petrovitsch

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On Mon, 2008-01-14 at 09:15 +0200, Tuomo Valkonen wrote:
[...]
> ntpdate isn't run by any of the init scripts. ntpd is, but like I

Yes, that is a usual bug/problem in common distributions[0] as there is
no real guarantee that your clock is not far off.
Add your timeservers in /etc/ntp/step-tickers whereever your
distribution looks into to decide if ntpdate should run or activate the
ntpdate init.d script.
And some distributions run `ntpdate` quite late BTW ....

> already mentioned, I doubt it would correct vastly incorrect time,
> not even being able to track and correct when it advances fast.

That the reason to activate `ntpdate` unconditionally: It sets the
current time to an (somewhat) accurate value and `ntpd` handles the
rest.

Bernd

[0]: Perhaps there is some reason for this. However I don't of any and
none came ever to my mind.
--
Firmix Software GmbH http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
Embedded Linux Development and Services

2008-01-14 09:48:26

by Tuomo Valkonen

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On 2008-01-14, Bernd Petrovitsch <[email protected]> wrote:
> Yes, that is a usual bug/problem in common distributions[0] as there is
> no real guarantee that your clock is not far off.

It isn't, right after boot. But while the system is on, it sometimes
starts advancing very fast, 15min a day or so. To my knowledge, the
time the CMOS clock is not used then, but rather the kernel tracks the
time based on scheduler interrupts, with ntpd occasionally correcting.
However, ntpd refuses to correct when the time has drifted too much,
causing even further drift.


> That the reason to activate `ntpdate` unconditionally: It sets the
> current time to an (somewhat) accurate value and `ntpd` handles the
> rest.

Nope, as explained above. ntpdate at boot wouldn't help much, because
the time is (approximately) correct after boot. It only drifts after it.

--
Tuomo

2008-01-14 09:57:27

by Bernd Petrovitsch

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On Mon, 2008-01-14 at 09:48 +0000, Tuomo Valkonen wrote:
> On 2008-01-14, Bernd Petrovitsch <[email protected]> wrote:
> > Yes, that is a usual bug/problem in common distributions[0] as there is
> > no real guarantee that your clock is not far off.
>
> It isn't, right after boot. But while the system is on, it sometimes
> starts advancing very fast, 15min a day or so. To my knowledge, the
> time the CMOS clock is not used then, but rather the kernel tracks the

ACK.

> time based on scheduler interrupts, with ntpd occasionally correcting.
> However, ntpd refuses to correct when the time has drifted too much,
> causing even further drift.

That shouldn't happen.

> > That the reason to activate `ntpdate` unconditionally: It sets the
> > current time to an (somewhat) accurate value and `ntpd` handles the
> > rest.
>
> Nope, as explained above. ntpdate at boot wouldn't help much, because
> the time is (approximately) correct after boot. It only drifts after it.

Aha. That's also strange. `ntpd` is able to (and always does AFAIK)
modify the speed of the clock (to keep it synchronized) so that the
error is usually much smaller than 1 second - also if you are behind
high-jitter links and/or an a high stratum.
That leads to the question why the clock starts to run like crazy at
some time so that `ntpd` can't cope with it.
Playing with `ntpd` parameters (e.g. increasing ) doesn't help I assume.

Bernd
--
Firmix Software GmbH http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
Embedded Linux Development and Services

2008-01-14 10:06:39

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: The ext3 way of journalling

Tuomo Valkonen <[email protected]> writes:

> It isn't, right after boot. But while the system is on, it sometimes
> starts advancing very fast, 15min a day or so.

So why don't you fix it first? Correct system time is essential.

I guess I would upgrade to some newer version, perhaps one which isn't
more than two years old...
--
Krzysztof Halasa

2008-01-14 10:46:57

by Christer Weinigel

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On Mon, 14 Jan 2008 10:57:09 +0100
Bernd Petrovitsch <[email protected]> wrote:

> On Mon, 2008-01-14 at 09:48 +0000, Tuomo Valkonen wrote:
> > On 2008-01-14, Bernd Petrovitsch <[email protected]> wrote:
> > > Yes, that is a usual bug/problem in common distributions[0] as
> > > there is no real guarantee that your clock is not far off.
> >
> > It isn't, right after boot. But while the system is on, it sometimes
> > starts advancing very fast, 15min a day or so. To my knowledge, the
> > time the CMOS clock is not used then, but rather the kernel tracks
> > the
>
> > time based on scheduler interrupts, with ntpd occasionally
> > correcting. However, ntpd refuses to correct when the time has
> > drifted too much, causing even further drift.
>
> That shouldn't happen.

> > Nope, as explained above. ntpdate at boot wouldn't help much,
> > because the time is (approximately) correct after boot. It only
> > drifts after it.
>
> Aha. That's also strange. `ntpd` is able to (and always does AFAIK)
> modify the speed of the clock (to keep it synchronized) so that the
> error is usually much smaller than 1 second - also if you are behind
> high-jitter links and/or an a high stratum.
> That leads to the question why the clock starts to run like crazy at
> some time so that `ntpd` can't cope with it.
> Playing with `ntpd` parameters (e.g. increasing ) doesn't help I
> assume.

NTP can't correct too large errors. I had some similar problems
(one of many problems) with my HP Proliant desktop. Something totally
messed up the timekeeping, so the system would drift up to an hour or
so per day. I don't know what was wrong, I tried a lot of combinations
of timer options, but couldn't find any that fixed it. A kernel
upgrade a couple of weeks later fixed those problems and the system
time has been stable since then.

So upgrading to a recent kernel is probably a good idea.

/Christer

2008-01-14 11:03:40

by Tuomo Valkonen

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On 2008-01-14 11:06 +0100, Krzysztof Halasa wrote:
> So why don't you fix it first? Correct system time is essential.

I've tried tuning it with adjtimex and everything, and sometimes it
works for days, but then just suddenly the clock starts advancing.

> I guess I would upgrade to some newer version, perhaps one which isn't
> more than two years old...

As I have stated earlier, upgrading Linux has become too painful by
compiling from source, and the distros provide even worse crap as
their stock kernels.

--
Tuomo

2008-01-14 11:12:17

by Tuomo Valkonen

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On 2008-01-14 10:57 +0100, Bernd Petrovitsch wrote:
> That leads to the question why the clock starts to run like crazy at
> some time so that `ntpd` can't cope with it.

I do wonder whether the PSU could've been causing it. Now that think
about it, I got the PSU around two years ago, just like I compiled
2.6.14. This PSU coincidentally seems to have been the cause of the
crash that started this thread, and went completely silent during
the same day, on the third crash. But even if the PSU could cause
the timer interrupt to signal too frequently or so, doesn't explain
why nearly always after a crash (when journal recovery would be the
normal course of action), fsck starts checking with absurd intervals
since last check, whereas there's no trouble booting after normal
shutdown.

--
Tuomo

2008-01-14 11:18:27

by Bernd Petrovitsch

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On Mon, 2008-01-14 at 13:11 +0200, Tuomo Valkonen wrote:
> On 2008-01-14 10:57 +0100, Bernd Petrovitsch wrote:
> > That leads to the question why the clock starts to run like crazy at
> > some time so that `ntpd` can't cope with it.
>
> I do wonder whether the PSU could've been causing it. Now that think

We have some embedded systems where some strange problems[0] were caused
by bad/cheap/low-quality PSUs.

> about it, I got the PSU around two years ago, just like I compiled
> 2.6.14. This PSU coincidentally seems to have been the cause of the
> crash that started this thread, and went completely silent during
> the same day, on the third crash. But even if the PSU could cause
> the timer interrupt to signal too frequently or so, doesn't explain
> why nearly always after a crash (when journal recovery would be the
> normal course of action), fsck starts checking with absurd intervals
> since last check, whereas there's no trouble booting after normal
> shutdown.

But for normal PCs, I don't know how much the quality of a PSU is
relevant for the speed of the clock.
Can you test with a different PSU?

Bernd

[0]: I don't know more details out of the top of my head.
--
Firmix Software GmbH http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
Embedded Linux Development and Services

2008-01-14 11:27:48

by Tuomo Valkonen

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On 2008-01-14, Bernd Petrovitsch <[email protected]> wrote:
> But for normal PCs, I don't know how much the quality of a PSU is
> relevant for the speed of the clock.
> Can you test with a different PSU?

I am testing right now. After all I had to get a new PSU, the old one
being as dead as a rock. But it will take time to see the results.

--
Tuomo

2008-01-14 12:47:09

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: The ext3 way of journalling

Tuomo Valkonen <[email protected]> writes:

>> So why don't you fix it first? Correct system time is essential.
>
> I've tried tuning it with adjtimex and everything, and sometimes it
> works for days, but then just suddenly the clock starts advancing.

Nothing will make it work reliably if the system clock isn't stable.

> As I have stated earlier, upgrading Linux has become too painful by
> compiling from source,

It works for me.

> and the distros provide even worse crap as
> their stock kernels.

That works for me, too.


The remaining options are c) to live with present situation, and
d) to give up using Linux (computers etc).

Or maybe e) to get a motherboard which isn't broken (with the kernel
you want to use).
--
Krzysztof Halasa

2008-01-14 16:11:14

by Bob Copeland

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On 1/14/08, Tuomo Valkonen <[email protected]> wrote:
> On 2008-01-13 18:11 -0500, Theodore Tso wrote:
> > It's much more likely that this early in your boot cycle, your clock is
> > sometimes incorrect.
>
> I doubt it. I get this nearly _always_ when the system crashes, which
> accounts for the vast majority of the times I boot it. (I wish swsusp
> didn't suck so much..)

It sounds like you have CONFIG_PM_TRACE turned on. From the Kconfig help:

This enables some cheesy code to save the last PM event point in the
RTC across reboots, so that you can debug a machine that just hangs
during suspend (or more commonly, during resume).

[...]

CAUTION: this option will cause your machine's real-time clock to be
set to an invalid time after a resume.

-Bob

2008-01-14 16:18:06

by Tuomo Valkonen

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On 2008-01-14, [email protected] <[email protected]> wrote:
> It sounds like you have CONFIG_PM_TRACE turned on. From the Kconfig help:

It isn't listed in /proc/config.gz. No, I don't think I even have
swsusp stuff compiled in, if it's related to that.

--
Tuomo

2008-01-14 16:18:45

by Lennart Sorensen

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On Mon, Jan 14, 2008 at 01:46:56PM +0100, Krzysztof Halasa wrote:
> Nothing will make it work reliably if the system clock isn't stable.

I remember my nforce2 board having totally insane clock behaviour back
around 2.6.14/2.6.15 or so. It has since been fixed in newer kernels.
I seem to recall some ATI chipsets were even more insane than the nvidia
at the time, with some running double speed for the system time.

--
Len Sorensen

2008-01-14 22:39:45

by John Hubbard

[permalink] [raw]
Subject: Re: The ext3 way of journalling

Tuomo Valkonen wrote:
> On 2008-01-13 18:11 -0500, Theodore Tso wrote:
>> It's much more likely that this early in your boot cycle, your clock is
>> sometimes incorrect.
>
> I doubt it. I get this nearly _always_ when the system crashes, which
> accounts for the vast majority of the times I boot it. (I wish swsusp
> didn't suck so much..)
>
>> Is the "9192" number roughly constant, or is it always changing?
>
> No. That's the number I got last time, but typically I've got
> something in the 3xxxx range.
>
>> If your machine is on the network, then the "ntpdate"
>> program could be setting your time so that it looks correct, but
>> that's after e2fsck is run.
>
> ntpdate isn't run by any of the init scripts. ntpd is, but like I
> already mentioned, I doubt it would correct vastly incorrect time,
> not even being able to track and correct when it advances fast.
>

ntpd will allow an initial correction that is arbitrarily large, if run
with the -g option. This is a commonly used option. I see it is running
on my stock Fedora Core 8, for example. So there is often no need to run
ntpdate.

Also, ntpd keeps track of how fast your local clock tends to drift, and
attempts to compensate. So, even if the local clock runs quite fast or
slow, you'll normally get good results. The exception would be if you
clock's drift rate jumps around; for example: fast today, slow tomorrow.

On most systems, ntpd will also copy the current time back to the CMOS,
periodically, and during an orderly shutdown.

Hope that adds some clarity.

thanks,
John Hubbard

Subject: Re: The ext3 way of journalling

El Mon, 14 Jan 2008 11:18:28 -0500
[email protected] (Lennart Sorensen) escribió:

> On Mon, Jan 14, 2008 at 01:46:56PM +0100, Krzysztof Halasa wrote:
> > Nothing will make it work reliably if the system clock isn't stable.
>
> I remember my nforce2 board having totally insane clock behaviour back
> around 2.6.14/2.6.15 or so. It has since been fixed in newer kernels.

My experience too with a Uli 1697 based mb. Estrange clock behaviour with
kernel around .15-20 but fixed now (suffering too with ext3 fsck now i use jfs)

Back in the day i blamed the new mb but now that it runs fine i can only blame
the kernel or ubuntu user space.

> I seem to recall some ATI chipsets were even more insane than the nvidia
> at the time, with some running double speed for the system time.
>
> --
> Len Sorensen
> --

2008-01-15 01:09:18

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: The ext3 way of journalling

[email protected] (Lennart Sorensen) writes:

> I remember my nforce2 board having totally insane clock behaviour back
> around 2.6.14/2.6.15 or so. It has since been fixed in newer kernels.
> I seem to recall some ATI chipsets were even more insane than the nvidia
> at the time, with some running double speed for the system time.

Right, I remember some reports about that, probably IOAPIC or other
HPET issues. Personally never seen that. Thus the suggestion of kernel
upgrade.
--
Krzysztof Halasa

2008-01-15 16:31:18

by Lennart Sorensen

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On Tue, Jan 15, 2008 at 12:13:51AM +0100, Alejandro Riveira Fern??ndez wrote:
> My experience too with a Uli 1697 based mb. Estrange clock behaviour with
> kernel around .15-20 but fixed now (suffering too with ext3 fsck now i use jfs)
>
> Back in the day i blamed the new mb but now that it runs fine i can only blame
> the kernel or ubuntu user space.

A lot was due to bugs in the BIOS setup on many of the boards, and a few
even due to bugs in the chip design (like ATI's as far as I remember),
which the kernel then had to detect and work around.

--
Len Sorensen

2008-01-15 16:32:55

by Lennart Sorensen

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On Tue, Jan 15, 2008 at 02:09:02AM +0100, Krzysztof Halasa wrote:
> Right, I remember some reports about that, probably IOAPIC or other
> HPET issues. Personally never seen that. Thus the suggestion of kernel
> upgrade.

I believe the ATI chipset managed to somehow get the timer interrupt to
arive both on the legacy 8259 and the APIC causing each timer tick to
count twice. Nice way to double your system time rate. The nforce2
didn't double, it just ran fast some of the time, which I have no idea
how happened, but it went away with newer kernels (and wasn't there with
earlier kernels either).

--
Len Sorensen

2008-02-08 04:16:17

by Rogelio Serrano

[permalink] [raw]
Subject: Re: The ext3 way of journalling

On Jan 9, 2008 12:07 AM, Tuomo Valkonen <[email protected]> wrote:
> One should always indicate the version of software when complaining. Well,
>
> $ uname -a
> Linux noi 2.6.14 #1 PREEMPT Sun Oct 30 20:18:48 EET 2005 i686 GNU/Linux
>
> I've tried upgrading, and failed: the megatonne monolith with a gazillion
> hidden options (and totally worthless make oldconfig) is impossible to
> compile these days, and the distros' stock kernel are utter and total crap
> that load drivers in wrong order etc., and are difficult to configure
> (demanding crap that demands udev to edit their initrds). Not to even
> speak of the udev-demanding scsi-mapping insanity of SATA etc. devices
> these days.
>

somebody has been reading the Unix Haters Handbook...

I like that book too. In a way it a very good guideline for unix developers.


> I've had it with Linux. It's no longer for power users. It's so complex
> that it's only for idiot users that are content with the shoddy defaults,
> and (paid) developers.
>
> --
> Tuomo
>

I said that 2 weeks ago. And after trying out alternatives im back to linux.

But i bought the minix 3 book...

--
Lay low and nourish in obscurity