2007-01-13 20:59:21

by Andries E. Brouwer

[permalink] [raw]
Subject: Re: 2.6 Partitioning bug with LILO

On Fri, Jan 12, 2007 at 06:18:00PM -0500, Kris Karas wrote:
> Hello Andries,
>
> I noticed you're listed as the maintainer for the disk
> geometry/partitioning logic in the 2.6 kernel, so I'm sending this to
> you, as I think this bug is most likely in that part of the code, ...
>
> I've been bug-hunting with John Coffman to solve an issue where running
> LILO trashes the ext2 metadata on my /boot partition. The consensus so
> far is that it's not LILO at all, but rather some subtle bug in the
> kernel that's the culprit. I can reproduce it easily enough on 2.6, but
> not on 2.4, which further suggests its kernel-related.
>
> If one does:
>
> umount /boot
> e2fsck -f -y /dev/hda1
> mount /dev/hda1 /boot
> lilo
> umount /boot
> e2fsck -f -y /dev/hda1
>
> the second run of e2fsck will report fixable block bitmap errors.

It is easy to see the cause of this.
There is an old problem with the Linux whole disk device /dev/hda
namely that there is aliasing with /dev/hdaN.
I don't know whether there exist kernels that fix this problem -
as far as I know this has been a problem since ancient times.

But, given the fact that this is a well-known problem of the kernel,
a utility should be careful to avoid problems and flush buffers
at appropriate times.

Now that lilo is one of the few utilities to write to /dev/hda,
it should be fixed.

[And yes, also the kernel should be fixed.]

Andries

[let me cc linux-kernel]