2002-08-01 13:43:46

by Ingo Molnar

[permalink] [raw]
Subject: [bug, 2.5.29, IDE] partition table corruption?


using 2.5.29 (vanilla or BK-curr) i cannot use /sbin/lilo anymore to
update the partition table.

if i do it then the partition table gets corrupted and the system does not
boot - it stops at 'LI'. (iirc meaning that the second-stage loader does
not load?) Using a recovery CD fixes the problem, so it's only the
partition info that got trashed, not the filesystem.

i use IDE disks.

this makes development under 2.5.29 quite inconvenient - i have to boot
back into another kernel whenever loading a new kernel.

Ingo



2002-08-01 13:50:26

by Marcin Dalecki

[permalink] [raw]
Subject: Re: [bug, 2.5.29, IDE] partition table corruption?

Uz.ytkownik Ingo Molnar napisa?:
> using 2.5.29 (vanilla or BK-curr) i cannot use /sbin/lilo anymore to
> update the partition table.
>
> if i do it then the partition table gets corrupted and the system does not
> boot - it stops at 'LI'. (iirc meaning that the second-stage loader does
> not load?) Using a recovery CD fixes the problem, so it's only the
> partition info that got trashed, not the filesystem.
>
> i use IDE disks.
>
> this makes development under 2.5.29 quite inconvenient - i have to boot
> back into another kernel whenever loading a new kernel.

And what leads you to the assumption that it is actually the
IDE code, which is to be blamed for this?

2002-08-01 13:51:46

by Alan

[permalink] [raw]
Subject: Re: [bug, 2.5.29, IDE] partition table corruption?

On Thu, 2002-08-01 at 14:45, Ingo Molnar wrote:
> if i do it then the partition table gets corrupted and the system does not
> boot - it stops at 'LI'. (iirc meaning that the second-stage loader does
> not load?) Using a recovery CD fixes the problem, so it's only the
> partition info that got trashed, not the filesystem.
>
> i use IDE disks.
>
> this makes development under 2.5.29 quite inconvenient - i have to boot
> back into another kernel whenever loading a new kernel.

Does telling lilo to use "linear" mode help ? Some of the geometry stuff
in 2.5 seems a bit broken.

2002-08-01 14:56:23

by Alan

[permalink] [raw]
Subject: Re: [bug, 2.5.29, IDE] partition table corruption?

On Thu, 2002-08-01 at 14:48, Marcin Dalecki wrote:
> And what leads you to the assumption that it is actually the
> IDE code, which is to be blamed for this?

Side question Martin - is the IDE flush cache on close stuff in the 2.5
tree yet. That might be a candidate for this

2002-08-01 15:04:03

by Marcin Dalecki

[permalink] [raw]
Subject: Re: [bug, 2.5.29, IDE] partition table corruption?

U?ytkownik Alan Cox napisa?:
> On Thu, 2002-08-01 at 14:48, Marcin Dalecki wrote:
>
>>And what leads you to the assumption that it is actually the
>>IDE code, which is to be blamed for this?
>
>
> Side question Martin - is the IDE flush cache on close stuff in the 2.5
> tree yet. That might be a candidate for this

main.c: printk(KERN_INFO "flushing ATA/ATAPI devices: ");


/*
* Handle power handling related events ths system informs us about.
*/
static int ata_sys_notify(struct notifier_block *this, unsigned long
event, void
*x)
{
int i;

Yes it is there.

2002-08-06 22:24:15

by Andries E. Brouwer

[permalink] [raw]
Subject: Re: [bug, 2.5.29, IDE] partition table corruption?

Ingo Molnar writes:

> using 2.5.29 (vanilla or BK-curr) i cannot use /sbin/lilo anymore
> to update the partition table.

> if i do it then the partition table gets corrupted and the system
> does not boot - it stops at 'LI'.

The standard explanation is that LILO cannot find the second stage
loader, like you say, and that happens because it looks in the wrong
place. For example, because it stores CHS coordinates in the wrong
geometry. (But it can also happen because something changed in the
disk numbering.)

"Corruption" of the partition table is to be expected only if you
ask LILO to rewrite the (CHS part of) the partition table.

The funny thing is, I removed some stuff here in 2.5.30,
so I would understand things immediately if you reported this
about 2.5.30. But for 2.5.29 I do not immediately see why
you would see any changes.

Did you in the meantime find out what was wrong?

Are things OK in 2.5.28 and wrong in vanilla 2.5.29
with the same version of LILO? (which version?)

Do you use the linear or lba32 options? The fix-table option?

What corruption do you see in the partition table?

Do you use LVM?

What happens under 2.5.30?

Andries

2002-08-07 17:48:49

by Ingo Molnar

[permalink] [raw]
Subject: Re: [bug, 2.5.29, IDE] partition table corruption?


On Wed, 7 Aug 2002, Ingo Molnar wrote:

> > What happens under 2.5.30?
>
> the same 'LI' message.
>
> I'll try Alan's suggestion of adding the 'linear' option.

this actually did the trick - lilo no more messes up the bootup. So Alan's
suspicion is right, there's something wrong about geometries in
2.5-current.

Ingo

2002-08-07 17:44:48

by Ingo Molnar

[permalink] [raw]
Subject: Re: [bug, 2.5.29, IDE] partition table corruption?


On Wed, 7 Aug 2002 [email protected] wrote:

> > using 2.5.29 (vanilla or BK-curr) i cannot use /sbin/lilo anymore
> > to update the partition table.
>
> > if i do it then the partition table gets corrupted and the system
> > does not boot - it stops at 'LI'.
>
> The funny thing is, I removed some stuff here in 2.5.30,
> so I would understand things immediately if you reported this
> about 2.5.30. But for 2.5.29 I do not immediately see why
> you would see any changes.

2.5.30 breaks as well.

> Did you in the meantime find out what was wrong?

nope. I still keep working it around.

> Are things OK in 2.5.28 and wrong in vanilla 2.5.29
> with the same version of LILO? (which version?)

a fairly standard LILO from RH 7.3: linux-21.4.4-10.

> Do you use the linear or lba32 options? The fix-table option?

I use none of these options. I use a very simple setup, a proper /boot
partition, nothing complex or unexpected.

> What corruption do you see in the partition table?

nothing in the descriptors that i can tell from looking at fdisk output -
but it would be pretty hard to recover the system via a pure rescue CD
otherwise.

> Do you use LVM?

nope. Plain old IDE, ext3fs,

> What happens under 2.5.30?

the same 'LI' message.

I'll try Alan's suggestion of adding the 'linear' option.

Ingo

2002-08-07 18:39:53

by Andries E. Brouwer

[permalink] [raw]
Subject: Re: [bug, 2.5.29, IDE] partition table corruption?

> The funny thing is, I removed some stuff here in 2.5.30,
> so I would understand things immediately if you reported this
> about 2.5.30. But for 2.5.29 I do not immediately see why
> you would see any changes.

2.5.30 breaks as well.

> Did you in the meantime find out what was wrong?

nope. I still keep working it around.

> Are things OK in 2.5.28 and wrong in vanilla 2.5.29
> with the same version of LILO? (which version?)

a fairly standard LILO from RH 7.3: linux-21.4.4-10.

> Do you use the linear or lba32 options? The fix-table option?

I use none of these options. I use a very simple setup, a proper /boot
partition, nothing complex or unexpected.

> What corruption do you see in the partition table?

nothing in the descriptors that i can tell from looking at fdisk output -
but it would be pretty hard to recover the system via a pure rescue CD
otherwise.

> Do you use LVM?

nope. Plain old IDE, ext3fs,

> What happens under 2.5.30?

the same 'LI' message.

I'll try Alan's suggestion of adding the 'linear' option.
...
this actually did the trick - lilo no more messes up the bootup.
So Alan's suspicion is right, there's something wrong about geometries
in 2.5-current.

I always like to understand all the details - forgive me if I come
with further questions.

LILO without "linear" or "lba32" is inherently broken:
it will talk CHS at boot time to the BIOS and hence needs a geometry
and install time, and nobody knows the geometry required. So, if
LILO doesnt break, this is pure coincidence.

Since 2.5.30 many people will have a different geometry, so many
people will have to find grub or a recent LILO, or add "linear"
to their old LILO. This is all well understood - I just repeat it
a few times in the hope that that will reduce the amount of email.

But now you talk about vanilla 2.5.29, and I am surprised.
Could you send the kernel boot messages concerning that disk
(dmesg | grep hd) for 2.5.28 and 2.5.29 and 2.5.30?

And you talk about corruption, and I am surprised again.
Have you verified that there really was a difference?
Or do you only suspect corruption because LILO has problem?
(In that case I can assure you that there was no corruption.)

Andries

2002-08-07 21:09:36

by Ingo Molnar

[permalink] [raw]
Subject: Re: [bug, 2.5.29, (not IDE)] partition table (not) corruption?


On Wed, 7 Aug 2002 [email protected] wrote:

> LILO without "linear" or "lba32" is inherently broken: it will talk CHS
> at boot time to the BIOS and hence needs a geometry and install time,
> and nobody knows the geometry required. So, if LILO doesnt break, this
> is pure coincidence.

well, lilo without linear worked for like years on this box ...

> Since 2.5.30 many people will have a different geometry, so many people
> will have to find grub or a recent LILO, or add "linear" to their old
> LILO. This is all well understood - I just repeat it a few times in the
> hope that that will reduce the amount of email.
>
> But now you talk about vanilla 2.5.29, and I am surprised. Could you
> send the kernel boot messages concerning that disk (dmesg | grep hd) for
> 2.5.28 and 2.5.29 and 2.5.30?

will do - it might have started in 2.5.28. But since i use the BK tree, i
might have tested an 'almost 2.5.30' 2.5.29 BK tree.

> And you talk about corruption, and I am surprised again. Have you
> verified that there really was a difference? Or do you only suspect
> corruption because LILO has problem? (In that case I can assure you that
> there was no corruption.)

you are right, there was no corruption most likely. And the IDE subsystem
is most definitely innocent.

Ingo

2002-08-08 07:48:17

by Marcin Dalecki

[permalink] [raw]
Subject: Re: [bug, 2.5.29, IDE] partition table corruption?


> Since 2.5.30 many people will have a different geometry, so many
> people will have to find grub or a recent LILO, or add "linear"
> to their old LILO. This is all well understood - I just repeat it
> a few times in the hope that that will reduce the amount of email.

I think you confuse two entierly unrelated issues a bit:

1. Remapping s single sector and thus making the behaviour of dd
if=/dev/hda /of=dev/hdb less then intuitive, namely: severly BROKEN. It
doesn't matter that this was broken for years. Now I can remember it did
bite me once I tryed to clone a system precisely in the dd way. (Of
course rerunning lilo on the clone wasn't impossible for me...) The only
thing which makes me worry here are the problems Petr was reporting about...

2. The xlate trick which was only supposed to be used by the MSDOS fs
driver and only on i386 and only if this thing was residuent and
ide-disk was not compiled as a module and so on. This is actually the
*geometry* issue. If someone needs access to an MS-DOS partition, well
he can always resort to mtools. FAT16, which is likely the affected
variant of FAT filesystem, was broken before anyway and I have still to
recheck whatever the removal of the geomtry "translation" didn't even
maybe make my CF PSION system disk readbale.

2002-08-08 09:04:45

by Adam J. Richter

[permalink] [raw]
Subject: Re: [bug, 2.5.29, IDE] partition table corruption?

Ingo Molnar writes:
>Hi Inusing 2.5.29 (vanilla or BK-curr) i cannot use /sbin/lilo anymore to
>update the partition table.
>
>if i do it then the partition table gets corrupted and the system does not
>boot - it stops at 'LI'. (iirc meaning that the second-stage loader does
>not load?) Using a recovery CD fixes the problem, so it's only the
>partition info that got trashed, not the filesystem.
>
>i use IDE disks.
>
>this makes development under 2.5.29 quite inconvenient - i have to boot
>back into another kernel whenever loading a new kernel.

Hi Ingo,

It might clarify things if you could identify:

o the last version of 2.5 that worked for you,
o the version of 2.4 that works for you,
o the version of lilo that you are using for all of this.

Back in May, I experienced some similar problem and discussed
it with John Coffman, the lilo maintainer, whom I am cc'ing.

I'll just quote two parts of an email that he sent me during
our discussion. It's a little more relevant to your message if I
quote them out of order:

| The head/sector mismatch check (fn 8h/fn 48h) has actually been in LILO
| since last year (22.0), and the (kernel/bios) check since 22.2. It has
| only been seriously visible since the introduction of the 2.4.18 kernel.
| The IDE disk drivers are now reporting actual IDE disk geometry, rather
| than the mapped BIOS geometry, which was reported by all previous kernels.
| This change in the results returned by the IOCTL used to get the disk
| geometry has been extremely annoying. It also leads to complaints about
| the format of the partition table.

Earlier in that seem email, he indicated that he was
thinking about giving precedence to the BIOS geometry in future
versions of lilo (this was 22.3, and I believe the current version is
now 22.3.1):

| Actually, on serious reflection on the issue, there is no choice: the
| value returned by (int 13h/fn 8h) should be used, if it is available. This
| is the value used by the conversion routine (linear/lba32 -> geometric) in
| the boot loader (read.S). Currently, the kernel value is given precedence;
| I am seriously reviewing this issue.

I just wonder if this is the problem that you are experiencing
rather than anything that was new in 2.5.29.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-08-08 09:29:01

by Ingo Molnar

[permalink] [raw]
Subject: Re: [bug, 2.5.29, IDE] partition table corruption?


On Thu, 8 Aug 2002, Marcin Dalecki wrote:

> > | the boot loader (read.S). Currently, the kernel value is given precedence;
> > | I am seriously reviewing this issue.
> >
> > I just wonder if this is the problem that you are experiencing
> > rather than anything that was new in 2.5.29.
>
> Yes.

folks, please keep in mind that this is a system that i just dont
reconfigure at whim. It's a proven, known system i use for testing and
nothing else. Suddenly it stopped working somewhere between 2.5.20 and
2.5.30. No lilo upgrade, no nothing, 2 years old binaries:

[mingo@a mingo]$ ls -l /sbin/lilo
-rwxr-xr-x 1 root root 59324 Aug 23 2000 /sbin/lilo

Ingo

2002-08-08 09:26:17

by Marcin Dalecki

[permalink] [raw]
Subject: Re: [bug, 2.5.29, IDE] partition table corruption?

U?ytkownik Adam J. Richter napisa?:

>
> | The head/sector mismatch check (fn 8h/fn 48h) has actually been in LILO
> | since last year (22.0), and the (kernel/bios) check since 22.2. It has
> | only been seriously visible since the introduction of the 2.4.18 kernel.
> | The IDE disk drivers are now reporting actual IDE disk geometry, rather
> | than the mapped BIOS geometry, which was reported by all previous kernels.
> | This change in the results returned by the IOCTL used to get the disk
> | geometry has been extremely annoying. It also leads to complaints about
> | the format of the partition table.
>
> Earlier in that seem email, he indicated that he was
> thinking about giving precedence to the BIOS geometry in future
> versions of lilo (this was 22.3, and I believe the current version is
> now 22.3.1):

How did he think DOS does disk access during dos fdisk time before?
As far as I can see lilo is relying on the BIOS during the "dot printing
phase" anyway so this should have been this way since day one of lilo.
*Not* the other way around.

> | Actually, on serious reflection on the issue, there is no choice: the
> | value returned by (int 13h/fn 8h) should be used, if it is available. This
> | is the value used by the conversion routine (linear/lba32 -> geometric) in
> | the boot loader (read.S). Currently, the kernel value is given precedence;
> | I am seriously reviewing this issue.
>
> I just wonder if this is the problem that you are experiencing
> rather than anything that was new in 2.5.29.

Yes.

2002-08-08 09:36:25

by Marcin Dalecki

[permalink] [raw]
Subject: Re: [bug, 2.5.29, IDE] partition table corruption?

Uz.ytkownik Ingo Molnar napisa?:
> On Thu, 8 Aug 2002, Marcin Dalecki wrote:
>
>
>>>| the boot loader (read.S). Currently, the kernel value is given precedence;
>>>| I am seriously reviewing this issue.
>>>
>>> I just wonder if this is the problem that you are experiencing
>>>rather than anything that was new in 2.5.29.
>>
>>Yes.
>
>
> folks, please keep in mind that this is a system that i just dont
> reconfigure at whim. It's a proven, known system i use for testing and
> nothing else. Suddenly it stopped working somewhere between 2.5.20 and
> 2.5.30. No lilo upgrade, no nothing, 2 years old binaries:
>
> [mingo@a mingo]$ ls -l /sbin/lilo
> -rwxr-xr-x 1 root root 59324 Aug 23 2000 /sbin/lilo

Yes sure. It is simply a very old bug in lilo, which the kernel worked
around and did fight against in a diallectic way.

2002-08-08 09:59:33

by Ingo Molnar

[permalink] [raw]
Subject: Re: [bug, 2.5.29, IDE] partition table corruption?


On Thu, 8 Aug 2002, Marcin Dalecki wrote:

> > folks, please keep in mind that this is a system that i just dont
> > reconfigure at whim. It's a proven, known system i use for testing and
> > nothing else. Suddenly it stopped working somewhere between 2.5.20 and
> > 2.5.30. No lilo upgrade, no nothing, 2 years old binaries:
> >
> > [mingo@a mingo]$ ls -l /sbin/lilo
> > -rwxr-xr-x 1 root root 59324 Aug 23 2000 /sbin/lilo
>
> Yes sure. It is simply a very old bug in lilo, which the kernel worked
> around and did fight against in a diallectic way.

just tested 2.5.29-vanilla, it works, without and with linear.
2.5.30-vanilla is broken without linear, works with linear.

i dont mind who's at fault, but generally we just dont break working
systems, no matter how good the reason. Somehow communicate with the lilo
folks to handle this in a smooth way. Being able to boot trumps purity, no
matter what.

Ingo


2002-08-08 11:07:26

by Alan

[permalink] [raw]
Subject: Re: [bug, 2.5.29, IDE] partition table corruption?

On Thu, 2002-08-08 at 10:34, Marcin Dalecki wrote:
> > [mingo@a mingo]$ ls -l /sbin/lilo
> > -rwxr-xr-x 1 root root 59324 Aug 23 2000 /sbin/lilo
>
> Yes sure. It is simply a very old bug in lilo, which the kernel worked
> around and did fight against in a diallectic way.

Its not a bug in lilo. Its a bug in the new kernel. Breaking backward
compatibility arbitarily is bad. The kernel needs to know geometry
anyway for the folks who have force ide translation

2002-08-08 11:25:03

by Marcin Dalecki

[permalink] [raw]
Subject: Re: [bug, 2.5.29, IDE] partition table corruption?

U?ytkownik Alan Cox napisa?:
> On Thu, 2002-08-08 at 10:34, Marcin Dalecki wrote:
>
>>> [mingo@a mingo]$ ls -l /sbin/lilo
>>> -rwxr-xr-x 1 root root 59324 Aug 23 2000 /sbin/lilo
>>
>>Yes sure. It is simply a very old bug in lilo, which the kernel worked
>>around and did fight against in a diallectic way.
>
>
> Its not a bug in lilo. Its a bug in the new kernel. Breaking backward
> compatibility arbitarily is bad. The kernel needs to know geometry
> anyway for the folks who have force ide translation

1. Requiring the kernel to read the partition table information is a
BUG.

2. Falling back on the values which are used by the application
afterwards is a BUG. (BIOS IRQ after all)

3. Not detecting LBA disk access is required by checking Cylinder value
to emulate BIOS behaviour is a BUG.

4. Asking the kernel to kindly avoid 100% partition table scanning and
*guessing* some *heuristic* values which fail frequently enough is a
BUG. (Take a look at the jumps and hops in the function in question if
you don't think it is guessing. I recommend the switch in esp.)

5. Relying on the kernel for the translation "trick" himself (if
anything) is a BUG.

6. It's after all no more inconvenient then renaming well for example
the USB host controller module.

7. It is *not* breaking backward compatibility. After the lilo
configuration fix the old kernel boots fine as well.

Not reading confusing lilo docs which should better say what to do is a
BUG.

BTW.> Silly RH beta fdisk did tell me bogous things about the
geometry of disks I did install under plain RH 7.3...

BTW.> And finally what about dd if=/dev/hda of=/dev/hdb?

2002-08-08 11:42:05

by Andries Brouwer

[permalink] [raw]
Subject: Re: [bug, 2.5.29, IDE] partition table corruption?

On Thu, Aug 08, 2002 at 12:01:27PM +0200, Ingo Molnar wrote:

> just tested 2.5.29-vanilla, it works, without and with linear.
> 2.5.30-vanilla is broken without linear, works with linear.

Excellent, precisely as expected.


[Yes, I don't know why an entire thread blossomed up around
something very well understood. Many wrong statements were
made, but I am too lazy to go and refute them all.

Concerning what happened: Long ago, the kernel did some more or
less silly things to get a geometry - for example, asking the
BIOS and the disk; but that did not produce the desired results
and someone added (in 1.3.61) some code to override the geometry
found this way by something guessed from the partition table,
in the good old tradition: is there a bug in a user space program?
fix it in the kernel. That is the [PTBL] you see in kernel boot
messages. Now that this kernel fix has been removed in a small
cleanup operation, lilo will have to be fixed, or people will
have to invoke lilo with linear, or so.

The full truth is more complicated, but this is the full
truth in your case.]

Andries

2002-08-08 12:00:34

by Alan

[permalink] [raw]
Subject: Re: [bug, 2.5.29, IDE] partition table corruption?

On Thu, 2002-08-08 at 12:22, Marcin Dalecki wrote:
> 1. Requiring the kernel to read the partition table information is a
> BUG.

We read it anyway to read the partitions for the block layer.

> 2. Falling back on the values which are used by the application
> afterwards is a BUG. (BIOS IRQ after all)

Breaking code is not a bug

> 3. Not detecting LBA disk access is required by checking Cylinder value
> to emulate BIOS behaviour is a BUG.

So why did you take it out ?


Thank you for reminding me why I've given up on the 2.5 kernel ever
working. Fortunately free software is self correcting so the tree can be
forked into something workable

2002-08-08 12:26:07

by Andries E. Brouwer

[permalink] [raw]
Subject: Re: [bug, 2.5.29, IDE] partition table corruption?

> Its not a bug in lilo.

We disagree here.

> Its a bug in the new kernel.

We disagree again.

> Breaking backward compatibility arbitrarily is bad.

Of course.

> The kernel needs to know geometry anyway

Let me repeat: Geometry does not exist.
It is impossible to know something that does not exist.
I can boot seven different kernels on my present machine
and get seven different geometries for /dev/hda.
You see that even "backward compatibility" is a
dubious concept here. Compatibility with what?

Even the BIOS is not consistent, and different BIOS functions
report different geometries.

We have had layer upon layer of bandaids.
I think the time is long overdue to get rid of it all.

> for the folks who have force ide translation

Can you elaborate on what you mean by "force ide translation" ?

Andries

2002-08-08 12:20:56

by Marcin Dalecki

[permalink] [raw]
Subject: Re: [bug, 2.5.29, IDE] partition table corruption?

U?ytkownik Alan Cox napisa?:
>
> So why did you take it out ?

I say it n-th time already dd if=/dev/hda of=/dev/hdb should give
a prefect sector by sector disk clone.

The bugs should be fixed in lilo and not worked around in the kernel.
We have the goal (and in fact obligation for scalability issues) to move
partition scanning out of the kernel space.

Did you ever bother looking the the function in question?

Did you ever look at the missordered code in lilo I cited here?

Did you ever think about what to do about the recurring complains
from people about disk order differences between BIOS and Linux?

Or the whole GET_GEO_BIG confusion ioctl()?

Please compare it with the proper formulas provided at http://www.phoenix.com
in excellent white papers. (Which can't be linked to directly, since
they check the refferrer.)

It did contain 'heuristics" which stopped to annoy people just becouse
many have developed the immediate reflex of always adding the linear
parameter during lilo configuration already a very very long time ago
becouse anything else doesn't make much sense and disks have passed the
512MB or even 4G barrier quite a time ago.

2002-08-08 18:11:01

by Alan

[permalink] [raw]
Subject: Re: [bug, 2.5.29, IDE] partition table corruption?


[email protected]> <[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
X-Mailer: Ximian Evolution 1.0.3 (1.0.3-6)
Date: 08 Aug 2002 20:34:07 +0100
Message-Id: <[email protected]>
Mime-Version: 1.0

On Thu, 2002-08-08 at 13:18, Marcin Dalecki wrote:
> I say it n-th time already dd if=/dev/hda of=/dev/hdb should give
> a prefect sector by sector disk clone.

The specific case of the hack for EZdisk I have no issue with the
general problem I do

> The bugs should be fixed in lilo and not worked around in the kernel.
> We have the goal (and in fact obligation for scalability issues) to move
> partition scanning out of the kernel space.

Partition handling doesn't show up in any scalability benchmarks. Show
me an IBM 8 way with partition reading showing up in lockmeter

> Did you ever bother looking the the function in question?
Yes

> Did you ever look at the missordered code in lilo I cited here?

Did you ever think that it

> It did contain 'heuristics" which stopped to annoy people just becouse
> many have developed the immediate reflex of always adding the linear
> parameter during lilo configuration already a very very long time ago
> becouse anything else doesn't make much sense and disks have passed the
> 512MB or even 4G barrier quite a time ago.

So fix the heuristics to the official algorithm. Thats a good thing to
do and test in a development kernel tree.

And btw lots of people still use < 512Mb and < 4G disks. No doubt you'd
prefer your kernel only ran on a pentiumII or higher

Alan

2002-08-08 18:18:23

by Andries E. Brouwer

[permalink] [raw]
Subject: Re: [bug, 2.5.29, IDE] partition table corruption?

Oh please, stop this discussion.

Alan, you seem to think Marcin made this change. I did.
If something is wrong, tell me.

Marcin, you say many true things, many semi-true things,
and many false things. Life is easier if you let me
talk for myself.

Andries

2002-08-08 22:33:07

by John Coffman

[permalink] [raw]
Subject: Re: [bug, 2.5.29, IDE] partition table corruption?

As the LILO maintainer, I thank you for copying me on this topic. So here
are a few comments of my own:

CHS == (Cylinder:Head:Sector)


Neither the LILO boot installer (/sbin/lilo) nor the Linux protected mode
kernel, hence disk drivers, has access to the machine BIOS. The BIOS
return values for int 13h, fn 8 (Get drive parameters), which returns the
disk CHS geometry subject to the 1024 cylinder limit, is only available in
Real Mode.

A comment appeared in this thread referring to DOS, and partitioning with
it. DOS uses the BIOS for its disk I/O, hence, the CHS geometry which it
sees is that reported by the BIOS. There are many disks out there that
were originally partitioned under DOS.

The lilo boot loader must use the BIOS as its disk driver to access the
disk at boot time. It may be directed to use one of three addressing modes:

geometric - use int 13h, fn 2 (CHS read)
linear - convert 24-bit disk address to CHS, use int 13h, fn 2 as above.
lba32 - 32-bit disk addresses; IF AVAILABLE, use int 13h, fn 42h (EDD
packet call) to read disk; else convert disk address to CHS, use int 13h,
fn 2 as above.

All CHS addressing is subject to the 1024 cylinder limit. If a 'linear' or
'lba32' address is converted to CHS, the head/sector information need for
the conversion is obtained with int 13h, fn 8 (Get drive parameters).

The LILO boot installer really wants a look at the BIOS CHS geometric
information, especially if 'geometric' addressing is to be used. A real
mode V86 interface was tried on an experimental basis with late version
21.X.X boot installers. Unfortunately, it was found to have the
characteristic of hanging on too many systems. The approach was abandoned.

The present LILO distribution takes a peek at BIOS parameters using a Real
Mode kludge: just before starting a loaded kernel, the responses to a
bunch of BIOS calls is recorded in the first page of memory (<4k) where a
subsequent invocation of the boot installer (/sbin/lilo) can find the LILO
signature and checksum. This is a strange way to allow a protected mode
program access to the BIOS. This accounts for the "BIOS data check"
message of the present LILO distribution. There are two methods of
bypassing this code, in case of a buggy BIOS which hangs up, and two such
systems have been encountered.

This kludgy code does, however, work on most systems. The BIOS calls used
are only those specified by Microsoft as being among the ones which Windows
requires to function correctly. And care is exercised to detect any
returned error code, and to record this fact for the protected mode code
which uses the return values.

Ugly -- I would heartily agree. BUT IT WORKS. /sbin/lilo now has access
to the BIOS return values if the system was booted by a compatible version
of LILO. Sort of a chicken-and-egg problem.

This code is also included in one of the LILO diagnostic disks which may be
created from the source distribution. Occasionally, I have to refer
correspondents who are having particularly bad LILO problems to these
diagnostic disks. They are also a mechanism for me to learn about BIOS
problems on a much wider variety of systems than I have physical access to.

One thought I have is that it would be nice if this BIOS data check
(collection) code were part of the real mode kernel (setup.S). However, as
we are all painfully aware, there are many BIOS incompatibilities which
exist in the real world, so there must be a mechanism to bypass this code,
which I still consider dangerous. As I stated, LILO has two such bypass
mechanisms, one of which is automatic bypass after a failed boot, and one
of which is an obscure, but documented, 'lilo.conf' flag.

On todays' IDE disks, as on all SCSI disks, the topic of CHS geometry is
really irrelevant. However, it is a legacy of the evolution of the IBM PC
which we must live with.

--John Coffman






PGP encrypted e-mail preferred (http://www.pgpi.com)
My KeyID: E97AE783 (good until 31-Dec-2002)
Keyserver at http://web.mit.edu/network/pgp.html
LILO links at http://freshmeat.net

2002-08-09 00:08:34

by Thunder from the hill

[permalink] [raw]
Subject: Re: [bug, 2.5.29, IDE] partition table corruption?

Hi,

On Thu, 2002-08-08 at 13:18, Marcin Dalecki wrote:
> becouse anything else doesn't make much sense and disks have passed the
> 512MB or even 4G barrier quite a time ago.

I have lots of pseudo-diskless hosts. Whereever I didn't want to spend
money for a boot PROM, I've grabbed out disks of ~30M, and things worked
fine. If things don't, I also have a Busybox on those disks which can be
booted instead of the network. They're clients, routers, blah...

Thunder
--
.-../../-./..-/-..- .-./..-/.-.././.../.-.-.-

2002-08-09 06:03:26

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [bug, 2.5.29, IDE] partition table corruption?

Followup to: <[email protected]>
By author: John Coffman <[email protected]>
In newsgroup: linux.dev.kernel
>
> The lilo boot loader must use the BIOS as its disk driver to access the
> disk at boot time. It may be directed to use one of three addressing modes:
>
> geometric - use int 13h, fn 2 (CHS read)
> linear - convert 24-bit disk address to CHS, use int 13h, fn 2 as above.
> lba32 - 32-bit disk addresses; IF AVAILABLE, use int 13h, fn 42h (EDD
> packet call) to read disk; else convert disk address to CHS, use int 13h,
> fn 2 as above.
>
> All CHS addressing is subject to the 1024 cylinder limit. If a 'linear' or
> 'lba32' address is converted to CHS, the head/sector information need for
> the conversion is obtained with int 13h, fn 8 (Get drive parameters).
>

Why support geometric at all? Either "linear" or "lba32" should work
on all systems (otherwise (Win)DOS won't work either.)

-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <[email protected]>

2002-08-09 06:28:19

by Marcin Dalecki

[permalink] [raw]
Subject: Re: [bug, 2.5.29, IDE] partition table corruption?

U?ytkownik Alan Cox napisa?:

> And btw lots of people still use < 512Mb and < 4G disks. No doubt you'd
> prefer your kernel only ran on a pentiumII or higher

They still work.


2002-08-09 06:30:26

by Marcin Dalecki

[permalink] [raw]
Subject: Re: [bug, 2.5.29, IDE] partition table corruption?

Uz.ytkownik Thunder from the hill napisa?:
> Hi,
>
> On Thu, 2002-08-08 at 13:18, Marcin Dalecki wrote:
>
>>becouse anything else doesn't make much sense and disks have passed the
>>512MB or even 4G barrier quite a time ago.
>
>
> I have lots of pseudo-diskless hosts. Whereever I didn't want to spend
> money for a boot PROM, I've grabbed out disks of ~30M, and things worked
> fine. If things don't, I also have a Busybox on those disks which can be
> booted instead of the network. They're clients, routers, blah...

They still work becouse then lilo doesn't get confused.