2006-02-10 13:06:13

by Seewer Philippe

[permalink] [raw]
Subject: RFC: disk geometry via sysfs

Hello all!

I don't want to start another geometry war, but with the introduction of
the general getgeo function by Christoph Hellwig for all disks this
simply would become a matter of extending the basic gendisk block driver.

There are people out there (like me) who need to know about disk
geometry. But since this is clearly post 2.6.16 I prefer to ask here
before writing a patch...

Q1: Yes or No?
If no, the other questions do not apply

Q2: Where under sysfs?
Either do /sys/block/hdx/heads, /sys/block/hdx/sectors, etc. or should
there be a new sub-object like /sys/block/hdx/geometry/heads?

Q3: Writable?
Under some (weird) circumstances it would actually be quite nice to
overwrite the kernels idea of a disks geometry. This would require a
general function like setgeo. Acceptable?

Regards
Philippe Seewer


Subject: Re: RFC: disk geometry via sysfs

On 2/10/06, Seewer Philippe <[email protected]> wrote:
> Hello all!

Hi!

> I don't want to start another geometry war, but with the introduction of
> the general getgeo function by Christoph Hellwig for all disks this
> simply would become a matter of extending the basic gendisk block driver.
>
> There are people out there (like me) who need to know about disk
> geometry. But since this is clearly post 2.6.16 I prefer to ask here
> before writing a patch...
>
> Q1: Yes or No?
> If no, the other questions do not apply

Yes?

> Q2: Where under sysfs?
> Either do /sys/block/hdx/heads, /sys/block/hdx/sectors, etc. or should
> there be a new sub-object like /sys/block/hdx/geometry/heads?

IMO /sys/block/hdx/sectors could be misleading
therefore /sys/block/hdx/geometry/ would be better

> Q3: Writable?
> Under some (weird) circumstances it would actually be quite nice to
> overwrite the kernels idea of a disks geometry. This would require a
> general function like setgeo. Acceptable?

Don't know. Maybe you should make it into separate patch
(incremental to basic functionality) so it can be decided later.

Cheers,
Bartlomiej

2006-02-13 16:33:08

by Phillip Susi

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs

Seewer Philippe wrote:
> Hello all!
>
> I don't want to start another geometry war, but with the introduction of
> the general getgeo function by Christoph Hellwig for all disks this
> simply would become a matter of extending the basic gendisk block driver.
>
> There are people out there (like me) who need to know about disk
> geometry. But since this is clearly post 2.6.16 I prefer to ask here
> before writing a patch...
>

Why do you need to know about geometry? Geometry is a useless fiction
that only still exists in PC system BIOS for the sake of backward
compatibility with software that was originally designed to operate with
MFM and RLL disks that actually used geometric addressing. These days
there is no such thing; it's just made up by the bios.
> Q1: Yes or No?
> If no, the other questions do not apply
>
> Q2: Where under sysfs?
> Either do /sys/block/hdx/heads, /sys/block/hdx/sectors, etc. or should
> there be a new sub-object like /sys/block/hdx/geometry/heads?
>

This is not suitable because block devices may not be bios accessible,
and thus, nowhere to get any bogus geometry information from. Even if
it is, do we really want to be calling the bios to get this information
and keep it around?

> Q3: Writable?
> Under some (weird) circumstances it would actually be quite nice to
> overwrite the kernels idea of a disks geometry. This would require a
> general function like setgeo. Acceptable?
>
>

What for? The only purpose to geometry is bios compatibility. Changing
the kernel's copy of the values won't do any good because the bios won't
be changed.


2006-02-13 19:03:37

by Seewer Philippe

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs



Phillip Susi wrote:
> Seewer Philippe wrote:
>
>> Hello all!
>>
>> I don't want to start another geometry war, but with the introduction of
>> the general getgeo function by Christoph Hellwig for all disks this
>> simply would become a matter of extending the basic gendisk block driver.
>>
>> There are people out there (like me) who need to know about disk
>> geometry. But since this is clearly post 2.6.16 I prefer to ask here
>> before writing a patch...
>>
>
>
> Why do you need to know about geometry? Geometry is a useless fiction
> that only still exists in PC system BIOS for the sake of backward
> compatibility with software that was originally designed to operate with
> MFM and RLL disks that actually used geometric addressing. These days
> there is no such thing; it's just made up by the bios.

...Thats why I said i didn't want to start another geometry war. But
then again, I did write RFC too, yes?

Yes, geometry is a fiction. And a bad one at that. To be honest I'd
rather get rid of it completely. But you said it: The geometry still
exists for the sake of backward compatibility. If it is still there, why
not export it? That's what sysfs is for...

Additionally have a look at libata-scsi.c which is part of the SATA
implementation. Theres CHS code in there...

Personally I want the geometry information in sysfs because debugging
partition tables not written by linux tools becomes just that tad more
easier...

>
>> Q1: Yes or No?
>> If no, the other questions do not apply
>>
>> Q2: Where under sysfs?
>> Either do /sys/block/hdx/heads, /sys/block/hdx/sectors, etc. or should
>> there be a new sub-object like /sys/block/hdx/geometry/heads?
>>
>
>
> This is not suitable because block devices may not be bios accessible,
> and thus, nowhere to get any bogus geometry information from. Even if
> it is, do we really want to be calling the bios to get this information
> and keep it around?
I did not say I'd implement it for _all_ devices. In fact I indent to
make geometry available only for devices whose drivers provide the
getgeo function.

>
>> Q3: Writable?
>> Under some (weird) circumstances it would actually be quite nice to
>> overwrite the kernels idea of a disks geometry. This would require a
>> general function like setgeo. Acceptable?
>>
>>
>
>
> What for? The only purpose to geometry is bios compatibility. Changing
> the kernel's copy of the values won't do any good because the bios won't
> be changed.


Exactly. I don't want the kernel to fix BIOS problems. But i want to
give userland the opportunity to overwrite what the kernel thinks (as in
/proc/ide/hdx/settings).
One example where this might be usable is connecting a PATA drive using
an Adapter to SATA. PATA returns the drive's geometry. SATA defaults to
x/255/63...

2006-02-13 19:22:41

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs


On Mon, 13 Feb 2006, Seewer Philippe wrote:

>
>
> Phillip Susi wrote:
>> Seewer Philippe wrote:
>>
>>> Hello all!
>>>
>>> I don't want to start another geometry war, but with the introduction of
>>> the general getgeo function by Christoph Hellwig for all disks this
>>> simply would become a matter of extending the basic gendisk block driver.
>>>
>>> There are people out there (like me) who need to know about disk
>>> geometry. But since this is clearly post 2.6.16 I prefer to ask here
>>> before writing a patch...
>>>
>>
>>
>> Why do you need to know about geometry? Geometry is a useless fiction
>> that only still exists in PC system BIOS for the sake of backward
>> compatibility with software that was originally designed to operate with
>> MFM and RLL disks that actually used geometric addressing. These days
>> there is no such thing; it's just made up by the bios.
>
> ...Thats why I said i didn't want to start another geometry war. But
> then again, I did write RFC too, yes?
>
> Yes, geometry is a fiction. And a bad one at that. To be honest I'd
> rather get rid of it completely. But you said it: The geometry still
> exists for the sake of backward compatibility. If it is still there, why
> not export it? That's what sysfs is for...
>
> Additionally have a look at libata-scsi.c which is part of the SATA
> implementation. Theres CHS code in there...
>
> Personally I want the geometry information in sysfs because debugging
> partition tables not written by linux tools becomes just that tad more
> easier...
>

You can make your own:

Pretend a sector is 512 bytes.
Use the maximum number of cylinders of either 65535 or 1024
Use the maximum number of sectors up to 255
Use the maxumum number of heads up to 255


Try the above with 1024 cylinders first. If it doesn't fit, use
65535. That's all the BIOS does. It's just used to fit the
stuff into registers for 16-bit BIOS calls (see int 0x13).

>>
>>> Q1: Yes or No?
>>> If no, the other questions do not apply
>>>
>>> Q2: Where under sysfs?
>>> Either do /sys/block/hdx/heads, /sys/block/hdx/sectors, etc. or should
>>> there be a new sub-object like /sys/block/hdx/geometry/heads?
>>>
>>
>>
>> This is not suitable because block devices may not be bios accessible,
>> and thus, nowhere to get any bogus geometry information from. Even if
>> it is, do we really want to be calling the bios to get this information
>> and keep it around?
> I did not say I'd implement it for _all_ devices. In fact I indent to
> make geometry available only for devices whose drivers provide the
> getgeo function.
>
>>
>>> Q3: Writable?
>>> Under some (weird) circumstances it would actually be quite nice to
>>> overwrite the kernels idea of a disks geometry. This would require a
>>> general function like setgeo. Acceptable?
>>>
>>>
>>
>>
>> What for? The only purpose to geometry is bios compatibility. Changing
>> the kernel's copy of the values won't do any good because the bios won't
>> be changed.
>
>
> Exactly. I don't want the kernel to fix BIOS problems. But i want to
> give userland the opportunity to overwrite what the kernel thinks (as in
> /proc/ide/hdx/settings).
> One example where this might be usable is connecting a PATA drive using
> an Adapter to SATA. PATA returns the drive's geometry. SATA defaults to
> x/255/63...
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.13.4 on an i686 machine (5589.66 BogoMips).
Warning : 98.36% of all statistics are fiction.
_


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2006-02-13 19:35:13

by Phillip Susi

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs

Seewer Philippe wrote:
> ...Thats why I said i didn't want to start another geometry war. But
> then again, I did write RFC too, yes?
>
> Yes, geometry is a fiction. And a bad one at that. To be honest I'd
> rather get rid of it completely. But you said it: The geometry still
> exists for the sake of backward compatibility. If it is still there, why
> not export it? That's what sysfs is for...
>

Because AFAIK, the kernel does not know the geometry. To find out you
have to ask the bios, and calling the bios is a no-no. That's assuming
the disk is even visible to the bios.
> Additionally have a look at libata-scsi.c which is part of the SATA
> implementation. Theres CHS code in there...
>
> Personally I want the geometry information in sysfs because debugging
> partition tables not written by linux tools becomes just that tad more
> easier...
>
>
The geometry expressed in the partition table by whatever utility
created it will be the geometry that the bios reported the disk had at
the time the tool created the MBR, which can change if you plug the disk
into another machine with different bios. If there is already geometry
in the MBR, then you should use those values and take them at face value.

> I did not say I'd implement it for _all_ devices. In fact I indent to
> make geometry available only for devices whose drivers provide the
> getgeo function.
>
>
AFAIK, most drivers do provide a getgeo function... but do they get the
information from the bios at boot time, or do they make it up? If it is
just made up anyhow, then why bother? I am not sure how it could even
be possible to get the information from the bios at boot time, and
figure out the correct mapping between bios devices and the real
hardware, which the driver is talking to. Since the geometry is
entirely made up anyhow why bother asking the driver to make it up when
that can be done in user space just as easily?
> Exactly. I don't want the kernel to fix BIOS problems. But i want to
> give userland the opportunity to overwrite what the kernel thinks (as in
> /proc/ide/hdx/settings).
> One example where this might be usable is connecting a PATA drive using
> an Adapter to SATA. PATA returns the drive's geometry. SATA defaults to
> x/255/63...
But neither one is any more correct than the other. Both are bogus
values, so what does it matter which is used? What good would having
this value stored in the kernel do? The kernel itself does not use it.
User mode tools that for some reason need to talk about it can just use
the values stored in the MBR.



2006-02-13 19:37:46

by Phillip Susi

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs

linux-os (Dick Johnson) wrote:
> You can make your own:
>
> Pretend a sector is 512 bytes.
> Use the maximum number of cylinders of either 65535 or 1024
> Use the maximum number of sectors up to 255
> Use the maxumum number of heads up to 255
>
>
> Try the above with 1024 cylinders first. If it doesn't fit, use
> 65535. That's all the BIOS does. It's just used to fit the
> stuff into registers for 16-bit BIOS calls (see int 0x13).
>
>

Actually, different bioses do it in different ways, that is just one way
( and possibly the most popular ). The same bios can even do it
differently depending on what options are selected in the bios setup.
Of course, this only effects Microsoft operating systems because
everyone else is sane and supports LBA.

2006-02-14 16:36:05

by Seewer Philippe

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs


Phillip Susi wrote:
> linux-os (Dick Johnson) wrote:
>
>> You can make your own:
>>
>> Pretend a sector is 512 bytes.
>> Use the maximum number of cylinders of either 65535 or 1024
>> Use the maximum number of sectors up to 255
>> Use the maxumum number of heads up to 255
>>
>>
>> Try the above with 1024 cylinders first. If it doesn't fit, use
>> 65535. That's all the BIOS does. It's just used to fit the
>> stuff into registers for 16-bit BIOS calls (see int 0x13).
>>
>>
>
>
> Actually, different bioses do it in different ways, that is just one way
> ( and possibly the most popular ). The same bios can even do it
> differently depending on what options are selected in the bios setup.
> Of course, this only effects Microsoft operating systems because
> everyone else is sane and supports LBA.

With emphasis on _sane_

2006-02-14 18:20:18

by Phillip Susi

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs

Seewer Philippe wrote:
>
> IDE tries to return the actual hardware geometry. Most other drivers
> implement a "fake". Or try to guess the geometry from the MBR...
>

But there is no "actual hardware geometry". IDE disks can report a
geometry, but that is no more real than any other made up geometry. If
you take the geometry that the disk itself reports and write that to the
MBR, then software that actually uses the geometry ( i.e. non LBA boot
loaders ) will fail because it is not the geometry that the bios uses.

The only remaining purpose to geometry values that I see is to store in
the MBR for non LBA boot loaders to use. Since they must have the
values the bios uses, then you need to get the values from the bios when
creating such an MBR.

> My personal answer is here: Because there are so many tools around which
> use the kernel values, that it is easier to overwrite the kernel than
> patch all other software... (i know, i know...)

The only tools that I am aware of are boot loaders and disk
partitioners, and these tools do not need the geometry, they just try to
get it to maintain compatibility with ancient systems. As such, it is
long past time for them to no longer require this information.

>
> And additionally: When partitioning its sometimes necessary or safer to
> write a whole new mbr (dd if=... of=... ; parted mklabel msdos). When
> dd'ing the mbr goes away. And some drivers return geometry based on the
> mbr...... So overwriting these values might come handy.
>

But what would you overwrite them with? The only values that have any
actual use are the ones from the bios. If you get the values from the
bios, it makes no sense to change them later.

2006-02-15 07:57:19

by Seewer Philippe

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs



Bartlomiej Zolnierkiewicz wrote:
>>
>>Q1: Yes or No?
>>If no, the other questions do not apply
>
>
> Yes?
>
>
>>Q2: Where under sysfs?
>>Either do /sys/block/hdx/heads, /sys/block/hdx/sectors, etc. or should
>>there be a new sub-object like /sys/block/hdx/geometry/heads?
>
>
> IMO /sys/block/hdx/sectors could be misleading
> therefore /sys/block/hdx/geometry/ would be better
>
>
>>Q3: Writable?
>>Under some (weird) circumstances it would actually be quite nice to
>>overwrite the kernels idea of a disks geometry. This would require a
>>general function like setgeo. Acceptable?
>
>
> Don't know. Maybe you should make it into separate patch
> (incremental to basic functionality) so it can be decided later.
>
> Cheers,
> Bartlomiej

Hi Bartlomiej

Thanks for your feedback. I'm currently testing the read export and it
seems to work fine.

If possible i'd like your opinion about how to implement write support.
I see 3 possibilities:
-Extend the gendisk struct by geometry information. If the user
overwrites the geometry, values from there are returned instead of
calling getgeo. This is the easiest way, because nothing has to be done
with subsystem drivers. On the other hand, if by chance a driver really
uses the geometry values he'll never know...
-Introduce a setgeo function as a companion to getgeo. Values under
sysfs will only be writable if the underlying drivers supplies this and
all writes will be delegated there. Drawback: Driver maintainers need to
think about this.
-The third way would be to combine both. Store the geometry in gendisk
if no setgeo is provided...

What do you think?

Thank you
Philippe Seewer

2006-02-15 08:39:40

by Seewer Philippe

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs


Phillip Susi wrote:
> Seewer Philippe wrote:
>
>>
>> IDE tries to return the actual hardware geometry. Most other drivers
>> implement a "fake". Or try to guess the geometry from the MBR...
>>
>
> But there is no "actual hardware geometry". IDE disks can report a
> geometry, but that is no more real than any other made up geometry. If
> you take the geometry that the disk itself reports and write that to the
> MBR, then software that actually uses the geometry ( i.e. non LBA boot
> loaders ) will fail because it is not the geometry that the bios uses.
>
> The only remaining purpose to geometry values that I see is to store in
> the MBR for non LBA boot loaders to use. Since they must have the
> values the bios uses, then you need to get the values from the bios when
> creating such an MBR.
>
>> My personal answer is here: Because there are so many tools around which
>> use the kernel values, that it is easier to overwrite the kernel than
>> patch all other software... (i know, i know...)
>
>
> The only tools that I am aware of are boot loaders and disk
> partitioners, and these tools do not need the geometry, they just try to
> get it to maintain compatibility with ancient systems. As such, it is
> long past time for them to no longer require this information.
>
>>
>> And additionally: When partitioning its sometimes necessary or safer to
>> write a whole new mbr (dd if=... of=... ; parted mklabel msdos). When
>> dd'ing the mbr goes away. And some drivers return geometry based on the
>> mbr...... So overwriting these values might come handy.
>>
>
> But what would you overwrite them with? The only values that have any
> actual use are the ones from the bios. If you get the values from the
> bios, it makes no sense to change them later.
>
Hi Phillip

I'd like to close this discussion if possible.

I think we both know that disk geometry is a fiction and except for a
few "older" devices which still need support, Linux couldn't care less
about it (and in an ideal world this would include myself).

On the other hand, at least in the x86 world, we must live with the fact
that there are other os around, which, as you so aptly put, aren't sane.
In order to work with them and if necessary to fix things, geometry
information is necessary. One part is the bios geometry, available
through edd or other means. The other part is the geometry the kernel
exports (whatever sane values it contains or where they come from).

Both are necessary for debugging and fixing. And sometimes it actually
makes sense to overwrite the kernel with values that are "compatible".
Whether gleaned from the bios via edd or computed by hand does not
matter as long as the user has to it by himself. I've given a few
examples for this, others can be found by googling (For example the ide
disk geometry rewrite for http://unattended.sourceforge.net).

I completely agree with all that the kernel should never try to report
bios geometry for a disk unless absolutely necessary and should not
attempt to fix things automagically.

But, as long as the Linux kernel does something with disk geometry, and
this could mean just returning some bogus values, it makes sense to
export these values read/write in sysfs. Because we all know, sysfs is
much easier to handle than say for example ioctls.

Regards
Philippe Seewer

Subject: Re: RFC: disk geometry via sysfs

On 2/15/06, Seewer Philippe <[email protected]> wrote:
>
> Phillip Susi wrote:
> > Seewer Philippe wrote:
> >
> >>
> >> IDE tries to return the actual hardware geometry. Most other drivers
> >> implement a "fake". Or try to guess the geometry from the MBR...
> >>
> >
> > But there is no "actual hardware geometry". IDE disks can report a
> > geometry, but that is no more real than any other made up geometry. If
> > you take the geometry that the disk itself reports and write that to the
> > MBR, then software that actually uses the geometry ( i.e. non LBA boot
> > loaders ) will fail because it is not the geometry that the bios uses.
> >
> > The only remaining purpose to geometry values that I see is to store in
> > the MBR for non LBA boot loaders to use. Since they must have the
> > values the bios uses, then you need to get the values from the bios when
> > creating such an MBR.
> >
> >> My personal answer is here: Because there are so many tools around which
> >> use the kernel values, that it is easier to overwrite the kernel than
> >> patch all other software... (i know, i know...)
> >
> >
> > The only tools that I am aware of are boot loaders and disk
> > partitioners, and these tools do not need the geometry, they just try to
> > get it to maintain compatibility with ancient systems. As such, it is
> > long past time for them to no longer require this information.
> >
> >>
> >> And additionally: When partitioning its sometimes necessary or safer to
> >> write a whole new mbr (dd if=... of=... ; parted mklabel msdos). When
> >> dd'ing the mbr goes away. And some drivers return geometry based on the
> >> mbr...... So overwriting these values might come handy.
> >>
> >
> > But what would you overwrite them with? The only values that have any
> > actual use are the ones from the bios. If you get the values from the
> > bios, it makes no sense to change them later.
> >
> Hi Phillip
>
> I'd like to close this discussion if possible.
>
> I think we both know that disk geometry is a fiction and except for a
> few "older" devices which still need support, Linux couldn't care less
> about it (and in an ideal world this would include myself).
>
> On the other hand, at least in the x86 world, we must live with the fact
> that there are other os around, which, as you so aptly put, aren't sane.
> In order to work with them and if necessary to fix things, geometry
> information is necessary. One part is the bios geometry, available
> through edd or other means. The other part is the geometry the kernel
> exports (whatever sane values it contains or where they come from).
>
> Both are necessary for debugging and fixing. And sometimes it actually
> makes sense to overwrite the kernel with values that are "compatible".
> Whether gleaned from the bios via edd or computed by hand does not
> matter as long as the user has to it by himself. I've given a few
> examples for this, others can be found by googling (For example the ide
> disk geometry rewrite for http://unattended.sourceforge.net).
>
> I completely agree with all that the kernel should never try to report
> bios geometry for a disk unless absolutely necessary and should not
> attempt to fix things automagically.
>
> But, as long as the Linux kernel does something with disk geometry, and
> this could mean just returning some bogus values, it makes sense to
> export these values read/write in sysfs. Because we all know, sysfs is
> much easier to handle than say for example ioctls.

This made me thinking - if all the kernel does is returning some bogus
values and we need to fix applications to use sysfs interface why not
instead just fix applications to not use ioctl interface?

Bartlomiej

2006-02-15 09:02:06

by Seewer Philippe

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs



Bartlomiej Zolnierkiewicz wrote:
> On 2/15/06, Seewer Philippe <[email protected]> wrote:
>>
>>Hi Phillip
>>
>>I'd like to close this discussion if possible.
>>
>>I think we both know that disk geometry is a fiction and except for a
>>few "older" devices which still need support, Linux couldn't care less
>>about it (and in an ideal world this would include myself).
>>
>>On the other hand, at least in the x86 world, we must live with the fact
>>that there are other os around, which, as you so aptly put, aren't sane.
>>In order to work with them and if necessary to fix things, geometry
>>information is necessary. One part is the bios geometry, available
>>through edd or other means. The other part is the geometry the kernel
>>exports (whatever sane values it contains or where they come from).
>>
>>Both are necessary for debugging and fixing. And sometimes it actually
>>makes sense to overwrite the kernel with values that are "compatible".
>>Whether gleaned from the bios via edd or computed by hand does not
>>matter as long as the user has to it by himself. I've given a few
>>examples for this, others can be found by googling (For example the ide
>>disk geometry rewrite for http://unattended.sourceforge.net).
>>
>>I completely agree with all that the kernel should never try to report
>>bios geometry for a disk unless absolutely necessary and should not
>>attempt to fix things automagically.
>>
>>But, as long as the Linux kernel does something with disk geometry, and
>>this could mean just returning some bogus values, it makes sense to
>>export these values read/write in sysfs. Because we all know, sysfs is
>>much easier to handle than say for example ioctls.
>
>
> This made me thinking - if all the kernel does is returning some bogus
> values and we need to fix applications to use sysfs interface why not
> instead just fix applications to not use ioctl interface?
>
> Bartlomiej

Good point (and the one I was afraid of coming up).

This would mean dropping the HDIO_GETGEO ioctl completely and force
applications such as fdisk/sfdisk and even dosemu to determine disk
geometry for themselves. Which I think actually would be the most
correct approach.

But this would come to a similar situation as in the beginnings of 2.6
when we had partitioning problems because bios geometry support was
dropped.

That's something I don't have the guts to decide (and luckily can't), so
I'd rather go with sysfs and provide a means to be as compatible as
possible without doing anything automagically.


2006-02-15 14:03:42

by Alan

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs

On Mer, 2006-02-15 at 10:01 +0100, Seewer Philippe wrote:
> This would mean dropping the HDIO_GETGEO ioctl completely and force
> applications such as fdisk/sfdisk and even dosemu to determine disk
> geometry for themselves. Which I think actually would be the most
> correct approach.

In the IDE case the drive geometry has meaning in certain cases,
specifically the C/H/S drive addressing case with old old drives.


2006-02-15 14:16:27

by Seewer Philippe

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs



Alan Cox wrote:
> On Mer, 2006-02-15 at 10:01 +0100, Seewer Philippe wrote:
>
>>This would mean dropping the HDIO_GETGEO ioctl completely and force
>>applications such as fdisk/sfdisk and even dosemu to determine disk
>>geometry for themselves. Which I think actually would be the most
>>correct approach.
>
>
> In the IDE case the drive geometry has meaning in certain cases,
> specifically the C/H/S drive addressing case with old old drives.
>
>
Yes. But the addressing is abstracted by the kernel and we where talking
about dropping the getgeo ioctrl. Not geometry itself.

2006-02-15 15:12:24

by Alan

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs

On Mer, 2006-02-15 at 15:11 +0100, Seewer Philippe wrote:
> > In the IDE case the drive geometry has meaning in certain cases,
> > specifically the C/H/S drive addressing case with old old drives.
> >
> >
> Yes. But the addressing is abstracted by the kernel and we where talking
> about dropping the getgeo ioctrl. Not geometry itself.

The tools need to know the C/H/S drive addressing data for old drives
because it is used to determine partition tables. That doesn't have to
be GETGEO but it does need to exist somewhere.

2006-02-15 15:22:06

by Phillip Susi

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs

Alan Cox wrote:
> On Mer, 2006-02-15 at 10:01 +0100, Seewer Philippe wrote:
>
>> This would mean dropping the HDIO_GETGEO ioctl completely and force
>> applications such as fdisk/sfdisk and even dosemu to determine disk
>> geometry for themselves. Which I think actually would be the most
>> correct approach.
>>
>
> In the IDE case the drive geometry has meaning in certain cases,
> specifically the C/H/S drive addressing case with old old drives.

I thought that C/H/S addressing was purely a function of int 13, not the
hardware interface? If it is a function of some older hardware
interfaces, then we are still talking about two different, and likely
incompatible geometries: the one the disk reports, and the one the bios
reports. The values in the MBR must be the values the bios reports.


2006-02-15 15:30:11

by Phillip Susi

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs

Alan Cox wrote:
> The tools need to know the C/H/S drive addressing data for old drives
> because it is used to determine partition tables. That doesn't have to
> be GETGEO but it does need to exist somewhere.
Currently GETGEO very often does not report the same values of the bios
doesn't it? For some disks it's completely made up, and for others it
is the value returned by the drive itself, which often differs from the
bios values. If this is the case, and it is the bios values that must
be stored in the MBR, then it makes little sense to have GETGEO seeing
as how it often provides incorrect information.

Wouldn't it be better then, to clean up GETGEO everywhere so that unless
it has correct values from the bios, it should just fail? And leave it
up to fdisk and friends to inform the user of that failure, choose
default values, and allow the user to override those defaults should
they need to?

The only time they would even have to worry about it is if they are
installing linux on a blank disk, and then want to install windows to
dual boot with it. In that case they might have to correct the CHS
values in the MBR to match the values the bios provides.


2006-02-15 16:03:43

by Alan

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs

On Mer, 2006-02-15 at 10:20 -0500, Phillip Susi wrote:
> I thought that C/H/S addressing was purely a function of int 13, not the
> hardware interface? If it is a function of some older hardware
> interfaces, then we are still talking about two different, and likely
> incompatible geometries: the one the disk reports, and the one the bios
> reports. The values in the MBR must be the values the bios reports.

We have at least three

Disk reported C/H/S
BIOS reported C/H/S (hda/hdb only)
Actual C/H/S (if it exists)
Partition table C/H/S

A partitioning tool needs to know
Disk reported C/H/S
Partition table C/H/S
Preferably BIOS reported C/H/S if there is one

The partition table C/H/S is on disk so trivial
The disk reported ones are in the identify block so could be pulled via
/proc and sysfs
The BIOS one is PC specific low memory poking around

I agree entirely that HD_GETGEO itself shouldn't matter.

2006-02-15 16:22:00

by Phillip Susi

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs

Alan Cox wrote:
>
> We have at least three
>
> Disk reported C/H/S
> BIOS reported C/H/S (hda/hdb only)
> Actual C/H/S (if it exists)
> Partition table C/H/S
>
> A partitioning tool needs to know
> Disk reported C/H/S
> Partition table C/H/S
> Preferably BIOS reported C/H/S if there is one
>

Why do you say the partitioning tool needs to know the disk reported
C/H/S? The value stored in the MBR must match the bios reported values,
not the disk reported ones, so why does the partitioner care about what
the disk reports?

> The partition table C/H/S is on disk so trivial
> The disk reported ones are in the identify block so could be pulled via
> /proc and sysfs
> The BIOS one is PC specific low memory poking around
>
> I agree entirely that HD_GETGEO itself shouldn't matter.
>

2006-02-15 17:29:36

by Alan

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs

On Mer, 2006-02-15 at 11:20 -0500, Phillip Susi wrote:
> Why do you say the partitioning tool needs to know the disk reported
> C/H/S? The value stored in the MBR must match the bios reported values,
> not the disk reported ones, so why does the partitioner care about what
> the disk reports?

You answered that in asking the question. "The value stored in the MBR
must match the ...". What if the MBR has not yet been written ?

(Also btw its *should*...) most modern OS's will take a sane MBR
geometry and trust it over BIOS defaults.

Alan

2006-02-15 18:44:33

by Phillip Susi

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs

Alan Cox wrote:
> On Mer, 2006-02-15 at 11:20 -0500, Phillip Susi wrote:
>> Why do you say the partitioning tool needs to know the disk reported
>> C/H/S? The value stored in the MBR must match the bios reported values,
>> not the disk reported ones, so why does the partitioner care about what
>> the disk reports?
>
> You answered that in asking the question. "The value stored in the MBR
> must match the ...". What if the MBR has not yet been written ?
>

The value in the MBR must match the _bios_ value, not the value that the
disk reports in its inquiry command ( which often will be different ).
When creating a new MBR you need to get the geometry from the bios, not
the drive itself.

> (Also btw its *should*...) most modern OS's will take a sane MBR
> geometry and trust it over BIOS defaults.
>

If the value in the MBR differs from the geometry that the bios reports,
then boot code using int 13 in chs mode will fail because it won't be
using the geometry that the bios expects.


2006-02-15 19:23:44

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs


On Wed, 15 Feb 2006, Phillip Susi wrote:

> Alan Cox wrote:
>> On Mer, 2006-02-15 at 11:20 -0500, Phillip Susi wrote:
>>> Why do you say the partitioning tool needs to know the disk reported
>>> C/H/S? The value stored in the MBR must match the bios reported values,
>>> not the disk reported ones, so why does the partitioner care about what
>>> the disk reports?
>>
>> You answered that in asking the question. "The value stored in the MBR
>> must match the ...". What if the MBR has not yet been written ?
>>
>
> The value in the MBR must match the _bios_ value, not the value that the
> disk reports in its inquiry command ( which often will be different ).
> When creating a new MBR you need to get the geometry from the bios, not
> the drive itself.
>
>> (Also btw its *should*...) most modern OS's will take a sane MBR
>> geometry and trust it over BIOS defaults.
>>
>
> If the value in the MBR differs from the geometry that the bios reports,
> then boot code using int 13 in chs mode will fail because it won't be
> using the geometry that the bios expects.
>

If the disc is a modern disk, and the BIOS is modern as well,
it won't care. For instance, if we attempt to seek to cylinder
10, sector 10, and there are only 9 sectors, then the supplied
head number is incremented, the sector to be read becomes 1
(dumb ones based), and everything is fine. If the head number
can't be incremented, it wraps to 0. Problems occur if the BIOS
has been set to "physical" mode for access. In this mode, the
CHS are absolute and "you can't get there from here." In the
physical mode, you can't have more than 1024 cylinders because
they need to fit into 10 bits.

As long as the BIOS is set for LBA, the boot sequence should not
care.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.13.4 on an i686 machine (5589.66 BogoMips).
Warning : 98.36% of all statistics are fiction.
_


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2006-02-15 20:55:27

by Phillip Susi

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs

linux-os (Dick Johnson) wrote:
> If the disc is a modern disk, and the BIOS is modern as well,
> it won't care. For instance, if we attempt to seek to cylinder
> 10, sector 10, and there are only 9 sectors, then the supplied
> head number is incremented, the sector to be read becomes 1
> (dumb ones based), and everything is fine. If the head number
> can't be incremented, it wraps to 0. Problems occur if the BIOS
> has been set to "physical" mode for access. In this mode, the
> CHS are absolute and "you can't get there from here." In the
> physical mode, you can't have more than 1024 cylinders because
> they need to fit into 10 bits.
>
> As long as the BIOS is set for LBA, the boot sequence should not
> care.
>

Are you sure? Do all bioses perform this auto correction? I would have
thought that they would simply fail the request because you asked for a
sector or head that is outside the valid range. Even if some bioses
will accept illegal values and auto translate, I doubt that they all do.

And what if you error in the other direction? If the MBR lists a LOWER
number of heads than the bios thinks there is? In that case you're
going to ask the bios for a larger cylinder number, and it will happily
read a sector from the disk that is further from the start than you
intended.


2006-02-15 21:41:56

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs


On Wed, 15 Feb 2006, Phillip Susi wrote:

> linux-os (Dick Johnson) wrote:
>> If the disc is a modern disk, and the BIOS is modern as well,
>> it won't care. For instance, if we attempt to seek to cylinder
>> 10, sector 10, and there are only 9 sectors, then the supplied
>> head number is incremented, the sector to be read becomes 1
>> (dumb ones based), and everything is fine. If the head number
>> can't be incremented, it wraps to 0. Problems occur if the BIOS
>> has been set to "physical" mode for access. In this mode, the
>> CHS are absolute and "you can't get there from here." In the
>> physical mode, you can't have more than 1024 cylinders because
>> they need to fit into 10 bits.
>>
>> As long as the BIOS is set for LBA, the boot sequence should not
>> care.
>>
>
> Are you sure? Do all bioses perform this auto correction? I would have
> thought that they would simply fail the request because you asked for a
> sector or head that is outside the valid range. Even if some bioses
> will accept illegal values and auto translate, I doubt that they all do.
>
> And what if you error in the other direction? If the MBR lists a LOWER
> number of heads than the bios thinks there is? In that case you're
> going to ask the bios for a larger cylinder number, and it will happily
> read a sector from the disk that is further from the start than you
> intended.

Heads start at 0. Sectors start at 1. Cylinders start at 0.
A "lower head" than allowed would be 0xff so the BIOS wouldn't
know it was "lower". The BIOS doesn't look at the MBR for
normal read/write access! Only while booting does it
read the first sector of the master boot record (MBR) into
the appropriate physical place (0x7c00). Then it checks to see
if there is an 0xaa55 as the last word in the sector. If so,
it executes code starting at offset zero. Modern BIOS don't
even check the "boot flag" because it may be wrong, preventing
a boot.

Now, during the boot sequence, the BIOS via INT 0x13 or 0x40
will be called upon to read data into memory from various
offsets on the media. If the offsets are calculated in the
same way that they were calculated when the disk was initialized
as a boot disk, then everything is okay. The calculations of
offsets do not require the same C/H/S phony variables! One
only has to follow the correct rules. The rules are that
heads increment from 0, as do cylinders, and sectors start
at 1. Also "sectors" must be 512-byte intervals even though
the physical media may have 16 kilobyte sectors. Given
these rules, there are zillions of ways for one to arrive
at the correct offset. The interpretation will be correct
IFF the number of cylinders are extracted first, then the
number of heads (tracks), then the number of sectors, always
using the largest number that will fit into the BIOS
registers used to make that access.

In the case of "large media" access, the cylinders are
set to 0xffff. This triggers additional logic that invents
a new virtual sector length to accommodate.

The following is the __entire__ boot code for an IBM/PC
compatible BIOS! Constant "DISKS" is 0x13.

;
ALIGN 4
INT_19H:
STI
PUSH DS
PUSH ES
XOR DX,DX ; Get a zero
MOV DS,DX ; Set segments
MOV ES,DX ; DS = ES = 0
MOV CX,256 ; The IBM/AT bios clears 256 WORDS
MOV DI,7C00H ; Boot location
XOR AX,AX ; Get a zero
REP STOSW ; Clear that area.
XOR DX,DX ; Reset any floppy disk
XOR AX,AX ; Reset Disk subsystem
INT DISKS ; Ignore any error
MOV AX,0201H ; Read one sector
MOV BX,7C00H ; DS:BX points to buffer
MOV CX,1 ; First sector
XOR DX,DX ; First floppy disk
INT DISKS ; Disk vector
JC SHORT FAIL ; Can't read it
IPL: CMP WORD PTR [BX+1FEH],0AA55H
JNZ SHORT FAIL ; IPL bad
DB 0EAH ; Jmp FAR
DW 7C00H ; Offset
DW 0 ; Segment
FAIL: POP ES ; Restore segments
POP DS
IRET


Cheers,
Dick Johnson
Penguin : Linux version 2.6.13.4 on an i686 machine (5589.66 BogoMips).
Warning : 98.36% of all statistics are fiction.
_


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2006-02-15 22:44:58

by Phillip Susi

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs

linux-os (Dick Johnson) wrote:
> Heads start at 0. Sectors start at 1. Cylinders start at 0.
> A "lower head" than allowed would be 0xff so the BIOS wouldn't
> know it was "lower". The BIOS doesn't look at the MBR for
> normal read/write access! Only while booting does it
> read the first sector of the master boot record (MBR) into
> the appropriate physical place (0x7c00). Then it checks to see
> if there is an 0xaa55 as the last word in the sector. If so,
> it executes code starting at offset zero. Modern BIOS don't
> even check the "boot flag" because it may be wrong, preventing
> a boot.
>

I'm talking about the geometry of the disk. If the disk has 16 sectors
and 8 heads, then the maximum value allowed for any valid address is 16
in the sector field and 7 in the heads field. This influences the
translation to/from LBA. A sector with LBA of 1234 would have a CHS
address using this geometry of 9/5/3. If the disk reports a geometry of
x/8/16 but the bios is using a geometry of x/255/63, then when you pass
9/5/3 to int 13 it will fetch LBA 144902 which is clearly not going to
give you what you wanted.

This is why you must use the same geometry that the bios exposes, NOT
what the disk reports in its inquiry command. It is quite typical for a
disk to report that it uses 8 heads and has a number of cylinders that
is > 1024. The bios will typically present a view of the disk with 255
heads ( though it very well may use a smaller value ). If you generate
CHS addresses when you write the MBR using 8 heads, they will be wrong
when you try to pass them to the bios.

> Now, during the boot sequence, the BIOS via INT 0x13 or 0x40
> will be called upon to read data into memory from various
> offsets on the media. If the offsets are calculated in the
> same way that they were calculated when the disk was initialized
> as a boot disk, then everything is okay. The calculations of
> offsets do not require the same C/H/S phony variables! One

Yes, they do require the same geometry, see above.

> only has to follow the correct rules. The rules are that
> heads increment from 0, as do cylinders, and sectors start
> at 1. Also "sectors" must be 512-byte intervals even though
> the physical media may have 16 kilobyte sectors. Given
> these rules, there are zillions of ways for one to arrive
> at the correct offset. The interpretation will be correct
> IFF the number of cylinders are extracted first, then the
> number of heads (tracks), then the number of sectors, always
> using the largest number that will fit into the BIOS
> registers used to make that access.
>

The bios will not accept values that are larger than it's idea of the
disk's geometry. If the bios thinks the disk only has 8 heads, you
can't ask it to fetch a sector on head 17.

> In the case of "large media" access, the cylinders are
> set to 0xffff. This triggers additional logic that invents
> a new virtual sector length to accommodate.
>
> The following is the __entire__ boot code for an IBM/PC
> compatible BIOS! Constant "DISKS" is 0x13.
>

<snip>

Not sure why you pasted the example bios code, maybe you could explain?


2006-02-16 08:13:12

by Seewer Philippe

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs



Phillip Susi wrote:
> Alan Cox wrote:
>
>> The tools need to know the C/H/S drive addressing data for old drives
>> because it is used to determine partition tables. That doesn't have to
>> be GETGEO but it does need to exist somewhere.
>
> Currently GETGEO very often does not report the same values of the bios
> doesn't it? For some disks it's completely made up, and for others it
> is the value returned by the drive itself, which often differs from the
> bios values. If this is the case, and it is the bios values that must
> be stored in the MBR, then it makes little sense to have GETGEO seeing
> as how it often provides incorrect information.
As stated earlier GETGEO reports the drivers/subsystem's idea of disk
geometry.
> Wouldn't it be better then, to clean up GETGEO everywhere so that unless
> it has correct values from the bios, it should just fail? And leave it
> up to fdisk and friends to inform the user of that failure, choose
> default values, and allow the user to override those defaults should
> they need to?
Thats the problem point here. As of 2.6 the kernel does no longer know
anything about bios geometry. The exception here might be for older
drives which do not support lba, where the physical geometry is the one
the bios reports (if not configured diffently).

This is, as we all know, intentional. Because it's quite impossible to
always and accurately match bios disk information to drives reported by
drivers.

>
> The only time they would even have to worry about it is if they are
> installing linux on a blank disk, and then want to install windows to
> dual boot with it. In that case they might have to correct the CHS
> values in the MBR to match the values the bios provides.
>
>
Not only windows but other os as well.

The problem here is a general interface problem. Tools want one
interface (be it ioctl or sysfs). If they can depend on a kernel
interface only partially and have to determine values themeself
otherwise, that interface should be dropped. Again i'm talking about the
interface, not actual code which might still depend on c/h/s.

On the other hand, if we keep that interface (or perhaps ioctl for
compatibility and sysfs for newer things) and introduce a means to tell
the driver via userspace what we want, many things can be solved. For
example for older drives which need chs, userspace can tell the driver
what the bios uses if values differ. For other implementations which
return defaults which are correct in 80% of all cases, the other 20% can
be overridden.

It's of course not really the kernel's responsability to fix things (or
better allow the user to fix things) not important to Linux, but i think
for the sake of compatility necessary.



2006-02-16 08:18:40

by Seewer Philippe

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs



Alan Cox wrote:
> On Mer, 2006-02-15 at 11:20 -0500, Phillip Susi wrote:
>
>>Why do you say the partitioning tool needs to know the disk reported
>>C/H/S? The value stored in the MBR must match the bios reported values,
>>not the disk reported ones, so why does the partitioner care about what
>>the disk reports?
>
>
> You answered that in asking the question. "The value stored in the MBR
> must match the ...". What if the MBR has not yet been written ?
>
> (Also btw its *should*...) most modern OS's will take a sane MBR
> geometry and trust it over BIOS defaults.
not always. to dos based winnt.exe installer for windows xp trusts the
bios, not the mbr
>
> Alan
>

2006-02-16 12:33:49

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs


On Wed, 15 Feb 2006, Phillip Susi wrote:

> linux-os (Dick Johnson) wrote:
>> Heads start at 0. Sectors start at 1. Cylinders start at 0.
>> A "lower head" than allowed would be 0xff so the BIOS wouldn't
>> know it was "lower". The BIOS doesn't look at the MBR for
>> normal read/write access! Only while booting does it
>> read the first sector of the master boot record (MBR) into
>> the appropriate physical place (0x7c00). Then it checks to see
>> if there is an 0xaa55 as the last word in the sector. If so,
>> it executes code starting at offset zero. Modern BIOS don't
>> even check the "boot flag" because it may be wrong, preventing
>> a boot.
>>
>
> I'm talking about the geometry of the disk. If the disk has 16 sectors
> and 8 heads, then the maximum value allowed for any valid address is 16
> in the sector field and 7 in the heads field. This influences the
> translation to/from LBA. A sector with LBA of 1234 would have a CHS
> address using this geometry of 9/5/3. If the disk reports a geometry of
> x/8/16 but the bios is using a geometry of x/255/63, then when you pass
> 9/5/3 to int 13 it will fetch LBA 144902 which is clearly not going to
> give you what you wanted.
>

Wrong! The disk gets an OFFSET! It doesn't care how that OFFSET
is obtained. That OFFSET is the sum of some variables. Some start
at 0 and some start at 1. The BIOS takes these PHONY things, without
checking to see if they "fit" in some pre-conceived notion of
"geometery" and sums them all up to make an OFFSET. The C/H/S
stuff started and ENDED with the ST-506 interface. PERIOD.

[Snipped rest]


Cheers,
Dick Johnson
Penguin : Linux version 2.6.13.4 on an i686 machine (5589.66 BogoMips).
Warning : 98.36% of all statistics are fiction.
_


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2006-02-16 15:27:38

by Phillip Susi

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs

linux-os (Dick Johnson) wrote:
>> I'm talking about the geometry of the disk. If the disk has 16 sectors
>> and 8 heads, then the maximum value allowed for any valid address is 16
>> in the sector field and 7 in the heads field. This influences the
>> translation to/from LBA. A sector with LBA of 1234 would have a CHS
>> address using this geometry of 9/5/3. If the disk reports a geometry of
>> x/8/16 but the bios is using a geometry of x/255/63, then when you pass
>> 9/5/3 to int 13 it will fetch LBA 144902 which is clearly not going to
>> give you what you wanted.
>>
>
> Wrong! The disk gets an OFFSET! It doesn't care how that OFFSET
> is obtained. That OFFSET is the sum of some variables. Some start
> at 0 and some start at 1. The BIOS takes these PHONY things, without
> checking to see if they "fit" in some pre-conceived notion of
> "geometery" and sums them all up to make an OFFSET. The C/H/S
> stuff started and ENDED with the ST-506 interface. PERIOD.
>

Please reread my explanation above. The bios has to compute the
absolute offset based on the geometry and the values you pass it. It
does so by multiplying the track number you pass by the number of
sectors per track, multiplies the cylinder number by the number of
sectors per track and the number of tracks, and adds those two values to
the sector number you pass to arrive at the LBA to read. If it performs
the CHS->LBA translation using a different geometry than you used to go
from LBA->CHS, then it will get the wrong sector.


2006-02-16 15:37:25

by Phillip Susi

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs

Seewer Philippe wrote:
> Thats the problem point here. As of 2.6 the kernel does no longer know
> anything about bios geometry. The exception here might be for older
> drives which do not support lba, where the physical geometry is the one
> the bios reports (if not configured diffently).
>
> This is, as we all know, intentional. Because it's quite impossible to
> always and accurately match bios disk information to drives reported by
> drivers.
>

If it is intentional that the kernel not keep track of the bios
geometry, then it should not track geometry at all. The only reason for
the existence of GETGEO is so partitioning tools can figure out what to
put in the MBR for the disk geometry. If they do not get the values
that the bios reports, then they are getting useless information.

Why give the illusion that they got the right information when you are
just lieing to them? Wouldn't it be better to fail the request so the
tool knows it can't get the right info from the system?

> Not only windows but other os as well.
>
> The problem here is a general interface problem. Tools want one
> interface (be it ioctl or sysfs). If they can depend on a kernel
> interface only partially and have to determine values themeself
> otherwise, that interface should be dropped. Again i'm talking about the
> interface, not actual code which might still depend on c/h/s.
>

Exactly, the interface should be completely dropped since it really is
useless to the tools anyhow without accurate information from the bios.

> On the other hand, if we keep that interface (or perhaps ioctl for
> compatibility and sysfs for newer things) and introduce a means to tell
> the driver via userspace what we want, many things can be solved. For
> example for older drives which need chs, userspace can tell the driver
> what the bios uses if values differ. For other implementations which
> return defaults which are correct in 80% of all cases, the other 20% can
> be overridden.
>

That is true, but since the kernel doesn't use this information, it
amounts to holding onto a user space configuration parameter. Since
it's just a user space configuration parameter, shouldn't that go in a
conf file in /etc or something, rather than burdening the kernel with
that information? And since the kernel won't remember the settings
across boots, then you're going to end up with them stored in a conf
file anyhow with a boot time script that copies it to the kernel, so
that fdisk can read it back from the kernel later. Since you likely
will only partition a drive when installing, is there even a need to
store it at all, let alone in the kernel? Just let fdisk ask the user
or choose defaults.

> It's of course not really the kernel's responsability to fix things (or
> better allow the user to fix things) not important to Linux, but i think
> for the sake of compatility necessary.
>

2006-02-16 15:41:19

by Seewer Philippe

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs



Phillip Susi wrote:
> Seewer Philippe wrote:
>
>> Thats the problem point here. As of 2.6 the kernel does no longer know
>> anything about bios geometry. The exception here might be for older
>> drives which do not support lba, where the physical geometry is the one
>> the bios reports (if not configured diffently).
>>
>> This is, as we all know, intentional. Because it's quite impossible to
>> always and accurately match bios disk information to drives reported by
>> drivers.
>>
>
> If it is intentional that the kernel not keep track of the bios
> geometry, then it should not track geometry at all. The only reason for
> the existence of GETGEO is so partitioning tools can figure out what to
> put in the MBR for the disk geometry. If they do not get the values
> that the bios reports, then they are getting useless information.
>
> Why give the illusion that they got the right information when you are
> just lieing to them? Wouldn't it be better to fail the request so the
> tool knows it can't get the right info from the system?
>
>> Not only windows but other os as well.
>>
>> The problem here is a general interface problem. Tools want one
>> interface (be it ioctl or sysfs). If they can depend on a kernel
>> interface only partially and have to determine values themeself
>> otherwise, that interface should be dropped. Again i'm talking about the
>> interface, not actual code which might still depend on c/h/s.
>>
>
> Exactly, the interface should be completely dropped since it really is
> useless to the tools anyhow without accurate information from the bios.
>
>> On the other hand, if we keep that interface (or perhaps ioctl for
>> compatibility and sysfs for newer things) and introduce a means to tell
>> the driver via userspace what we want, many things can be solved. For
>> example for older drives which need chs, userspace can tell the driver
>> what the bios uses if values differ. For other implementations which
>> return defaults which are correct in 80% of all cases, the other 20% can
>> be overridden.
>>
>
> That is true, but since the kernel doesn't use this information, it
> amounts to holding onto a user space configuration parameter. Since
> it's just a user space configuration parameter, shouldn't that go in a
> conf file in /etc or something, rather than burdening the kernel with
> that information? And since the kernel won't remember the settings
> across boots, then you're going to end up with them stored in a conf
> file anyhow with a boot time script that copies it to the kernel, so
> that fdisk can read it back from the kernel later. Since you likely
> will only partition a drive when installing, is there even a need to
> store it at all, let alone in the kernel? Just let fdisk ask the user
> or choose defaults.
>
>> It's of course not really the kernel's responsability to fix things (or
>> better allow the user to fix things) not important to Linux, but i think
>> for the sake of compatility necessary.
>>
>
The problem does not end with fdisk. There are tons of tools (sfdisk,
parted, dosemu, ...) which would be affected.

2006-02-16 16:15:34

by Seewer Philippe

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs



Phillip Susi wrote:
> linux-os (Dick Johnson) wrote:
>
>>> I'm talking about the geometry of the disk. If the disk has 16 sectors
>>> and 8 heads, then the maximum value allowed for any valid address is 16
>>> in the sector field and 7 in the heads field. This influences the
>>> translation to/from LBA. A sector with LBA of 1234 would have a CHS
>>> address using this geometry of 9/5/3. If the disk reports a geometry of
>>> x/8/16 but the bios is using a geometry of x/255/63, then when you pass
>>> 9/5/3 to int 13 it will fetch LBA 144902 which is clearly not going to
>>> give you what you wanted.
>>>
>>
>> Wrong! The disk gets an OFFSET! It doesn't care how that OFFSET
>> is obtained. That OFFSET is the sum of some variables. Some start
>> at 0 and some start at 1. The BIOS takes these PHONY things, without
>> checking to see if they "fit" in some pre-conceived notion of
>> "geometery" and sums them all up to make an OFFSET. The C/H/S
>> stuff started and ENDED with the ST-506 interface. PERIOD.
>>
>
> Please reread my explanation above. The bios has to compute the
> absolute offset based on the geometry and the values you pass it. It
> does so by multiplying the track number you pass by the number of
> sectors per track, multiplies the cylinder number by the number of
> sectors per track and the number of tracks, and adds those two values to
> the sector number you pass to arrive at the LBA to read. If it performs
> the CHS->LBA translation using a different geometry than you used to go
> from LBA->CHS, then it will get the wrong sector.
>
>

Guys, lets not forget ourself... ok?

For an in depth explanation of the whole c/h/s lba thing go here:
http://www.mossywell.com/boot-sequence/

This is the best reference i've found thus far.

I think what we should be talking about here is what is necessary to
write a mbr and partition a disk, not how the whole c/h/s shebang works,
because that is no longer of any real interest.

The important fact here is that Linux does not really depend on an MBR
which matches the BIOS. Only other os do...

The current behaviour of partitioning tools under Linux is (most of the
time) quite simple: If an MBR exists, determine the geometry to use to
create new partitions from the MBR.

The problem starts when creating a new MBR. In this case we need a
geometry. There most Utilities depend (probably historically) on
HDIO_GETGEO. By now we know that these values do not necessarily
correspond to bios values. They don't have to, because they can contain
as much bogus as we want. Why? Because all partitions will be created
with these values as a base. The question here is actually only if the
user wants compatible values or not.

The problem increases when we use tools such as dosemu, which need to
emulate a bios. If we do things there like deploy windows with dosemu
(please remember, this is just an example), the geometry values
represented by dosemu need to be exactly the same as the bios returns.

The problem increases further with the use of bootloaders. Because they
need at least some basic geometry information. See the thread "Support
HDIO_GETGEO on device-mapper volumes" in this mailinglist for an example
(Actually this thread is among the reasons why I started this).

So the whole thing comes to the question whether we drop any interfaces
reporting geometry, making userspace tools responsible or if we provide
a common interface which can be modified by userspace if necessary.
(There are no other workable options i can see)

I vote for keeping it in the kernel, because otherwise tons of
user-space tools would need to be modified and it actually might be the
case that a driver knows what he's returning...

2006-02-16 16:16:32

by Phillip Susi

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs

Seewer Philippe wrote:
> The problem does not end with fdisk. There are tons of tools (sfdisk,
> parted, dosemu, ...) which would be affected.

I think that the important thing to remember is that these tools are
already broken; they just don't know it. It is better to tell the tools
you don't know the geometry than to make something up which won't work
for its intended purpose.


In some cases the tools still require this information but don't really
need it at all ( like dosemu ) so they really need to be fixed. It is
silly to require a lie that you don't even use.

2006-02-16 16:39:15

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs


On Thu, 16 Feb 2006, Phillip Susi wrote:

> linux-os (Dick Johnson) wrote:
>>> I'm talking about the geometry of the disk. If the disk has 16 sectors
>>> and 8 heads, then the maximum value allowed for any valid address is 16
>>> in the sector field and 7 in the heads field. This influences the
>>> translation to/from LBA. A sector with LBA of 1234 would have a CHS
>>> address using this geometry of 9/5/3. If the disk reports a geometry of
>>> x/8/16 but the bios is using a geometry of x/255/63, then when you pass
>>> 9/5/3 to int 13 it will fetch LBA 144902 which is clearly not going to
>>> give you what you wanted.
>>>
>>
>> Wrong! The disk gets an OFFSET! It doesn't care how that OFFSET
>> is obtained. That OFFSET is the sum of some variables. Some start
>> at 0 and some start at 1. The BIOS takes these PHONY things, without
>> checking to see if they "fit" in some pre-conceived notion of
>> "geometery" and sums them all up to make an OFFSET. The C/H/S
>> stuff started and ENDED with the ST-506 interface. PERIOD.
>>
>
> Please reread my explanation above. The bios has to compute the
> absolute offset based on the geometry and the values you pass it. It
> does so by multiplying the track number you pass by the number of
> sectors per track, multiplies the cylinder number by the number of
> sectors per track and the number of tracks, and adds those two values to
> the sector number you pass to arrive at the LBA to read. If it performs
> the CHS->LBA translation using a different geometry than you used to go
> from LBA->CHS, then it will get the wrong sector.
>

I read it, and it's wrong. You don't bother to learn. I will
take one last hack at this and then drop it.

When a disk is first accessed, the BIOS reads the disk capacity.
That's all. This disk capacity is in 512-byte things called "sectors".

>From that information, ** AND THAT INFORMATION ALONE **, the
BIOS builds a BIOS parameter block (BPB) for subsequent INT 0x13
translation. This will be used ONLY BY THIS BIOS to extract the
required drive access parameters from the BIOS INT 0x13 interface.

The BIOS knows that the maximum number of cylinders is 1024.
It also knows that the maximum head number is 255, and the
maximum sector number is 255 because that's what's available
in the registers.

It will find the sectors to access in AL, the starting cylinder
in CH (low 8 bits) and CL (high 2 bits), CL also contains 6 bits
of the starting sector. DH will contain the head number. Note
that the BIOS interface only has 6 bits available for the sector!
It doesn't care even though there may be 255 in your phony table
it will just adjust the head number during the access to obtain
the offset exactly.

With that information, the BIOS will tear apart an offset into,
the highest available cylinder, the highest available starting
sector, and the highest head number.

Since there are only 6 bits available for sectors, it will be
limited to 64, in spite of the fact that there may be thousands
of sectors available on a real track.

You see that these are simply "chunks" of sectors. You can make
them up from many different combinations of heads and cylinders.

If the disk has no equivalent "read capacity" command, like
floppy disks and old ST-506 interface disks, then it needs to
use it's internal tables generated from C/H/S that was either
supplied by IBM (for the first AT) or, later when Phoenix started
making BIOS(es), by the user setting a particular type. If the
BIOS is set to "auto" the C/H/S are calculated from the "read
capacity command" as described.

In all cases, the C/H/S is never read from the media. DOS `fdisk`
puts a BPB on the media after it has been partitioned. It is a
throw-back to floppy disks. It is possible for some donkey's boot-
code to fail to boot if it doesn't see a BPB at the correct location,
however that was a W/95 problem.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.15.4 on an i686 machine (5590.48 BogoMips).
Warning : 98.36% of all statistics are fiction.
_


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2006-02-16 17:02:58

by Phillip Susi

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs

Seewer Philippe wrote:
> I think what we should be talking about here is what is necessary to
> write a mbr and partition a disk, not how the whole c/h/s shebang works,
> because that is no longer of any real interest.
>
> The important fact here is that Linux does not really depend on an MBR
> which matches the BIOS. Only other os do...

True.

>
> The current behaviour of partitioning tools under Linux is (most of the
> time) quite simple: If an MBR exists, determine the geometry to use to
> create new partitions from the MBR.
>
> The problem starts when creating a new MBR. In this case we need a
> geometry. There most Utilities depend (probably historically) on
> HDIO_GETGEO. By now we know that these values do not necessarily
> correspond to bios values. They don't have to, because they can contain
> as much bogus as we want. Why? Because all partitions will be created
> with these values as a base. The question here is actually only if the
> user wants compatible values or not.
>

That is the sticking point. The MBR can NOT contain whatever values we
want. It must contain the values that the bios expects, otherwise boot
loaders that use those values will fail to operate correctly.

> The problem increases when we use tools such as dosemu, which need to
> emulate a bios. If we do things there like deploy windows with dosemu
> (please remember, this is just an example), the geometry values
> represented by dosemu need to be exactly the same as the bios returns.
>

dosemu emulates the bios and talks to the disk via the kernel, so it
does not care what the real bios geometry is, or even if there IS a real
bios geometry. You can run dosemu on any block device, including a loop
device, which clearly has no geometry. It can choose whatever geometry
it wants to emulate, though it should probably use whatever existing
geometry is in the MBR of the disk ( physical or virtual ) that you are
having it use.

> The problem increases further with the use of bootloaders. Because they
> need at least some basic geometry information. See the thread "Support
> HDIO_GETGEO on device-mapper volumes" in this mailinglist for an example
> (Actually this thread is among the reasons why I started this).
>

The boot loaders get it from the MBR if they are not operating in LBA
mode. The partitioners put the geometry in the MBR. The geometry that
they place there must match the values expected by the bios. If the
kernel does not know those values, then it should not lie to the
partitioning tools about it, it should fail the request, and let the
partitioning tool decide what to do.

> So the whole thing comes to the question whether we drop any interfaces
> reporting geometry, making userspace tools responsible or if we provide
> a common interface which can be modified by userspace if necessary.
> (There are no other workable options i can see)
>

Right, the kernel can keep the old interface and rely on yet another
user space tool to tell the kernel what it should report, or it can drop
it and rely on the partitioners to deal with it.

> I vote for keeping it in the kernel, because otherwise tons of
> user-space tools would need to be modified and it actually might be the
> case that a driver knows what he's returning...
>

You said already that as of 2.6 the kernel no longer knows the bios
values, so the driver NEVER knows the right value to return. Since that
is the case, it should not pretend that it does know, it should let the
user space partitioning tools know it does not know. Yes, they may need
modified to handle that case, but that is something they should have
been doing a long time ago, and why complicate the kernel when this is
really done in user space anyhow?


2006-02-16 17:10:33

by Phillip Susi

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs

linux-os (Dick Johnson) wrote:
> I read it, and it's wrong. You don't bother to learn. I will
> take one last hack at this and then drop it.
>
> When a disk is first accessed, the BIOS reads the disk capacity.
> That's all. This disk capacity is in 512-byte things called "sectors".
>

You don't bother to mention HOW it is wrong, so it appears it is you who
fail to learn. I will attempt once more to explain. When you call int
13 and ask it for C = 3, H = 4, S = 5, exactly which sector you get
depends very much on what the bios thinks the geometry of the disk is,
because the bios will translate 3/4/5 into a completely different value
before sending it to the drive. That translation is dependent entirely
on which fake geometry the bios chooses to report the disk has.

I illustrated this translation and you simply say it is wrong. If that
is the case then show how.



2006-02-16 18:14:14

by Matt Domsch

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs

On Wed, Feb 15, 2006 at 04:06:55PM +0000, Alan Cox wrote:
> On Mer, 2006-02-15 at 10:20 -0500, Phillip Susi wrote:
> > I thought that C/H/S addressing was purely a function of int 13, not the
> > hardware interface? If it is a function of some older hardware
> > interfaces, then we are still talking about two different, and likely
> > incompatible geometries: the one the disk reports, and the one the bios
> > reports. The values in the MBR must be the values the bios reports.
>
> We have at least three
>
> Disk reported C/H/S
> BIOS reported C/H/S (hda/hdb only)
> Actual C/H/S (if it exists)
> Partition table C/H/S
>
> A partitioning tool needs to know
> Disk reported C/H/S
> Partition table C/H/S
> Preferably BIOS reported C/H/S if there is one
>
> The partition table C/H/S is on disk so trivial
> The disk reported ones are in the identify block so could be pulled via
> /proc and sysfs
> The BIOS one is PC specific low memory poking around

On i386 and x86_64, the edd module reports the 2 types of C/H/S values
as BIOS knows them, in /sys/firmware/edd/int13_dev*/

legacy_max_cylinder, legacy_max_head, and legacy_max_sectors_per_track
are int13 AH=08h values.

default_cylinders, default_heads, and default_sectors_per_track are
int13 AH=48h values.

Files not in that directory mean the value reported by BIOS was zero.

--
Matt Domsch
Software Architect
Dell Linux Solutions linux.dell.com & http://www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com

2006-02-16 19:01:33

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs


On Thu, 16 Feb 2006, Phillip Susi wrote:

> linux-os (Dick Johnson) wrote:
>> I read it, and it's wrong. You don't bother to learn. I will
>> take one last hack at this and then drop it.
>>
>> When a disk is first accessed, the BIOS reads the disk capacity.
>> That's all. This disk capacity is in 512-byte things called "sectors".
>>
> You don't bother to mention HOW it is wrong, so it appears it is you who
> fail to learn. I will attempt once more to explain. When you call int
> 13 and ask it for C = 3, H = 4, S = 5, exactly which sector you get
> depends very much on what the bios thinks the geometry of the disk is,
> because the bios will translate 3/4/5 into a completely different value
> before sending it to the drive. That translation is dependent entirely
> on which fake geometry the bios chooses to report the disk has.
>
> I illustrated this translation and you simply say it is wrong. If that
> is the case then show how.

You sure are interested in arguing. The translation cannot be wrong
because the BIOS invented the translation which was created when
the BIOS did a "read capacity." That translation is stored in the
BIOS as a BPB, not on the disk, and it is accessed by any file-
systems that use the 16-bit Int 0x13 interface. If the file-
systems are not broken, they will NOT use the wrong translation
because they will read the current interpretation by reading
the BPB from the vector represented by int 0x64, or by executing
Int 0x13, function code 8 (read drive parameters). These parameters
are INVENTED upon startup as previously explained.

As previously explained, the fake geometry is not geometry at
all, but rather a translation key that was decided upon
startup after the capacity was determined. Its sole purpose
is to get a sector-offset through the limited register-set
in the 0x13 interface.

[FS offset]--->[encode KEY]--->[INT 0x13]--->[decode KEY]--->[drive offset]
| |
|-- anything that will fit ---|

This encode/decode key should have never been let out of its cage.
Unfortunately some DOS tools put it on the disks in a table
called the BPB.

DOS creates two software interrupt vectors, int 0x25, and
int 0x26, (absolute read and write), which perform this
translation using the stuff in the BPB. This means that
the caller (the file-system) doesn't have to worry about
these things.

Since the offsets are directly available when the BIOS is not
used, this BPB is useless.

Even when using dosemu, where a virtual 0x13 is available, the
key used to access this resource is obtained by reading the
capacity of the DOS file-system(s) and building a BPB for
each (virtual) disk.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.15.4 on an i686 machine (5590.48 BogoMips).
Warning : 98.36% of all statistics are fiction.
_


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2006-02-16 19:56:30

by Phillip Susi

[permalink] [raw]
Subject: Re: RFC: disk geometry via sysfs

linux-os (Dick Johnson) wrote:
>
> You sure are interested in arguing. The translation cannot be wrong
> because the BIOS invented the translation which was created when
> the BIOS did a "read capacity." That translation is stored in the
> BIOS as a BPB, not on the disk, and it is accessed by any file-
> systems that use the 16-bit Int 0x13 interface. If the file-
> systems are not broken, they will NOT use the wrong translation
> because they will read the current interpretation by reading
> the BPB from the vector represented by int 0x64, or by executing
> Int 0x13, function code 8 (read drive parameters). These parameters
> are INVENTED upon startup as previously explained.
>

The BPB is stored in the MBR by fdisk. The MBR also contains CHS sector
addresses for the location of the partitions on the disk. fdisk
computes those addresses by translating the LBA into CHS using a
geometry. If the bios is using a different geometry then those CHS
addresses that the boot loader will request the bios to load will have a
completely different meaning, so the loader won't get the intended sector.

> As previously explained, the fake geometry is not geometry at
> all, but rather a translation key that was decided upon
> startup after the capacity was determined. Its sole purpose
> is to get a sector-offset through the limited register-set
> in the 0x13 interface.
>

Yes, I've been saying it is a translation key the whole time. The bios
uses it to figure out what sector you are referencing in int 13, and
fdisk uses it to figure out what CHS values correspond to a given LBA so
it can store them in the MBR. If they don't both use the same
translation key, they aren't both speaking the same language, and so the
boot loader will break.

> [FS offset]--->[encode KEY]--->[INT 0x13]--->[decode KEY]--->[drive offset]
> | |
> |-- anything that will fit ---|
>
> This encode/decode key should have never been let out of its cage.
> Unfortunately some DOS tools put it on the disks in a table
> called the BPB.
>

And fdisk must also perform the encode so the partition table can
correctly indicate where the partition begins. The boot loader passes
those values to the bios which decodes them. The encode and decode must
both be done using the same key.

> DOS creates two software interrupt vectors, int 0x25, and
> int 0x26, (absolute read and write), which perform this
> translation using the stuff in the BPB. This means that
> the caller (the file-system) doesn't have to worry about
> these things.
>

We aren't talking about dos's filesystem here, we're talking about the
windows ( or dos ) boot loader which directly uses int 13 and passes the
CHS values from the MBR. The meaning of those values is entirely
dependent on the geometry, so when fdisk writes those values to the MBR
it has to be using the same geometry that the bios will use to decode
it, otherwise when the sector address of the partition is encoded using
one geometry, and decoded using another, it will come out all wrong.

> Since the offsets are directly available when the BIOS is not
> used, this BPB is useless.
>

The boot loader is directly using the bios.

> Even when using dosemu, where a virtual 0x13 is available, the
> key used to access this resource is obtained by reading the
> capacity of the DOS file-system(s) and building a BPB for
> each (virtual) disk.
>