Hi!
This patch replaces the current AMD IDE driver (by Andre Hedrick) by
mine. Myself I think my implementation is much cleaner, but I'll leave
upon others to judge that. My driver also additionally supports the
AMD-8111 IDE.
It's well tested, and I'd like to have this in the kernel instead of
what's there now.
--
Vojtech Pavlik
SuSE Labs
Vojtech Pavlik wrote:
> Hi!
>
> This patch replaces the current AMD IDE driver (by Andre Hedrick) by
> mine. Myself I think my implementation is much cleaner, but I'll leave
> upon others to judge that. My driver also additionally supports the
> AMD-8111 IDE.
>
> It's well tested, and I'd like to have this in the kernel instead of
> what's there now.
Agreed. Let's give it a try. (This is trivial to back up.)
On Mon, 11 Mar 2002, Martin Dalecki wrote:
> Vojtech Pavlik wrote:
> > This patch replaces the current AMD IDE driver (by Andre Hedrick) by
> > mine.
> Agreed. Let's give it a try. (This is trivial to back up.)
I think experience has tought us by now that large changes in
code are impossible to completely back out or debug.
Then again, after reading your discussion with Alan I guess
we don't have much too lose anyway.
regards,
Rik
--
<insert bitkeeper endorsement here>
http://www.surriel.com/ http://distro.conectiva.com/
> > > This patch replaces the current AMD IDE driver (by Andre Hedrick) by
> > > mine.
>
> > Agreed. Let's give it a try. (This is trivial to back up.)
>
> I think experience has tought us by now that large changes in
> code are impossible to completely back out or debug.
The AMD stuff is a nice seperate single driver update. It cleans up the
code a hell of a lot and adds new stuff. I'll run it into -ac later for
testing.
Its quite different to other goings on
On Mon, 11 Mar 2002, Alan Cox wrote:
>
> Its quite different to other goings on
Alan, did you actually look at the diffs that Martin sent out, or are you
just reacting to the description?
I think you read more into the description than was actually in the patch
itself.
Rule #1: always read the patch.
Right now, that patch definitely needs to learn to use "yield()" instead
of "schedule()" etc details, but I really don't understand why all the
brouhaha over Martins patches.
Am I really the only one who actually reads the actual _changes_ instead
of arguing over personal issues?
Now, I've long had this theory that IDE coding is bad for your mental
health (you won't ever see _me_ going close to the dang thing - I'll use
it, but I won't start writing code for it), but that theory used to be a
_joke_, for chrissake! Don't make it appear a truism.
Linus
> On Mon, 11 Mar 2002, Alan Cox wrote:
> >
> > Its quite different to other goings on
>
> Alan, did you actually look at the diffs that Martin sent out, or are you
> just reacting to the description?
I wasnt actually trying to be rude there. Just make it clear that the AMD
stuff is seperate and quite reasonable 2.4 stuff while the big changes
are 2.5 stuff
On Mon, Mar 11, 2002 at 02:39:14PM -0800, Linus Torvalds wrote:
> On Mon, 11 Mar 2002, Alan Cox wrote:
> >
> > Its quite different to other goings on
>
> Alan, did you actually look at the diffs that Martin sent out, or are you
> just reacting to the description?
>
> I think you read more into the description than was actually in the patch
> itself.
>
> Rule #1: always read the patch.
>
> Right now, that patch definitely needs to learn to use "yield()" instead
> of "schedule()" etc details, but I really don't understand why all the
> brouhaha over Martins patches.
>
> Am I really the only one who actually reads the actual _changes_ instead
> of arguing over personal issues?
>
> Now, I've long had this theory that IDE coding is bad for your mental
> health (you won't ever see _me_ going close to the dang thing - I'll use
> it, but I won't start writing code for it), but that theory used to be a
> _joke_, for chrissake! Don't make it appear a truism.
Are you, by any chance, confusing my AMD IDE changes with Pavel Machek's
first attempt at IDE wake/suspend code?
Gee, I think we all need some calming down and sleep ...
--
Vojtech Pavlik
SuSE Labs
On Mon, 11 Mar 2002, Vojtech Pavlik wrote:
>
> Are you, by any chance, confusing my AMD IDE changes with Pavel Machek's
> first attempt at IDE wake/suspend code?
No, I'm just reacting to alan's email where he mentions the "other goings
on" - as opposed to your patch which even Alan agreed with.
Note that I've actually tried to read all patches, and right now they are
all in my tree (of course, the "all" is the Linus kind of "all", which
only includes the ones I actually _noticed_.
That includes your AMD IDE driver change _and_ the oh-so-much-discussed
patches by Martin (which in turn included Pavel's change, which - together
with his changelog entry - seems to be the one that triggered the "lively
discussions").
Linus
On Mon, 11 Mar 2002, Linus Torvalds wrote:
> Note that I've actually tried to read all patches, and right now they are
> all in my tree (of course, the "all" is the Linus kind of "all", which
> only includes the ones I actually _noticed_.
>
> That includes your AMD IDE driver change _and_ the oh-so-much-discussed
> patches by Martin (which in turn included Pavel's change, which - together
> with his changelog entry - seems to be the one that triggered the "lively
> discussions").
I think you might want to offer an opinion (or edict, mandate, whatever)
WRT taskfile. While I think that some protective editing of commands is
desirable, useful, etc, I'm damn sure that some form of raw access is
required long term to allow diagmostics. I wouldn't argue if that
interface was required for low level format and other really drastic
operations as well.
I think "future new commands" is total FUD, the idea that some new command
would come along and be so instantly popular, useful and incompatible that
all Linux boxes would require it before the next kernel or driver update
is silly at best, and I'm working hard to keep this on a civil plane.
Linux moves a hell of a lot faster than Windows, and I can't see anything
coming out which requires "instant attention" because it is totally
incompatible with existing drivers. Most changes will be only added
functionality, like 48 bit addressing, and will be a superset of current
drivers. In other words, not everyone will wnat or need the capability,
and no one is going to feel that a month or so wait from driver TESTING
would cause them to give up computing and become a stripper in a gay bar.
In any case, taskfile is here, and unless there is a good reason to drop
lightly edited commands, or raw commands, or go to some totally different
approach, I wish you would express a clear opinion, at least WRT 2.4,
before developers become nasty and stop limiting themselves to accusations
of stupidity, incompetence, inreliability and prevarication.
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
Bill Davidsen wrote:
>On Mon, 11 Mar 2002, Linus Torvalds wrote:
>
>>Note that I've actually tried to read all patches, and right now they are
>>all in my tree (of course, the "all" is the Linus kind of "all", which
>>only includes the ones I actually _noticed_.
>>
>>That includes your AMD IDE driver change _and_ the oh-so-much-discussed
>>patches by Martin (which in turn included Pavel's change, which - together
>>with his changelog entry - seems to be the one that triggered the "lively
>>discussions").
>>
>
>I think you might want to offer an opinion (or edict, mandate, whatever)
>WRT taskfile. While I think that some protective editing of commands is
>desirable, useful, etc, I'm damn sure that some form of raw access is
>required long term to allow diagmostics. I wouldn't argue if that
>interface was required for low level format and other really drastic
>operations as well.
>
One sticking point seems to be the question of whether there needs to be
some amount of validation performed on the commands passed through the
taskfile interface. I don't think Andre buys into the argument that
most other kernel hackers understand, which is, "if userspace is allowed
to bang away at random i/o ports, why validate taskfile requests in the
kernel?"
And IMO, we should have some basic validation of taskfile requests in
the kernel...
Reason 1: Standard kernel convention. In other ioctls, we check basic
arguments and return EINVAL when they are wrong, even for privieleged
ioctls.
Reason 2: If you have multiple programs issuing ATA commands, you would
want a decent amount of synchronization, provided by the kernel, for the
multiple user processes and multiple kernel processes issuing requests.
Having the userspace commands come down a single spot in the kernel
code makes this job a lot easier, if not making the impossible possible :)
Regards,
Jeff
On Mon Mar 11, 2002 at 07:34:26PM -0500, Jeff Garzik wrote:
> Reason 1: Standard kernel convention. In other ioctls, we check basic
> arguments and return EINVAL when they are wrong, even for privieleged
> ioctls.
I have no argument with basic command validation. But take a
look at ide_cmd_type_parser(), for example. Do we really need a
giant switch statement listing all the allowed commands, just so
we can throw back a IDE_DRIVE_TASK_INVALID to user-space if they
decide to send down some undocumeted firmware wiping commands?
Especially since that giant struct of allowed commands is
duplicated in ide_pre_handler_parser() and ide_handler_parser()
-Erik
--
Erik B. Andersen http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--
On Mon, 11 Mar 2002, Bill Davidsen wrote:
>
> I think you might want to offer an opinion (or edict, mandate, whatever)
> WRT taskfile. While I think that some protective editing of commands is
> desirable, useful, etc, I'm damn sure that some form of raw access is
> required long term to allow diagmostics. I wouldn't argue if that
> interface was required for low level format and other really drastic
> operations as well.
Quite frankly, my personal opinion is that there's no point in trying to
parse the commands too much at all. The _only_ thing the kernel should try
to care about is that the buffers passed to the command are valid (ie the
actual data in/out area pointer should be a kernel buffer as far as
possible, not under user control).
Basically, there are three "modes" of operation here:
- regular kernel access (ie the normal read/write stuff)
parsing: none. The kernel originated the request.
- common ioctl's that have nothing to do with IDE per se (ie all the
stuff to open/close/lock removable media, unlocking DVD's, reading
sound sectors etc - stuff that really exists elsewhere too, and where
the kernel should do the translation from the generic problem space to
the specific commands for the bus in quesion)
Parsing: common ioctl turning stuff into common SCSI commands in the
"struct request".
- totally IDE-only stuff, where the kernel simply doesn't add any value,
and only has a way of passing it down to the drive, and passing back
the requests.
Parsing: none. The kernel simply doesn't have anything to do with the
request, except to synchronize it with all other requests.
The only common factor here is the "synchronize with other requests" - I
feel strongly (much more strongly than any parsing notion) that the raw
requests have to be passed down the "struct request" and NOT be done the
way they are traditionally done (ie completely outside the request stream,
with no synchronization at all with any IO currently in progress).
> I think "future new commands" is total FUD, the idea that some new command
> would come along and be so instantly popular, useful and incompatible that
> all Linux boxes would require it before the next kernel or driver update
> is silly at best, and I'm working hard to keep this on a civil plane.
It has nothing to do with "new" commands, and everything to do with
"random vendor-specific commands and the vendor-specific tools". Commands
that simply should _never_ be parsed in the kernel, because we do not want
to care about 10 different vendors 10 different revisions of their
firmware having 10 different small random special commands for that
particular drive.
In particular, a user that upgrades his hardware should never _ever_ have
to upgrade his kernel just because some random disk diagnostic tool needs
support for a disk that is new and has new diagnostics.
Linus
Erik Andersen wrote:
>On Mon Mar 11, 2002 at 07:34:26PM -0500, Jeff Garzik wrote:
>
>>Reason 1: Standard kernel convention. In other ioctls, we check basic
>>arguments and return EINVAL when they are wrong, even for privieleged
>>ioctls.
>>
>
>I have no argument with basic command validation. But take a
>look at ide_cmd_type_parser(), for example. Do we really need a
>giant switch statement listing all the allowed commands, just so
>we can throw back a IDE_DRIVE_TASK_INVALID to user-space if they
>decide to send down some undocumeted firmware wiping commands?
>Especially since that giant struct of allowed commands is
>duplicated in ide_pre_handler_parser() and ide_handler_parser()
>
I agree the implementation could be improved.
Your first question is really philosophical. I think that people should
-not- be able to send undocumented commands through the interface...
and in this area IMO it pays to be paranoid.
If we wanted to be ultra-super-paranoid, drop the ioctl and taskfile
parser, and implement the taskfile checks via SMM mode callbacks from
activity on the IDE ports ;-) That way we know the NSA is not doing
something sneaky, as well as supporting unlimited SMP bit-banging from
userland. Can you say ug and non-portable even to a lot of ia32
platforms. :)
So, the implementation may need improvement, but we do (a) want the
taskfile ioctl [and one for scsi too], and (b) want to implement some
amount of mininal sanity checks on the requests.
Jeff
Linus Torvalds wrote:
>The only common factor here is the "synchronize with other requests" - I
>feel strongly (much more strongly than any parsing notion) that the raw
>requests have to be passed down the "struct request" and NOT be done the
>way they are traditionally done (ie completely outside the request stream,
>with no synchronization at all with any IO currently in progress).
>
agreed
>>I think "future new commands" is total FUD, the idea that some new command
>>would come along and be so instantly popular, useful and incompatible that
>>all Linux boxes would require it before the next kernel or driver update
>>is silly at best, and I'm working hard to keep this on a civil plane.
>>
>
>It has nothing to do with "new" commands, and everything to do with
>"random vendor-specific commands and the vendor-specific tools". Commands
>that simply should _never_ be parsed in the kernel, because we do not want
>to care about 10 different vendors 10 different revisions of their
>firmware having 10 different small random special commands for that
>particular drive.
>
>In particular, a user that upgrades his hardware should never _ever_ have
>to upgrade his kernel just because some random disk diagnostic tool needs
>support for a disk that is new and has new diagnostics.
>
Are such random vendor-specific commands really that common?
Linus, would it be acceptable to you to include an -optional- filter for
ATA commands? There is definitely a segment of users that would like to
firewall their devices, and I think (as crazy as it may sound) that
notion is a valid one.
Jeff
Linus Torvalds wrote:
>
>On Mon, 11 Mar 2002, Jeff Garzik wrote:
>
>>Your first question is really philosophical. I think that people should
>>-not- be able to send undocumented commands through the interface...
>> and in this area IMO it pays to be paranoid.
>>
>
>What if the command is perfectly documented, but only for a certain class
>of IBM disks?
>
>Are you going to create a table of every disk out there, along with every
>command it can do?
>
>Remember: the kernel driver is a driver for the host controller, yet the
>command is for the _disk_. It makes no sense to check for disk commands in
>a host controller driver - they are two different things.
>
>It's like checking for icmp messages in a network driver. Do you seriously
>propose having network drivers check icmp messages for command validity?
>
See my other message, and thanks for making this analogy :)
I -do- know the distrinction between hosts and devices. I think there
should be -some- way, I don't care how, to filter out those unknown
commands (which may be perfectly valid for a small subset of special IBM
drives). The net stack lets me do filtering, I want to sell you on the
idea of letting the ATA stack do the same thing.
You have convinced me that unconditional filtering is bad. But I still
think people should be provided the option to filter if they so desire.
Jeff
On Mon, 11 Mar 2002, Jeff Garzik wrote:
>
> Your first question is really philosophical. I think that people should
> -not- be able to send undocumented commands through the interface...
> and in this area IMO it pays to be paranoid.
What if the command is perfectly documented, but only for a certain class
of IBM disks?
Are you going to create a table of every disk out there, along with every
command it can do?
Remember: the kernel driver is a driver for the host controller, yet the
command is for the _disk_. It makes no sense to check for disk commands in
a host controller driver - they are two different things.
It's like checking for icmp messages in a network driver. Do you seriously
propose having network drivers check icmp messages for command validity?
Linus
On Mon, 11 Mar 2002, Jeff Garzik wrote:
>
> Linus, would it be acceptable to you to include an -optional- filter for
> ATA commands? There is definitely a segment of users that would like to
> firewall their devices, and I think (as crazy as it may sound) that
> notion is a valid one.
I'd rather have the rule that you can turn off this feature altogther: if
the command is so esoteric that it's not part of the regular commands we
can generate on our own, 99% of all people probably aren't even interested
in it.
If the command is useful on its own, and has a generic meaning (ie across
a large subset of users), then I think the kernel should support it
through a _real_ interface, not through some "pass user data though
inscathed" thing. Because that generic meaning probably exists even
outside ATA drives.
Linus
> I -do- know the distrinction between hosts and devices. I think there
> should be -some- way, I don't care how, to filter out those unknown
> commands (which may be perfectly valid for a small subset of special IBM
> drives). The net stack lets me do filtering, I want to sell you on the
> idea of letting the ATA stack do the same thing.
>
> You have convinced me that unconditional filtering is bad. But I still
> think people should be provided the option to filter if they so desire.
>
> Jeff
How hard would this be to farm off to an external utility?
Gerhard
--
Gerhard Mack
[email protected]
<>< As a computer I find your faith in technology amusing.
Linus Torvalds wrote:
>
>On Mon, 11 Mar 2002, Jeff Garzik wrote:
>
>>Linus, would it be acceptable to you to include an -optional- filter for
>>ATA commands? There is definitely a segment of users that would like to
>>firewall their devices, and I think (as crazy as it may sound) that
>>notion is a valid one.
>>
>
>I'd rather have the rule that you can turn off this feature altogther: if
>the command is so esoteric that it's not part of the regular commands we
>can generate on our own, 99% of all people probably aren't even interested
>in it.
>
Are we talking about the same thing or different things?
If filtering is done, I agree the filter feature is disable-able if the
kernel builder / sysadmin desires such. Disable the filter by default,
if that's your fancy. But let us filter. :)
>If the command is useful on its own, and has a generic meaning (ie across
>a large subset of users), then I think the kernel should support it
>through a _real_ interface, not through some "pass user data though
>inscathed" thing. Because that generic meaning probably exists even
>outside ATA drives.
>
Let me first agree on a point, to get that out of the way. I agree
there is generic meaning outside of ATA drives,for most of the ATA[PI]
commands. Further, I hope you know by now I agree with your taste in
kernel interfaces :)
First, for verification of the driver, it's darned useful to be able to
issue ATA commands from userspace. But since we're being generic, let's
just call them device commands, and imagine SCSI perhaps playing in this
playground too.
Second, if you are issuing device commands from userspace, you need to
deal with synchronization with the commands being issued from kernel space.
Third, if you have synchronization, that's a good and easy point to add
-optional- filtering.
Now... the overall point I am trying to sell is... IFF certain
lesser-used device commands are primarily issued from userspace, and IFF
it is not reasonable to create a new kernel interface, is it not
reasonable to be able to issue raw device commands from userspace? Just
the verification case alone seems like a strong argument.
Call it device_command not ata_command if you want, and add SCSI
support. Passing in raw device_commands from userspace is useful, IMO.
Throw in filtering of those commands, and life is golden for everybody
AFAICS. There is no need to be ATA specific with these goals...
Jeff
On Mon, 11 Mar 2002, Jeff Garzik wrote:
>
> You have convinced me that unconditional filtering is bad. But I still
> think people should be provided the option to filter if they so desire.
Hey, choice is always good, except if it adds complexity.
The problem with conditional filtering is that either it is a boot (or
compile time) option, or it is a dynamic filter.
If its a dynamic filter, and you don't trust root, what _are_ you going to
trust? The root program you don't trust might as well be turning the
filtering off because it wants to be "convenient". And since the only
programs you really want to filter are _exactly_ the kinds of programs
that want to avoid filtering, you're just hosed.
That's my real beef with this whole idiotic parsing thing. Either it is
fixed (bad, if you don't know what the commands are for all disks) or it
is trivially overcome in the name of "convenience" (equally bad, since it
makes the whole thing pointless).
None of these issues really exist in your networking example.
Trust me, I'm not trying to be difficult, I'm just pointing out the
fundamental problems of your approach.
Yeah yeah. You can add additional levels of protection, and we have
capabilities.
Add a special password-protected capability, so that only YOU can enable
certain hardware access stuff. Where does it end? Is one such capability
enough? How do you initialize the default values for the system if you
need to be there to type in the password at bootup every time? We're
talking about some rather fundamental things here, and these are issues
that go _far_ beyond some silly ATA stack layer.
Linus
Linus Torvalds wrote:
>
>On Mon, 11 Mar 2002, Jeff Garzik wrote:
>
>>You have convinced me that unconditional filtering is bad. But I still
>>think people should be provided the option to filter if they so desire.
>>
>
>Hey, choice is always good, except if it adds complexity.
>
>The problem with conditional filtering is that either it is a boot (or
>compile time) option, or it is a dynamic filter.
>
heh, I couldn't have given a better argument against a dynamic filter.
I was actually assuming the filter would be a compile-time option, for
security's sake. Boot-time option works too.
So, it sounds like you could be sold on an fixed-at-compile-time filter
that can be disabled at boot :) I know you don't like
fixed-at-compile-time as you mentioned, but it's my argument there is a
class of users that definitely do. MandrakeSoft would likely enable the
filter in the "secure" kernel and disable it in the "normal" kernel, for
example.
Jeff
Linus Torvalds wrote:
>
>On Mon, 11 Mar 2002, Jeff Garzik wrote:
>
>>If filtering is done, I agree the filter feature is disable-able if the
>>kernel builder / sysadmin desires such. Disable the filter by default,
>>if that's your fancy. But let us filter. :)
>>
>
>BUT WHAT THE HELL IS THE POINT?
>
>Don't you get that? If the sysadmin can turn the filtering off, so can any
>root program. And your worry seems to be the CRM kind of disk locking
>utility which most _definitely_ would do exactly that if it is as evil as
>you think it is.
>
dynamic filtering sucks, I agree, and I am not arguing for that.
>And if you hardcode the filtering at compile-time in the kernel, that
>means that you've now painted yourself into the corner that you already
>seemed to admit was not a good idea - the same way it's not a good idea
>for network filtering.
>
No, I think hardcoding at compile time is useful here.
It serves to encourage openness, nobody is forced to use it, and it
provides an additional layer of protection for those that choose to use
it. That is the point. It's a choice, and you don't have to enable it
in your kernel. But there seems to be enough demand that it should be
at least an option.
Jeff
> Are such random vendor-specific commands really that common?
Not very
> Linus, would it be acceptable to you to include an -optional- filter for
> ATA commands? There is definitely a segment of users that would like to
> firewall their devices, and I think (as crazy as it may sound) that
> notion is a valid one.
Jeff -I like the idea of the filters - but if the ATA command raw stuff
is CAP_SYS_RAWIO then its the same right set as inb/outb. Beyond that
its a job for the NSA and RSBAC stuff ?
On Mon, 11 Mar 2002, Jeff Garzik wrote:
>
> If filtering is done, I agree the filter feature is disable-able if the
> kernel builder / sysadmin desires such. Disable the filter by default,
> if that's your fancy. But let us filter. :)
BUT WHAT THE HELL IS THE POINT?
Don't you get that? If the sysadmin can turn the filtering off, so can any
root program. And your worry seems to be the CRM kind of disk locking
utility which most _definitely_ would do exactly that if it is as evil as
you think it is.
And if you hardcode the filtering at compile-time in the kernel, that
means that you've now painted yourself into the corner that you already
seemed to admit was not a good idea - the same way it's not a good idea
for network filtering.
Linus
Alan Cox wrote:
>>Linus, would it be acceptable to you to include an -optional- filter for
>>ATA commands? There is definitely a segment of users that would like to
>>firewall their devices, and I think (as crazy as it may sound) that
>>notion is a valid one.
>>
>
>Jeff -I like the idea of the filters - but if the ATA command raw stuff
>is CAP_SYS_RAWIO then its the same right set as inb/outb. Beyond that
>its a job for the NSA and RSBAC stuff ?
>
Yeah, you can still bit-bang with the current implementation, on that
capability. Couldn't that be cured with s/CAP_SYS_RAWIO/some new
CAP_DEVICE_CMD/ for the raw device command interface?
The current implementation needs to be changed anyway :) From "ATA raw
command" to "device raw command" at the very least.
Jeff
From: "Jeff Garzik" <[email protected]>
> Linus Torvalds wrote:
>
> >
> >On Mon, 11 Mar 2002, Jeff Garzik wrote:
...
> First, for verification of the driver, it's darned useful to be able to
> issue ATA commands from userspace. But since we're being generic, let's
> just call them device commands, and imagine SCSI perhaps playing in this
> playground too.
No fooling! (By way of history I was one of the SCSI device driver mavens
for the Amiga. We covered this generic command ground, recovered this ground,
and generally ground this ground into the ground.)
> Second, if you are issuing device commands from userspace, you need to
> deal with synchronization with the commands being issued from kernel space.
Generally speaking, yes.
> Third, if you have synchronization, that's a good and easy point to add
> -optional- filtering.
And while I see your point I still ask, "But why?" To make this work the
way it should work you will need a special table of normally disallowed
commands that are allowed for each specific device on the IDE buses
within a machine. You also need this to be generic enough to allow some
commands to go through except for some specific sets of parameters. Now,
I realize that SCSI is more generic than IDE. However, contemplating this
sort of filtering within SCSI device drivers fell into such a point of
silliness that it became the general concensus within the community that
the filtering is inappropriate. In point of fact this proved useful when
technology marched on. The facility within SCSI device drivers for directly
issueing low level commands became instantly important with the advent of
drives that were larger than 2^24 bytes. The normal read/write interface
on the device drivers was byte offset and byte count oriented, even though
it required the byte offsets to be on block boundaries. The direct SCSI
command interface allowed for workaround patches that broke the disk into
several virtual disks. These patches would open the low level driver and
insert themselves as the "real" device driver for the set of devices on
each SCSI bus. This died away fairly quickly as filesystem adaptations
became more popular. These adaptations, in the case finally adopted within
the latest OS revisions, involved an institutionalized driver patch that
presented a proper 64 bit read/write interface to the filesystems. The
fact that the device drivers that are no longer under any sort of
development are patched using direct SCSI commands is neatly hidden
from the users. But it is there and allowed the OS to survive where there
was no further maintenance effort taking place.
I note there seem to be more than one device driver for Linux that is no
longer maintained or poorly maintained. While I doubt this will be the
case for generic IDE, I do suspect that specific IDE interfaces that have
"peculiarities" will fall by the wayside as fewer people use them. It
might be nice to have the ability to patch around these drivers rather
than try to patch then with no test cases handy. It saved a significant
amount of grief on the Amiga. There's no reason I can see that it would
not save grief on Linux.
> Now... the overall point I am trying to sell is... IFF certain
> lesser-used device commands are primarily issued from userspace, and IFF
> it is not reasonable to create a new kernel interface, is it not
> reasonable to be able to issue raw device commands from userspace? Just
> the verification case alone seems like a strong argument.
>
> Call it device_command not ata_command if you want, and add SCSI
> support. Passing in raw device_commands from userspace is useful, IMO.
> Throw in filtering of those commands, and life is golden for everybody
> AFAICS. There is no need to be ATA specific with these goals...
It might be a good exercise, Jeff, to define the filters in such a way
that potentially harmful commands can be adequately filters while other
potentially "saving" commands can be allowed even if they differ only in
parameters. It is also interesting to note that direct userland ATA and
ATAPI reads and writes are fully as dangerous as any exotic level commands.
Linux seems to allow this danger. What makes the other commands so dangerous?
I'm not as familiar with the Linux SCSI interfaces as with those of the
Amiga. (I've sort of moved on to video work.) However, I wonder if the Linux
SCSI world filters "mode select" and manufacturer defined commands or not.
I suspect they have a longer history with this sort of interface and its
abuse than IDE. It might pay to investigate and adopt consonant postures.
{^_^} Joanne Dow, [email protected], did Amiga SCSI device drivers for
a decade or so as a consultant.
From: "Linus Torvalds" <[email protected]>
> Yeah yeah. You can add additional levels of protection, and we have
> capabilities.
>
> Add a special password-protected capability, so that only YOU can enable
> certain hardware access stuff. Where does it end? Is one such capability
> enough? How do you initialize the default values for the system if you
> need to be there to type in the password at bootup every time? We're
> talking about some rather fundamental things here, and these are issues
> that go _far_ beyond some silly ATA stack layer.
Linus, this discussion hung on long enough I decided to start reading it.
So I may have missed something. However, I notice you speak of a networking
example. I believe it is a strawman. The real comparison is closer to that
between IDE and SCSI. Do the SCSI device drivers filter? What and how do
they filter if they do at all? Aren't they a better model to adopt for
a system consistent interface philosophy?
{^_^} Joanne Dow, [email protected]
J. Dow wrote:
>From: "Jeff Garzik" <[email protected]>
>
>>Second, if you are issuing device commands from userspace, you need to
>>deal with synchronization with the commands being issued from kernel space.
>>
>Generally speaking, yes.
>
cool
>>Third, if you have synchronization, that's a good and easy point to add
>>-optional- filtering.
>>
>And while I see your point I still ask, "But why?" To make this work the
>way it should work you will need a special table of normally disallowed
>commands that are allowed for each specific device on the IDE buses
>within a machine.
>
[...]
>It might be a good exercise, Jeff, to define the filters in such a way
>that potentially harmful commands can be adequately filters while other
>potentially "saving" commands can be allowed even if they differ only in
>parameters. It is also interesting to note that direct userland ATA and
>
I'm afraid I may have been confusing to the casual reader, because the
current thread of discussion mutated from talking about (among other
things) implementation details of the existing ATA raw command "filter"
and the existing interface for issuing raw ATA commands.
You can, certainly, make a filter that addresses the issues you list.
The existing filter is mainly a correctness filter -- it ensures that
only known and standard ATA commands are passed down to the actual
devices. It filters out vendor-specific commands as well as potentially
nefarious future commands like COPYPROTECT or somesuch.
So, to summarize my points on the thread so far (as modified by Linus's
responses to me):
1) There should be a raw device command interface (not ATA or SCSI specific)
2) There should be kernel interfaces for the standard cases, so that the
raw device command interface is often not needed
3) There should be capability to optionally install filter the raw
device command interface. The filter is built into the kernel at
compile-time, but can also be disabled at boot time.
Jeff
On Mon, Mar 11, 2002 at 09:37:23PM -0500, Jeff Garzik wrote:
> It serves to encourage openness, nobody is forced to use it, and it
> provides an additional layer of protection for those that choose to use
> it. That is the point.
It doesn't provide any meaningful protection, that's the point.
If you're root/have CAP_SYS_RAWIO, you can bit-bang the interface, you
can patch out the filter from the kernel binary, you can do whatever
pleases you. Don't run evil programs as root in the first place. And
if you want to have finer-grained capabilities for specific
drive-level actions, create an higher-level interface for them which
will guarantee that only safe commands are used because they will be
generated by the kernel in the first place.
Broken security is actually worse than no security. With no security
you at least know what to expect.
The exact same discussion happened with Andre, please refer to it.
OG.
On Mon, 11 Mar 2002, Jeff Garzik wrote:
>
> 1) There should be a raw device command interface (not ATA or SCSI specific)
Well, since the raw commands themselves will be bus-specific, you can't
really avoid THAT part.
The thing I want to be generic is the fact that everybody uses the
"request->cmd[]" array, along with the flags in "request->flags" to
specify what kind of command it is.
We actually have this infrastructure already, it's just not fully used.
> 2) There should be kernel interfaces for the standard cases, so that the
> raw device command interface is often not needed
Absolutely.
> 3) There should be capability to optionally install filter the raw
> device command interface. The filter is built into the kernel at
> compile-time, but can also be disabled at boot time.
This is the part I really don't like.
Thinking like a sysadmin, I want to be able to run programs that I would
not allow my users to run or want to be run accidentally. And I do _not_
want to reboot my kernel just because one of my mirrored disks died, I
hot-replaced it, and I notice that I need to upgrade the firmware on the
thing to make it play nice with the other disks in the array.
See? A setup that either allows everything or nothing is fundamentally
flawed in this kind of situation - suddenly I as a sysadmin cannot do
something without bringing the machine down. Which makes all the hotplug
interfaces useless - or then I as a sysadmin just have to leave a kernel
in place that allows the kinds of raw command accesses that I am so scared
of.
One solution may be to have the whole raw cmd thing as a loadable module,
and then I can make sure that it's not even available on the system so
that I have to do some work to find it, and somebody elses program won't
just know what to do.
But in that case is should be far removed from the IDE driver - it would
just be a module that inserts a raw request on the request queue, and NOT
inside some subsystem driver that I obviously want to have available all
the time.
See?
Linus
Linus Torvalds wrote:
>On Mon, 11 Mar 2002, Jeff Garzik wrote:
>
>>1) There should be a raw device command interface (not ATA or SCSI specific)
>>
>
>Well, since the raw commands themselves will be bus-specific, you can't
>really avoid THAT part.
>
>The thing I want to be generic is the fact that everybody uses the
>"request->cmd[]" array, along with the flags in "request->flags" to
>specify what kind of command it is.
>
>We actually have this infrastructure already, it's just not fully used.
>
I omitted "for userspace" :)
I was referring to having a userspace conduit for submitting requests, a
la the taskfile ioctl now.
>>3) There should be capability to optionally install filter the raw
>>device command interface. The filter is built into the kernel at
>>compile-time, but can also be disabled at boot time.
>>
>
>This is the part I really don't like.
>
>Thinking like a sysadmin, I want to be able to run programs that I would
>not allow my users to run or want to be run accidentally. And I do _not_
>want to reboot my kernel just because one of my mirrored disks died, I
>hot-replaced it, and I notice that I need to upgrade the firmware on the
>thing to make it play nice with the other disks in the array.
>
>See? A setup that either allows everything or nothing is fundamentally
>flawed in this kind of situation - suddenly I as a sysadmin cannot do
>something without bringing the machine down. Which makes all the hotplug
>interfaces useless - or then I as a sysadmin just have to leave a kernel
>in place that allows the kinds of raw command accesses that I am so scared
>of.
>
>One solution may be to have the whole raw cmd thing as a loadable module,
>and then I can make sure that it's not even available on the system so
>that I have to do some work to find it, and somebody elses program won't
>just know what to do.
>
>But in that case is should be far removed from the IDE driver - it would
>just be a module that inserts a raw request on the request queue, and NOT
>inside some subsystem driver that I obviously want to have available all
>the time.
>
I like this solution, it was the one I was thinking of :)
The entire userspace raw cmd ioctl should be a separate module for
precisely the issues you outlined. If they choose, people can compile
that module into the static kernel image, including filter. Or they can
use the module without the filter. Or they can not use the module at
all. Etc.
I do understand the problems stemming from the inflexibility of the
filter that's being discussed, but I disagree that it's a fundamental
flaw. It's an option that some people -shouldn't- choose, for various
reasons. But that's an issue for documentation and system integrators.
Jeff
On Mon, 11 Mar 2002, Linus Torvalds wrote:
>
> One solution may be to have the whole raw cmd thing as a loadable module,
> and then I can make sure that it's not even available on the system so
> that I have to do some work to find it, and somebody elses program won't
> just know what to do.
>
> But in that case is should be far removed from the IDE driver - it would
> just be a module that inserts a raw request on the request queue, and NOT
> inside some subsystem driver that I obviously want to have available all
> the time.
Let me put this proposal in more specific terms and see who hollers..
First, the actual assumptions:
- we should use the request queue, simply because that is the only thing
that serializes access to all controllers - if we do not have any
"sideband", there is no way to create the kind of confusion that we can
create right now.
- we want the approach to be generic, even if the details will end up
being IDE/SCSI/xxx-specific.
- I personally believe that we want to be able to do filtering
independently of the controller driver, ie the filtering is not part of
the driver infrastructure at all, but at a higher level (ie the same
way network filtering has _nothing_ to do with any actual network
drivers, regardless of what bus those drivers are on)
Thus I would suggest against a filter inside the IDE driver, and instead
suggest a loadable module that does
- attach to one or more request queue(s). Notice that you should not have
_one_ module that handles all request queues, because the filter module
obviously has to be different for an ATA disk than for a SCSI disk, and
in fact it might be different for an IBM ATA disk than for a Maxtor ATA
disk, for example.
- the module basically acts the way the SCSI generic driver does right
now, except it acts on a higher level: instead of generating SCSI
requests and feeding them directly to the driver with a scsi_do_req(),
it would generate the command requests and feed it to the request
queue.
In fact, don't think of it as the ATA thing at all: I'm more thinking
along the lines of splitting up "sg.c" into the highlevel command
generator (the biggest part) and a very _small_ part in the scsi
request loop that understands about the generic command interface.
If we can really do what sg.c does now, while at the _same_ time also have
a "ide-generic" module that uses the exact same infrastructure, then I
think I'm happy. Yes, the filtering is bus-specific (because the commands
are bus-specific), but the general approach is common.
Does anybody find any real downsides to this approach or basically trying
to abstract sg.c "upwards" a bit?
Linus
Olivier Galibert wrote:
>On Mon, Mar 11, 2002 at 09:37:23PM -0500, Jeff Garzik wrote:
>
>>It serves to encourage openness, nobody is forced to use it, and it
>>provides an additional layer of protection for those that choose to use
>>it. That is the point.
>>
>
>It doesn't provide any meaningful protection, that's the point.
>
>If you're root/have CAP_SYS_RAWIO, you can bit-bang the interface, you
>can patch out the filter from the kernel binary, you can do whatever
>pleases you. Don't run evil programs as root in the first place. And
>if you want to have finer-grained capabilities for specific
>drive-level actions, create an higher-level interface for them which
>will guarantee that only safe commands are used because they will be
>
Under more restricted domains, root cannot bit-bang the interface.
s/CAP_SYS_RAWIO/CAP_DEVICE_CMD/ for the raw cmd ioctl interface. Have
the OS trap I/O port accesses using SMM mode if you would like, and that
applies to your particular security situation.
The filter is useful for other reasons like correctness, as well.
Jeff
Your proposal sounds 100% ok to me...
For the details of the userspace interface (for both ATA and SCSI), my
idea was to use standard read(2) and write(2).
Any number of programs can open /dev/ata/hda/control or
/dev/scsi/sdc/control. write(2) submits requests, read(2) consumes
command responses, perhaps buffering a bit so that multiple responses
are not lost if userspace is slow.
Maybe it's a cheesy way to avoid ioctl(2), maybe not...
Jeff
On Mon, 11 Mar 2002, Linus Torvalds wrote:
>
> - attach to one or more request queue(s). Notice that you should not have
> _one_ module that handles all request queues, because the filter module
> obviously has to be different for an ATA disk than for a SCSI disk, and
> in fact it might be different for an IBM ATA disk than for a Maxtor ATA
> disk, for example.
Btw, to tie this back to the other IDE thread, namely the suspend/resume
thing, I think things like that should also just push commands down the
request list. In particular, instead of waiting until the handler is NULL,
it should do something like
- create a "sync" request
- do the equivalent of
DECLARE_COMPLETION(wait);
rq->waiting = &wait;
q->elevator.elevator_add_req_fn(q, rq, queue_head);
wait_for_completion(&wait);
which automatically synchronizes with any outstanding requests (simply
by virtue of the elevator knowing not to re-order/merge special requests,
so when the sync command in finished, we know all other commands have
finished too).
Note that this should be the same code as for a shutdown as well - we
should finish with a flush buffers request and wait for it to have
completed.
I'd like a _lot_ of stuff to stop using "do_xxx_command()", and move toa
higher layer so that more code can be shared between different subsystems
(all of these "sync" issues are really completely generic, and should
_not_ be duplicated across drivers or subsystems).
Linus
On Mon Mar 11, 2002 at 10:10:36PM -0500, Jeff Garzik wrote:
> 1) There should be a raw device command interface (not ATA or SCSI specific)
Hmm. If such a generic low-level raw device layer were to be
implemented (presumably as the foundation for the block layer), I
expect the interface would be somthing like the cdrom layer, and
would abstract out all the normal things that raw mass-storage
devices can do.
But the minute such a layer is in place, people will begin going
straight to the sub-low-level raw device layers so they can use
all the exciting new extended features of their XP370000 quantum
storage array which needs the special frob-electrons command to
make it work...
-Erik
--
Erik B. Andersen http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--
On Mon Mar 11, 2002 at 07:58:32PM -0800, Linus Torvalds wrote:
> If we can really do what sg.c does now, while at the _same_ time also have
> a "ide-generic" module that uses the exact same infrastructure, then I
> think I'm happy. Yes, the filtering is bus-specific (because the commands
> are bus-specific), but the general approach is common.
>
> Does anybody find any real downsides to this approach or basically trying
> to abstract sg.c "upwards" a bit?
Essentially, if I understand what you are saying, you are
looking for a uniform low-level mass-storage layer that does all
the normal low-level drive access stuff. Presumably, if done
correctly, this would act as a bus-abstracting foundation upon
which the block layer could be built...
-Erik
--
Erik B. Andersen http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--
Erik Andersen wrote:
>But the minute such a layer is in place, people will begin going
>straight to the sub-low-level raw device layers so they can use
>all the exciting new extended features of their XP370000 quantum
>storage array which needs the special frob-electrons command to
>make it work...
>
SCSI generic has existed for a while now :)
So this is really just catching up. And WRT the filtering stuff, people
are free to use the raw cmd without any filtering at all. Your choice.
If you mean bit-banging, see my reply to Oliver (Olivier?).
Anyway, Linus's current proposal seems both sane and flexible enough.
And the basic infrastructure already exists to shove raw commands onto
the request queue.
Jeff
On Mon, 11 Mar 2002, Jeff Garzik wrote:
>
> For the details of the userspace interface (for both ATA and SCSI), my
> idea was to use standard read(2) and write(2).
I like it, but you have to realize that most, if not all, of the commands
really are not a matter of a read or a write, but a "transaction", which
is a pair of both (and at least in theory you could have transactions that
are more complex than just a "command + result", ie more than just a
write("cmd + outdata")+read("status + indata")).
I agree 100% with the notion of using read/write to do this, but I think
you need to make it very explicit that we _are_ talking about
transactions, and that the file descriptor would act as a "transaction
descriptor" when you do this. The file descriptor is the one that matches
up the write that started the transaction with the read that gets the
status of it - thus allowing multiple concurrent transactions in flight.
And the command has to have some structure, ie it needs to not just be the
low-level command, but also have the information about the "format" of the
transaction (so that the generic layer can build up the correct BIO data
structures for the result, before it actually sees the read itself).
Linus
Linus Torvalds wrote:
>On Mon, 11 Mar 2002, Linus Torvalds wrote:
>
>> - attach to one or more request queue(s). Notice that you should not have
>> _one_ module that handles all request queues, because the filter module
>> obviously has to be different for an ATA disk than for a SCSI disk, and
>> in fact it might be different for an IBM ATA disk than for a Maxtor ATA
>> disk, for example.
>>
>
>Btw, to tie this back to the other IDE thread, namely the suspend/resume
>thing, I think things like that should also just push commands down the
>request list. In particular, instead of waiting until the handler is NULL,
>it should do something like
>
> - create a "sync" request
> - do the equivalent of
>
> DECLARE_COMPLETION(wait);
> rq->waiting = &wait;
> q->elevator.elevator_add_req_fn(q, rq, queue_head);
> wait_for_completion(&wait);
>
>which automatically synchronizes with any outstanding requests (simply
>by virtue of the elevator knowing not to re-order/merge special requests,
>so when the sync command in finished, we know all other commands have
>finished too).
>
Dumb question, why create a separate request?
Why not just have some way to wait for request X (and flag it for
no-merge/barrier treatment, etc.)? bios have end_io callbacks...
Jeff
On Mon, 11 Mar 2002, Erik Andersen wrote:
>
> Essentially, if I understand what you are saying, you are
> looking for a uniform low-level mass-storage layer that does all
> the normal low-level drive access stuff.
Well, yes and no.
It's not really low-level - it's at the same level as the current
"ll_rw_block()" in that it doesn't per se know any real details of any
drivers - it is nothing but a "build this request" layer (with a way of
getting the data back, of course).
So from a driver perspective (or even driver subsystem like IDE or SCSI)
it's actually a high-level request generator.
Now, let me say outright that I have a very specific reason why I'd like
this thing to be done at this level - which really has nothing to do with
Jeff's wishes to create a IDE command parser.
The reason I want to see stuff done _above_ the level of the request queue
is that I really count on the "struct request" as being the abstraction
layer that the current block driver interfaces so sadly lack.
Right now, block drivers are not very well abstracted at all: and this is
a large part of the reason for why we even _have_ things like the SCSI
midlevel layer or the IDE layer in the first place. But both the SCSI
layer and the IDE layer are just fairly messy - with no good way to share
code (no common interfaces) and with very inflexible setups (just look at
how hard it is to do a high-performance SCSI driver, because if you do a
SCSI driver at all you have to do all the interface crap that comes with
the job).
I'm really hoping that some day, if the _only_ interface to the block
driver is "struct request", you can write truly interchangable block
drivers - the same way you can write truly independent network drivers
today. None of this "IDE vs SCSI midlevel layer" crap.
Yes, "struct request" is slightly more complex than just being handed a
packet to send out, but I think that is unavoidable - block devices just
_are_ more complex than sending out a packet on a network. So while it's
not a really simple interface, at least it is a clear border between
drivers. And a SCSI driver would be nothing more than a driver that
accepts at least a subset of the kinds of requests that some scsi module
can push to it.
None of this complex function call interface with common complex
structures that are imposed by hundreds of different drivers that each
have slightly different issues. Just a queue.
"Everything is a stream of bytes" - except in this case the stream just
happens to have enough structure that you can re-order the packets and try
to sort them while they haven't been committed yet.
Linus
On Tue, 12 Mar 2002, Jeff Garzik wrote:
>
> Dumb question, why create a separate request?
Because you need to anyway. Things like shutdown/suspend need to sync the
caches, and that's a command that needs to go down the pipe to the disk.
> Why not just have some way to wait for request X (and flag it for
> no-merge/barrier treatment, etc.)? bios have end_io callbacks...
The bio's are just fragment descriptors, they don't really stand on their
own. A bio needs a request in order to move down to the driver.
The request is the place where you find the actual command - the bio just
contains the fragment data of the command.
Of course, the "just" is a big simplification. Since a block command can
be a chain of hundreds of blocks which each actually have a lifetime of
their own, unlike the network later, the fragments are a lot more
complicated than a "skb_frag_t".
So bio's are complex entities in themselves, and they have a life of their
own. It's just that you cannot send a raw bio to a device - the device
wouldn't know what to do with it.
Linus
From: "Linus Torvalds" <[email protected]>
> > 3) There should be capability to optionally install filter the raw
> > device command interface. The filter is built into the kernel at
> > compile-time, but can also be disabled at boot time.
>
> This is the part I really don't like.
>
> Thinking like a sysadmin, I want to be able to run programs that I would
> not allow my users to run or want to be run accidentally. And I do _not_
> want to reboot my kernel just because one of my mirrored disks died, I
> hot-replaced it, and I notice that I need to upgrade the firmware on the
> thing to make it play nice with the other disks in the array.
>
> See? A setup that either allows everything or nothing is fundamentally
> flawed in this kind of situation - suddenly I as a sysadmin cannot do
> something without bringing the machine down. Which makes all the hotplug
> interfaces useless - or then I as a sysadmin just have to leave a kernel
> in place that allows the kinds of raw command accesses that I am so scared
> of.
Good example. (I even implemented something of that sort for downloading
new firmware for Maxtor SCSI drives back when the 1G Panther was a brandy
spanky new thing. I never distributed it and have no plans of doing so.
Amigans as a rule are not good candidates for having such code on hand. The
average Linux SysAdmin should be a few dB smarter and more careful. I don't
think it would be a good thing to force multiple reboots in the situation
you cite. And it does happen.)
{^_^} Joanne Dow
From: "Jeff Garzik" <[email protected]>
> Linus Torvalds wrote:
...
> >One solution may be to have the whole raw cmd thing as a loadable module,
> >and then I can make sure that it's not even available on the system so
> >that I have to do some work to find it, and somebody elses program won't
> >just know what to do.
> >
> >But in that case is should be far removed from the IDE driver - it would
> >just be a module that inserts a raw request on the request queue, and NOT
> >inside some subsystem driver that I obviously want to have available all
> >the time.
> >
> I like this solution, it was the one I was thinking of :)
It leaves me bemused. You speak of a module to install a raw userspace
IO capability. If that module exists the module interface exists. Would
it have to be the module compiled into the kernel that gets run on that
interface? It looks as wide open as ever, to me.
> The entire userspace raw cmd ioctl should be a separate module for
> precisely the issues you outlined. If they choose, people can compile
> that module into the static kernel image, including filter. Or they can
> use the module without the filter. Or they can not use the module at
> all. Etc.
Of course, this seems to be one of those compromises between high
availability and high security. It should be made clear that this is
the compromise involved when the raw io filter is compiled in or not.
I'm not sure you can build an unfiltered raw IO capability into the
kernel SCSI, IDE, or anything else and maintain system security.
{^_^} Joanne
From: "Jeff Garzik" <[email protected]>
> Your proposal sounds 100% ok to me...
>
> For the details of the userspace interface (for both ATA and SCSI), my
> idea was to use standard read(2) and write(2).
>
> Any number of programs can open /dev/ata/hda/control or
> /dev/scsi/sdc/control. write(2) submits requests, read(2) consumes
> command responses, perhaps buffering a bit so that multiple responses
> are not lost if userspace is slow.
>
> Maybe it's a cheesy way to avoid ioctl(2), maybe not...
Jeff, from a security aspect would it perhaps be better to have the
filter always in place and load rule sets through a rigidly controlled
interface? This gives a control hook for non-Unixoid security model
control over the interface filtering. The filter module would have the
lower level interfaces all opened exclusively so there would be no
paths around the filter. I propose that the rule sets, for each
device's instance of the filter interface, could be changed to include
anything from a null set to forbidding anything past fully controlled
read and write with no raw IO. One specific entity could be allowed
to make the changes and no others. This gives a single interface for
verifying signatures on filter data sets, as well.
{^_^}
On Mon, Mar 11, 2002 at 08:33:42PM -0500, Jeff Garzik wrote:
> Erik Andersen wrote:
>
> >On Mon Mar 11, 2002 at 07:34:26PM -0500, Jeff Garzik wrote:
> >
> >>Reason 1: Standard kernel convention. In other ioctls, we check basic
> >>arguments and return EINVAL when they are wrong, even for privieleged
> >>ioctls.
> >>
> >
> >I have no argument with basic command validation. But take a
> >look at ide_cmd_type_parser(), for example. Do we really need a
> >giant switch statement listing all the allowed commands, just so
> >we can throw back a IDE_DRIVE_TASK_INVALID to user-space if they
> >decide to send down some undocumeted firmware wiping commands?
> >Especially since that giant struct of allowed commands is
> >duplicated in ide_pre_handler_parser() and ide_handler_parser()
> >
> I agree the implementation could be improved.
>
> Your first question is really philosophical. I think that people should
> -not- be able to send undocumented commands through the interface...
> and in this area IMO it pays to be paranoid.
Why? What if I really want to update the firmware is say my CD-RW, and
it doesn't use any of the ATA-spec commands for that? Should it fail?
I think filtering bad requests is OK - validating stuff like
permissions, concurrent access, pointers, whatever, but the kernel
shouldn't have hardcoded tables of commands that are allowed.
The "right solution", if we really need command filtering is an ioctl
and an utility that can fill a 'filter table', which says which commands
are ok for whom and which are not. Like 'INQUIRY can be used by anyone',
'no, I won't send anything related to CPRM to the drive'. Easy to
implement, no huge case statements needed.
> If we wanted to be ultra-super-paranoid, drop the ioctl and taskfile
> parser, and implement the taskfile checks via SMM mode callbacks from
> activity on the IDE ports ;-) That way we know the NSA is not doing
> something sneaky, as well as supporting unlimited SMP bit-banging from
> userland. Can you say ug and non-portable even to a lot of ia32
> platforms. :)
>
> So, the implementation may need improvement, but we do (a) want the
> taskfile ioctl [and one for scsi too], and (b) want to implement some
> amount of mininal sanity checks on the requests.
--
Vojtech Pavlik
SuSE Labs
From: "Erik Andersen" <[email protected]>
> On Mon Mar 11, 2002 at 10:10:36PM -0500, Jeff Garzik wrote:
> > 1) There should be a raw device command interface (not ATA or SCSI specific)
>
> Hmm. If such a generic low-level raw device layer were to be
> implemented (presumably as the foundation for the block layer), I
> expect the interface would be somthing like the cdrom layer, and
> would abstract out all the normal things that raw mass-storage
> devices can do.
>
> But the minute such a layer is in place, people will begin going
> straight to the sub-low-level raw device layers so they can use
> all the exciting new extended features of their XP370000 quantum
> storage array which needs the special frob-electrons command to
> make it work...
Erik, that is precisely the point on which the Amiga discussions of
the Direct SCSI command capability in scsi device drivers fell down.
(Due to historical oddities even IDE device drivers feature the direct
SCSI interface via emulation. It seems like an ultimate horror. However
it has enabled some marvelously generic tools.)
{^_^}
From: "Jeff Garzik" <[email protected]>
> SCSI generic has existed for a while now :)
Now we get to the point. What is the SCSI experience with the
generic interface and filtering, if any?
{^_^}
On Mon, Mar 11, 2002 at 06:19:05PM -0800, Linus Torvalds wrote:
> On Mon, 11 Mar 2002, Jeff Garzik wrote:
> >
> > You have convinced me that unconditional filtering is bad. But I still
> > think people should be provided the option to filter if they so desire.
>
> Hey, choice is always good, except if it adds complexity.
>
> The problem with conditional filtering is that either it is a boot (or
> compile time) option, or it is a dynamic filter.
>
> If its a dynamic filter, and you don't trust root, what _are_ you going to
> trust? The root program you don't trust might as well be turning the
> filtering off because it wants to be "convenient". And since the only
> programs you really want to filter are _exactly_ the kinds of programs
> that want to avoid filtering, you're just hosed.
>
> That's my real beef with this whole idiotic parsing thing. Either it is
> fixed (bad, if you don't know what the commands are for all disks) or it
> is trivially overcome in the name of "convenience" (equally bad, since it
> makes the whole thing pointless).
Well, there are uses for the 'dynamic' filter, and it doesn't add too
much complexity. One could be allowing certain commands to be performed
on certain devices by normal users - eg. CD-burning or whatever without
root privileges (I know we're using ide-scsi for the command access
right now ...), and also protecting the oneself from ACPI and the like.
Because ACPI can do IDE commands and does that in a way interfaceable to
a 'taskfile' kernel ioctl. It'd be nice to know a broken ACPI
implementation can't screw up your drive easily through a kernel driver.
--
Vojtech Pavlik
SuSE Labs
On Mon Mar 11, 2002 at 05:58:41PM -0700, Erik wrote:
> I have no argument with basic command validation. But take a
> look at ide_cmd_type_parser(), for example. Do we really need a
> giant switch statement listing all the allowed commands, just so
> we can throw back a IDE_DRIVE_TASK_INVALID to user-space if they
In looking closer at the ide driver, it turns out that
ide_cmd_type_parser() is _not_ filtering HDIO_DRIVE_TASKFILE
ioctl commands at all. My mistake. There is actually no
filtering at all HDIO_DRIVE_TASKFILE ioctl.
Strangely enough, ide_cmd_type_parser() is filtering commands
issued by the ide driver itself.
-Erik
--
Erik B. Andersen http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--
Hello Vojtech.
I have noticed that the ide-timings.h and ide_modules.h are running
much in aprallel in the purpose they serve. Are the any
chances you could dare to care about propagating the
fairly nice ide-timings.h stuff in favour of
ide_modules.h more.
BTW.> I think some stuff from ide-timings.h just belongs
as generic functions intro ide.c, and right now there is
nobody who you need to work from behind ;-).
So please feel free to do the changes you apparently desired
to do a long time ago...
> Yeah, you can still bit-bang with the current implementation, on that
> capability. Couldn't that be cured with s/CAP_SYS_RAWIO/some new
> CAP_DEVICE_CMD/ for the raw device command interface?
By uploading new firmware you can make the drive return compromised pages
from swap which make an existing RAWIO application do what you want
- so CAP_DEVICE_CMD in the IDE case at least is CAP_SYS_RAWIO
> The current implementation needs to be changed anyway :) From "ATA raw
> command" to "device raw command" at the very least.
That assumes a totally generic interface is good - ATA is rather different
to ATAPI/SCSI so it may be best the device interface reflects that.
Jeff Garzik wrote:
> Linus Torvalds wrote:
>
>> The only common factor here is the "synchronize with other requests" - I
>> feel strongly (much more strongly than any parsing notion) that the raw
>> requests have to be passed down the "struct request" and NOT be done the
>> way they are traditionally done (ie completely outside the request
>> stream,
>> with no synchronization at all with any IO currently in progress).
>>
> agreed
>
>>> I think "future new commands" is total FUD, the idea that some new
>>> command
>>> would come along and be so instantly popular, useful and incompatible
>>> that
>>> all Linux boxes would require it before the next kernel or driver update
>>> is silly at best, and I'm working hard to keep this on a civil plane.
>>>
>>
>> It has nothing to do with "new" commands, and everything to do with
>> "random vendor-specific commands and the vendor-specific tools". Commands
>> that simply should _never_ be parsed in the kernel, because we do not
>> want
>> to care about 10 different vendors 10 different revisions of their
>> firmware having 10 different small random special commands for that
>> particular drive.
>>
>> In particular, a user that upgrades his hardware should never _ever_ have
>> to upgrade his kernel just because some random disk diagnostic tool needs
>> support for a disk that is new and has new diagnostics.
>>
> Are such random vendor-specific commands really that common?
>
> Linus, would it be acceptable to you to include an -optional- filter for
> ATA commands? There is definitely a segment of users that would like to
> firewall their devices, and I think (as crazy as it may sound) that
> notion is a valid one.
If you are *trully paranoid* and want to *fire wall* your device then
the proper way of doing this is to DISABLE those ioctl entierly.
It simple like that. They are not required for regular operation by
concept. Other then this I see no argument here.
Jeff Garzik wrote:
> Linus Torvalds wrote:
>
>>
>> On Mon, 11 Mar 2002, Jeff Garzik wrote:
>>
>>> You have convinced me that unconditional filtering is bad. But I still
>>> think people should be provided the option to filter if they so desire.
>>>
>>
>> Hey, choice is always good, except if it adds complexity.
>>
>> The problem with conditional filtering is that either it is a boot (or
>> compile time) option, or it is a dynamic filter.
>>
> heh, I couldn't have given a better argument against a dynamic filter.
>
> I was actually assuming the filter would be a compile-time option, for
> security's sake. Boot-time option works too.
>
> So, it sounds like you could be sold on an fixed-at-compile-time filter
> that can be disabled at boot :) I know you don't like
> fixed-at-compile-time as you mentioned, but it's my argument there is a
> class of users that definitely do. MandrakeSoft would likely enable the
> filter in the "secure" kernel and disable it in the "normal" kernel, for
> example.
Mandrake Soft should disable the whole sidestepping
ioctl - even the generic one - in a secure kernel.
Filtering them all to -EINVAL is the bast one can do for *security*.
The stuff which is not should be implemented with generic ioctl
with a proper sementics.
Rings a bell?
No matter how you turn it it turns out that the taskfile stuff as it
is is just useless - formal verification inside the kernel can
be done at coding time. Formal verification for vendor specific
stuff can be done in user land. Formal verification for security
stuff doens't make sense, becouse the whole interface doesn't
make sense for security concerns. Formal verification of the
commands for political reasons is just plain naive...
Jeff Garzik wrote:
> It serves to encourage openness, nobody is forced to use it, and it
> provides an additional layer of protection for those that choose to use
> it. That is the point. It's a choice, and you don't have to enable it
> in your kernel. But there seems to be enough demand that it should be
> at least an option.
No there is no real demand out there. There are just people
arguing over it again and agian on the abstract, without proper
actual usage examples. and most of the time without
proper understanding what setuid == 0 is about.
Linus Torvalds wrote:
> On Mon, 11 Mar 2002, Linus Torvalds wrote:
>
>>One solution may be to have the whole raw cmd thing as a loadable module,
>>and then I can make sure that it's not even available on the system so
>>that I have to do some work to find it, and somebody elses program won't
>>just know what to do.
>>
>>But in that case is should be far removed from the IDE driver - it would
>>just be a module that inserts a raw request on the request queue, and NOT
>>inside some subsystem driver that I obviously want to have available all
>>the time.
>>
>
> Let me put this proposal in more specific terms and see who hollers..
>
> First, the actual assumptions:
> - we should use the request queue, simply because that is the only thing
> that serializes access to all controllers - if we do not have any
> "sideband", there is no way to create the kind of confusion that we can
> create right now.
Amen to this.
> - we want the approach to be generic, even if the details will end up
> being IDE/SCSI/xxx-specific.
Could be done but seems to be very optimistic about the actual
detail. However stuff like flush/power up/power down/operate silent/operate fast
should be indeed generic.
> - I personally believe that we want to be able to do filtering
> independently of the controller driver, ie the filtering is not part of
> the driver infrastructure at all, but at a higher level (ie the same
> way network filtering has _nothing_ to do with any actual network
> drivers, regardless of what bus those drivers are on)
>
> Thus I would suggest against a filter inside the IDE driver, and instead
> suggest a loadable module that does
>
> - attach to one or more request queue(s). Notice that you should not have
> _one_ module that handles all request queues, because the filter module
> obviously has to be different for an ATA disk than for a SCSI disk, and
> in fact it might be different for an IBM ATA disk than for a Maxtor ATA
> disk, for example.
>
> - the module basically acts the way the SCSI generic driver does right
> now, except it acts on a higher level: instead of generating SCSI
> requests and feeding them directly to the driver with a scsi_do_req(),
> it would generate the command requests and feed it to the request
> queue.
>
> In fact, don't think of it as the ATA thing at all: I'm more thinking
> along the lines of splitting up "sg.c" into the highlevel command
> generator (the biggest part) and a very _small_ part in the scsi
> request loop that understands about the generic command interface.
The last part could make the few above possible. But the layering problems
in ide are currently much preventig the most trivial kind of validation.
See for example problems in asynchornous commands, which turned out recently.
Linus Torvalds wrote:
> On Mon, 11 Mar 2002, Linus Torvalds wrote:
>
>> - attach to one or more request queue(s). Notice that you should not have
>> _one_ module that handles all request queues, because the filter module
>> obviously has to be different for an ATA disk than for a SCSI disk, and
>> in fact it might be different for an IBM ATA disk than for a Maxtor ATA
>> disk, for example.
>>
>
> Btw, to tie this back to the other IDE thread, namely the suspend/resume
> thing, I think things like that should also just push commands down the
> request list. In particular, instead of waiting until the handler is NULL,
> it should do something like
>
> - create a "sync" request
> - do the equivalent of
>
> DECLARE_COMPLETION(wait);
> rq->waiting = &wait;
> q->elevator.elevator_add_req_fn(q, rq, queue_head);
> wait_for_completion(&wait);
>
> which automatically synchronizes with any outstanding requests (simply
> by virtue of the elevator knowing not to re-order/merge special requests,
> so when the sync command in finished, we know all other commands have
> finished too).
>
> Note that this should be the same code as for a shutdown as well - we
> should finish with a flush buffers request and wait for it to have
> completed.
>
> I'd like a _lot_ of stuff to stop using "do_xxx_command()", and move toa
> higher layer so that more code can be shared between different subsystems
> (all of these "sync" issues are really completely generic, and should
> _not_ be duplicated across drivers or subsystems).
>
"Amen Brother" to this. Getting device id strings and others are in the
same gang.
BTW.> Would you dare to care to splow out a 2.5.7-pre1 just to allow
synchronization without bitkeeper?
J. Dow wrote:
> From: "Jeff Garzik" <[email protected]>
>
>>Your proposal sounds 100% ok to me...
>>
>>For the details of the userspace interface (for both ATA and SCSI), my
>>idea was to use standard read(2) and write(2).
>>
>>Any number of programs can open /dev/ata/hda/control or
>>/dev/scsi/sdc/control. write(2) submits requests, read(2) consumes
>>command responses, perhaps buffering a bit so that multiple responses
>>are not lost if userspace is slow.
>>
>>Maybe it's a cheesy way to avoid ioctl(2), maybe not...
>>
>
> Jeff, from a security aspect would it perhaps be better to have the
> filter always in place and load rule sets through a rigidly controlled
> interface?
You are overdesigning by a broad margin. From a security
point of view (I mean the paranoid one) the whole raw interface whatever
filtered or not should *just not be there*.
>> Linus, would it be acceptable to you to include an -optional- filter for
>> ATA commands? There is definitely a segment of users that would like to
>> firewall their devices, and I think (as crazy as it may sound) that
>> notion is a valid one.
>
>If you are *trully paranoid* and want to *fire wall* your device then
>the proper way of doing this is to DISABLE those ioctl entierly.
>It simple like that. They are not required for regular operation by
>concept. Other then this I see no argument here.
Well; I don't agree. Firewall via a loadable module can make sense if the
actual module does a kind of "filter all but a specified known set of safe
commands" for example, like retreiving SMART infos, or such things.
Ben.
On Tue, Mar 12, 2002 at 12:00:24PM +0100, Martin Dalecki wrote:
> Hello Vojtech.
>
> I have noticed that the ide-timings.h and ide_modules.h are running
> much in aprallel in the purpose they serve. Are the any
> chances you could dare to care about propagating the
> fairly nice ide-timings.h stuff in favour of
> ide_modules.h more.
>
> BTW.> I think some stuff from ide-timings.h just belongs
> as generic functions intro ide.c, and right now there is
> nobody who you need to work from behind ;-).
>
> So please feel free to do the changes you apparently desired
> to do a long time ago...
Hmm, ok. Try this. It shouldn't change any functionality, yet makes a
small step towards cleaning the chipset specific drivers.
Reading through them as I was doing the changes, I found out that most
of them compute the timings incorrectly. Because of that I also removed
the pio blacklist (which is going to come back in a more powerful form,
merged together with the DMA blacklist), because that one is based on
ancient experiments with the broken CMD640 chip and a driver which
doesn't get the timings correct either. The blacklist is plain invalid.
I plan to focus on the most important drivers first, to fix and clean
them, working with the authors where possible.
--
Vojtech Pavlik
SuSE Labs
Vojtech Pavlik wrote:
> On Tue, Mar 12, 2002 at 12:00:24PM +0100, Martin Dalecki wrote:
>
>
>>Hello Vojtech.
>>
>>I have noticed that the ide-timings.h and ide_modules.h are running
>>much in aprallel in the purpose they serve. Are the any
>>chances you could dare to care about propagating the
>>fairly nice ide-timings.h stuff in favour of
>>ide_modules.h more.
>>
>>BTW.> I think some stuff from ide-timings.h just belongs
>>as generic functions intro ide.c, and right now there is
>>nobody who you need to work from behind ;-).
>>
>>So please feel free to do the changes you apparently desired
>>to do a long time ago...
>>
>
> Hmm, ok. Try this. It shouldn't change any functionality, yet makes a
> small step towards cleaning the chipset specific drivers.
>
> Reading through them as I was doing the changes, I found out that most
> of them compute the timings incorrectly. Because of that I also removed
> the pio blacklist (which is going to come back in a more powerful form,
> merged together with the DMA blacklist), because that one is based on
> ancient experiments with the broken CMD640 chip and a driver which
> doesn't get the timings correct either. The blacklist is plain invalid.
Amen to this. May "the force" be with you! (I mean the force in you fingers!)
AS you may know I was once (an eon ago)
during the Marc Lord "era" involved in the initial developement of the cmd640
support. And well we got it working, but after that some friend got to the idea
of the black list and my disk went from georgious 5M/sec to only lame 2.8M/sec
rates (remember it was a conner 400MB drive then one of those "buggy" Quantums!)
for no good reason. I was long time patching every single kernel those time for
this. So if anything I very well know that the list found there is both:
obsolete and invalid. Further on my CMD640 code wasn't even trying to compute
the timing values in any dynamic ways. I was just using the original tables from
CMD directly, but unfortunately the maintainer enjoyed Z/ ring arithmetics too
much ;-)
> I plan to focus on the most important drivers first, to fix and clean
> them, working with the authors where possible.
PIIX na VIA comes to mind first ;-)...
Vojtech Pavlik wrote:
> On Tue, Mar 12, 2002 at 12:00:24PM +0100, Martin Dalecki wrote:
>
>
>>Hello Vojtech.
>>
>>I have noticed that the ide-timings.h and ide_modules.h are running
>>much in aprallel in the purpose they serve. Are the any
>>chances you could dare to care about propagating the
>>fairly nice ide-timings.h stuff in favour of
>>ide_modules.h more.
>>
>>BTW.> I think some stuff from ide-timings.h just belongs
>>as generic functions intro ide.c, and right now there is
>>nobody who you need to work from behind ;-).
>>
>>So please feel free to do the changes you apparently desired
>>to do a long time ago...
>>
>
> Hmm, ok. Try this. It shouldn't change any functionality, yet makes a
> small step towards cleaning the chipset specific drivers.
OK the patch looks fine. Taken. Still I have some notes:
1. Let's start calling stuff ATA and not IDE. (AT-Attachment is it
and not just Integrated Device Electornics.) OK?
2. I quite don't like the nested #include directives in ide-timing.h.
It's cleaner to include the needed headers in front of usage
of ide-timing.h. (Just s small note.... not really important...)
3. I wellcome that the MIN MAX macros there are gone. In fact
I have yerstoday just done basically the same ;-). (Will just have to
revert it now.
Patch swallowed.
On Tue, Mar 12, 2002 at 05:11:13PM +0100, Martin Dalecki wrote:
> > Reading through them as I was doing the changes, I found out that most
> > of them compute the timings incorrectly. Because of that I also removed
> > the pio blacklist (which is going to come back in a more powerful form,
> > merged together with the DMA blacklist), because that one is based on
> > ancient experiments with the broken CMD640 chip and a driver which
> > doesn't get the timings correct either. The blacklist is plain invalid.
>
> Amen to this. May "the force" be with you! (I mean the force in you fingers!)
>
> AS you may know I was once (an eon ago)
> during the Marc Lord "era" involved in the initial developement of the cmd640
> support. And well we got it working, but after that some friend got to the idea
> of the black list and my disk went from georgious 5M/sec to only lame 2.8M/sec
> rates (remember it was a conner 400MB drive then one of those "buggy" Quantums!)
> for no good reason. I was long time patching every single kernel those time for
> this. So if anything I very well know that the list found there is both:
> obsolete and invalid. Further on my CMD640 code wasn't even trying to compute
> the timing values in any dynamic ways. I was just using the original tables from
> CMD directly, but unfortunately the maintainer enjoyed Z/ ring arithmetics too
> much ;-)
Well, as much as I'd like to use safe pre-computed register values for
the chips, that ain't possible - even when we assumed the system bus
(PCI, VLB, whatever) was always 33 MHz, still the drives have various
ideas about what DMA and PIO modes should look like, see the tDMA and
tPIO entries in hdparm -t.
So, arithmetics has to stay. Hopefully just one instance in
ide-timing.c.
> > I plan to focus on the most important drivers first, to fix and clean
> > them, working with the authors where possible.
>
> PIIX na VIA comes to mind first ;-)...
VIA is already OK, well, it has my name in it. :) AMD is now also (well,
that one wasn't broken, just ugly), SiS is being revamped by Lionel
Bouton (whom I'm trying to help as much as I can), so yes, PIIX would be
next.
PIIX and ICH are pretty crazy hardware from the design perspective, very
legacy-bound back to the first Intel PIIX chip. And the driver for these
in the kernel has similarly evolved following the hardware. However, it
doesn't seem to be wrong at the first glance. Nevertheless, I'll take a
look at it. Unfortunately, I don't have any Intel hardware at hand to
test it with.
--
Vojtech Pavlik
SuSE Labs
Vojtech Pavlik wrote:
> Well, as much as I'd like to use safe pre-computed register values for
> the chips, that ain't possible - even when we assumed the system bus
> (PCI, VLB, whatever) was always 33 MHz, still the drives have various
> ideas about what DMA and PIO modes should look like, see the tDMA and
> tPIO entries in hdparm -t.
Yes yes yes of course some of the drivers are confused. And I don't
argue that precomputation is adequate right now. It just wasn't for
the CMD640 those times... I only wanted to reffer to history and
why my timings where different then the computed.
> So, arithmetics has to stay. Hopefully just one instance in
> ide-timing.c.
>
>
>>>I plan to focus on the most important drivers first, to fix and clean
>>>them, working with the authors where possible.
>>>
>>PIIX na VIA comes to mind first ;-)...
>>
>
> VIA is already OK, well, it has my name in it. :) AMD is now also (well,
Oh of course I was reffering to VIA as important.
> that one wasn't broken, just ugly), SiS is being revamped by Lionel
> Bouton (whom I'm trying to help as much as I can), so yes, PIIX would be
> next.
I swallowed his SiS stuff already, since my home main machine is
a SiS735 based board. (Awfoul cheap that thing and quite good price/performance
ratio ;-).
> PIIX and ICH are pretty crazy hardware from the design perspective, very
> legacy-bound back to the first Intel PIIX chip. And the driver for these
Yes I "love" the bound together DMA areas as well ;-).
> in the kernel has similarly evolved following the hardware. However, it
> doesn't seem to be wrong at the first glance. Nevertheless, I'll take a
> look at it. Unfortunately, I don't have any Intel hardware at hand to
> test it with.
Just another hint: dmaproc is silly nomenclature is should be
dma_strategy <- this would better reflect it's purpose.
On Tue, Mar 12, 2002 at 05:17:35PM +0100, Martin Dalecki wrote:
> Vojtech Pavlik wrote:
> > On Tue, Mar 12, 2002 at 12:00:24PM +0100, Martin Dalecki wrote:
> >
> >
> >>Hello Vojtech.
> >>
> >>I have noticed that the ide-timings.h and ide_modules.h are running
> >>much in aprallel in the purpose they serve. Are the any
> >>chances you could dare to care about propagating the
> >>fairly nice ide-timings.h stuff in favour of
> >>ide_modules.h more.
> >>
> >>BTW.> I think some stuff from ide-timings.h just belongs
> >>as generic functions intro ide.c, and right now there is
> >>nobody who you need to work from behind ;-).
> >>
> >>So please feel free to do the changes you apparently desired
> >>to do a long time ago...
> >>
> >
> > Hmm, ok. Try this. It shouldn't change any functionality, yet makes a
> > small step towards cleaning the chipset specific drivers.
>
>
> OK the patch looks fine. Taken. Still I have some notes:
>
> 1. Let's start calling stuff ATA and not IDE. (AT-Attachment is it
> and not just Integrated Device Electornics.) OK?
Sure. Feel free to rename whatever file/struct/variable you want. In my
opinion, it's just not worth caring about whether we call the stuff ATA
or IDE, both are TLAs with a long history. (Hmm, ATA probably means
Advanced Technology Attachment, right?)
But for new stuff, I'll try to stick with the 'ata' name.
> 2. I quite don't like the nested #include directives in ide-timing.h.
> It's cleaner to include the needed headers in front of usage
> of ide-timing.h. (Just s small note.... not really important...)
I can change that, or you can. I think the hdreg.h one is reasonable,
while the ide.h can probably go without any significant trouble. I think
the only file that'll need adding #include <ide.h> will be ide-timing.c
> 3. I wellcome that the MIN MAX macros there are gone. In fact
> I have yerstoday just done basically the same ;-). (Will just have to
> revert it now.
>
> Patch swallowed.
--
Vojtech Pavlik
SuSE Labs
Vojtech Pavlik wrote:
>>OK the patch looks fine. Taken. Still I have some notes:
>>
>>1. Let's start calling stuff ATA and not IDE. (AT-Attachment is it
>>and not just Integrated Device Electornics.) OK?
>>
>
> Sure. Feel free to rename whatever file/struct/variable you want. In my
> opinion, it's just not worth caring about whether we call the stuff ATA
> or IDE, both are TLAs with a long history. (Hmm, ATA probably means
> Advanced Technology Attachment, right?)
Basically I would like to try to follow the FreeBSD nomenclature as much as
possible, becouse:
1. I started doing Linux, becouse I like UNIX ;-).
2. It helps (in some distant time) to look at both side by side.
3. It helps to see what has changed "recently".
4. "Learning from the Soviet Union is learning about victory"... just kidding
I'm. But you are from .cz so you already know ;-).
On Tue, Mar 12, 2002 at 05:26:20PM +0100, Martin Dalecki wrote:
> Vojtech Pavlik wrote:
>
>
> > Well, as much as I'd like to use safe pre-computed register values for
> > the chips, that ain't possible - even when we assumed the system bus
> > (PCI, VLB, whatever) was always 33 MHz, still the drives have various
> > ideas about what DMA and PIO modes should look like, see the tDMA and
> > tPIO entries in hdparm -t.
>
> Yes yes yes of course some of the drivers are confused. And I don't
> argue that precomputation is adequate right now. It just wasn't for
> the CMD640 those times... I only wanted to reffer to history and
> why my timings where different then the computed.
We may want to compare your original timings to what ide-timing.[ch]
will compute ...
> > So, arithmetics has to stay. Hopefully just one instance in
> > ide-timing.c.
> >
> >
> >>>I plan to focus on the most important drivers first, to fix and clean
> >>>them, working with the authors where possible.
> >>>
> >>PIIX na VIA comes to mind first ;-)...
> >>
> >
> > VIA is already OK, well, it has my name in it. :) AMD is now also (well,
>
> Oh of course I was reffering to VIA as important.
>
> > that one wasn't broken, just ugly), SiS is being revamped by Lionel
> > Bouton (whom I'm trying to help as much as I can), so yes, PIIX would be
> > next.
>
> I swallowed his SiS stuff already, since my home main machine is
> a SiS735 based board. (Awfoul cheap that thing and quite good price/performance
> ratio ;-).
As you can guess, I have mostly VIA machines around here. One SiS, too,
but that's an embedded board.
> > PIIX and ICH are pretty crazy hardware from the design perspective, very
> > legacy-bound back to the first Intel PIIX chip. And the driver for these
>
> Yes I "love" the bound together DMA areas as well ;-).
And the SIDETIM register is just a kyooot idea. :)
> > in the kernel has similarly evolved following the hardware. However, it
> > doesn't seem to be wrong at the first glance. Nevertheless, I'll take a
> > look at it. Unfortunately, I don't have any Intel hardware at hand to
> > test it with.
>
> Just another hint: dmaproc is silly nomenclature is should be
> dma_strategy <- this would better reflect it's purpose.
Dmaproc? Well that's a do-everything-possible-with-default-fallbacks
function. Quite an interesting approach, though I'd be much happier
without it.
Btw, do you know that the chipset and drives are tuned twice each boot?
First they're configured to PIO and half a second later to DMA (or PIO)
again ...
--
Vojtech Pavlik
SuSE Labs
On Mon, 11 Mar 2002, Linus Torvalds wrote:
>
>
> On Mon, 11 Mar 2002, Bill Davidsen wrote:
> >
> > I think you might want to offer an opinion (or edict, mandate, whatever)
> > WRT taskfile. While I think that some protective editing of commands is
> > desirable, useful, etc, I'm damn sure that some form of raw access is
> > required long term to allow diagmostics. I wouldn't argue if that
> > interface was required for low level format and other really drastic
> > operations as well.
>
> Quite frankly, my personal opinion is that there's no point in trying to
> parse the commands too much at all. The _only_ thing the kernel should try
> to care about is that the buffers passed to the command are valid (ie the
> actual data in/out area pointer should be a kernel buffer as far as
> possible, not under user control).
>
> Basically, there are three "modes" of operation here:
>
> - regular kernel access (ie the normal read/write stuff)
>
> parsing: none. The kernel originated the request.
>
> - common ioctl's that have nothing to do with IDE per se (ie all the
> stuff to open/close/lock removable media, unlocking DVD's, reading
> sound sectors etc - stuff that really exists elsewhere too, and where
> the kernel should do the translation from the generic problem space to
> the specific commands for the bus in quesion)
>
> Parsing: common ioctl turning stuff into common SCSI commands in the
> "struct request".
This next part is the one I believe should be a compile/boot option, since
it is the one which provides full access to the device.
> - totally IDE-only stuff, where the kernel simply doesn't add any value,
> and only has a way of passing it down to the drive, and passing back
> the requests.
>
> Parsing: none. The kernel simply doesn't have anything to do with the
> request, except to synchronize it with all other requests.
[observations on sync and queue issues understood and snipped]
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
On Mon, 11 Mar 2002, Erik Andersen wrote:
> On Mon Mar 11, 2002 at 10:10:36PM -0500, Jeff Garzik wrote:
> > 1) There should be a raw device command interface (not ATA or SCSI specific)
>
> Hmm. If such a generic low-level raw device layer were to be
> implemented (presumably as the foundation for the block layer), I
> expect the interface would be somthing like the cdrom layer, and
> would abstract out all the normal things that raw mass-storage
> devices can do.
>
> But the minute such a layer is in place, people will begin going
> straight to the sub-low-level raw device layers so they can use
> all the exciting new extended features of their XP370000 quantum
> storage array which needs the special frob-electrons command to
> make it work...
Given that this has not happened with sg, nor with people using the
sg+ide-scsi, I think you would have to provide a good reason why this
would happen. The most likely path would be a separate driver or enhanced
IDE driver patch. User programs generally do read, write and seek, not
ioctl level hacking.
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
Hi,
On Tue, 12 Mar 2002 17:21:34 +0100
Vojtech Pavlik <[email protected]> wrote:
> VIA is already OK, well, it has my name in it. :) AMD is now also (well,
> that one wasn't broken, just ugly), SiS is being revamped by Lionel
> Bouton (whom I'm trying to help as much as I can), so yes, PIIX would be
> next.
>
> PIIX and ICH are pretty crazy hardware from the design perspective, very
> legacy-bound back to the first Intel PIIX chip. And the driver for these
> in the kernel has similarly evolved following the hardware. However, it
> doesn't seem to be wrong at the first glance. Nevertheless, I'll take a
> look at it. Unfortunately, I don't have any Intel hardware at hand to
> test it with.
I have one Intel Corp. 82371AB PIIX4 IDE (rev 01) here and I can test it for you.
This machine isn't very important so it doesn't matter if some data gets shredded ;)
Just send me the patch when you've finished it.
Bye
On Mon, 11 Mar 2002, Jeff Garzik wrote:
> And IMO, we should have some basic validation of taskfile requests in
> the kernel...
> Reason 1: Standard kernel convention. In other ioctls, we check basic
> arguments and return EINVAL when they are wrong, even for privieleged
> ioctls.
If we assume that this path is for commands the kernel doesn't understand,
how do we validate them?
> Reason 2: If you have multiple programs issuing ATA commands, you would
> want a decent amount of synchronization, provided by the kernel, for the
> multiple user processes and multiple kernel processes issuing requests.
> Having the userspace commands come down a single spot in the kernel
> code makes this job a lot easier, if not making the impossible possible :)
Linus addressed this, I agree with his proposed three part implementation.
AFAIK he said the same thing in a different way.
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
Vojtech Pavlik wrote:
> On Tue, Mar 12, 2002 at 05:26:20PM +0100, Martin Dalecki wrote:
>
>>Vojtech Pavlik wrote:
>>
>>
>>
>>>Well, as much as I'd like to use safe pre-computed register values for
>>>the chips, that ain't possible - even when we assumed the system bus
>>>(PCI, VLB, whatever) was always 33 MHz, still the drives have various
>>>ideas about what DMA and PIO modes should look like, see the tDMA and
>>>tPIO entries in hdparm -t.
>>>
>>Yes yes yes of course some of the drivers are confused. And I don't
>>argue that precomputation is adequate right now. It just wasn't for
>>the CMD640 those times... I only wanted to reffer to history and
>>why my timings where different then the computed.
>>
>
> We may want to compare your original timings to what ide-timing.[ch]
> will compute ...
>
>
>>>So, arithmetics has to stay. Hopefully just one instance in
>>>ide-timing.c.
>>>
>>>
>>>
>>>>>I plan to focus on the most important drivers first, to fix and clean
>>>>>them, working with the authors where possible.
>>>>>
>>>>>
>>>>PIIX na VIA comes to mind first ;-)...
>>>>
>>>>
>>>VIA is already OK, well, it has my name in it. :) AMD is now also (well,
>>>
>>Oh of course I was reffering to VIA as important.
>>
>>
>>>that one wasn't broken, just ugly), SiS is being revamped by Lionel
>>>Bouton (whom I'm trying to help as much as I can), so yes, PIIX would be
>>>next.
>>>
>>I swallowed his SiS stuff already, since my home main machine is
>>a SiS735 based board. (Awfoul cheap that thing and quite good price/performance
>>ratio ;-).
>>
>
> As you can guess, I have mostly VIA machines around here. One SiS, too,
> but that's an embedded board.
>
>
>>>PIIX and ICH are pretty crazy hardware from the design perspective, very
>>>legacy-bound back to the first Intel PIIX chip. And the driver for these
>>>
>>Yes I "love" the bound together DMA areas as well ;-).
>>
>
> And the SIDETIM register is just a kyooot idea. :)
>
>
>>>in the kernel has similarly evolved following the hardware. However, it
>>>doesn't seem to be wrong at the first glance. Nevertheless, I'll take a
>>>look at it. Unfortunately, I don't have any Intel hardware at hand to
>>>test it with.
>>>
>>Just another hint: dmaproc is silly nomenclature is should be
>>dma_strategy <- this would better reflect it's purpose.
>>
>
> Dmaproc? Well that's a do-everything-possible-with-default-fallbacks
> function. Quite an interesting approach, though I'd be much happier
> without it.
Yeep. As an added bonus it's error prone, if some implementation of this
method forgets to cut and paste the fallback code.
You noticed the way I made at least the file_operations dispatcher
looking "opaque" recently? Most visible was the vanishment of
dome
if (ban->bang == NULL)
bang = default1
if (ban->bang == NULL)
bang = default1
if (ban->bang == NULL)
bang = default1
if (ban->bang == NULL)
bang = default1
if (ban->bang == NULL)
bang = default1
if (ban->bang == NULL)
bang = default1
....
orgy upon "disk" initialization...
> Btw, do you know that the chipset and drives are tuned twice each boot?
> First they're configured to PIO and half a second later to DMA (or PIO)
> again ...
Yes I know ;-). I have just removed some other twice callers for
other stuff. Do you know for example that I'm about 90% sure that
cleanup hooks for the device category drivers can be called twice.
Or that adding just a single drive reiterates every one on the system?
First I will try to change the way disks get iterated. This is
becouse disks are the main object in the whole driver and not
the host chip channels... Look for the plenty loops with
MAX_HWIFS as upper bound to see what I mean.
"Tune" is a silly name as well BTW. It has nothing to do with
attaching of carbon fiber to cars. It's host chip initialization, which
is going on.
Vojtech Pavlik wrote:
> On Tue, Mar 12, 2002 at 05:26:20PM +0100, Martin Dalecki wrote:
>
>>Vojtech Pavlik wrote:
>>
>>
>>
>>>Well, as much as I'd like to use safe pre-computed register values for
>>>the chips, that ain't possible - even when we assumed the system bus
>>>(PCI, VLB, whatever) was always 33 MHz, still the drives have various
>>>ideas about what DMA and PIO modes should look like, see the tDMA and
>>>tPIO entries in hdparm -t.
>>>
>>Yes yes yes of course some of the drivers are confused. And I don't
>>argue that precomputation is adequate right now. It just wasn't for
>>the CMD640 those times... I only wanted to reffer to history and
>>why my timings where different then the computed.
>>
>
> We may want to compare your original timings to what ide-timing.[ch]
> will compute ...
Unfortunately there is no chance. I have abondony this board quite
happy a long time ago... It was an 486 and I don't keep old
shread around. Sorry I just don't have it at hand anylonger.
On Tue, Mar 12, 2002 at 05:43:17PM +0100, Martin Dalecki wrote:
> Vojtech Pavlik wrote:
> > On Tue, Mar 12, 2002 at 05:26:20PM +0100, Martin Dalecki wrote:
> >
> >>Vojtech Pavlik wrote:
> >>
> >>
> >>
> >>>Well, as much as I'd like to use safe pre-computed register values for
> >>>the chips, that ain't possible - even when we assumed the system bus
> >>>(PCI, VLB, whatever) was always 33 MHz, still the drives have various
> >>>ideas about what DMA and PIO modes should look like, see the tDMA and
> >>>tPIO entries in hdparm -t.
> >>>
> >>Yes yes yes of course some of the drivers are confused. And I don't
> >>argue that precomputation is adequate right now. It just wasn't for
> >>the CMD640 those times... I only wanted to reffer to history and
> >>why my timings where different then the computed.
> >>
> >
> > We may want to compare your original timings to what ide-timing.[ch]
> > will compute ...
>
> Unfortunately there is no chance. I have abondony this board quite
> happy a long time ago... It was an 486 and I don't keep old
> shread around. Sorry I just don't have it at hand anylonger.
You may happen to have the numbers, though - that should be enough.
Btw, I have a CMD640B based PCI card lying around here, but never
managed to get it generate any interrupts, though the rest seems to be
working.
--
Vojtech Pavlik
SuSE Labs
Vojtech Pavlik wrote:
> On Tue, Mar 12, 2002 at 05:43:17PM +0100, Martin Dalecki wrote:
>
>>Vojtech Pavlik wrote:
>>
>>>On Tue, Mar 12, 2002 at 05:26:20PM +0100, Martin Dalecki wrote:
>>>
>>>
>>>>Vojtech Pavlik wrote:
>>>>
>>>>
>>>>
>>>>
>>>>>Well, as much as I'd like to use safe pre-computed register values for
>>>>>the chips, that ain't possible - even when we assumed the system bus
>>>>>(PCI, VLB, whatever) was always 33 MHz, still the drives have various
>>>>>ideas about what DMA and PIO modes should look like, see the tDMA and
>>>>>tPIO entries in hdparm -t.
>>>>>
>>>>>
>>>>Yes yes yes of course some of the drivers are confused. And I don't
>>>>argue that precomputation is adequate right now. It just wasn't for
>>>>the CMD640 those times... I only wanted to reffer to history and
>>>>why my timings where different then the computed.
>>>>
>>>>
>>>We may want to compare your original timings to what ide-timing.[ch]
>>>will compute ...
>>>
>>Unfortunately there is no chance. I have abondony this board quite
>>happy a long time ago... It was an 486 and I don't keep old
>>shread around. Sorry I just don't have it at hand anylonger.
>>
>
> You may happen to have the numbers, though - that should be enough.
>
> Btw, I have a CMD640B based PCI card lying around here, but never
> managed to get it generate any interrupts, though the rest seems to be
> working.
I remember that I have found them in some readme attached to the
original dos drivers from CMD. Maybe it's still availabe on the net but
really I don't have it anylongeranywhere archieved. And BTW.> The CMD640B was
the variant which wasn't physically broken, as the CMD640 which had
improper bus termination of the secondary channel.
On Tue, Mar 12, 2002 at 12:00:24PM +0100, Martin Dalecki wrote:
> Hello Vojtech.
>
> I have noticed that the ide-timings.h and ide_modules.h are running
> much in aprallel in the purpose they serve. Are the any
> chances you could dare to care about propagating the
> fairly nice ide-timings.h stuff in favour of
> ide_modules.h more.
>
> BTW.> I think some stuff from ide-timings.h just belongs
> as generic functions intro ide.c, and right now there is
> nobody who you need to work from behind ;-).
>
> So please feel free to do the changes you apparently desired
> to do a long time ago...
Oh, by the way, here goes the PIIX rewrite ... unlike the AMD one, this
is completely untested, and may not work at all - I only have the
datasheets at hand, no PIIX hardware.
Differences from the previous PIIX driver:
* 82451NX MIOC isn't supported anymore. It's not an ATA controller, anyway ;)
* 82371FB_0 PIIX ISA bridge isn't an ATA controller either.
* 82801CA ICH3 support added. Only ICH3-M is supported by the original driver.
* 82371MX MPIIX is not supported anymore. Too weird beast and doesn't do
DMA anyway, better handled by the generic PCI ATA routines.
* Cleaner, converted to ide-timing.[ch]
* May not work. ;)
--
Vojtech Pavlik
SuSE Labs
Jeff Garzik wrote:
> Linus, would it be acceptable to you to include an -optional- filter for
> ATA commands? There is definitely a segment of users that would like to
> firewall their devices, and I think (as crazy as it may sound) that
> notion is a valid one.
A perfect filter exists already: it's called "non-root user" :-)
On Tue, 12 Mar 2002 21:00:35 +0100
Vojtech Pavlik <[email protected]> wrote:
> On Tue, Mar 12, 2002 at 12:00:24PM +0100, Martin Dalecki wrote:
> > Hello Vojtech.
> >
> > I have noticed that the ide-timings.h and ide_modules.h are running
> > much in aprallel in the purpose they serve. Are the any
> > chances you could dare to care about propagating the
> > fairly nice ide-timings.h stuff in favour of
> > ide_modules.h more.
> >
> > BTW.> I think some stuff from ide-timings.h just belongs
> > as generic functions intro ide.c, and right now there is
> > nobody who you need to work from behind ;-).
> >
> > So please feel free to do the changes you apparently desired
> > to do a long time ago...
>
> Oh, by the way, here goes the PIIX rewrite ... unlike the AMD one, this
> is completely untested, and may not work at all - I only have the
> datasheets at hand, no PIIX hardware.
>
> Differences from the previous PIIX driver:
>
> * 82451NX MIOC isn't supported anymore. It's not an ATA controller, anyway ;)
> * 82371FB_0 PIIX ISA bridge isn't an ATA controller either.
> * 82801CA ICH3 support added. Only ICH3-M is supported by the original driver.
> * 82371MX MPIIX is not supported anymore. Too weird beast and doesn't do
> DMA anyway, better handled by the generic PCI ATA routines.
>
> * Cleaner, converted to ide-timing.[ch]
>
> * May not work. ;)
But does work with an Intel Corp. 82371AB PIIX4 IDE (rev 01) IDE controller...
I'll do some more stress testing but it boots, works in DMA and the data transfer rates haven't decreased ;)
*playingwithhdparmanddbench* ;)
Bye
On Tue, Mar 12, 2002 at 09:35:05PM +0100, Sebastian Droege wrote:
> On Tue, 12 Mar 2002 21:00:35 +0100
> Vojtech Pavlik <[email protected]> wrote:
>
> > On Tue, Mar 12, 2002 at 12:00:24PM +0100, Martin Dalecki wrote:
> > > Hello Vojtech.
> > >
> > > I have noticed that the ide-timings.h and ide_modules.h are running
> > > much in aprallel in the purpose they serve. Are the any
> > > chances you could dare to care about propagating the
> > > fairly nice ide-timings.h stuff in favour of
> > > ide_modules.h more.
> > >
> > > BTW.> I think some stuff from ide-timings.h just belongs
> > > as generic functions intro ide.c, and right now there is
> > > nobody who you need to work from behind ;-).
> > >
> > > So please feel free to do the changes you apparently desired
> > > to do a long time ago...
> >
> > Oh, by the way, here goes the PIIX rewrite ... unlike the AMD one, this
> > is completely untested, and may not work at all - I only have the
> > datasheets at hand, no PIIX hardware.
> >
> > Differences from the previous PIIX driver:
> >
> > * 82451NX MIOC isn't supported anymore. It's not an ATA controller, anyway ;)
> > * 82371FB_0 PIIX ISA bridge isn't an ATA controller either.
> > * 82801CA ICH3 support added. Only ICH3-M is supported by the original driver.
> > * 82371MX MPIIX is not supported anymore. Too weird beast and doesn't do
> > DMA anyway, better handled by the generic PCI ATA routines.
> >
> > * Cleaner, converted to ide-timing.[ch]
> >
> > * May not work. ;)
> But does work with an Intel Corp. 82371AB PIIX4 IDE (rev 01) IDE controller...
Thanks a lot for the testing!
> I'll do some more stress testing but it boots, works in DMA and the data transfer rates haven't decreased ;)
> *playingwithhdparmanddbench* ;)
If you can (in addition to the benchmark numbers) also send me the
output of 'cat /proc/ide/piix' and 'dmesg' and 'lspci -vvxxx'both my and
the original version ... that'd help a lot too.
Thanks again,
--
Vojtech Pavlik
SuSE Labs
On Tue, 12 Mar 2002 21:34:28 +0100
Vojtech Pavlik <[email protected]> wrote:
> On Tue, Mar 12, 2002 at 09:35:05PM +0100, Sebastian Droege wrote:
> > On Tue, 12 Mar 2002 21:00:35 +0100
> > Vojtech Pavlik <[email protected]> wrote:
> >
> > > On Tue, Mar 12, 2002 at 12:00:24PM +0100, Martin Dalecki wrote:
> > > > Hello Vojtech.
> > > >
> > > > I have noticed that the ide-timings.h and ide_modules.h are running
> > > > much in aprallel in the purpose they serve. Are the any
> > > > chances you could dare to care about propagating the
> > > > fairly nice ide-timings.h stuff in favour of
> > > > ide_modules.h more.
> > > >
> > > > BTW.> I think some stuff from ide-timings.h just belongs
> > > > as generic functions intro ide.c, and right now there is
> > > > nobody who you need to work from behind ;-).
> > > >
> > > > So please feel free to do the changes you apparently desired
> > > > to do a long time ago...
> > >
> > > Oh, by the way, here goes the PIIX rewrite ... unlike the AMD one, this
> > > is completely untested, and may not work at all - I only have the
> > > datasheets at hand, no PIIX hardware.
> > >
> > > Differences from the previous PIIX driver:
> > >
> > > * 82451NX MIOC isn't supported anymore. It's not an ATA controller, anyway ;)
> > > * 82371FB_0 PIIX ISA bridge isn't an ATA controller either.
> > > * 82801CA ICH3 support added. Only ICH3-M is supported by the original driver.
> > > * 82371MX MPIIX is not supported anymore. Too weird beast and doesn't do
> > > DMA anyway, better handled by the generic PCI ATA routines.
> > >
> > > * Cleaner, converted to ide-timing.[ch]
> > >
> > > * May not work. ;)
> > But does work with an Intel Corp. 82371AB PIIX4 IDE (rev 01) IDE controller...
>
> Thanks a lot for the testing!
>
> > I'll do some more stress testing but it boots, works in DMA and the data transfer rates haven't decreased ;)
> > *playingwithhdparmanddbench* ;)
>
> If you can (in addition to the benchmark numbers) also send me the
> output of 'cat /proc/ide/piix' and 'dmesg' and 'lspci -vvxxx'both my and
> the original version ... that'd help a lot too.
OK here they are ;)
The only thing which confuses me is the line
PCI clock: 33333MHz
in /proc/ide/piix
It should be 33 MHz I think
I think the mistake is in line 199 of piix.c:
piix_print("PCI clock: %dMHz", piix_clock);
has to be
piix_print("PCI clock: %dMHz", piix_clock / 1000);
or
the whole piix_clock stuff after line 432 is wrong
BTW: in /proc/ide/piix are sometimes wrong values for transfer rate and cycle time... I think they can't get negative ;)
Benchmark results come tomorrow
Bye
Hi!
Thanks a lot, the /proc stuff indeed seems to be wrong. I'll fix that
tomorrow.
On Tue, Mar 12, 2002 at 10:07:01PM +0100, Sebastian Droege wrote:
> On Tue, 12 Mar 2002 21:34:28 +0100
> Vojtech Pavlik <[email protected]> wrote:
>
> > On Tue, Mar 12, 2002 at 09:35:05PM +0100, Sebastian Droege wrote:
> > > On Tue, 12 Mar 2002 21:00:35 +0100
> > > Vojtech Pavlik <[email protected]> wrote:
> > >
> > > > On Tue, Mar 12, 2002 at 12:00:24PM +0100, Martin Dalecki wrote:
> > > > > Hello Vojtech.
> > > > >
> > > > > I have noticed that the ide-timings.h and ide_modules.h are running
> > > > > much in aprallel in the purpose they serve. Are the any
> > > > > chances you could dare to care about propagating the
> > > > > fairly nice ide-timings.h stuff in favour of
> > > > > ide_modules.h more.
> > > > >
> > > > > BTW.> I think some stuff from ide-timings.h just belongs
> > > > > as generic functions intro ide.c, and right now there is
> > > > > nobody who you need to work from behind ;-).
> > > > >
> > > > > So please feel free to do the changes you apparently desired
> > > > > to do a long time ago...
> > > >
> > > > Oh, by the way, here goes the PIIX rewrite ... unlike the AMD one, this
> > > > is completely untested, and may not work at all - I only have the
> > > > datasheets at hand, no PIIX hardware.
> > > >
> > > > Differences from the previous PIIX driver:
> > > >
> > > > * 82451NX MIOC isn't supported anymore. It's not an ATA controller, anyway ;)
> > > > * 82371FB_0 PIIX ISA bridge isn't an ATA controller either.
> > > > * 82801CA ICH3 support added. Only ICH3-M is supported by the original driver.
> > > > * 82371MX MPIIX is not supported anymore. Too weird beast and doesn't do
> > > > DMA anyway, better handled by the generic PCI ATA routines.
> > > >
> > > > * Cleaner, converted to ide-timing.[ch]
> > > >
> > > > * May not work. ;)
> > > But does work with an Intel Corp. 82371AB PIIX4 IDE (rev 01) IDE controller...
> >
> > Thanks a lot for the testing!
> >
> > > I'll do some more stress testing but it boots, works in DMA and the data transfer rates haven't decreased ;)
> > > *playingwithhdparmanddbench* ;)
> >
> > If you can (in addition to the benchmark numbers) also send me the
> > output of 'cat /proc/ide/piix' and 'dmesg' and 'lspci -vvxxx'both my and
> > the original version ... that'd help a lot too.
> OK here they are ;)
> The only thing which confuses me is the line
> PCI clock: 33333MHz
> in /proc/ide/piix
> It should be 33 MHz I think
> I think the mistake is in line 199 of piix.c:
> piix_print("PCI clock: %dMHz", piix_clock);
> has to be
> piix_print("PCI clock: %dMHz", piix_clock / 1000);
>
> or
> the whole piix_clock stuff after line 432 is wrong
>
> BTW: in /proc/ide/piix are sometimes wrong values for transfer rate and cycle time... I think they can't get negative ;)
>
> Benchmark results come tomorrow
>
> Bye
--
Vojtech Pavlik
SuSE Labs
On Tue, Mar 12, 2002 at 05:41:29PM +0100, Martin Dalecki wrote:
> "Tune" is a silly name as well BTW. It has nothing to do with
> attaching of carbon fiber to cars. It's host chip initialization, which
> is going on.
Since when has carbon fiber got anything to do with Pianos and other
musical instruments that need to be "tuned" ?
tune vt.:
4. to adjust (an electronic circuit, a motor, etc.) to the proper or
desired performance.
tune up:
2. to put (an engine) into good working order.
I therefore think "tune" (to tune the interface) is a reasonable name for
the function that is being performed.
--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html
On Tue, Mar 12, 2002 at 02:51:07AM +0000, Jeff Garzik wrote:
> The current implementation needs to be changed anyway :) From "ATA raw
> command" to "device raw command" at the very least.
Regarding proposed ATA filtering, this touches somewhat on something I've
been thinking about for a while: biodump.
Basically, biodump is to a block device what tcpdump is to a network
adaptor.
So we would be able to do this:
# biodump /dev/hda
09:09:33.023 READ block 12345 [10 blocks]
09:09:33.024 READ block 12355 [10 blocks]
09:09:33.025 READ block 12365 [10 blocks]
09:09:34.000 WRITE block 12345 [1 block]
Or somewhat more fancy, and only useful for non-growing files:
# biodump /var/db/bigdb/tablefile
file on /dev/hda, getting blockmap:
09:09:33.023 READ logical block 12345 [10 blocks]
09:09:33.024 READ logical block 12355 [10 blocks]
09:09:33.025 READ logical block 12365 [10 blocks]
09:09:34.000 WRITE logical block 12345 [1 block]
Because right now, we have no idea why that disk light is blinking. If
somebody implements a filter, it would be a natural place to also provide
a dumping hook.
biodump might in fact turn out to become atadump and scsidump (which is
rather sad actually).
regards,
bert
--
http://www.PowerDNS.com Versatile DNS Software & Services
http://www.tk the dot in .tk
http://lartc.org Linux Advanced Routing & Traffic Control HOWTO
bert hubert wrote:
># biodump /dev/hda
>09:09:33.023 READ block 12345 [10 blocks]
>09:09:33.024 READ block 12355 [10 blocks]
>09:09:33.025 READ block 12365 [10 blocks]
>09:09:34.000 WRITE block 12345 [1 block]
>
Definitely an interesting idea... With this new stuff Linus talked
about in his proposal and what I'm thinking about, it shouldn't be too
hard to do.
Jeff
Jeff Garzik writes:
> bert hubert wrote:
>
> ># biodump /dev/hda
> >09:09:33.023 READ block 12345 [10 blocks]
> >09:09:33.024 READ block 12355 [10 blocks]
> >09:09:33.025 READ block 12365 [10 blocks]
> >09:09:34.000 WRITE block 12345 [1 block]
> >
>
>
> Definitely an interesting idea... With this new stuff Linus talked
> about in his proposal and what I'm thinking about, it shouldn't be too
> hard to do.
I wrote a basic version of that (reqlog) a couple of years ago (with
the writing of the rw/block/size info inline in the kernel rather than
via a nice separate filter). The use I put it to was for migrating
block devices live (i.e. while being read/written) without having to
quiesce the readers/writers except for a few seconds at the end. The
idea is that you set up (as an ordinary userland process) a bitmap
in memory for all the blocks on the device. You start with them all
marked dirty. You then do two loops concurrently (select() or threads):
while (1) {
read next block/size touched
mark corresponding bits dirty in bitmap
}
and
while (1) {
while (find_next_dirty_bit_in_bitmap()) {
mark bit clean
read corresponding block from device
send blocknumber, blockdata to remote peer
}
go back to start of bitmap
}
The remote peer is a daemon which just reads the (blocknumber,data)
pairs and writes the data to the snapshot-to-be device at its end.
Eventually, the bitmap gets mostly clean and the migrating process
has "caught up" and is just siphoning newly dirtied data off to the
remote end in real time. At that point, you quiesce the writing
applications nicely for a few seconds (database, serving daemons or
whatever is writing to the device) and let the migrator clean the
bitmap fully (the last few blocks) and at that point you have a
point-in-time safe snapshot at the remote end.
With filters available (presumably as modules) for the bio stuff,
this will become possible without having to patch the kernel (I think
I submitted the reqlog and bufflink patches for inclusion but
didn't care enough to keep trying).
See
http://www.clueful.co.uk/mbeattie/linux-kernel.html#reqlog
and bmigrate and bufflink on the same page for the "old" way. It's
basic stuff and new APIs mean a rewrite will be much easier but I
thought you may be interested in another application of such logging.
--Malcolm
--
Malcolm Beattie <[email protected]>
Linux Technical Consultant
IBM EMEA Enterprise Server Group...
...from home, speaking only for myself
On Wed, 13 Mar 2002, Jeff Garzik wrote:
> bert hubert wrote:
>
> ># biodump /dev/hda
> >09:09:33.023 READ block 12345 [10 blocks]
> >09:09:33.024 READ block 12355 [10 blocks]
> >09:09:33.025 READ block 12365 [10 blocks]
> >09:09:34.000 WRITE block 12345 [1 block]
>
> Definitely an interesting idea... With this new stuff Linus talked
> about in his proposal and what I'm thinking about, it shouldn't be too
> hard to do.
Note that I actually think we're talking about two different things.
There's the notion of _feeding_ special requests onto the request queue
through some interface that also can refuse to feed certain kinds of
requests. That is needed for the special commands, and is "above" the
requests queue layer.
That interface doesn't really support filtering of requests that other
people (notably the regular kernel itself) is also feeding to the request
queue.
Note that one of the big issues with the request queue is that it acts as
a funnel: it (very much by design) can, and does, take requests from
different places, and nobody needs to "own" the request queue. But the
kind of "feed this raw request down" module that has been talked about
would have absolutely _zero_ visibility into what others are feeding into
the request queue.
If you actually want to filter other peoples requests, then you have to do
something completely different, namely set up a request queue of your own,
then exporting _your_ request queue as the request queue for <major,minor>
and then taking the requests off that queue internally, and moving them to
the original queue after you've done filtering.
Basically a simplified "loopback" thing that doesn't even need to do any
remapping.
The two are totally independent, they work on requests queues at different
levels (one feeds some random request queue, the other changes how we look
up a request queue and inserts its own queue in between).
Linus
Jeff Garzik <[email protected]>
[...]
> I -do- know the distrinction between hosts and devices. I think there
> should be -some- way, I don't care how, to filter out those unknown
> commands (which may be perfectly valid for a small subset of special IBM
> drives). The net stack lets me do filtering, I want to sell you on the
> idea of letting the ATA stack do the same thing.
The net stack does filtering for handling traffic from _untrusted_ external
sources, either for local consumtion or as a service for dumb machines
downstream, and as a way of limiting outward access to _untrusted_
users. Here we are talking of the ultimate _trusted_ user (root,
CAP_SYS_RAWIO, whatever). It makes no sense for the _kernel_ to get in the
way. Create a userland proggie for prodding IDE drives, and give it ways to
check (as far as terminal paranoia demands, a little, or not at all) as
desired. Unix ultimate simplicity is all about giving root enough rope to
shoot at his own feet.
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513
On Wed, 13 Mar 2002, Horst von Brand wrote:
> Jeff Garzik <[email protected]>
>
> [...]
>
> > I -do- know the distrinction between hosts and devices. I think there
> > should be -some- way, I don't care how, to filter out those unknown
> > commands (which may be perfectly valid for a small subset of special IBM
> > drives). The net stack lets me do filtering, I want to sell you on the
> > idea of letting the ATA stack do the same thing.
>
> The net stack does filtering for handling traffic from _untrusted_ external
> sources, either for local consumtion or as a service for dumb machines
> downstream, and as a way of limiting outward access to _untrusted_
> users. Here we are talking of the ultimate _trusted_ user (root,
> CAP_SYS_RAWIO, whatever). It makes no sense for the _kernel_ to get in the
> way. Create a userland proggie for prodding IDE drives, and give it ways to
> check (as far as terminal paranoia demands, a little, or not at all) as
> desired. Unix ultimate simplicity is all about giving root enough rope to
> shoot at his own feet.
Greetings Dr. von Brand,
One of the issues that has been confused in the stack is the command
parser and/or sequencer. It is used internally to allow quick and rapid
command additions as needed. Most are not clear how the IOCTL works and
thinks it runs via the command parser, this is totally wrong. The sad
part is the lenght of the rope does not hit the feet, it hits the head.
Regards,
Andre Hedrick
On Tue, 12 Mar 2002, Vojtech Pavlik wrote:
> PIIX and ICH are pretty crazy hardware from the design perspective, very
> legacy-bound back to the first Intel PIIX chip. And the driver for these
> in the kernel has similarly evolved following the hardware. However, it
> doesn't seem to be wrong at the first glance. Nevertheless, I'll take a
> look at it. Unfortunately, I don't have any Intel hardware at hand to
> test it with.
I'm not sure "evolved" is the term you want, allow me to suggest
"mutated."
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
Hi
> You may happen to have the numbers, though - that should be enough.
>
> Btw, I have a CMD640B based PCI card lying around here, but never
> managed to get it generate any interrupts, though the rest seems to be
> working.
Attach it to the timer interrupt -- that should do it for testing. Simplest
way is to make ide timeouts HZ/100 and killing "lost interrupt" msg ;-).
Pavel
--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.
Hi
> Under more restricted domains, root cannot bit-bang the interface.
> s/CAP_SYS_RAWIO/CAP_DEVICE_CMD/ for the raw cmd ioctl interface. Have
Nobody uses capabilities these days, right?
> The filter is useful for other reasons like correctness, as well.
If you want to test if it works, you just disallow that access altogether.
It is usually not needed , anyway.
Pavel
--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.
Hi!
> Well, there are uses for the 'dynamic' filter, and it doesn't add too
> much complexity. One could be allowing certain commands to be performed
> on certain devices by normal users - eg. CD-burning or whatever without
That should be better done userspace: setuid on cdrecord + its security audit
seems like nice solution.
> root privileges (I know we're using ide-scsi for the command access
> right now ...), and also protecting the oneself from ACPI and the like.
> Because ACPI can do IDE commands and does that in a way interfaceable to
> a 'taskfile' kernel ioctl. It'd be nice to know a broken ACPI
> implementation can't screw up your drive easily through a kernel driver.
This is entirely different can of worms, and should be easy to do at ACPI
level.
Pavel
--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.
On Thu, Mar 14, 2002 at 02:02:11PM +0000, Pavel Machek wrote:
> > You may happen to have the numbers, though - that should be enough.
> >
> > Btw, I have a CMD640B based PCI card lying around here, but never
> > managed to get it generate any interrupts, though the rest seems to be
> > working.
>
> Attach it to the timer interrupt -- that should do it for testing. Simplest
> way is to make ide timeouts HZ/100 and killing "lost interrupt" msg ;-).
Well, it seems like we'll have to something like this anyway. Some chips
sometimes forget to assert the IRQ after a transfer due to HW bugs, and
some PIIX3s are reported to do it quite often.
--
Vojtech Pavlik
SuSE Labs
Pavel Machek wrote:
>Hi
>
>>Under more restricted domains, root cannot bit-bang the interface.
>> s/CAP_SYS_RAWIO/CAP_DEVICE_CMD/ for the raw cmd ioctl interface. Have
>>
>
>Nobody uses capabilities these days, right?
>
Actually, the NSA and HP secure linux products do, at the very least.
And there is some ELF capabilities project out there IIRC, but I dunno
if anybody's using it.
>>The filter is useful for other reasons like correctness, as well.
>>
>
>If you want to test if it works, you just disallow that access altogether.
>It is usually not needed , anyway.
>
The filter, or directly sending drive commands to userspace?
I agree the filter is of limited usefulness.
Sending drive commands directly from userspace is definitely -very-
useful. As we start doing more and more stuff in userspace, I predict
this facility will be used more and more. Plus, IBM particularly
already has their Drive Fitness Tests, or whatever, sending direct drive
commands. With the proper sequencing, you can even do power management
of the drives in userspace. You don't want to do system suspend/resume
that way, but you can certainly have a userspace policy daemon running,
that powers-down and powers-up the drives, etc.
Jeff
> > s/CAP_SYS_RAWIO/CAP_DEVICE_CMD/ for the raw cmd ioctl interface. Have
>
> Nobody uses capabilities these days, right?
Wrong. There are quite a few people using them, including large scale projects
and actual sold packages. For other reasons CAP_DEVICE_CMD doesn't work out
Hi!
> >>Under more restricted domains, root cannot bit-bang the interface.
> >>s/CAP_SYS_RAWIO/CAP_DEVICE_CMD/ for the raw cmd ioctl interface. Have
> >>
> >
> >Nobody uses capabilities these days, right?
>
> Actually, the NSA and HP secure linux products do, at the very least.
> And there is some ELF capabilities project out there IIRC, but I dunno
> if anybody's using it.
I did ELF capabilities ;-). And no, I do not think I had many users.
> commands. With the proper sequencing, you can even do power management
> of the drives in userspace. You don't want to do system suspend/resume
> that way, but you can certainly have a userspace policy daemon running,
> that powers-down and powers-up the drives, etc.
See noflushd, Hdparm is able to powersave disks well, already, and it
was in 2.2.X, too.
Pavel
--
(about SSSCA) "I don't say this lightly. However, I really think that the U.S.
no longer is classifiable as a democracy, but rather as a plutocracy." --hpa
Hi!
> > > You may happen to have the numbers, though - that should be enough.
> > >
> > > Btw, I have a CMD640B based PCI card lying around here, but never
> > > managed to get it generate any interrupts, though the rest seems to be
> > > working.
> >
> > Attach it to the timer interrupt -- that should do it for testing. Simplest
> > way is to make ide timeouts HZ/100 and killing "lost interrupt" msg ;-).
>
> Well, it seems like we'll have to something like this anyway. Some chips
> sometimes forget to assert the IRQ after a transfer due to HW bugs, and
> some PIIX3s are reported to do it quite often.
What is "quite often"? Unless it is more than once in a hour, current
code is just okay... (It waits for timeout, which is about 30 sec?,
then recovers).
Pavel
--
(about SSSCA) "I don't say this lightly. However, I really think that the U.S.
no longer is classifiable as a democracy, but rather as a plutocracy." --hpa
On Mon, Mar 18, 2002 at 08:20:05PM +0100, Pavel Machek wrote:
> > commands. With the proper sequencing, you can even do power management
> > of the drives in userspace. You don't want to do system suspend/resume
> > that way, but you can certainly have a userspace policy daemon running,
> > that powers-down and powers-up the drives, etc.
>
> See noflushd, Hdparm is able to powersave disks well, already, and it
> was in 2.2.X, too.
Not all of them safely, though. Many a drive will corrupt data if it
receives a command when not spinned up. You need to issue a wake command
first, which hdparm doesn't, it just leaves it to the kernel to issue a
read command or whatever to wake the drive ...
--
Vojtech Pavlik
SuSE Labs
Hi!
> > > commands. With the proper sequencing, you can even do power management
> > > of the drives in userspace. You don't want to do system suspend/resume
> > > that way, but you can certainly have a userspace policy daemon running,
> > > that powers-down and powers-up the drives, etc.
> >
> > See noflushd, Hdparm is able to powersave disks well, already, and it
> > was in 2.2.X, too.
>
> Not all of them safely, though. Many a drive will corrupt data if it
> receives a command when not spinned up. You need to issue a wake command
> first, which hdparm doesn't, it just leaves it to the kernel to issue a
> read command or whatever to wake the drive ...
Is this common disk bug, or are they permitted to behave like that?
Pavel
--
Casualities in World Trade Center: ~3k dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.
On Tue, Mar 19, 2002 at 10:21:30PM +0100, Pavel Machek wrote:
> Hi!
>
> > > > commands. With the proper sequencing, you can even do power management
> > > > of the drives in userspace. You don't want to do system suspend/resume
> > > > that way, but you can certainly have a userspace policy daemon running,
> > > > that powers-down and powers-up the drives, etc.
> > >
> > > See noflushd, Hdparm is able to powersave disks well, already, and it
> > > was in 2.2.X, too.
> >
> > Not all of them safely, though. Many a drive will corrupt data if it
> > receives a command when not spinned up. You need to issue a wake command
> > first, which hdparm doesn't, it just leaves it to the kernel to issue a
> > read command or whatever to wake the drive ...
>
> Is this common disk bug, or are they permitted to behave like that?
This behavior is permitted by the specification, as far as I know -
results of commands other than wakeup (and other pm commands) in sleep
or suspend mode are undefined ...
--
Vojtech Pavlik
SuSE Labs
Pavel,
No, it is called being a "BAD HOST". Things were added before the
infrastructure was complete to support feature. Thus the feature is now a
BUG. Sorry but that is the simple true, one of the reasons why I asked
you to delay but it is not important now.
Regards,
Andre Hedrick
Linux Disk Certification Project Linux ATA Development
On Tue, 19 Mar 2002, Pavel Machek wrote:
> Hi!
>
> > > > commands. With the proper sequencing, you can even do power management
> > > > of the drives in userspace. You don't want to do system suspend/resume
> > > > that way, but you can certainly have a userspace policy daemon running,
> > > > that powers-down and powers-up the drives, etc.
> > >
> > > See noflushd, Hdparm is able to powersave disks well, already, and it
> > > was in 2.2.X, too.
> >
> > Not all of them safely, though. Many a drive will corrupt data if it
> > receives a command when not spinned up. You need to issue a wake command
> > first, which hdparm doesn't, it just leaves it to the kernel to issue a
> > read command or whatever to wake the drive ...
>
> Is this common disk bug, or are they permitted to behave like that?
>
> Pavel
> --
> Casualities in World Trade Center: ~3k dead inside the building,
> cryptography in U.S.A. and free speech in Czech Republic.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
> > receives a command when not spinned up. You need to issue a wake command
> > first, which hdparm doesn't, it just leaves it to the kernel to issue a
> > read command or whatever to wake the drive ...
>
> Is this common disk bug, or are they permitted to behave like that?
Its not that common, but these sort of things happen sometimes. IDE is
very much a defensive driver
On Tue, 19 Mar 2002 22:56:09 +0100, Vojtech Pavlik wrote:
>> > Not all of them safely, though. Many a drive will corrupt data if it
>> > receives a command when not spinned up. You need to issue a wake command
>> > first, which hdparm doesn't, it just leaves it to the kernel to issue a
>> > read command or whatever to wake the drive ...
>>
>> Is this common disk bug, or are they permitted to behave like that?
>
>This behavior is permitted by the specification, as far as I know -
Actually not. Have a look at page 36 of the current ATA6 specification.
>results of commands other than wakeup (and other pm commands) in sleep
>or suspend mode are undefined ...
If a disk is in power state PM1:idle or PM2:standby, each ATA command
which requires media access will result in a transition to power state
PM0:active as well. The driver should be prepared of a long command
execution time in this case (due to the spin up delay). This is why a
well implemented ATA driver should track the individual power states of
each attached unit to modulate its internal command timeout
accordingly.
The driver may issue a SET_FEATURES(spin up) command in anticipation of
a media access. A "forgiving" implementation might issue this command
at proper system state transitions as well .
If a disk has entered power state PM3:sleep, its interface is turned
off! You no longer can issue any command except for a DEVICE RESET to
this unit. A reset is required to initiate a state transition from
PM3:sleep to PM2:standby (there are no other state transitions).
Ciao,
Dani
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Daniela Engert, systems engineer at MEDAV GmbH
Gr?fenberger Str. 34, 91080 Uttenreuth, Germany
Phone ++49-9131-583-348, Fax ++49-9131-583-11
On Wed, 20 Mar 2002, Daniela Engert wrote:
> >This behavior is permitted by the specification, as far as I know -
>
> Actually not. Have a look at page 36 of the current ATA6 specification.
That's probably not the place to look, since drives will conform to the
specs in place when they were designed. If older specs did not require
automatic spinup, then ATA6 doesn't apply.
In the real world I have seen drives which definitely didn't like commands
in spindown, so unless there is some really large penalty for sending that
I would suggest having the logic support the existing hardware. We still
have 386 and FPE emulation, I would bet there are more older drives than
386s in use (at least by my clients ;-).
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
On Wed, 20 Mar 2002 13:11:03 -0500 (EST), Bill Davidsen wrote:
>On Wed, 20 Mar 2002, Daniela Engert wrote:
>
>> >This behavior is permitted by the specification, as far as I know -
>>
>> Actually not. Have a look at page 36 of the current ATA6 specification.
>
>That's probably not the place to look, since drives will conform to the
>specs in place when they were designed. If older specs did not require
>automatic spinup, then ATA6 doesn't apply.
Well, then have a look at page 31 of the ATA3 spec. It tells you
basically the same - just a little more terse ;-)
Ciao,
Dani
Hi!
> >> > Not all of them safely, though. Many a drive will corrupt data if it
> >> > receives a command when not spinned up. You need to issue a wake command
> >> > first, which hdparm doesn't, it just leaves it to the kernel to issue a
> >> > read command or whatever to wake the drive ...
> >>
> >> Is this common disk bug, or are they permitted to behave like that?
> >
> >This behavior is permitted by the specification, as far as I know -
>
> Actually not. Have a look at page 36 of the current ATA6
> specification.
Good. So noflushd is safe.
Pavel
> If a disk has entered power state PM3:sleep, its interface is turned
> off! You no longer can issue any command except for a DEVICE RESET to
> this unit. A reset is required to initiate a state transition from
> PM3:sleep to PM2:standby (there are no other state transitions).
I do not think we are using PM3:sleep in noflushd, but we even
could. AFAICT, ide layer resets interface if it does not respond.
--
Casualities in World Trade Center: ~3k dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.
On Wed, Mar 20, 2002 at 11:15:14PM +0100, Pavel Machek wrote:
> I do not think we are using PM3:sleep in noflushd, but we even
> could. AFAICT, ide layer resets interface if it does not respond.
It does, but only after a rather longish timeout, and a couple of IDE
errors in the syslog. Nothing I'd like to scare users with, so noflushd
issues standby (PM2).
Regards,
Daniel.