2004-01-01 12:35:35

by Rob Landley

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Wednesday 31 December 2003 18:31, Rob Love wrote:
> On Wed, 2003-12-31 at 19:15, Andries Brouwer wrote:
> > My plan has been to essentially use a hashed disk serial number
> > for this "any old unique value". The problem is that "any old"
> > is easy enough, but "unique" is more difficult.
> > Naming devices is very difficult, but in some important cases,
> > like SCSI or IDE disks, that would work and give a stable name.
>
> Yup.
>
> > The kernel must not invent consecutive numbers - that does not
> > lead to stable names. Setting this up correctly is nontrivial.
>
> This is definitely an interesting problem space.
>
> I agree wrt just inventing consecutive numbers. If there was a nice way
> to trivially generate a random and unique number from some
> device-inherent information, that would be nice.
>
> Rob Love

Fundamental problem: "Unique" depends on the other devices in the system. You
can't guarantee unique by looking at one device, more or less by definition.

Combine that with hotplug and you have a world of pain. Generating a number
from a device is just a fancy hashing function, but as soon as you have two
devices that generate the same number independently (when in separate
systems) and you plug them both into the same system: boom.

Now if you don't care about hotplug, it gets a little easier. You can have a
collission handler that does some kind of hashing thing, figuring out which
device needs to get bumped and bumping it. (As long as it consistently picks
the same victim, you're okay, although that in and of itself could get
interesting. And if you remove the earlier device it conflicted with and
reboot, the device could get renumbered which is evil...)

Of course the EASY way to deal with collisions is to just fail the hash thingy
in a detectable way, and punt to some kind of udev override. So if you yank
a drive from system A, throw it in system B, try to re-export it NFS, and
it's not going to work, it TELLS you.

Solve 90% of the problem space and have a human deal with the exceptions. How
big's the unique number being exported, anyway? (If it's 32 bits, the
exceptions are 1 in 4 billion. It may never be seen in the wild...)

Rob


2004-01-01 15:22:55

by Robert Love

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Thu, 2004-01-01 at 07:34, Rob Landley wrote:

> Fundamental problem: "Unique" depends on the other devices in the system. You
> can't guarantee unique by looking at one device, more or less by definition.

Of course.

> Combine that with hotplug and you have a world of pain. Generating a number
> from a device is just a fancy hashing function, but as soon as you have two
> devices that generate the same number independently (when in separate
> systems) and you plug them both into the same system: boom.

A solution would have to deal with collisions.

> Of course the EASY way to deal with collisions is to just fail the hash thingy
> in a detectable way, and punt to some kind of udev override. So if you yank
> a drive from system A, throw it in system B, try to re-export it NFS, and
> it's not going to work, it TELLS you.

No no no. Nothing this complicated. No punting to udev.

> Solve 90% of the problem space and have a human deal with the exceptions. How
> big's the unique number being exported, anyway? (If it's 32 bits, the
> exceptions are 1 in 4 billion. It may never be seen in the wild...)

Device numbers are 64-bit now.

Rob Love


2004-01-01 15:49:59

by Andries Brouwer

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Thu, Jan 01, 2004 at 10:22:53AM -0500, Rob Love wrote:

> Device numbers are 64-bit now.
>
> Rob Love

I am afraid I have to disappoint you. I made them 64-bit,
and I think they were 64-bit for a few months in the -mm tree,
forgot the details, but unfortunately Al went back to 32-bit again.

2004-01-01 15:54:03

by Robert Love

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Thu, 2004-01-01 at 10:48, Andries Brouwer wrote:

> I am afraid I have to disappoint you. I made them 64-bit,
> and I think they were 64-bit for a few months in the -mm tree,
> forgot the details, but unfortunately Al went back to 32-bit again.

You did disappoint me! My heart is crushed and my aspirations for the
future ruined.

But you are right, dunno what I was thinking.

Rob Love


2004-01-01 21:02:27

by kaih

[permalink] [raw]
Subject: Re: udev and devfs - The final word

[email protected] (Rob Landley) wrote on 01.01.04 in <[email protected]>:

> On Wednesday 31 December 2003 18:31, Rob Love wrote:
> > On Wed, 2003-12-31 at 19:15, Andries Brouwer wrote:
> > > My plan has been to essentially use a hashed disk serial number
> > > for this "any old unique value". The problem is that "any old"
> > > is easy enough, but "unique" is more difficult.
> > > Naming devices is very difficult, but in some important cases,
> > > like SCSI or IDE disks, that would work and give a stable name.
> >
> > Yup.
> >
> > > The kernel must not invent consecutive numbers - that does not
> > > lead to stable names. Setting this up correctly is nontrivial.
> >
> > This is definitely an interesting problem space.
> >
> > I agree wrt just inventing consecutive numbers. If there was a nice way
> > to trivially generate a random and unique number from some
> > device-inherent information, that would be nice.
> >
> > Rob Love
>
> Fundamental problem: "Unique" depends on the other devices in the system.
> You can't guarantee unique by looking at one device, more or less by
> definition.

This is actually not fundamental at all.

The best-known exception is probably the MAC address. But it is not the
only example of devices having true unique information.

It is certainly true, though, that there are devices without this kind of
info.

And remember that you can sometimes use secondary information. With any
kind of read-write storage device, it might be possible to create such a
piece of information and store it onto that device.

Moral: keep the identifier creation framework flexible enough so that you
can chose device-specific means to produce useful identifiers. (And, use
long identifiers, as they're less likely to be duplicated in general.)

MfG Kai

2004-01-02 00:18:09

by Maciej Żenczykowski

[permalink] [raw]
Subject: Re: udev and devfs - The final word

> Solve 90% of the problem space and have a human deal with the exceptions.
> How big's the unique number being exported, anyway? (If it's 32 bits, the
> exceptions are 1 in 4 billion. It may never be seen in the wild...)

Wouldn't this be a classical birthday problem with 50% collision chance
popping up in and around a few hundred devices? [20 for 8 bits, 23 for
365, 302 for 16 bits, 77163 for 32 bits], and that's only in a single
system - with hundreds of thousands of systems even a 0.1% collision rate
is deadly. [0.1% collision rate at 32 bits with 2932 devices] Even with
only 300 devices per system, you'll still get a collision (at 32 bits) on
more than 1 system in a hundred thousand.

Cheers,
MaZe.


2004-01-02 07:27:24

by Rob Landley

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Thursday 01 January 2004 13:43, Kai Henningsen wrote:
> [email protected] (Rob Landley) wrote on 01.01.04 in
<[email protected]>:
> > On Wednesday 31 December 2003 18:31, Rob Love wrote:
> > > On Wed, 2003-12-31 at 19:15, Andries Brouwer wrote:
> > > > My plan has been to essentially use a hashed disk serial number
> > > > for this "any old unique value". The problem is that "any old"
> > > > is easy enough, but "unique" is more difficult.
> > > > Naming devices is very difficult, but in some important cases,
> > > > like SCSI or IDE disks, that would work and give a stable name.
> > >
> > > Yup.
> > >
> > > > The kernel must not invent consecutive numbers - that does not
> > > > lead to stable names. Setting this up correctly is nontrivial.
> > >
> > > This is definitely an interesting problem space.
> > >
> > > I agree wrt just inventing consecutive numbers. If there was a nice
> > > way to trivially generate a random and unique number from some
> > > device-inherent information, that would be nice.
> > >
> > > Rob Love
> >
> > Fundamental problem: "Unique" depends on the other devices in the system.
> > You can't guarantee unique by looking at one device, more or less by
> > definition.
>
> This is actually not fundamental at all.
>
> The best-known exception is probably the MAC address. But it is not the
> only example of devices having true unique information.

I thought of mentioning this, but deleted it as a digression. But since you
brought it up:

A) There are ethernet cards that have the same mac address. (Over the years,
the cheap manufacturers have managed to screw this up. Ask Alan Cox.) They
show up randomly and cause real headaches for network administrators if you
don't think to look for it.

B) You can override the mac address thing thing comes with. This is done all
the time. (Hot failover comes to mind, but it's not the only one. I
remember how the cable modem company that serviced my mother's house snagged
the mac address of the cable modem as part of the inital setup, and refused
to work with a different mac address. (I asked their support guys: They
wanted to make sure you were still using the machine they'd installed their
special software on, which was a windows machine and I was installing a linux
firewall. And predicting THIS digression: yes I power cycled and hit the
reset button on the cable modem, it didn't help. The problem was at the
other end, their gateway dropped packets from the wrong mac address.)

So I changed the mac address of the other machine as part of its init scripts,
and it worked again...

> It is certainly true, though, that there are devices without this kind of
> info.
>
> And remember that you can sometimes use secondary information. With any
> kind of read-write storage device, it might be possible to create such a
> piece of information and store it onto that device.

I.E. a udev config entry?

> Moral: keep the identifier creation framework flexible enough so that you
> can chose device-specific means to produce useful identifiers. (And, use
> long identifiers, as they're less likely to be duplicated in general.)

Seems to be what udev is for. When we do go to random major and minor
numbers, maybe it would be useful to let udev request specific ones? (Just a
thought...)

> MfG Kai

Rob

2004-01-02 20:43:32

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev and devfs - The final word



On Thu, 1 Jan 2004, Rob Love wrote:
>
> On Thu, 2004-01-01 at 10:48, Andries Brouwer wrote:
> > I am afraid I have to disappoint you. I made them 64-bit,
> > and I think they were 64-bit for a few months in the -mm tree,
> > forgot the details, but unfortunately Al went back to 32-bit again.
>
> You did disappoint me! My heart is crushed and my aspirations for the
> future ruined.
>
> But you are right, dunno what I was thinking.

Note that one reason I didn't much like the 64-bit versions is that not
only are they bigger, they also encourage insanity. Ie you'd find SCSI
people who want to try to encode device/controller/bus/target/lun info
into the device number.

We should resist any effort that makes the numbers "mean" something. They
are random cookies. Not "unique identifiers", and not "addresses".

The unique identifiers you get from things like udev, using contents of
the device itself or user preferences etc. That's outside the scope of the
kernel. The addresses you get from /sys.

Linus

2004-01-03 04:13:54

by Andries Brouwer

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Fri, Jan 02, 2004 at 12:42:41PM -0800, Linus Torvalds wrote:

Hi Linus - A happy 2004 !


> Note that one reason I didn't much like the 64-bit versions is that not
> only are they bigger, they also encourage insanity. Ie you'd find SCSI
> people who want to try to encode device/controller/bus/target/lun info
> into the device number.

Weak. "We don't want this power that has good uses because it also
can be used stupidly." That is not Unix-style.

> We should resist any effort that makes the numbers "mean" something. They
> are random cookies. Not "unique identifiers", and not "addresses".

Random cookies? I prefer "arbitrary" over "random". The value plays no role
at all, but it must be unique, preferably stable across reboots.

Andries



2004-01-03 04:46:48

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev and devfs - The final word



On Sat, 3 Jan 2004, Andries Brouwer wrote:
>
> > Note that one reason I didn't much like the 64-bit versions is that not
> > only are they bigger, they also encourage insanity. Ie you'd find SCSI
> > people who want to try to encode device/controller/bus/target/lun info
> > into the device number.
>
> Weak. "We don't want this power that has good uses because it also
> can be used stupidly." That is not Unix-style.

No.

That's not the argument: the argument is that the _only_ thing that 64-bit
stuff can be used for is stupid things.

For everything else, a 32-bit dev_t is sufficient.

And the UNIX way is definitely: "do one thing, and do it well" and "small
is beautiful". It has _never_ been "overdesign everything to accomodate
stupidity".

You may have confused UNIX with Multics. Where overdesign was the rule,
not the exception.

> > We should resist any effort that makes the numbers "mean" something. They
> > are random cookies. Not "unique identifiers", and not "addresses".
>
> Random cookies? I prefer "arbitrary" over "random". The value plays no role
> at all, but it must be unique, preferably stable across reboots.

Don't use "unique". It has way too many connotations of _true_ uniqieness
in computer science.

And the operative word in "preferably stable across reboots" is
"preferably". Because it basically cannot be in the general case (it
can't be unique for things that aren't enumerable, and clearly a lot of
things aren't), and thus nothing must ever _assume_ it is.

And the thing is, to break those wrong assumptions (that are true in many
common cases, but are _not_ true in the rare general case), we may have to
actively do things that are "silly" on purpose. For example, for
debugging, we start the "jiffies" counter not at zero, but at -300. That's
patently _silly_, but it was very useful in finding the cases where the
rare general case was not handled correctly.

Similarly, I'll probably advocate at some point (when distributions are
using udev) that we purposefully try to make device numbers _unstable_
across reboots, to find cases that do the wrong thing and have things
hardcoded. Exactly to find and fix them, so that the distribution works
correctly even when things aren't enumerable.

(As to examples of inumerable devices, iSCSI comes to mind. As does pretty
much anything else that is connected over IP - you can't even enumerate
according to path or IP, since those may change too).

Linus

2004-01-03 13:10:36

by Andries Brouwer

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Fri, Jan 02, 2004 at 08:46:33PM -0800, Linus Torvalds wrote:

> > Random cookies? I prefer "arbitrary" over "random". The value plays no role
> > at all, but it must be unique, preferably stable across reboots.
>
> The operative word in "preferably stable across reboots" is
> "preferably". Because it basically cannot be in the general case,
> and thus nothing must ever _assume_ it is.

Sure. It is not "need". It is "quality of implementation".
Consider NFS.

Andries

2004-01-03 18:33:48

by Pavel Machek

[permalink] [raw]
Subject: Wrapping jiffies [was Re: udev and devfs - The final word]

Hi!

> actively do things that are "silly" on purpose. For example, for
> debugging, we start the "jiffies" counter not at zero, but at -300. That's
> patently _silly_, but it was very useful in finding the cases where the
> rare general case was not handled correctly.

BTW, as we are currently in stable series, it might be good idea to
make jiffies start at zero... Hopefully jiffie wrap had enough testing
during stable...

Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

2004-01-03 22:28:00

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev and devfs - The final word



On Sat, 3 Jan 2004, Andries Brouwer wrote:
>
> Sure. It is not "need". It is "quality of implementation".
> Consider NFS.

The problems occur when there are things we _cannot_ guarantee, and that
user space starts unnecessarily to depend on. And that ends up resulting
in bugs waiting to happen. Bugs that many "normal" developers may never
hit, simply because the quality of implementation ends up being so good
that it hides the problem cases in regular usage.

And then a high-quality implementation actually ends up being
_detrimental_. It's hiding problems that can still happen, they just
happen rarely enough that the bugs don't get found and fixed.

And then the painful thing of forcing "stupid", aka "bad QoI" behaviour,
actually ends up being the better thing in the long run.

Linus

2004-01-03 23:08:47

by Andries Brouwer

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Sat, Jan 03, 2004 at 02:27:47PM -0800, Linus Torvalds wrote:

> > Sure. It is not "need". It is "quality of implementation".
> > Consider NFS.

> And then a high-quality implementation actually ends up being
> _detrimental_. It's hiding problems that can still happen, they just
> happen rarely enough that the bugs don't get found and fixed.

Empty talk. This is not about finding and fixing bugs.
We know very precisely what properties the NFS protocol has.
Now one can have a system that works as well as possible with NFS.
And one can have a worse system.

Andries

2004-01-04 01:18:17

by Mark Mielke

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Sun, Jan 04, 2004 at 12:08:40AM +0100, Andries Brouwer wrote:
> On Sat, Jan 03, 2004 at 02:27:47PM -0800, Linus Torvalds wrote:
> > And then a high-quality implementation actually ends up being
> > _detrimental_. It's hiding problems that can still happen, they just
> > happen rarely enough that the bugs don't get found and fixed.
> Empty talk. This is not about finding and fixing bugs.
> We know very precisely what properties the NFS protocol has.
> Now one can have a system that works as well as possible with NFS.
> And one can have a worse system.

It seems to me that as long as /dev is always a local mount (tmpfs in
the case of an NFS-root installation), it doesn't really matter. Maintaining
system-specific information on a remote machine seems dirty, and something
that shouldn't be *expected* to work. You wouldn't expect /proc to work
over NFS, would you? :-)

mark

--
[email protected]/[email protected]/[email protected] __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

2004-01-04 01:55:17

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Sat, 03 Jan 2004 20:16:26 EST, Mark Mielke said:

> It seems to me that as long as /dev is always a local mount (tmpfs in
> the case of an NFS-root installation), it doesn't really matter. Maintaining
> system-specific information on a remote machine seems dirty, and something
> that shouldn't be *expected* to work. You wouldn't expect /proc to work
> over NFS, would you? :-)

ISTR that SunOS 4.0 handled an NFS-mounted /dev and swap just fine some 15
years ago? (in fact, due to performance differences between the disks on a Sun3/
2xx server and the shoebox disk on a 3/50, you could page faster over the net
than to a local /dev/swap).

So it's more a case of "we have decided to do it differently" than "that's so nuts
that it shouldn't be expected to work"....


Attachments:
(No filename) (226.00 B)

2004-01-04 02:10:10

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev and devfs - The final word



On Sun, 4 Jan 2004, Andries Brouwer wrote:
>
> Empty talk. This is not about finding and fixing bugs.
> We know very precisely what properties the NFS protocol has.
> Now one can have a system that works as well as possible with NFS.
> And one can have a worse system.

Oh, things can be _much_ worse than /dev over NFS.

You don't seem to realize what I men with "not enumerable".

With NFS, you could have some strange per-mount device number mapping etc,
and it wouldn't need to be all that complicated.

But if you start considering network-attached storage (as in "disks over
IP", not as in "samba"), the problem is that you fundamentally cannot
enumerate the things on a kernel level. EVER. There is no way to do
automatic discovery, because the bus fundamentally isn't enumerable. It
isn't even _repeatable_, ie if you do broadcast "tell me what disks
exists", the results won't be ordered some way.

In other words, the device numbers that eventually get attached to these
disks (however the discovery ends up working - with the sysadmin
explicitly mentioning them, or with some kind of broadcast protocol)
simply WILL NOT NECESSARILY be the same across reboots.

And there just _isn't_ any way to make them the same or to "describe" the
storage in any integer of any finite length. It has nothing to do with
32-bit vs 64-bit vs 1024-bit.

Once you accept that fact, you should accept the fact that device numbers
not only have no meaning, they literally have no permanence across reboots
either.

Yes, the common case is permanent. What I'm saying is that the common case
_cannot_ be the generic case.

Linus

2004-01-04 02:51:32

by Norman Diamond

[permalink] [raw]
Subject: Re: Wrapping jiffies [was Re: udev and devfs - The final word]

Pavel Machek wrote:

> BTW, as we are currently in stable series, it might be good idea to
> make jiffies start at zero...

I disagree. The importance of fixing bugs does not decrease in stable.
Hiding bugs is still the opposite of fixing bugs.

Perhaps I misunderstand the meaning of stable, but I expected stable to mean
that efforts tend more towards fixing things so they work properly, and
unstable meant that efforts tend more towards adding features even though
they're broken at first. Hiding a broken thing is still the opposite of
fixing a broken thing.

> Hopefully jiffie wrap had enough testing during stable...

I think you mean unstable, in which case I agree with this half of what I
think you meant. This still doesn't give any reason to switch back to
hiding things. In fact this doesn't give any reason to switch from a
technique that "hopefully [...] had enough testing" to a different
technique, even if logically the different technique doesn't need as much
testing.

2004-01-04 02:49:44

by Andries Brouwer

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Sat, Jan 03, 2004 at 06:09:47PM -0800, Linus Torvalds wrote:
> On Sun, 4 Jan 2004, Andries Brouwer wrote:

> > Empty talk. This is not about finding and fixing bugs.
> > We know very precisely what properties the NFS protocol has.
> > Now one can have a system that works as well as possible with NFS.
> > And one can have a worse system.
>
> Oh, things can be _much_ worse than /dev over NFS.

Yes, but why do you start saying that?

Our topic is the statement that it is good to have device numbers
stable across a reboot. Not absolutely necessary, but good.

For example, given an NFS mount, if the server reboots and
suddenly the client sees different stat data, that would be
less than optimal. A low quality NFS implementation.

You write long stories - but it really is desirable to have
stable device numbers.

> You don't seem to realize what I mean with "not enumerable".

One of your side avenues is the matter of enumeration.
I don't see why that would be relevant. One identifies
things by their UUID. Order is never important.

> And there just _isn't_ any way to make them the same or to "describe" the
> storage in any integer of any finite length. It has nothing to do with
> 32-bit vs 64-bit vs 1024-bit.

A UUID usually takes 128 bits.

Andries

2004-01-04 03:04:28

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev and devfs - The final word



On Sun, 4 Jan 2004, Andries Brouwer wrote:
>
> You write long stories - but it really is desirable to have
> stable device numbers.

And I write the long stories because you do not seem to _get_ the point.

The point is that we will most likely ON PURPOSE break those stable device
numbers, for debugging reasons. Because it is _not_ desirable to have
people _believe_ that they can depend on stable device numbers.

> I don't see why that would be relevant. One identifies
> things by their UUID. Order is never important.

And this is exactly how it should be. However, it requires that user code
actually does the right thing.

And to _verify_ that user code properly identifies devices by other things
than device numbers, we should during 2.7.x explicitly _break_ all
dependencies on stable device numbers.

And UUID's are _not_ "device numbers". They fundamentally _cannot_ be
that, because the kernel just doesn't have any information on how to
generate a unique identifier that is actually stable.

The kernel doesn't know what it can depend on - should it look at the UUID
in the boot sector of the disk, or should it look up the UUID using IP
number reverse lookup, or what?

The only thing that can generate a UUID is literally user mode. Which is
_exactly_ why things like udev exists.

So device numbers are _not_ UUID's. Device numbers are needed before the
UUID's have been identified.

And that has been my point all along: device numbers do not have any
meaning. They are neither unique nor stable across reboots. They have no
information AT ALL associated with them. Anybody who thinks that they are
is fundamentally _wrong_ about it.

I agree that for a stable kernel we should then go back to "best effort"
mode, where for simple politeness reasons we should try to keep device
numbers as stable as we can.

Linus

2004-01-04 04:36:20

by Ananda Bhattacharya

[permalink] [raw]
Subject: Pentium 4 HT SMP

Hi,
I was wondering if one compiles a kernel for a
Pentium 4 which has HyperThreading will we need to recompile
SMP support for a single physical CPU or will one need to
have SMP enabled to take advantag of hyperthreading.

thanks
-A

2004-01-04 05:56:07

by Martin J. Bligh

[permalink] [raw]
Subject: Re: Pentium 4 HT SMP

> I was wondering if one compiles a kernel for a
> Pentium 4 which has HyperThreading will we need to recompile
> SMP support for a single physical CPU or will one need to
> have SMP enabled to take advantag of hyperthreading.

You need SMP.

M.

2004-01-04 09:05:57

by Greg KH

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Fri, Jan 02, 2004 at 01:26:44AM -0600, Rob Landley wrote:
> > Moral: keep the identifier creation framework flexible enough so that you
> > can chose device-specific means to produce useful identifiers. (And, use
> > long identifiers, as they're less likely to be duplicated in general.)
>
> Seems to be what udev is for. When we do go to random major and minor
> numbers, maybe it would be useful to let udev request specific ones? (Just a
> thought...)

Let udev request specific what? Major/minor numbers? Huh? I think you
are very confused here...

thanks,

greg k-h

2004-01-04 09:44:52

by Rob Landley

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Sunday 04 January 2004 02:57, Greg KH wrote:
> On Fri, Jan 02, 2004 at 01:26:44AM -0600, Rob Landley wrote:
> > > Moral: keep the identifier creation framework flexible enough so that
> > > you can chose device-specific means to produce useful identifiers.
> > > (And, use long identifiers, as they're less likely to be duplicated in
> > > general.)
> >
> > Seems to be what udev is for. When we do go to random major and minor
> > numbers, maybe it would be useful to let udev request specific ones?
> > (Just a thought...)
>
> Let udev request specific what? Major/minor numbers? Huh? I think you
> are very confused here...

Currently, NFS exports are using device major/minor as part of the identifier
for an exported direcory, and device numbers are going to be dynamically
allocated in 2.7 to support hotplug, so i was wondering if there was a need
to have some way for root to go "I know this device hotplugged in at major 3
minor 99, but if major 53 minor 12 is free, could you change it to that?") A
bit like dup2, only for devices.

The discussion has moved on since then, and now it seems pretty clear that NFS
is going to be expected to use something OTHER than device numbers, and Linus
wants a clean break with device nodes being cookies. Better solution all
around, really...

But the original question did make sense. (The answer was "no", but that's
often the sign of a good question. :)

> thanks,
>
> greg k-h

Rob

2004-01-04 13:23:49

by Andries Brouwer

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Sat, Jan 03, 2004 at 07:04:17PM -0800, Linus Torvalds wrote:

> I agree that for a stable kernel we should then go back to "best effort"
> mode, where for simple politeness reasons we should try to keep device
> numbers as stable as we can.

Good - you understand now.
So, the right setup - you call it politeness, I call it quality
of implementation - is to have both stable names and stable numbers,
in as many cases as possible.

Concerning the names, we are in reasonable shape. We have nameif
that binds a stable name to a MAC address. Much beter than eth2.
Also udev is a good step in the right direction - it gives
stable names under certain circumstances.

(And since udev can use the kernel device number, it can give stable
names under more circumstances when the kernel device number is
more often stable.)

Concerning the numbers, numbers based on enumeration are less than
satisfactory - they must be the last fallback when nothing else
can be found. And the ordering then is the ordering in time.

Almost always something better can be found. It is the drivers' job
to invent the device number. For the important special case of
SCSI or IDE disk, the disk serial number can be used.

Our helper function takes a string and an integer and a range, and
produces a device number in the given range, distinct from already
existing numbers. If you prefer random device numbers you make this
function ignore the string argument. I prefer stable device numbers
so would do an md5sum-like thing.

And that brings us back to the start of this thread:
Life is simpler when there is more room.
So it is a pity that we chose for less room.

Andries

2004-01-04 20:49:56

by Mark Mielke

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Sat, Jan 03, 2004 at 08:54:36PM -0500, [email protected] wrote:
> ISTR that SunOS 4.0 handled an NFS-mounted /dev and swap just fine
> some 15 years ago? (in fact, due to performance differences between
> the disks on a Sun3/ 2xx server and the shoebox disk on a 3/50, you
> could page faster over the net than to a local /dev/swap).

Whether it did at some point, or whether it didn't, doesn't really matter.

It doesn't need to, and with the amount of memory that most computers come
with these days, remote access storage for tiny kernel data structures, like
that which would be required for tmpfs /dev that is only populated with the
devices that actually exist, just isn't worth it.

> So it's more a case of "we have decided to do it differently" than
> "that's so nuts that it shouldn't be expected to work"....

I was saying "why do you think this is a good model?" not "I can't imagine
why you would do it..." :-) Sorry it didn't come across as I intended.

mark

--
[email protected]/[email protected]/[email protected] __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

2004-01-04 21:05:41

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev and devfs - The final word



On Sun, 4 Jan 2004, Andries Brouwer wrote:
>
> On Sat, Jan 03, 2004 at 07:04:17PM -0800, Linus Torvalds wrote:
> >
> > I agree that for a stable kernel we should then go back to "best effort"
> > mode, where for simple politeness reasons we should try to keep device
> > numbers as stable as we can.
>
> Good - you understand now.

Oh, _I_ always understood. You were the one that was arguing for stable
numbers as somehow important. I'm just telling you that they aren't
stable, and that a user application that depends on their stability or
their uniqieness is BROKEN.

> So, the right setup - you call it politeness, I call it quality
> of implementation - is to have both stable names and stable numbers,
> in as many cases as possible.

And I still disagree. You seem to think that this is an "absolute
goodness", and call it a quality issue.

While I personally strongly believe that it is a bug in user space to
care, and that it is not a quality issue at all, but rather a "allow buggy
and/or nonconverted user space to work".

In other words, it's not about "quality", as much as about compatibility
with applications that are old and/or braindead. Big difference.

Linus

2004-01-04 22:01:13

by Andries Brouwer

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Sun, Jan 04, 2004 at 01:05:20PM -0800, Linus Torvalds wrote:

> Oh, _I_ always understood. You were the one that was arguing for
> stable numbers as somehow important.

Indeed. I said "preferably stable across reboots".

> I'm just telling you that they aren't stable, and that a
> user application that depends on their stability or
> their uniqueness is BROKEN.

Surprise! Are you leaving POSIX? Or ditching NFS?
Or demanding that NFS servers must never reboot?

A common Unix idiom is testing for the identity
of two files by comparing st_ino and st_dev.
A broken idiom?

No idea what part of our Unix heritage you now have decided to call broken.

Andries


2004-01-04 22:25:39

by Helge Hafting

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Sun, Jan 04, 2004 at 11:01:04PM +0100, Andries Brouwer wrote:
> On Sun, Jan 04, 2004 at 01:05:20PM -0800, Linus Torvalds wrote:
>
> > Oh, _I_ always understood. You were the one that was arguing for
> > stable numbers as somehow important.
>
> Indeed. I said "preferably stable across reboots".
>
> > I'm just telling you that they aren't stable, and that a
> > user application that depends on their stability or
> > their uniqueness is BROKEN.
>
> Surprise! Are you leaving POSIX? Or ditching NFS?
> Or demanding that NFS servers must never reboot?
>
> A common Unix idiom is testing for the identity
> of two files by comparing st_ino and st_dev.
> A broken idiom?
>
> No idea what part of our Unix heritage you now have decided to call broken.
>

You worry about /dev over nfs, with the server booting in the middle of
such a comparison? This can work even with randomized device numbers,
just don't let that nfs server populate the exported /dev itself.

Let the client(s) run udev, and have one /dev for each on persistent
storage. If the nfs server reboots it simply keeps serving /dev's
in whatever shape the clients set them up with.

Helge Hafting

2004-01-04 22:37:15

by Al Viro

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Sun, Jan 04, 2004 at 11:01:04PM +0100, Andries Brouwer wrote:
> A common Unix idiom is testing for the identity
> of two files by comparing st_ino and st_dev.
> A broken idiom?

No, just your usual highly selective reading. First of all, that
idiom relies only on different ->s_dev *among* *currently* *mounted*
*filesystems*. In part that has anything to do with devices, it means
only one thing:

Any two different block devices that are both currently opened by
the kernel and are both alive must have different device numbers.

Note the "are alive" part - we can even allow reuse of device numbers
as long as we make sure that stat() will fail on filesystems mounted
from dead ones.

Now, care to explain how preserving aforementioned common Unix idiom
is related to your expostulations?

2004-01-04 23:35:56

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Sun, 04 Jan 2004 23:01:04 +0100, Andries Brouwer said:

> A common Unix idiom is testing for the identity
> of two files by comparing st_ino and st_dev.
> A broken idiom?

Comparing two of these obtained at the same time is *usually* a good
test, although racy even on current systems. (Consider the case of an
unlink()/creat() pair between the two stat() calls - there's been more than
one race condition resulting in a security hole based on THIS one). It's
only safe if you actually have an open reference to both files before you
fstat() either one. And yes, it has to be fstat(), as you can't guarantee
that the file referenced by path in stat() is the one you did an open() on.

Comparing the st_ino/st_dev for a file to day with one from last Friday has
NEVER been a good idea.


Attachments:
(No filename) (226.00 B)

2004-01-05 01:04:29

by Mark Mielke

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Sun, Jan 04, 2004 at 10:37:10PM +0000, [email protected] wrote:
> On Sun, Jan 04, 2004 at 11:01:04PM +0100, Andries Brouwer wrote:
> > A common Unix idiom is testing for the identity
> > of two files by comparing st_ino and st_dev.
> > A broken idiom?
> No, just your usual highly selective reading. First of all, that
> idiom relies only on different ->s_dev *among* *currently* *mounted*
> *filesystems*.
> ...
> Now, care to explain how preserving aforementioned common Unix idiom
> is related to your expostulations?

I think he is defending bad design practices by pointing out common
bad design practices, and asking why these bad practices shouldn't be
allowed to continue, given that they are so common... :-)

Are there any real programs that assume st_dev/st_ino values are constant
across mount/unmount/mount? If so, Linus is saying we should break these
programs, so that the authors can become aware of the problem, rather than
leaving the problem as a subtle corner case.

I see no reason at all to keep these programs running. They are incorrect,
and that is that.

If and when this comes up in 2.7 development, I would like to see an
option of the sort: 1) Try to maintain major:minor numbers across
reboots (even at the expense of complexity and efficiency), 2) Try to
maintain a subset of the major:minor numbers across reboots
(compromise) 3) Provide the most efficient implementation, making no
guarantees regarding the numbering scheme, unless using a numbering
scheme turns out to be more efficient. Deprecate 1), and let 2) and 3)
evolve until we see who the victor is... :-) As long as the interface
that maps device to number is abstracted, the above should be pluggable.

mark

--
[email protected]/[email protected]/[email protected] __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

2004-01-05 01:40:51

by Jeremy Maitin-Shepard

[permalink] [raw]
Subject: Re: udev and devfs - The final word

[email protected] writes:

> On Sun, 04 Jan 2004 23:01:04 +0100, Andries Brouwer said:
>> A common Unix idiom is testing for the identity
>> of two files by comparing st_ino and st_dev.
>> A broken idiom?

> Comparing two of these obtained at the same time is *usually* a good
> test, although racy even on current systems. (Consider the case of an
> unlink()/creat() pair between the two stat() calls - there's been more than
> one race condition resulting in a security hole based on THIS one). It's
> only safe if you actually have an open reference to both files before you
> fstat() either one. And yes, it has to be fstat(), as you can't guarantee
> that the file referenced by path in stat() is the one you did an
> open() on.

Unfortunately, programs such as tar depend on inode numbers of distinct
files being distinct even when the file is not open over a period of
several minutes/seconds. This is needed to avoid dumping hard links
more than once. Furthermore, there is no efficient way to write
programs such as tar without depending on this capability. Thus, if
st_ino cannot be used reliably for this purpose, it would be useful for
there to be a system call for retrieving a true
unique-within-the-filesystem identifier for the file.

--
Jeremy Maitin-Shepard

2004-01-05 01:58:30

by Al Viro

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Sun, Jan 04, 2004 at 08:43:27PM -0500, Jeremy Maitin-Shepard wrote:

> Unfortunately, programs such as tar depend on inode numbers of distinct
> files being distinct even when the file is not open over a period of
> several minutes/seconds. This is needed to avoid dumping hard links
> more than once. Furthermore, there is no efficient way to write
> programs such as tar without depending on this capability. Thus, if
> st_ino cannot be used reliably for this purpose, it would be useful for
> there to be a system call for retrieving a true
> unique-within-the-filesystem identifier for the file.

No such thing. It's not the matter of having a syscall to extract such
identifier - it's that on a lot of filesystems (including many common Unix
ones) there's nothing that would qualify.

Note that tar et.al. do not behave well if used on actively modified directory
tree and ->st_ino reuse is the least of the problems in that area.

2004-01-05 01:59:29

by Jeremy Maitin-Shepard

[permalink] [raw]
Subject: Re: st_dev:st_ino

Mark Mielke <[email protected]> writes:

> On Sun, Jan 04, 2004 at 08:43:27PM -0500, Jeremy Maitin-Shepard wrote:
>> Unfortunately, programs such as tar depend on inode numbers of distinct
>> files being distinct even when the file is not open over a period of
>> several minutes/seconds. This is needed to avoid dumping hard links
>> more than once. Furthermore, there is no efficient way to write
>> programs such as tar without depending on this capability. Thus, if
>> st_ino cannot be used reliably for this purpose, it would be useful for
>> there to be a system call for retrieving a true
>> unique-within-the-filesystem identifier for the file.

> We already have that: st_nlink

> I think you mean a system call that would allow you to be certain that
> two file descriptors refer to the same file. Then, any files with
> st_nlink >= 2 would have to use the system call to match them up.

In order to efficiently implement tar, it is necessary to store the
inode numbers for files with a link count greater than 1 in a hash
table. It would not be practical to keep open all of these files in
order to ensure that the inode numbers remain valid. Thus, a different
unique identifier is needed, which is unique even for files that are not
open.

--
Jeremy Maitin-Shepard

2004-01-05 02:09:45

by Jeremy Maitin-Shepard

[permalink] [raw]
Subject: Re: udev and devfs - The final word

[email protected] writes:

> On Sun, Jan 04, 2004 at 08:43:27PM -0500, Jeremy Maitin-Shepard wrote:
>> Unfortunately, programs such as tar depend on inode numbers of distinct
>> files being distinct even when the file is not open over a period of
>> several minutes/seconds. This is needed to avoid dumping hard links
>> more than once. Furthermore, there is no efficient way to write
>> programs such as tar without depending on this capability. Thus, if
>> st_ino cannot be used reliably for this purpose, it would be useful for
>> there to be a system call for retrieving a true
>> unique-within-the-filesystem identifier for the file.

> No such thing. It's not the matter of having a syscall to extract such
> identifier - it's that on a lot of filesystems (including many common Unix
> ones) there's nothing that would qualify.

Even if the files in question aren't being modified, created, deleted,
etc.? Even if nothing on the filesystem is being modified, created,
deleted, etc.?

> [snip]

--
Jeremy Maitin-Shepard

2004-01-05 02:29:09

by Andries Brouwer

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Sun, Jan 04, 2004 at 10:37:10PM +0000, [email protected] wrote:

Hi Al - a happy 2004 to you too!

> Now, care to explain how preserving aforementioned common Unix idiom
> is related to your expostulations?

Hmm. You sound like you agree that random device numbers and NFS
are a bad combination, but don't see why my example might be
relevant.

There is a great variation here in what various servers and clients do,
but roughly speaking filehandles tend to contain a fsid, and this fsid
often (no fsid= given) involves (major,minor,ino). When device numbers
vary randomly, the fsid may vary randomly. Various bad things may happen:
maybe all file handles go stale (or, worse, refer to something else),
or maybe device numbers on the client vary randomly.

Andries

2004-01-05 02:24:32

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Sun, 04 Jan 2004 20:02:36 EST, Mark Mielke said:

> If and when this comes up in 2.7 development, I would like to see an
> option of the sort: 1) Try to maintain major:minor numbers across
> reboots (even at the expense of complexity and efficiency), 2) Try to
> maintain a subset of the major:minor numbers across reboots
> (compromise) 3) Provide the most efficient implementation, making no
> guarantees regarding the numbering scheme, unless using a numbering
> scheme turns out to be more efficient. Deprecate 1), and let 2) and 3)
> evolve until we see who the victor is... :-) As long as the interface
> that maps device to number is abstracted, the above should be pluggable.

I'd recommend (at least during 2.7) some code in the allocator:

if (LINUX_VERSION_CODE % 3) {
major ^= get_random_bytes(4);
minor ^= get_random_bytes(4);
}

Just to keep everybody honest. :)


Attachments:
(No filename) (226.00 B)

2004-01-05 02:53:12

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev and devfs - The final word



On Sun, 4 Jan 2004, Andries Brouwer wrote:
>
> Surprise! Are you leaving POSIX? Or ditching NFS?
> Or demanding that NFS servers must never reboot?

Ok, Andries, time for you to take a deep breath, and calm down. Because
your arguments are getting ridiculous in the extreme.

A NFS server is sure as hell not going to export _its_ dynamic /dev to its
clients. That would be not just stupid, but crazy. Next you tell me that
you were using devfs and exporting that over NFS.

A NFS server is going to export something _totally_ different than its own
/dev directory - it needs to be _client_-specific anyway. That's true with
stable numbers too, btw - ever tried to mount a Solaris /dev on a Linux
client? No workee.

> A common Unix idiom is testing for the identity
> of two files by comparing st_ino and st_dev.
> A broken idiom?

No. It still works. Even if the device numbers change across reboots.

Why? Becuase that _program_ sure as hell isn't running across a reboot.

And again, this is not something we haven't seen before. Have you ever
looked at the "st_dev" values? Try once - look at what it returns for a
NFS-mounted filesystem. Ponder. Notice how it already is NOT stable across
reboots.

In other words, the stuff you're complaining about is all stuff that
nobody has _ever_ been able to rely on, and that has nothign to do with
udev or anythign else. It all just shows how 100% right I am for saying
that you cannot rely on stable numbers.

So I repeat: calm down, and think it through.

Linus

2004-01-05 03:06:14

by David Lang

[permalink] [raw]
Subject: Re: udev and devfs - The final word

Linus, what Andries is saying is that if you export a directory (say
/home) the process of exporting it somehow uses the /dev device number so
if the server reboots and gets a different device number for the partition
that /home is on the clients won't see it as the same export, breaking the
NFS requirement that a server can be rebooted.

I don't agree with him becouse if the NFS server does include /dev info in
what it shows to the outside world it's already broken.

David Lang


On Sun, 4 Jan 2004, Linus Torvalds wrote:

> Date: Sun, 4 Jan 2004 18:52:56 -0800 (PST)
> From: Linus Torvalds <[email protected]>
> To: Andries Brouwer <[email protected]>
> Cc: Rob Love <[email protected]>, [email protected],
> Pascal Schmidt <[email protected]>, [email protected],
> Greg KH <[email protected]>
> Subject: Re: udev and devfs - The final word
>
>
>
> On Sun, 4 Jan 2004, Andries Brouwer wrote:
> >
> > Surprise! Are you leaving POSIX? Or ditching NFS?
> > Or demanding that NFS servers must never reboot?
>
> Ok, Andries, time for you to take a deep breath, and calm down. Because
> your arguments are getting ridiculous in the extreme.
>
> A NFS server is sure as hell not going to export _its_ dynamic /dev to its
> clients. That would be not just stupid, but crazy. Next you tell me that
> you were using devfs and exporting that over NFS.
>
> A NFS server is going to export something _totally_ different than its own
> /dev directory - it needs to be _client_-specific anyway. That's true with
> stable numbers too, btw - ever tried to mount a Solaris /dev on a Linux
> client? No workee.
>
> > A common Unix idiom is testing for the identity
> > of two files by comparing st_ino and st_dev.
> > A broken idiom?
>
> No. It still works. Even if the device numbers change across reboots.
>
> Why? Becuase that _program_ sure as hell isn't running across a reboot.
>
> And again, this is not something we haven't seen before. Have you ever
> looked at the "st_dev" values? Try once - look at what it returns for a
> NFS-mounted filesystem. Ponder. Notice how it already is NOT stable across
> reboots.
>
> In other words, the stuff you're complaining about is all stuff that
> nobody has _ever_ been able to rely on, and that has nothign to do with
> udev or anythign else. It all just shows how 100% right I am for saying
> that you cannot rely on stable numbers.
>
> So I repeat: calm down, and think it through.
>
> Linus
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

2004-01-05 03:14:37

by Al Viro

[permalink] [raw]
Subject: Re: st_dev:st_ino

On Sun, Jan 04, 2004 at 09:02:02PM -0500, Jeremy Maitin-Shepard wrote:

> In order to efficiently implement tar, it is necessary to store the
> inode numbers for files with a link count greater than 1 in a hash
> table. It would not be practical to keep open all of these files in
> order to ensure that the inode numbers remain valid. Thus, a different
> unique identifier is needed, which is unique even for files that are not
> open.

Files that are not open could've been removed and replaced with something
completely different since your stat(2).

2004-01-05 03:08:29

by Daniel Jacobowitz

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Sun, Jan 04, 2004 at 06:52:56PM -0800, Linus Torvalds wrote:
>
>
> On Sun, 4 Jan 2004, Andries Brouwer wrote:
> >
> > Surprise! Are you leaving POSIX? Or ditching NFS?
> > Or demanding that NFS servers must never reboot?
>
> Ok, Andries, time for you to take a deep breath, and calm down. Because
> your arguments are getting ridiculous in the extreme.
>
> A NFS server is sure as hell not going to export _its_ dynamic /dev to its
> clients. That would be not just stupid, but crazy. Next you tell me that
> you were using devfs and exporting that over NFS.
>
> A NFS server is going to export something _totally_ different than its own
> /dev directory - it needs to be _client_-specific anyway. That's true with
> stable numbers too, btw - ever tried to mount a Solaris /dev on a Linux
> client? No workee.

I think you two are talking straight past each other. Andries is
talking about the fsid, which is determined by the NFS server, based on
its idea of the device number of the filesystem underlying the exported
directory. Right now, I can reboot my host system, and when it comes
up then the NFS directories it exports to clients will have the same
fsid. With random device numbers it won't work; after rebooting the
NFS server all clients will be forced to explicitly unmount and
remount.

Now, it seems to me that this isn't much of an argument against random
device numbers. Have userspace set a UUID for the device if you want,
and use that in the fsid instead. But that's the argument; it has
nothing to do with the NFS server exporting its /dev.

--
Daniel Jacobowitz
MontaVista Software Debian GNU/Linux Developer

2004-01-05 03:33:29

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev and devfs - The final word



On Sun, 4 Jan 2004, Daniel Jacobowitz wrote:
>
> I think you two are talking straight past each other. Andries is
> talking about the fsid, which is determined by the NFS server, based on
> its idea of the device number of the filesystem underlying the exported
> directory. Right now, I can reboot my host system, and when it comes
> up then the NFS directories it exports to clients will have the same
> fsid. With random device numbers it won't work; after rebooting the
> NFS server all clients will be forced to explicitly unmount and
> remount.

Ahh. I'll buy into that, and yes, this is an example of something that
needs to be fixed.

It shouldn't be fixed by saying "device numbers have to be stable across
reboots", because the fact is, we're most likely going to have storage
that is really really hard to enumerate in a repeatable fashion.

So the _proper_ thing to do is to have the NFS server not use the device
number as part of fsid. It should use the _stable_ UUID of the filesystem
or some similar label.

And it should do that exactly because the device number isn't as stable as
NFS exporting would like it to be. Exactly because things like network-
attached disks etc. How would you otherwise export a disk that perhaps
gets its address from DHCP?

[ I incredulously asked a NetApp person why you'd ever want to expose the
_disk_ over ethenet, rather than just have the NAS device export a
filesystem of its own. It turns out that some people really want to just
see a block device, either because Windows sucks at network filesystems
or because they want to do things like databases onto them. The point
being that once you do that, you'll likely want to export the thing as
an SMB share from the thing that "owns" the disk.

So you would literally have a _disk_ whose IP address changed depending
on what other machines were booted on the same network. ]

Issues like this is also why Linux vendors have already started doing
things like "mount by label" - because disks have a nasty tendency to move
around, and specifying the fstab contents (or "root=xxx" on the kernel
command line) with physical location or similar just doesn't work all
that well. It happens today with things like USB2 or firewire disks. They
get moved around, and they get a new device number.

It's still not _common_, but it's slowly getting there.

> Now, it seems to me that this isn't much of an argument against random
> device numbers. Have userspace set a UUID for the device if you want,
> and use that in the fsid instead. But that's the argument; it has
> nothing to do with the NFS server exporting its /dev.

I buy into that, and I agree 100% with you that this is just a case where
you should use a UUID.

Linus

2004-01-05 03:42:45

by Al Viro

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Mon, Jan 05, 2004 at 03:29:01AM +0100, Andries Brouwer wrote:
> On Sun, Jan 04, 2004 at 10:37:10PM +0000, [email protected] wrote:
>
> Hi Al - a happy 2004 to you too!
>
> > Now, care to explain how preserving aforementioned common Unix idiom
> > is related to your expostulations?
>
> Hmm. You sound like you agree that random device numbers and NFS
> are a bad combination, but don't see why my example might be
> relevant.

No. I don't see what the fuck does it have to POSIX compliance, ability
to determine whether two files are identical by st_ino/st_dev and common
UNIX idioms.

> There is a great variation here in what various servers and clients do,
> but roughly speaking filehandles tend to contain a fsid, and this fsid
> often (no fsid= given) involves (major,minor,ino).

Now, _that_ is true. And yes, I agree that setups with unstable device
numbers do need explicit actions on part of admin. In particular, editing
/etc/exports to add fsid= in each relevant entry.

Which means that *in* *setups* *where* *numbers* *are* *currently* *stable*
we should not make them random without admin's knowledge. And /etc/exports
is not the only problem - RAID, journaling filesystems with device number of
log stored on-disk, etc.

*However*, if we are talking about new classes of devices, all bets are off
and proper fix is to stop using unsuitable interfaces for those devices.
For exports it means "use explicit fsid". For RAID we both agreed, IIRC,
that raidtools will need to switch to saner API, etc.

2004-01-05 03:49:10

by Rob Landley

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Sunday 04 January 2004 21:06, David Lang wrote:
> Linus, what Andries is saying is that if you export a directory (say
> /home) the process of exporting it somehow uses the /dev device number so
> if the server reboots and gets a different device number for the partition
> that /home is on the clients won't see it as the same export, breaking the
> NFS requirement that a server can be rebooted.

NFS always struck me as a peverse design. "The fileserver must be stateless
with regard to clients, even though maintainging state is what a filesystem
DOES, and the point of the thing is to export a filesystem." Okay... (If it
was exporting read-only filesystems with no locking of any kind, maybe they'd
have a point, but come on guys...)

So here's an example of where the fileserver _does_ expect to maintain
non-file state across reboots. "Ooh, the device node we're exporting is part
of the ID, gee, we missed one!"

So why, exactly, can the NFS server not maintain whatever extra state it needs
to remember between reboots in a filesystem? (Not even necessarily the one
it's exporting, just some rc file something under /var.) The device node it
was exporting USED to be in the filesystem, you know, ala mknod. Now that
the kernel's not keeping that stable, have the #*%(&# server generate a
number and make a note of it somewhere. (Is requiring an NFS server to have
access to persistent storage too much to ask?)

Personally, I could never figure out why Samba servers are in userspace but
NFS servers seem to want to live in the kernel. I can almost secure a samba
server for access to the outside world, but a NFS system that isn't behind a
firewall automatically says to me "this machine has already been compromised
eight ways from sunday within five minutes of being exposed to the internet".
Call me paranoid...

Rob

2004-01-05 03:50:46

by Al Viro

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Sun, Jan 04, 2004 at 07:33:16PM -0800, Linus Torvalds wrote:
> Ahh. I'll buy into that, and yes, this is an example of something that
> needs to be fixed.
>
> It shouldn't be fixed by saying "device numbers have to be stable across
> reboots", because the fact is, we're most likely going to have storage
> that is really really hard to enumerate in a repeatable fashion.
>
> So the _proper_ thing to do is to have the NFS server not use the device
> number as part of fsid. It should use the _stable_ UUID of the filesystem
> or some similar label.

... and we already have a way to specify it explicitly. Which, BTW, allows
to take server down, copy exported fs from failing IDE disk to SCSI one and
reexport. With clients remaining happy with you. Remember discussions
circa 2.5.50 or so about that stuff?

So we have tools for that. And it's 100% OK to say "if you are doing NFS
export of filesystem that lives on $new_weird_device, explicit fsid= is
not just a good idea, it's a must-have".

What is _not_ OK, though, is to have folks suddenly see /dev/hda3 changing
its device number - then we would break existing setups that worked all
along; even if admin can fix the breakage, it's not a good thing to do.

2004-01-05 04:02:37

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev and devfs - The final word



On Mon, 5 Jan 2004 [email protected] wrote:
>
> What is _not_ OK, though, is to have folks suddenly see /dev/hda3 changing
> its device number - then we would break existing setups that worked all
> along; even if admin can fix the breakage, it's not a good thing to do.

Ehh, it will actually happen.

If nothing else, things like SATA will end up meaning that the device you
were used to seeign as /dev/hdc will suddenly show up as /dev/scd0
instead. Just because you changed the cabling while you upgraded to a
newer version of your CD-ROM drive.

And the thing is, with fs labels and udev, even "existing systems" really
shouldn't much care.

Now, we'd probably not want to force the switch, but I do suspect we'll
have exactly this as a switch in the "Kernel Debugging Config" section.
Where even _common_ things like disks could end up with per-bootup values.
Just to verify that every part of the system ends up having it right.

Think of it this way: RedHat not that long ago decided to break with a
_lot_ of tradition by switching over to UTF-8 as the common text encoring.
It broke some _major_ programs in how they dealt with "simple" things like
keyboard input that had worked for literally _decades_.

And you could switch it off if you really wanted to, but quite frankly, it
wasn't even a simple choice in the install. You had to know what you were
doing to switch it off.

And the thing is, that is _the_ single thing that cleaned up a lot of
remaining problems wrt UTF-8 on Linux. Yes, almost all of them had been
solved already, or RH wouldn't have dared do the switch. But to get there
all the way, you had to literally force the cut-over.

(Yeah, I'm a bad person, and I personally went back to the C locale,
because "pine" still doesn't get UTF-8 right, and nobody is apparently
ever going to fix it. Oh, well. But at least I know I'm doing something
_wrong_, which in itself is a good thing.).

Linus

2004-01-05 04:16:17

by Peter Chubb

[permalink] [raw]
Subject: Re: udev and devfs - The final word

>>>>> "Andries" == Andries Brouwer <[email protected]> writes:

Andries> On Sun, Jan 04, 2004 at 01:05:20PM -0800, Linus Torvalds
Andries> wrote:

Andries> Surprise! Are you leaving POSIX? Or ditching NFS? Or
Andries> demanding that NFS servers must never reboot?

Andries> A common Unix idiom is testing for the identity of two files
Andries> by comparing st_ino and st_dev. A broken idiom?

It's worse than that. You can do
mknod fred b maj minor
anywhere on any UNIX filesystem and expect it to a) work and b) refer
to the same device for all time until it is removed. However, this
doesn't appear to be guaranteed by SUS -- the only guarantees are that
the dev_t returned from the stat() family of calls is unique within a LAN.

I know that Linux already breaks this (the stupid /dev/sg[0-9] that
depend not on the SCSI bus and lun but on the order they're detected,
for example)

2004-01-05 04:38:34

by Al Viro

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Sun, Jan 04, 2004 at 08:02:20PM -0800, Linus Torvalds wrote:
>
>
> On Mon, 5 Jan 2004 [email protected] wrote:
> >
> > What is _not_ OK, though, is to have folks suddenly see /dev/hda3 changing
> > its device number - then we would break existing setups that worked all
> > along; even if admin can fix the breakage, it's not a good thing to do.
>
> Ehh, it will actually happen.
>
> If nothing else, things like SATA will end up meaning that the device you
> were used to seeign as /dev/hdc will suddenly show up as /dev/scd0
> instead. Just because you changed the cabling while you upgraded to a
> newer version of your CD-ROM drive.

If I open the damn box, I sure as hell can be bothered to edit stuff in
/etc...

> And the thing is, with fs labels and udev, even "existing systems" really
> shouldn't much care.
>
> Now, we'd probably not want to force the switch, but I do suspect we'll
> have exactly this as a switch in the "Kernel Debugging Config" section.
> Where even _common_ things like disks could end up with per-bootup values.
> Just to verify that every part of the system ends up having it right.

Then we'd better have a very good idea of the things that are going to
break. Note that right now even late-boot code in kernel itself will
break on that - there are explicit checks for ROOT_DEV==MKDEV(2,0),
all sorts of weird crap deep in the bowels of arch/ppc/*/*, etc.

It won't be an easy transition - I know that Greg is very optimistic
about it, but there will be a *lot* of crap to take care of. In theory
getting bigger dev_t should've been very straightforward, but if you
check what really had been involved...

ObOtherStraightforwardThings: net_device refcounting. Take a look at
Jeff's queue someday - by now it's one big merge short of getting it
right for practically all drivers. 1.9Mb total + 247Kb pending patches
here. Several hundreds changesets, practically all of them fixing
exploitable holes. And yes, most of them had been bugs all along -
since 2.2 if not earlier. Sure, that made things better, but if somebody
comes along and makes similar "fun" necessary for e.g. ALSA...

> because "pine" still doesn't get UTF-8 right, and nobody is apparently
> ever going to fix it. Oh, well. But at least I know I'm doing something
> _wrong_, which in itself is a good thing.).

Heh. Took you long enough - "using pine" should've been a dead giveaway
from the very beginning ;-)

2004-01-05 04:42:28

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev and devfs - The final word



On Mon, 5 Jan 2004, Peter Chubb wrote:
>
> It's worse than that. You can do
> mknod fred b maj minor
> anywhere on any UNIX filesystem and expect it to a) work and b) refer
> to the same device for all time until it is removed.

Hmm.. I can see (a) (except for the fact that pretty much all unixes have
mount-flags to say "no device files") but I don't see why you'd _ever_
expect (b) to be true.

It's patently not true for such rather traditional unix devices as pty's,
for example. The "same device" ends up being true only for as long as the
master at the other end exists - and the same numbers get re-used in all
normal usage for different virtual devices.

> I know that Linux already breaks this (the stupid /dev/sg[0-9] that
> depend not on the SCSI bus and lun but on the order they're detected,
> for example)

That "stupid" thing is a hell of a lot less stupid than the alternatives,
and is very much equivalent to how pty's work.

In fact, the "number according to detection" is pretty much the best
device number allocation strategy. It's the _only_ one that doesn't have
some incorrect bias built-in.

Linus

2004-01-05 04:54:45

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev and devfs - The final word



On Mon, 5 Jan 2004 [email protected] wrote:

> > If nothing else, things like SATA will end up meaning that the device you
> > were used to seeign as /dev/hdc will suddenly show up as /dev/scd0
> > instead. Just because you changed the cabling while you upgraded to a
> > newer version of your CD-ROM drive.
>
> If I open the damn box, I sure as hell can be bothered to edit stuff in
> /etc...

Actually, not necessarily.

The thing is, _the_ most common reason I have for opening the box is that
the effing thing started having problems.

At which point I want to just remove the disk, move it to another box, and
boot up the other box.

And THAT is exactly the kind of situation where I sure as hell don't want
to care exactly where the disk was. I can't "prepare" for it by editing
files in /etc, since I don't know that the CPU fan or whatever is going to
die on me.

And this is _exactly_ why we should try to get away from device numbering
having any meaning. Because if we do this right, something like the CPU
fan dying, and me moving a disk to a new machine that has SATA (with the
disk having both SATA and PATA connectors), I shouldn't need to even
_think_ about it.

That's where "mount by label" does part of the job. But if the system is
_always_ set up to do things like NFS exports according to some separate
UUID, that too would "just work".

There's a lot to be said for "just work". Even if sometimes it takes some
pain when you break old (and broken) assumptions.

> > because "pine" still doesn't get UTF-8 right, and nobody is apparently
> > ever going to fix it. Oh, well. But at least I know I'm doing something
> > _wrong_, which in itself is a good thing.).
>
> Heh. Took you long enough - "using pine" should've been a dead giveaway
> from the very beginning ;-)

Those are them fighting words.

But since you brought it up: do you actually have anything else that can
open a remote IMAP file with a few thousand messages without taking ages
for it, and that you don't have to mouse around with? I'd like a graphical
interface for configuring stuff etc, but I sure as hell don't want to find
some f*ing icon to save a few messages that I selected in-order to my
"doit" queue or go to the next one, or pipe the thing to a shell-script,
or any number of things that are my actual _job_.

And the "no mousing" means that I don't want to have some popup window
that asks me what file I want to save into or similar crap. I can type
fast enough if I stay on the keyboard and can focus on one part of the
screen, but if I have to switch my focus around, I'm a goner.

On a related matter, I'm probably a retard, but I've tried alternatives to
"trn" too, and there really aren't any. None of the graphical news readers
can show me one full page of threads, select the 3-4 threads from _that_
one page that I want (from the keyboard), and then kill _that_ one page.
Not the whole newsgroup: only the part that shows in the window at that
time.

In "trn", the magic command is capital-D, for "discard".

Linus

2004-01-05 04:52:44

by Trond Myklebust

[permalink] [raw]
Subject: Re: udev and devfs - The final word

P? su , 04/01/2004 klokka 22:48, skreiv Rob Landley:

> NFS always struck me as a peverse design. "The fileserver must be stateless
> with regard to clients, even though maintainging state is what a filesystem
> DOES, and the point of the thing is to export a filesystem." Okay... (If it
> was exporting read-only filesystems with no locking of any kind, maybe they'd
> have a point, but come on guys...)

Sigh... What has that got to do with anything?

Read the RFCs: NFS *was* entirely stateless until v4 was drafted.
Locking was never part of the NFS protocol, but was an external addition
that was documented by the Open Group. So, yes, there is a history and a
reason behind all the talk of statelessness.

As for the current thread about remembering device numbers: as far as
NFS is concerned, that is entirely an implementation issue. There is no
need for any extra NFS protocol support for this sort of crap.

> So why, exactly, can the NFS server not maintain whatever extra state it needs
> to remember between reboots in a filesystem? (Not even necessarily the one
> it's exporting, just some rc file something under /var.) The device node it
> was exporting USED to be in the filesystem, you know, ala mknod. Now that
> the kernel's not keeping that stable, have the #*%(&# server generate a
> number and make a note of it somewhere. (Is requiring an NFS server to have
> access to persistent storage too much to ask?)

It could be done (and probably entirely in userspace). I assume you are
volunteering to do the work?

> Personally, I could never figure out why Samba servers are in userspace but
> NFS servers seem to want to live in the kernel. I can almost secure a samba
> server for access to the outside world, but a NFS system that isn't behind a
> firewall automatically says to me "this machine has already been compromised
> eight ways from sunday within five minutes of being exposed to the internet".
> Call me paranoid...

Sun was doing Kerberos for NFS years before the Samba project was
started.

Security has bugger all to do with kernel or userland and everything to
do with the short-sighted "munitions" policies of certain governments at
the time around when the Sun RPC protocol was being drafted. The same
policies were still around to dictate our implementation much later when
we were doing RPC for Linux. Now the laws have changed, and so we've
finally been able to add strong authentication in 2.6.x.

Cheers,
Trond

2004-01-05 05:32:35

by Eric W. Biederman

[permalink] [raw]
Subject: Re: udev and devfs - The final word

[email protected] writes:

> On Sun, Jan 04, 2004 at 08:02:20PM -0800, Linus Torvalds wrote:
> > Now, we'd probably not want to force the switch, but I do suspect we'll
> > have exactly this as a switch in the "Kernel Debugging Config" section.
> > Where even _common_ things like disks could end up with per-bootup values.
> > Just to verify that every part of the system ends up having it right.
>
> Then we'd better have a very good idea of the things that are going to
> break. Note that right now even late-boot code in kernel itself will
> break on that - there are explicit checks for ROOT_DEV==MKDEV(2,0),
> all sorts of weird crap deep in the bowels of arch/ppc/*/*, etc.

/sbin/lilo and possibly some of the other bootloaders. Relationships
between devices are a challenge to work with. How do you go from a
partition to it's actual block device etc. I don't remember how many
major numbers lilo has hard coded, I just remember looking at it once
and realizing I couldn't think of a better way to accomplish what it
was trying to do.

Eric

2004-01-05 06:11:30

by Al Viro

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Sun, Jan 04, 2004 at 08:52:56PM -0800, Linus Torvalds wrote:

> That's where "mount by label" does part of the job. But if the system is
> _always_ set up to do things like NFS exports according to some separate
> UUID, that too would "just work".

mount by label does part of the job, until you decide to use dd(1) to copy
a disk. At which point you have, AFAICS, no way tell which copy will get
mounted.

> Those are them fighting words.
>
> But since you brought it up: do you actually have anything else that can
> open a remote IMAP file with a few thousand messages without taking ages
> for it, and that you don't have to mouse around with? I'd like a graphical
> interface for configuring stuff etc, but I sure as hell don't want to find
> some f*ing icon to save a few messages that I selected in-order to my
> "doit" queue or go to the next one, or pipe the thing to a shell-script,
> or any number of things that are my actual _job_.

I prefer to ssh to another box and use mutt. Seriously, I've made a mistake
of reading imapd source and that was enough to decide that I'm _not_ touching
uw-<anything> and that protocol in general unless I really have no other
options. So far I've managed to avoid that...

> On a related matter, I'm probably a retard, but I've tried alternatives to
> "trn" too, and there really aren't any.

Same here. There are things about trn command set I'd prefer to see changed,
but it's better than other newsreaders I've seen...

2004-01-05 07:04:10

by Rob Landley

[permalink] [raw]
Subject: [offtopic] Re: udev and devfs - The final word

On Sunday 04 January 2004 22:52, Trond Myklebust wrote:
> P? su , 04/01/2004 klokka 22:48, skreiv Rob Landley:
> > NFS always struck me as a peverse design. "The fileserver must be
> > stateless with regard to clients, even though maintainging state is what
> > a filesystem DOES, and the point of the thing is to export a filesystem."
> > Okay... (If it was exporting read-only filesystems with no locking of
> > any kind, maybe they'd have a point, but come on guys...)
>
> Sigh... What has that got to do with anything?
>
> Read the RFCs: NFS *was* entirely stateless until v4 was drafted.
> Locking was never part of the NFS protocol, but was an external addition
> that was documented by the Open Group. So, yes, there is a history and a
> reason behind all the talk of statelessness.

I vaguely remember being pretty well up to speed on V2 (circa... 1995?) The
last one I even glanced at was V3, but I never had to support it. I haven't
even looked at V4. For exporting /home directories, everybody I deal with
seems to want samba servers these days instead for some reason. (Couple of
net boot systems that care more about permissions than that, but ram's so
cheap that it's easier to just "ssh user@bootserver -i key "cat root_img.tgz"
| tar xz" into a ramfs or shmfs or some such. (Heck, the last system I set
up like that mounted a zisofs image and ran from that...)

I'm sure it's still useful. I just haven't wanted to even attempt to secure
it. For home directories, samba is doing a simple tcp/ip connection per
session, reestablishing it automatically if it breaks (same server reboot
question). Since _both_ protocols seem to suck pretty badly under the hood,
it's been a question of choosing the lesser of two evils. It seems that more
people actually USE samba, so...

> > So why, exactly, can the NFS server not maintain whatever extra state it
> > needs to remember between reboots in a filesystem? (Not even necessarily
> > the one it's exporting, just some rc file something under /var.) The
> > device node it was exporting USED to be in the filesystem, you know, ala
> > mknod. Now that the kernel's not keeping that stable, have the #*%(&#
> > server generate a number and make a note of it somewhere. (Is requiring
> > an NFS server to have access to persistent storage too much to ask?)
>
> It could be done (and probably entirely in userspace). I assume you are
> volunteering to do the work?

I don't like nfs, I haven't bothered to actually use it for anything since
1999, so no.

> > Personally, I could never figure out why Samba servers are in userspace
> > but NFS servers seem to want to live in the kernel. I can almost secure
> > a samba server for access to the outside world, but a NFS system that
> > isn't behind a firewall automatically says to me "this machine has
> > already been compromised eight ways from sunday within five minutes of
> > being exposed to the internet". Call me paranoid...
>
> Sun was doing Kerberos for NFS years before the Samba project was
> started.
>
> Security has bugger all to do with kernel or userland and everything to
> do with the short-sighted "munitions" policies of certain governments at
> the time around when the Sun RPC protocol was being drafted. The same

I can transparently tunnel any tcp/ip session through ssh with some iptables
rules and a dozen line python script. (Great fun for rolling your own vpn.)
Mixing UDP and encryption is just plain a bad idea: no level at which it
makes sense to store persistent connection state in a "fire and forget"
packet protocol...)

I.E. this also works with samba, but didn't with (old) NFS.

> policies were still around to dictate our implementation much later when
> we were doing RPC for Linux. Now the laws have changed, and so we've
> finally been able to add strong authentication in 2.6.x.

Can you recommend a good link to the history of NFS? Computer history's a
hobby of mine. (I've got snippets on this topic, but not any kind of unified
story of NFS...)

http://www.landley.net/history/mirror/index.html
http://www.landley.net/history/scans/index.html

> Cheers,
> Trond

Rob

2004-01-05 07:47:24

by Greg KH

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Sun, Jan 04, 2004 at 08:52:56PM -0800, Linus Torvalds wrote:
>
> But since you brought it up: do you actually have anything else that can
> open a remote IMAP file with a few thousand messages without taking ages
> for it, and that you don't have to mouse around with? I'd like a graphical
> interface for configuring stuff etc, but I sure as hell don't want to find
> some f*ing icon to save a few messages that I selected in-order to my
> "doit" queue or go to the next one, or pipe the thing to a shell-script,
> or any number of things that are my actual _job_.

mutt can provide a path for a recovering pine addict. I did that a
number of years ago and have been quite happy since. I can't vouch for
its IMAP speeds (seems to be fast enough for me, as long as I don't try
to do a filter on a large IMAP folder), but the other tasks you do
(selecting, piping, etc.) work very well.

I even think there's a mutt config file that duplicates all of the
default pine keystrokes just to make moving easier.

The message threading was reason enough for me to switch, although I've
heard rumors that pine can handle that now.

thanks,

greg k-h

2004-01-05 07:40:08

by Greg KH

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Mon, Jan 05, 2004 at 04:38:30AM +0000, [email protected] wrote:
>
> Then we'd better have a very good idea of the things that are going to
> break. Note that right now even late-boot code in kernel itself will
> break on that - there are explicit checks for ROOT_DEV==MKDEV(2,0),
> all sorts of weird crap deep in the bowels of arch/ppc/*/*, etc.
>
> It won't be an easy transition - I know that Greg is very optimistic
> about it, but there will be a *lot* of crap to take care of.

Oh I know it's going to be tough, and there's going to be a lot of crap
to take care of, but in the end, I think it will be worth it...hopefully
if I'm still sane then...

> ObOtherStraightforwardThings: net_device refcounting. Take a look at
> Jeff's queue someday - by now it's one big merge short of getting it
> right for practically all drivers. 1.9Mb total + 247Kb pending patches
> here. Several hundreds changesets, practically all of them fixing
> exploitable holes. And yes, most of them had been bugs all along -
> since 2.2 if not earlier. Sure, that made things better, but if somebody
> comes along and makes similar "fun" necessary for e.g. ALSA...

Yeah, ALSA scares me, along with the input layer code. I had dreams of
easily converting them to use proper refcounting, but now know there's
no way that would be an easy conversion and have pretty much given up on
it. For 2.6 at least.

That's why my "simple_class" patch will have to be a band-aid for now to
get sysfs representation for those types of devices.

thanks,

greg k-h

2004-01-05 07:45:15

by James Cloos

[permalink] [raw]
Subject: Re: udev and devfs - The final word

>>>>> "Linus" == Linus Torvalds <[email protected]> writes:

Linus> Why? Becuase that _program_ sure as hell isn't
Linus> running across a reboot.

Is that strictly true? With (software) suspend to disk,
will the old device enumeration data be recovered from
the suspend partition?

-JimC

2004-01-05 07:52:49

by Nigel Cunningham

[permalink] [raw]
Subject: Re: udev and devfs - The final word

Hi.

On Mon, 2004-01-05 at 20:44, James H. Cloos Jr. wrote:
> >>>>> "Linus" == Linus Torvalds <[email protected]> writes:
>
> Linus> Why? Becuase that _program_ sure as hell isn't
> Linus> running across a reboot.
>
> Is that strictly true? With (software) suspend to disk,
> will the old device enumeration data be recovered from
> the suspend partition?

Yes. You end up running the original kernel.

Regards,

Nigel

--
My work on Software Suspend is graciously brought to you by
LinuxFund.org.


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2004-01-05 08:13:26

by Mark Mielke

[permalink] [raw]
Subject: st_dev:st_ino (was: Re: udev and devfs - The final word)

On Sun, Jan 04, 2004 at 08:43:27PM -0500, Jeremy Maitin-Shepard wrote:
> Unfortunately, programs such as tar depend on inode numbers of distinct
> files being distinct even when the file is not open over a period of
> several minutes/seconds. This is needed to avoid dumping hard links
> more than once. Furthermore, there is no efficient way to write
> programs such as tar without depending on this capability. Thus, if
> st_ino cannot be used reliably for this purpose, it would be useful for
> there to be a system call for retrieving a true
> unique-within-the-filesystem identifier for the file.

We already have that: st_nlink

I think you mean a system call that would allow you to be certain that
two file descriptors refer to the same file. Then, any files with
st_nlink >= 2 would have to use the system call to match them up.

mark

--
[email protected]/[email protected]/[email protected] __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

2004-01-05 09:06:28

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Mon, 05 Jan 2004 02:44:10 EST, "James H. Cloos Jr." said:
> >>>>> "Linus" == Linus Torvalds <[email protected]> writes:
>
> Linus> Why? Becuase that _program_ sure as hell isn't
> Linus> running across a reboot.
>
> Is that strictly true? With (software) suspend to disk,
> will the old device enumeration data be recovered from
> the suspend partition?

That would be a suspend, not a reboot, if we're speaking strictly....


Attachments:
(No filename) (226.00 B)

2004-01-05 11:02:04

by Robin Rosenberg

[permalink] [raw]
Subject: Re: udev and devfs - The final word

m?ndagen den 5 januari 2004 08.45 skrev Nigel Cunningham:
> Hi.
>
> On Mon, 2004-01-05 at 20:44, James H. Cloos Jr. wrote:
> > >>>>> "Linus" == Linus Torvalds <[email protected]> writes:
> >
> > Linus> Why? Becuase that _program_ sure as hell isn't
> > Linus> running across a reboot.
> >
> > Is that strictly true? With (software) suspend to disk,
> > will the old device enumeration data be recovered from
> > the suspend partition?
>
> Yes. You end up running the original kernel.

But not necessarily the same devices.

> Regards,
>
> Nigel

-- rob in

2004-01-05 11:16:20

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Sun, Jan 04, 2004 at 11:47:17PM -0800, Greg KH wrote:

> > But since you brought it up: do you actually have anything else that can
> > open a remote IMAP file with a few thousand messages without taking ages
> > for it, and that you don't have to mouse around with? I'd like a graphical
> > interface for configuring stuff etc, but I sure as hell don't want to find
> > some f*ing icon to save a few messages that I selected in-order to my
> > "doit" queue or go to the next one, or pipe the thing to a shell-script,
> > or any number of things that are my actual _job_.
>
> mutt can provide a path for a recovering pine addict. I did that a
> number of years ago and have been quite happy since. I can't vouch for
> its IMAP speeds (seems to be fast enough for me, as long as I don't try
> to do a filter on a large IMAP folder), but the other tasks you do
> (selecting, piping, etc.) work very well.

Mutt with IMAP is rather bearable even on a GPRS connection (40kbps,
1sec latency). On a 100baseTX it's not distinguishable from local
operation.

One thing missing in mutt is a persistent message and message header
cache - opening a folder can take a lot of time over a slow connection.
But there is a patch at least for the message header cache persistence
floating on the 'net somewhere.

Another thing that bugs me often in mutt is its inability to service
keystrokes while doing something else (like checking for new mail over
IMAP with a slow link). It becomes unresponsive until that task is done.

> I even think there's a mutt config file that duplicates all of the
> default pine keystrokes just to make moving easier.
>
> The message threading was reason enough for me to switch, although I've
> heard rumors that pine can handle that now.
>
> thanks,
>
> greg k-h
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Vojtech Pavlik
SuSE Labs, SuSE CR

2004-01-05 12:07:17

by Trond Myklebust

[permalink] [raw]
Subject: Re: [offtopic] Re: udev and devfs - The final word

P? m? , 05/01/2004 klokka 02:03, skreiv Rob Landley:

> I'm sure it's still useful. I just haven't wanted to even attempt to secure
> it. For home directories, samba is doing a simple tcp/ip connection per
> session, reestablishing it automatically if it breaks (same server reboot
> question).

...and so does NFS.

> Since _both_ protocols seem to suck pretty badly under the hood,
> it's been a question of choosing the lesser of two evils. It seems that more
> people actually USE samba, so...

...and 95% of all desktop machines are Windows based. So what's new?

> > It could be done (and probably entirely in userspace). I assume you are
> > volunteering to do the work?
>
> I don't like nfs, I haven't bothered to actually use it for anything since
> 1999, so no.

Then you're unlikely to get the feature until someone else finds it
worth their while to implement it.

> I can transparently tunnel any tcp/ip session through ssh with some iptables
> rules and a dozen line python script. (Great fun for rolling your own vpn.)
> Mixing UDP and encryption is just plain a bad idea: no level at which it
> makes sense to store persistent connection state in a "fire and forget"
> packet protocol...)

So do the same thing with NFS now that we've finally gotten RPC over TCP
fully supported under Linux too: everybody else has had it for years.

In 2.6.x, we've also added native RPCSEC_GSS support for kerberos-based
authentication. Packet integrity checking and full privacy are in the
pipeline, as are other security mechanisms.

> Can you recommend a good link to the history of NFS? Computer history's a
> hobby of mine. (I've got snippets on this topic, but not any kind of unified
> story of NFS...)

Dunno if anybody has ever written a proper history of NFS, but I can ask
around. Here are a few sources I found on the fly though. They all tend
to relate to the history of the protocol, and not much about
implementation history (shame that).

NFSv2 transition to NFSv3
http://www.netapp.com/tech_library/evolution.html
RFC1813

transition to NFSv4
http://www.ietf.org/html.charters/nfsv4-charter.html
(in particular see http://www.ietf.org/rfc/rfc2624.txt which
runs through the earlier design considerations)
RFC3530 (the final version of the protocol)

I'd also recommend nosing around the Connectathon site on
http://www.connectathon.org/
That contains a record of talks going back to 95 (not really that long -
I know) and so should help out with the more recent history.

...of course if you google around, you'll also find loads of Powerpoint
presentations etc...

Cheers,
Trond

2004-01-05 12:44:03

by Nigel Cunningham

[permalink] [raw]
Subject: Re: udev and devfs - The final word

Hi.

The suspend to disk implementations all assume that devices are not
[dis]appearing under us while we're suspended. If you do go adding and
removing devices while the power is off, you can expect the same
problems you'd get if you removed them without suspending the machine.
It would be roughly equivalent to hot[un]plugging devices.

To return to the original point though, userspace may see a sudden big
jump in the time clock if it's looking, but it won't suddenly find major
& minor numbers are different.

Regards,

Nigel

On Tue, 2004-01-06 at 00:01, Robin Rosenberg wrote:
> > Yes. You end up running the original kernel.
>
> But not necessarily the same devices.

--
My work on Software Suspend is graciously brought to you by
LinuxFund.org.


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2004-01-05 14:41:44

by Mainak Mandal _00007001_

[permalink] [raw]
Subject: IRQ disabled on linux 2.6.1-rc1-mm1

Hi,
I am running 2.6.1-rc1-mm1 on a suse 9.0 system.
But i am having problems with the sound drivers/alsa.
I have compiled the sound drivers into the kernel and not as a loadable
module, but the sound system fails to start automatically after a reboot.
I have to go into yast and go thru the sound setup everytime i boot and
the strange thing is that the sound card is already configured and shows
that it is running. I just have to go thru it and sound starts working.
however sometimes after a little while the message
disabling IRQ #11
appears and the sound card stops working.

and everytime I boot the kernel also disables IRQ #9.

the following lines are from the o/p of the hwinfo command.

17: PCI 0d.0: 0200 Ethernet controller
[Created at pci.65]
Unique ID: qnJ_.zq4WwPe20r5
Hardware Class: network
Model: "Davicom Ethernet 100/10 MBit"
Vendor: pci 0x1282 "Davicom Semiconductor, Inc."
Device: pci 0x9102 "Ethernet 100/10 MBit"
SubVendor: pci 0x0291
SubDevice: pci 0x8212
Revision: 0x31
I/O Ports: 0xdc00-0xdcff (rw)
Memory Range: 0xdfffff00-??? (rw,non-prefetchable)
Memory Range: 0xdff80000-??? (ro,disabled)
IRQ: 17 (1955378 events)
Driver Info #0:
Driver Status: dmfe is not active
Driver Activation Cmd: "insmod dmfe"
Driver Info #1:
Driver Status: tulip is not active
Driver Activation Cmd: "insmod tulip"
Config Status: cfg=yes, avail=yes, need=no, active=unknown

23: PCI 11.5: 0401 Multimedia audio controller
[Created at pci.65]
Unique ID: Ssy1.FSRBkG45L8D
Hardware Class: sound
Model: "Giga-byte GA-7VAX Onboard Audio (Realtek ALC650)"
Vendor: pci 0x1106 "VIA Technologies, Inc."
Device: pci 0x3059 "VT8233/A/8235 AC97 Audio Controller"
SubVendor: pci 0x1458 "Giga-byte Technology"
SubDevice: pci 0xa002 "GA-7VAX Onboard Audio (Realtek ALC650)"
Revision: 0x30
I/O Ports: 0xd800-0xd8ff (rw)
IRQ: 11 (2000000 events)
Driver Info #0:
Driver Info: snd-via686,snd-via8233
Config Status: cfg=yes, avail=yes, need=no, active=unknown

41: PCI 11.5: 0401 Multimedia audio controller
[Created at manual.260]
Unique ID: Ssy1.V2mM3CHvMND
Hardware Class: sound
Model: "VIA VT8233/A/8235 AC97 Audio Controller"
Vendor: pci 0x1106 "VIA Technologies, Inc."
Device: pci 0x3059 "VT8233/A/8235 AC97 Audio Controller"
Revision: 0x30
I/O Ports: 0xd800-0xd8ff (rw)
IRQ: 11 (10290 events)
Driver Info #0:
Driver Info: snd-via686,snd-via8233
Config Status: cfg=no, avail=no, need=no, active=unknown

44: PCI 0c.0: 0200 Ethernet controller
[Created at manual.260]
Unique ID: lgGW.zq4WwPe20r5
Hardware Class: network
Model: "Davicom Ethernet 100/10 MBit"
Vendor: pci 0x1282 "Davicom Semiconductor, Inc."
Device: pci 0x9102 "Ethernet 100/10 MBit"
SubVendor: pci 0x0291
SubDevice: pci 0x8212
Revision: 0x31
I/O Ports: 0xdc00-0xdcff (rw)
Memory Range: 0xdfffff00-??? (rw,non-prefetchable)
Memory Range: 0xdff80000-??? (ro,disabled)
IRQ: 11 (1282 events)
Driver Info #0:
Driver Status: dmfe is not active
Driver Activation Cmd: "insmod dmfe"
Driver Info #1:
Driver Status: tulip is not active
Driver Activation Cmd: "insmod tulip"
Config Status: cfg=yes, avail=no, need=no, active=unknown

55: PCI 11.6: 0780 Communication controller
[Created at manual.260]
Unique ID: KBSt.m2OuY_layX5
Hardware Class: unknown
Model: "VIA Intel 537 [AC97 Modem]"
Vendor: pci 0x1106 "VIA Technologies, Inc."
Device: pci 0x3068 "Intel 537 [AC97 Modem]"
Revision: 0x70
I/O Ports: 0xd800-0xd8ff (rw)
IRQ: 9 (no events)
Config Status: cfg=new, avail=no, need=no, active=unknown

If you need any more information then plz to follow up and is there any
way of enabling an IRQ.

also although it shows IRQ #11 disabled network continues to work.

I have also attached the .config file for my system

Mainak Mandal

--
----------------------------------------------------------------------------------------------------
"Choose life. Choose a job. Choose a career. Choose a family. Choose a big fucking
television, choose washing machines, cars, compact disk players and electrical tin
openers...choose DIY and wondering who the fuck you are on a Sunday morning. Choose sitting
on the couch, watching mind-numbing, spirit-crushing game shows, stuffing junk food
into your mouth. Choose rotting away at the end of it all, pishing your last in a
miserable home, nothing more than an embarassment to the selfish, fucked-up brats you
spawned to replace yourself. Choose your future. Choose life. But why would I want to do a
thing lke that? I chose not to choose life. I chose something else. And the reasons?
There are no reasons. Who needs reasons when you've got heroin?"

--Irvine Welsh
TRAINSPOTTING (1996)

We are the people our parents warned us about...


Attachments:
.config (25.98 kB)

2004-01-05 14:55:50

by Andries Brouwer

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Sun, Jan 04, 2004 at 07:33:16PM -0800, Linus Torvalds wrote:

[A mailbox full of messages, too many to reply to.
Yes, Daniel Jacobowitz understood that I referred to fsid in the NFS case:

There is a great variation here in what various servers and clients do,
but roughly speaking filehandles tend to contain a fsid, and this fsid
often (no fsid= given) involves (major,minor,ino).

No, I have not talked this year about exporting /dev. Also interesting.
Yes, as I said, one can avoid NFS problems by giving fsid=.
It is similar elsewhere. A thousand minor problems are caused by
unstable device numbers. All annoying, each can be solved easily
once one has figured out what goes wrong and why. That is why I say
"preferably stable across reboots".]


What remains to be said?

Linus, let me try a bit more to address what I see as a misconception
in your posts.

> It shouldn't be fixed by saying "device numbers have to be stable across
> reboots", because the fact is, we're most likely going to have storage
> that is really really hard to enumerate in a repeatable fashion.

You have this strange hangup concerning "enumerate", and then keep
repeating to others that enumerating is impossible, and that therefore
stable device numbers are impossible, and that consequently, since we
cannot have stable device numbers expecting them to be stable is broken.

It is an old misconception - I recall you telling me how many billion years
an "ls /dev" would take with 64-bit device numbers.

No - I never advocated "find a device number by enumeration".
Quite the opposite, I advocated "use a hash of the serial number
as the device number of a disk". And more generally, "it is the
driver's job to assign a device number".

So it is not difficult at all to give this network attached storage
a stable device number.

And if one can, there is no reason not to do so.
It may even allow udev to give stable names as well.


Andries

2004-01-05 15:14:59

by Mark Mielke

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Sun, Jan 04, 2004 at 09:48:24PM -0600, Rob Landley wrote:
> On Sunday 04 January 2004 21:06, David Lang wrote:
> > Linus, what Andries is saying is that if you export a directory (say
> > /home) the process of exporting it somehow uses the /dev device number so
> > if the server reboots and gets a different device number for the partition
> > that /home is on the clients won't see it as the same export, breaking the
> > NFS requirement that a server can be rebooted.
> NFS always struck me as a peverse design. "The fileserver must be
> stateless with regard to clients, even though maintainging state is
> what a filesystem DOES, and the point of the thing is to export a
> filesystem." Okay... (If it was exporting read-only filesystems
> with no locking of any kind, maybe they'd have a point, but come on
> guys...)

Statelessness translated to capacity back in the day when maintaining state
for hundreds or thousands of machines was expensive...

I don't buy NFS as an excuse, though. I refuse to believe that a
shared /dev is necessary or desirable for *any* environment. /dev/pts
is one example of where everybody seems to have already agreed on
this.

With udev, or with devfs, a shared /dev becomes unnecessary. /dev will
no longer need to be 7000+ entries. It could be a few hundred or less
for common configurations, and 0% persistence/remote storage for
tmpfs-udev or devfs.

There are a few cases that we might be forced to maintain regular
numbers: mkfifo() creates a named pipe, and bind() creates a named
socket. These might be accessed between reboots over NFS, or local
mounts by many existing programs. I think these must be guaranteed to
keep the same major:minor numbers across reboots (preferably, even
across kernel releases). These are exceptional cases, though, and
should be considered as such.

Cheers,
mark

--
[email protected]/[email protected]/[email protected] __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

2004-01-05 16:13:46

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev and devfs - The final word



On Mon, 5 Jan 2004, Andries Brouwer wrote:
>
> You have this strange hangup concerning "enumerate", and then keep
> repeating to others that enumerating is impossible, and that therefore
> stable device numbers are impossible, and that consequently, since we
> cannot have stable device numbers expecting them to be stable is broken.

Right.

> It is an old misconception - I recall you telling me how many billion years
> an "ls /dev" would take with 64-bit device numbers.

No. When I talk abotu "enumerate", I do not mean "give numbers starting at
1". In the mathematical sense it means that you _can_ number them with
integers, not that it is necessarily a sequence from 1...n.

For example, PCI device slots are "enumerable". That doesn't mean that we
give them numbers 1..n, it only means that we can encode their address in
a single number. So if everything was a PCI slot, we could enumerate the
whole address space, and "stable" device numbers would be possible (they'd
be stable by _slot_, not by actual device, but that's good enough for some
people).

But the thing is, some things you simply _cannot_ number. For example, a
two-dimensional space is innumerable - you need more than one integer
number to look things up. So is the set of real numbers (but not the set
of fractions), etc etc.

It boils down to not how many devices there can be, but to whether there
is any way to "walk the space of devices".

And there fundamentally isn't. And _that_ is the basic issue: if you
_cannot_ number a space, you cannot have a stable device number.

> No - I never advocated "find a device number by enumeration".
> Quite the opposite, I advocated "use a hash of the serial number
> as the device number of a disk". And more generally, "it is the
> driver's job to assign a device number".

There _is_ no such number as you are talking about. You are talking pure
theory that has nothing to do with reality. There are no "serial numbers".

Don't you see? This is what "enumeration" is all about. You are assuming a
model that simply DOES NOT EXIST. Your "serial numbers" are exactly what
I'm talking about when I say "enumerate". Whenever you claim that a device
has a "serial number", you literally claim that the device space is
enumerable, and that is what I have been telling you from day one IS NOT
TRUE!

Whether you then hash the serial number or not is totally irrelevant: an
enumeration of hashes is still an enumeration.

Devices do not _have_ serial numbers. They are not enumerable. In other
words, they do not have some kind of explicit identity that we can use to
give them numbers. That is what "innumerable" MEANS, and that is why I
have been harping on the issue.

Please. Where do you think those numbers would come from?

So I claim as an axiom for device numbering that devices are not
enumerable, and that this _fundamentally_ leads to the corollary that you
cannot give them stable numbers. Not with hashes, not with _anything_. The
best you can do is to _literally_ just give them some per-session unique
integer that is simply the discovery ordering, nothing more.

> So it is not difficult at all to give this network attached storage
> a stable device number.

It is not only difficult, it is fundamentally _impossible_.

> And if one can, there is no reason not to do so.
> It may even allow udev to give stable names as well.

My point is that for the subset of devices that _do_ have serial numbers
(and it is a subset, nothing more), udev can then use those serial numbers
to have a stable pathname to the device. But it's a _pathname_, not a
number.

And for devices that don't have serial numbers, udev can try to use other
heuristics instead to give those stable names. Sometimes those other
heuristics would be looking at the actual _content_ of the thing.

For example, if you wanted to, you could make udev do a cddb lookup on the
CD-ROM, and use that as the pathname, so that when you insert your
favorite audio disk, it will always show up in the same place, regardless
of whether you put it in the DVD slot or the CD-RW drive.

[ Yeah, that sounds like a singularly silly thing to do, but it's a good
example of something where there is no actual serial number, but you can
"identify" it automatically through its contents, and name it stably
according to that. ]

That is indeed the point of udev. Doing things that the kenrel
_obviously_ should not do.

Linus

2004-01-05 16:36:14

by Andreas Schwab

[permalink] [raw]
Subject: Re: udev and devfs - The final word

Mark Mielke <[email protected]> writes:

> There are a few cases that we might be forced to maintain regular
> numbers: mkfifo() creates a named pipe, and bind() creates a named
> socket.

Neither fifos nor sockets are devices.

Andreas.

--
Andreas Schwab, SuSE Labs, [email protected]
SuSE Linux AG, Maxfeldstra?e 5, 90409 N?rnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

2004-01-05 17:33:39

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev and devfs - The final word



On Mon, 5 Jan 2004, Vojtech Pavlik wrote:
>
> Two dimensional discrete space (*) is enumerable.

Yeah, I'm sorry - you're the second person to point it out, and I really
knew that but had all the wrong associations (I was thinking of the
complex plane, not a discrete thing).

> (**) Assuming the coordinates can be negative. For non-negative
> it's even easier.

It ends up being exactly the same pattern as for fractions (ignoring 0,
which just shifts it), which I explicitly listed as being enumerable, so I
was just being stupid.

I can only say that it's been some time since I actually did my early
math..

Linus

2004-01-05 17:28:42

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Mon, Jan 05, 2004 at 08:13:26AM -0800, Linus Torvalds wrote:

> But the thing is, some things you simply _cannot_ number. For example, a
> two-dimensional space is innumerable - you need more than one integer
> number to look things up. So is the set of real numbers (but not the set
> of fractions), etc etc.

Two dimensional discrete space (*) is enumerable. Just start at [0,0]
and assign numbers going around the center in a growing spiral (**).
That way you assign a number to every point in that space. This is very
similar to the trick used to demonstrate fractions are enumerable.

(*) The one where you can use two integers to look things up.
(**) Assuming the coordinates can be negative. For non-negative
it's even easier.

--
Vojtech Pavlik
SuSE Labs, SuSE CR

2004-01-05 17:52:49

by Davide Libenzi

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Mon, 5 Jan 2004, Vojtech Pavlik wrote:

> On Mon, Jan 05, 2004 at 08:13:26AM -0800, Linus Torvalds wrote:
>
> > But the thing is, some things you simply _cannot_ number. For example, a
> > two-dimensional space is innumerable - you need more than one integer
> > number to look things up. So is the set of real numbers (but not the set
> > of fractions), etc etc.
>
> Two dimensional discrete space (*) is enumerable. Just start at [0,0]
> and assign numbers going around the center in a growing spiral (**).
> That way you assign a number to every point in that space. This is very
> similar to the trick used to demonstrate fractions are enumerable.

Vojtech, a spiral (in the math sense) won't work because whatever
continuos function you choose for the radius, you are going to skip
integers when the radius grows (and duplicate them when it's small). Also,
IIRC, fractions are enumerable because they're a mapping from two
enumerable spaces (integers): F = F(I1, I2) = I1 / I2.



- Davide



2004-01-05 18:09:47

by Hugo Mills

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Mon, Jan 05, 2004 at 09:52:45AM -0800, Davide Libenzi wrote:
> On Mon, 5 Jan 2004, Vojtech Pavlik wrote:
>
> > On Mon, Jan 05, 2004 at 08:13:26AM -0800, Linus Torvalds wrote:
> >
> > > But the thing is, some things you simply _cannot_ number. For example, a
> > > two-dimensional space is innumerable - you need more than one integer
> > > number to look things up. So is the set of real numbers (but not the set
> > > of fractions), etc etc.
> >
> > Two dimensional discrete space (*) is enumerable. Just start at [0,0]
> > and assign numbers going around the center in a growing spiral (**).
> > That way you assign a number to every point in that space. This is very
> > similar to the trick used to demonstrate fractions are enumerable.
>
> Vojtech, a spiral (in the math sense) won't work because whatever
> continuos function you choose for the radius, you are going to skip
> integers when the radius grows (and duplicate them when it's small). Also,
> IIRC, fractions are enumerable because they're a mapping from two
> enumerable spaces (integers): F = F(I1, I2) = I1 / I2.

I think he meant something like this:

( 0, 0)
( 1, 0)
( 0, 1)
(-1, 0)
( 0, -1)
( 2, 0)
( 1, 1)
( 0, 2)
(-1, 1)
(-2, 0)
(-1, -1)
etc.

Rationals are countable since they're the product of the integers
(numerator) and the natural numbers without zero (denominator). You
can count them in a similar way to the above "spiral", making sure
that you don't count 1/2 and 2/4 as two different numbers. :)

Hugo.

--
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
PGP key: 1C335860 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- Try everything once, except incest and folk-dancing. ---


Attachments:
(No filename) (1.70 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2004-01-05 18:03:55

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev and devfs - The final word



On Mon, 5 Jan 2004, Davide Libenzi wrote:
>
> Vojtech, a spiral (in the math sense) won't work

It's not a spiral in that sense - it's just that the pattern you get when
walking the "dots" looks like a spiral.

> Also, IIRC, fractions are enumerable because they're a mapping from two
> enumerable spaces (integers): F = F(I1, I2) = I1 / I2.

Which is exactly the thing that Vojtech is really talking about: the
enumerable space of a _discrete_ two-dimensional shape, ie folding two
enumerable spaces onto one.

The negative values don't matter, since you can effectively enumerate both
ways starting from zero (ie the full set of integers is not any less
enumerable than the positive numbers are):

0, 1, -1, 2, -2, 3, -3, ...

so it doesn't really matter if you only enumerate one quadrant (which is
effectively the same thing as enumerating fractions) or all four
quadrants.

The "spiral" pattern for a two-dimensional enumeration ends up being
something like

(0,0) -> (1,0) -> (0,1) -> (-1,0) -> (0, -1) -> (1,-1) -> (2,0) -> ...

(do it on a graph paper and it's obvious, the above is probably wrong
since I'm trying to visualize it)


Linus

2004-01-05 19:10:36

by Paul Rolland

[permalink] [raw]
Subject: Re: udev and devfs - The final word

Hello,

> > Two dimensional discrete space (*) is enumerable. Just
> start at [0,0]
> > and assign numbers going around the center in a growing spiral (**).
> > That way you assign a number to every point in that space.
> This is very
> > similar to the trick used to demonstrate fractions are enumerable.
>
> Vojtech, a spiral (in the math sense) won't work because whatever
> continuos function you choose for the radius, you are going to skip
> integers when the radius grows (and duplicate them when it's
> small). Also,
> IIRC, fractions are enumerable because they're a mapping from two
> enumerable spaces (integers): F = F(I1, I2) = I1 / I2.
>
No, I think Vojtech was meaning this kind of spiral and
enumeration :

...16 15 14 13
5 4 3 12
6 1 2 11
7 8 9 10

and so on... The spiral in not to be taken in the math sense...

Regards,
Paul

2004-01-05 20:17:12

by Theodore Ts'o

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Mon, Jan 05, 2004 at 12:15:56PM +0100, Vojtech Pavlik wrote:
>
> Mutt with IMAP is rather bearable even on a GPRS connection (40kbps,
> 1sec latency). On a 100baseTX it's not distinguishable from local
> operation.

Hmm... I've tried using mutt/IMAP over GPRS connection, and I find it
extremely unpleasant, myself. My solution is to use isync to provide
a local cached copy of the IMAP server on my laptop, and then run mutt
against the local cached copy.

I have a patch to isync which allows it to issue multiple IMAP
commands in parallel (instead of operating in lockstep fashion):

http://bugs.debian.org/cgi-bin/bugreport.cgi//tmp/async-imap-patch?bug=226222&msg=3&att=1

With this patch, isync works very well, even over high latency, slow
speed links.

- Ted

2004-01-05 20:20:36

by Andries Brouwer

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Mon, Jan 05, 2004 at 08:13:26AM -0800, Linus Torvalds wrote:

> > You keep repeating that enumerating is impossible, and that therefore
> > stable device numbers are impossible, and that consequently, since we
> > cannot have stable device numbers expecting them to be stable is broken.
>
> Right.

> When I talk about "enumerate", I do not mean "give numbers starting at 1".
> It boils down to not how many devices there can be, but to whether there
> is any way to "walk the space of devices".

Yes, that is what one commonly calls to enumerate. Let us say,
an effective way, given some integer, to find the associated device.

[You can leave the mathematics out - this enumerable is not the same as
denumerable or countable. The set of devices on earth is finite.]

> And there fundamentally isn't. And _that_ is the basic issue: if you
> _cannot_ number a space, you cannot have a stable device number.

If there is no effective way to find a disk given some number,
there may very well be an effective way to find a number given some disk.
And indeed, there usually is.

> There are no "serial numbers".
> Please. Where do you think those numbers would come from?

Most of my devices do have them...

> My point is that for the subset of devices that _do_ have serial numbers

Ah, wait! You also have heard about devices with serial numbers! Good!
It is those devices I was talking about. Remember? ["important special case"]

> udev can then use those serial numbers to have a stable pathname

True. Provided that it knows how to get them.
The kernel driver knew all about the device.
Must udev also know all about all possible devices? Do I/O to these devices?
Or must sysfs export all data that could possibly be used?

Andries

2004-01-05 20:44:19

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev and devfs - The final word



On Mon, 5 Jan 2004, Andries Brouwer wrote:
>
> > udev can then use those serial numbers to have a stable pathname
>
> True. Provided that it knows how to get them.

And that is the _only_ thing that the "device number" actually is. It is a
cookie that the kernel has allocated for the device that the kernel knows
about. Nothing more.

Go back and read my emails. Device numbers cannot have any meaning, they
literally are _only_ useful as cookies.

> The kernel driver knew all about the device.

No. The kernel driver knows _of_ the device, it does not know "all about"
the device. And that's a big difference.

Quite often the kernel only knows that it found "a device". It has very
limited knowledge about what the device is, and what it can do. That's why
we have tools like "smartd" etc, that know a lot more about devices than
the kernel often does.

In particular, the kernel driver knows _nothing_ about potential serial
numbers or how to read them for different classes of devices.

> Must udev also know all about all possible devices? Do I/O to these devices?
> Or must sysfs export all data that could possibly be used?

There is nothing to export. You seem to imply that the kernel somehow
knows more than user space, but the reverse is generally true.

In particular, the kernel should never have policy encoded in it, and
naming of a device is about pretty much nothing _but_ policy. Stuff that
the kernel literally has _zero_ knowledged about.

Yes, the kernel knows the physical location, but that doesn't actually
help the kernel itself. It's exported through sysfs, yes, and udev,
together with the hotplug stuff, can be used to make up the "stable name".

Have you even _tried_ udev? Udev can do exactly things like find UUID's
off disks - something the kernel doesn't have a _clue_ about. When the
kernel sees a disk, it's just a disk. The kernel doesn't know if there is
an UUID embedded on the disk, and the kernel SHOULD NOT HAVE A POLICY to
try to find one.

But for user space, the thing is trivially done: the kernel will notify
user space about the fact that it found a device (without necessarily
knowing what the heck the device is - quite common with USB or specialty
SCSI devices). The kernel pretty much doesn't know _anything_ about things
like laser range finders, cameras etc. It ends up classifying the device
on a very rough level, nothing more.

And without knowing practically _anythign_ about the device, it still has
to allocate a device number. Exactly so that somebody else can come around
and poke at it, and maybe know that "ahh, this device is a USB-attached
camera" or similar.

Do you not see that fundamental issue? The kernel has to allocate a number
before a UUID or anythign else is necessarily available.

The UUID/serial number/type policy comes _later_.

Linus

2004-01-05 21:06:50

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Mon, Jan 05, 2004 at 03:11:44PM -0500, Theodore Ts'o wrote:

> On Mon, Jan 05, 2004 at 12:15:56PM +0100, Vojtech Pavlik wrote:
> >
> > Mutt with IMAP is rather bearable even on a GPRS connection (40kbps,
> > 1sec latency). On a 100baseTX it's not distinguishable from local
> > operation.
>
> Hmm... I've tried using mutt/IMAP over GPRS connection, and I find it
> extremely unpleasant, myself. My solution is to use isync to provide
> a local cached copy of the IMAP server on my laptop, and then run mutt
> against the local cached copy.
>
> I have a patch to isync which allows it to issue multiple IMAP
> commands in parallel (instead of operating in lockstep fashion):
>
> http://bugs.debian.org/cgi-bin/bugreport.cgi//tmp/async-imap-patch?bug=226222&msg=3&att=1
>
> With this patch, isync works very well, even over high latency, slow
> speed links.

That looks very nice. Now, if there were a way how to make the isync
IMAP connections go over a compressed ssh link (like I'm doing with
Mutt/IMAP) that'd be very cool.

--
Vojtech Pavlik
SuSE Labs, SuSE CR

2004-01-05 22:19:45

by Shawn

[permalink] [raw]
Subject: Re: udev and devfs - The final word

Linus is correct. I say this somewhat because I find his arguments to
make perfect sense in a philosophical way, but more because it is his
kernel.

Anyway, I'll weigh in with my 0.02 pesos:
Right now, as things are, hardware devices' numbers are not very stable
as it is. Detection order can and will change, and you should not rely
on them being the same. PERIOD.

Having said that, I will say that they are /somewhat/ stable. You can,
in general, say 'fdisk /dev/hdb' and be editing the same block device's
partition table... That is, if nothing has changed in the BIOS or
hardware or kernel or....

Now, correct me if I'm wrong, but I don't believe we are expecting
device numbers to change nearly every time you reboot given there are no
hardware changes with a dynamic numbering scheme, right?

I would, as an admin, have need for distinguishing between my 4
identical SATA hard drives with identical partition tables without
having to resort to examining UUIDs, serial number or FS labels by hand,
especially if I dd(1) stuff between them. I understand this is not as
simple as with ide(0-N)(pri|sec)(master|slave) (ignoring that ide(0-N)
could be detected in arbitrary order) as SATA is different.

As an admin, would I at least theoretically have /some/ consistency if
merely for my own sanity when dealing with block devices by hand (I do
need to setup LVM stuff from time to time)??

On Mon, 2004-01-05 at 14:38, Linus Torvalds wrote:
> On Mon, 5 Jan 2004, Andries Brouwer wrote:
> >
> > > udev can then use those serial numbers to have a stable pathname
> >
> > True. Provided that it knows how to get them.
>
> And that is the _only_ thing that the "device number" actually is. It is a
> cookie that the kernel has allocated for the device that the kernel knows
> about. Nothing more.
>
> Go back and read my emails. Device numbers cannot have any meaning, they
> literally are _only_ useful as cookies.
>
> > The kernel driver knew all about the device.
>
> No. The kernel driver knows _of_ the device, it does not know "all about"
> the device. And that's a big difference.
>
> Quite often the kernel only knows that it found "a device". It has very
> limited knowledge about what the device is, and what it can do. That's why
> we have tools like "smartd" etc, that know a lot more about devices than
> the kernel often does.
>
> In particular, the kernel driver knows _nothing_ about potential serial
> numbers or how to read them for different classes of devices.
>
> > Must udev also know all about all possible devices? Do I/O to these devices?
> > Or must sysfs export all data that could possibly be used?
>
> There is nothing to export. You seem to imply that the kernel somehow
> knows more than user space, but the reverse is generally true.
>
> In particular, the kernel should never have policy encoded in it, and
> naming of a device is about pretty much nothing _but_ policy. Stuff that
> the kernel literally has _zero_ knowledged about.
>
> Yes, the kernel knows the physical location, but that doesn't actually
> help the kernel itself. It's exported through sysfs, yes, and udev,
> together with the hotplug stuff, can be used to make up the "stable name".
>
> Have you even _tried_ udev? Udev can do exactly things like find UUID's
> off disks - something the kernel doesn't have a _clue_ about. When the
> kernel sees a disk, it's just a disk. The kernel doesn't know if there is
> an UUID embedded on the disk, and the kernel SHOULD NOT HAVE A POLICY to
> try to find one.
>
> But for user space, the thing is trivially done: the kernel will notify
> user space about the fact that it found a device (without necessarily
> knowing what the heck the device is - quite common with USB or specialty
> SCSI devices). The kernel pretty much doesn't know _anything_ about things
> like laser range finders, cameras etc. It ends up classifying the device
> on a very rough level, nothing more.
>
> And without knowing practically _anythign_ about the device, it still has
> to allocate a device number. Exactly so that somebody else can come around
> and poke at it, and maybe know that "ahh, this device is a USB-attached
> camera" or similar.
>
> Do you not see that fundamental issue? The kernel has to allocate a number
> before a UUID or anythign else is necessarily available.
>
> The UUID/serial number/type policy comes _later_.
>
> Linus

2004-01-05 22:23:07

by Mark Mielke

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Mon, Jan 05, 2004 at 05:36:09PM +0100, Andreas Schwab wrote:
> Mark Mielke <[email protected]> writes:
> > There are a few cases that we might be forced to maintain regular
> > numbers: mkfifo() creates a named pipe, and bind() creates a named
> > socket.
> Neither fifos nor sockets are devices.

Well, then, as long as things like this don't break... :-)

Other than backing up /dev, does anybody have *real* cases where a
program assumes major:minor is consistent across reboots? We should
start notifying the authors now... NFS seems to be one, given the
explanation offered for how fsid's are derived...

mark

--
[email protected]/[email protected]/[email protected] __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

2004-01-05 22:47:35

by Theodore Ts'o

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Mon, Jan 05, 2004 at 10:06:25PM +0100, Vojtech Pavlik wrote:
>
> That looks very nice. Now, if there were a way how to make the isync
> IMAP connections go over a compressed ssh link (like I'm doing with
> Mutt/IMAP) that'd be very cool.
>

The following in your .isyncrc file will do the trick:

Mailbox thunk
Box Inbox
Host thunk.org
Tunnel "socat SOCKS4A:127.0.0.1:thunk.org:143 STDIO"

You can also do this via secure IMAP, but then ssh's compression won't
be able to do much. Nevertheless, I do this when synchronizing
against an IMAP server where I don't have ssh access, and where I want
the connection between the thunk.org and po14.mit.edu to be secured.
So I use the following syntax in .isyncrc to achieve to do this:

Mailbox Inbox
Box Inbox
Host imaps:po14.mit.edu
Tunnel "socat SOCKS4A:127.0.0.1:po14.mit.edu:993 STDIO"
UseSSLv2 yes
UseSSLv3 yes
UseTLSv1 yes

- Ted

2004-01-05 23:14:15

by Andries Brouwer

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Mon, Jan 05, 2004 at 12:38:54PM -0800, Linus Torvalds wrote:

> Have you even _tried_ udev?

Yes, and it works reasonably well. I have version 012 here.
Some flaws will be fixed in 013 or so. Some difficulties are of a
more fundamental type, not so easy to fix. But udev is an entirely
different discussion. Some other time.

> In particular, the kernel should never have policy encoded in it, and
> naming of a device is about pretty much nothing _but_ policy.

Of course. But this is not about naming.

The kernel invents device numbers, and user space names.

Now compare our setups:

dev_t lbt_devno(void) { return random(); }

dev_t aeb_devno(char *s) { dev_t d = hash(s); while (inuse(d)) d++; return d; }

An earlier fragment of the discussion was concerned with the fact
that random(); is a bad idea. Something reproducible is better.

Let us abbreviate the above function f. Some driver determines that
a disk has serial number A809ADGC. Another driver determines that
some device was produced by HP but otherwise has no opinion.
A third driver has no stable information at all about the device.
They assign device numbers f("A809ADGC"), f("HP"), f("").

What is the result? Yes, device numbers are cookies, but a reasonable
attempt has been made to make the device numbers stable.
No guarantees anywhere - this is best effort. Better than no effort.

And this information helps udev. It may make a callout superfluous,
or even give udev information that cannot be obtained from userspace.

Andries

2004-01-05 23:08:28

by Shawn

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Mon, 2004-01-05 at 16:25, Mark Mielke wrote:
> On Mon, Jan 05, 2004 at 04:17:57PM -0600, Shawn wrote:
> > ...
> > As an admin, would I at least theoretically have /some/ consistency if
> > merely for my own sanity when dealing with block devices by hand (I do
> > need to setup LVM stuff from time to time)??
>
> If all you care about is that /dev names remain consistent, you need
> not fear. udev and devfs are two different ways of providing this
> consistency. They abstract the device numbers from the /dev names,
> meaning that you don't have to care if the numbers change. The names
> don't.
I'm obviously confused if this is true, as then I do not know how the
great and powerful udev derives the names if not from the numbers, or
some other sysfs info.

Anyway, assuming this is true, I have much less concern.

2004-01-05 23:26:20

by Shawn

[permalink] [raw]
Subject: Re: udev and devfs - The final word

And looking back on some of these emails, it seems there was more than
just me being confused. Seems this is a point worth emphasizing.

On Mon, 2004-01-05 at 17:05, Shawn wrote:
> On Mon, 2004-01-05 at 16:25, Mark Mielke wrote:
> > On Mon, Jan 05, 2004 at 04:17:57PM -0600, Shawn wrote:
> > > ...
> > > As an admin, would I at least theoretically have /some/ consistency if
> > > merely for my own sanity when dealing with block devices by hand (I do
> > > need to setup LVM stuff from time to time)??
> >
> > If all you care about is that /dev names remain consistent, you need
> > not fear. udev and devfs are two different ways of providing this
> > consistency. They abstract the device numbers from the /dev names,
> > meaning that you don't have to care if the numbers change. The names
> > don't.
> I'm obviously confused if this is true, as then I do not know how the
> great and powerful udev derives the names if not from the numbers, or
> some other sysfs info.
>
> Anyway, assuming this is true, I have much less concern.

2004-01-05 23:34:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev and devfs - The final word



On Tue, 6 Jan 2004, Andries Brouwer wrote:
>
> Now compare our setups:
>
> dev_t lbt_devno(void) { return random(); }

Actually, I'd have something like

int nr;

initialize()
{
#ifdef CONFIG_DEBUG_BAD_USERS
nr = random();
#endif
}

dev_t lbt_devno()
{
return nr++;
}

since the numbers do have to be unique "per boot". They just shouldn't be
considered "stable" _nor_ "meaningful".

> dev_t aeb_devno(char *s) { dev_t d = hash(s); while (inuse(d)) d++; return d; }
>
> An earlier fragment of the discussion was concerned with the fact
> that random(); is a bad idea. Something reproducible is better.

And I've told you why reproducibility is a BAD THING, and why I disagree.

Basically, if you cannot 100% guarantee reproducibility (and nobody can,
not your hashes, not anything else), then the _appearance_ of
reproducibility is literally a mistake. Because it ends up being a bug
waiting to happen - and one that is very very hard to reproduce on a
developer machine.

You seem to continually ignore this issue.

I'm not going to bother arguign this for another week. I'm just going to
state once and for all:

- total device number reproducability is fundamentally impossible. It's
not just impossible in theory, it is impossible in practice too.
- with that in mind, anything that depends on stable device numbers is a
BUG.
- Thus all your arguments boil down to: "I want to encourage bugs".

My argument is that we should find and fix the bugs. And we should do so
by making the lack of meaning of the device numbers as well-known as
possible. And that shouldn't just be due to long and boring threads on the
kernel mailing list, but by actually actively trying to trigger the bad
cases.

Bugs are good at hiding, and then showing up at the most inopportune
times when you can't debug them. It's much better to try to trigger them
where a developer can see them, and you do that by doing strange things.

Linus

2004-01-06 00:15:11

by Rob Landley

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Monday 05 January 2004 15:06, Vojtech Pavlik wrote:
> On Mon, Jan 05, 2004 at 03:11:44PM -0500, Theodore Ts'o wrote:
> > On Mon, Jan 05, 2004 at 12:15:56PM +0100, Vojtech Pavlik wrote:
> > > Mutt with IMAP is rather bearable even on a GPRS connection (40kbps,
> > > 1sec latency). On a 100baseTX it's not distinguishable from local
> > > operation.
> >
> > Hmm... I've tried using mutt/IMAP over GPRS connection, and I find it
> > extremely unpleasant, myself. My solution is to use isync to provide
> > a local cached copy of the IMAP server on my laptop, and then run mutt
> > against the local cached copy.
> >
> > I have a patch to isync which allows it to issue multiple IMAP
> > commands in parallel (instead of operating in lockstep fashion):
> >
> > http://bugs.debian.org/cgi-bin/bugreport.cgi//tmp/async-imap-patch?bug=22
> >6222&msg=3&att=1
> >
> > With this patch, isync works very well, even over high latency, slow
> > speed links.
>
> That looks very nice. Now, if there were a way how to make the isync
> IMAP connections go over a compressed ssh link (like I'm doing with
> Mutt/IMAP) that'd be very cool.

You can run any tcp/ip service over ssh.

Tell isync that the imap server it's synchronizing with lives on the loopback
interface, and then run a variant this little python script I use to check my
email (adjusting the last line for your connection info). (Note that the far
end needs netcat. If you haven't got it, try the version in busybox.)

Yeah, the script's a quick and dirty hack, but really easy to modify. I have
a more complicated one using SO_ORIGINAL_DEST and a lookup table if you
prefer to setup some firewall rules and tell your imap server it lives in the
192.168.x.x or 10.x.x.x address range... But I've never gotten around to
configuring my laptop to use it just to tunnel pop. :)

I keep meaning to put the full solution up on http://dvpn.sf.net, but nobody's
pestered me about it. :)

Rob


Attachments:
(No filename) (1.91 kB)
boing.py (605.00 B)
Download all attachments

2004-01-06 00:02:43

by Greg KH

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Tue, Jan 06, 2004 at 12:13:26AM +0100, Andries Brouwer wrote:
> On Mon, Jan 05, 2004 at 12:38:54PM -0800, Linus Torvalds wrote:
>
> > Have you even _tried_ udev?
>
> Yes, and it works reasonably well. I have version 012 here.
> Some flaws will be fixed in 013 or so.

What flaws would that be? The short time delay for partitions? Or
something else?

> Some difficulties are of a more fundamental type, not so easy to fix.

Such as?

> But udev is an entirely different discussion. Some other time.

Feel free to bring it up on the linux-hotplug-devel list whenever you
wish.

thanks,

greg k-h

2004-01-06 00:39:09

by Mark Mielke

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Mon, Jan 05, 2004 at 04:17:57PM -0600, Shawn wrote:
> Having said that, I will say that they are /somewhat/ stable. You can,
> in general, say 'fdisk /dev/hdb' and be editing the same block device's
> partition table... That is, if nothing has changed in the BIOS or
> hardware or kernel or....
> ...
> As an admin, would I at least theoretically have /some/ consistency if
> merely for my own sanity when dealing with block devices by hand (I do
> need to setup LVM stuff from time to time)??

If all you care about is that /dev names remain consistent, you need
not fear. udev and devfs are two different ways of providing this
consistency. They abstract the device numbers from the /dev names,
meaning that you don't have to care if the numbers change. The names
don't.

Cheers,
mark

--
[email protected]/[email protected]/[email protected] __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

2004-01-06 00:32:21

by Rob Landley

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Monday 05 January 2004 17:13, Andries Brouwer wrote:
> An earlier fragment of the discussion was concerned with the fact
> that random(); is a bad idea. Something reproducible is better.

To find people making bad assuptions that will only break after widespread
deployment, random() is much better than "usually reproducible".

> Let us abbreviate the above function f. Some driver determines that
> a disk has serial number A809ADGC. Another driver determines that
> some device was produced by HP but otherwise has no opinion.
> A third driver has no stable information at all about the device.
> They assign device numbers f("A809ADGC"), f("HP"), f("").
>
> What is the result? Yes, device numbers are cookies, but a reasonable
> attempt has been made to make the device numbers stable.

Should the same argument be made about process ID's? When your system boots
up, your daemons generally start in the same order. But any script that
depends on this is broken.

Or filehandles. They're cookies. There's whole pages on why it's a bad idea
to make assumptions about what filehandles point to:

http://en.tldp.org/HOWTO/Secure-Programs-HOWTO/avoid-race.html

> No guarantees anywhere - this is best effort. Better than no effort.

You're suggesting that it should be easier to write things that are
fundamentally unclean, and bake in assumptions that WILL break, but not on
the developer's machine, only for end-users who aren't expecting it.

What's the advantage? Making it easier for people to do something stupid?
(You can sort of trust this thing we can't make any guarantees about. Since
when is "sort of trust" a condition that's encouraged? At the very least,
those kinds of cases are backed up by a detection and recovery mechanism and
the whole thing's called a heuristic.) Why is there a need for this?

Either the kernel can make a guarantee, or it should very much not make a
guarantee. Where is an example of a middle ground?

> And this information helps udev. It may make a callout superfluous,
> or even give udev information that cannot be obtained from userspace.

I'm waiting for the udev maintainer to weight in on this and say "no, it
doesn't". If there is information that "cannot be obtained from userspace",
then we should fix the sysfs exports. Encoding something in a semi-stable
cookie and actually trying to USE that information is stupid.

What about kernel upgrades? Future backwards compatability when developers
change the device enumeration methods? (The sata driver got completely
rewritten from scratch, and now it detects devices in a wildly different
order, but we need this shim layer for backwards compatability with a
guarantee we never should have made because we encouraged old scripts to
remain broken.) This plants hidden land mines restricting future
development. You're basically proposing a whole "device number stabilization
infrastructure" for future kernels if it's to have ANY meaning at all...

Where's the advantage? Name a single real-world case that's more difficult to
fix than it would be to make the kernel pander to it in perpetuity.

> Andries

Rob

2004-01-06 00:39:09

by Greg KH

[permalink] [raw]
Subject: Silly udev script [was Re: udev and devfs - The final word]

On Mon, Jan 05, 2004 at 08:13:26AM -0800, Linus Torvalds wrote:
>
> For example, if you wanted to, you could make udev do a cddb lookup on the
> CD-ROM, and use that as the pathname, so that when you insert your
> favorite audio disk, it will always show up in the same place, regardless
> of whether you put it in the DVD slot or the CD-RW drive.
>
> [ Yeah, that sounds like a singularly silly thing to do, but it's a good
> example of something where there is no actual serial number, but you can
> "identify" it automatically through its contents, and name it stably
> according to that. ]

That was such a silly thing to do, here's a script that does it, along
with the udev rule to add to udev.rules for it. It names your cdrom
Artist_Title, and creates a symlink called cdrom that points to it, just
to be a tiny bit sane :)

I had been saying for a long time that you could have udev make a query
across the network to get a device name, this provides the perfect
example of just that...

thanks,

greg k-h


#!/usr/bin/perl

# a horribly funny script that shows how flexible udev can really be
# This is to be executed by udev with the following rules:
# CALLOUT, BUS="ide", PROGRAM="name_cdrom.pl %M %m", ID="good*", NAME="%1c", SYMLINK="cdrom"
# CALLOUT, BUS="scsi", PROGRAM="name_cdrom.pl %M %m", ID="good*", NAME="%1c", SYMLINK="cdrom"
#
# The scsi rule catches USB cdroms and ide-scsi devices.
#

use CDDB_get qw( get_cddb );

my %config;

$dev_node = "/tmp/cd_foo";

# following variables just need to be declared if different from defaults
$config{CDDB_HOST}="freedb.freedb.org"; # set cddb host
$config{CDDB_PORT}=8880; # set cddb port
$config{CDDB_MODE}="cddb"; # set cddb mode: cddb or http
$config{CD_DEVICE}="$dev_node"; # set cd device

# No user interaction, this is a automated script!
$config{input}=0;

$major = $ARGV[0];
$minor = $ARGV[1];

# create our temp device node to read the cd info from
if (system("mknod $dev_node b $major $minor")) {
die "bad mknod failed";
}

# get it on
my %cd=get_cddb(\%config);

# remove the dev node we just created
unlink($dev_node);

# print out our cd name if we have found it
unless(defined $cd{title}) {
print"bad unknown cdrom\n";
} else {
print "good $cd{artist}_$cd{title}\n";
}

2004-01-06 00:49:13

by Greg KH

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Mon, Jan 05, 2004 at 05:05:16PM -0600, Shawn wrote:
> On Mon, 2004-01-05 at 16:25, Mark Mielke wrote:
> > On Mon, Jan 05, 2004 at 04:17:57PM -0600, Shawn wrote:
> > > ...
> > > As an admin, would I at least theoretically have /some/ consistency if
> > > merely for my own sanity when dealing with block devices by hand (I do
> > > need to setup LVM stuff from time to time)??
> >
> > If all you care about is that /dev names remain consistent, you need
> > not fear. udev and devfs are two different ways of providing this
> > consistency. They abstract the device numbers from the /dev names,
> > meaning that you don't have to care if the numbers change. The names
> > don't.
> I'm obviously confused if this is true, as then I do not know how the
> great and powerful udev derives the names if not from the numbers, or
> some other sysfs info.

udev can derive the names for the /dev entries from just about anything
you can think of:
- sysfs files
- bus topology
- bus ids
- any script/program that you might want to run
- the kernel name

It will default back to the "kernel name" that shows up in sysfs, and is
what we currently use, if it can not match up any other name to it. The
method of creating these rules that udev uses, are contained in the
udev.rules file. See the udev man page for the syntax and some example
rules. Also see the example udev.rules and udev.rules.devfs files for
lots more example rules that you might want to come up with.

The strength in this is that udev can poke around and try to find a
unique "tag" that a specific device exports (be it UUID, or a CDDB
entry) and use that to match up a name to. That enables your cdrom to
always be called /dev/cdrom no matter where in the scsi chain it happens
to be.

In summary, udev doesn't care squat about the major/minor that the
kernel has used for a device. It merely uses those numbers and creates
a /dev entry with them, assigned to a name that it comes up with.

Does that help out? The udev OLS paper might also help explain some of
this.

thanks,

greg k-h

2004-01-06 00:53:40

by Shawn

[permalink] [raw]
Subject: Re: udev and devfs - The final word

I'm embarrassed to say I did not read that.

I'm starting to wonder what some folks are complaining about. WRT
practicality and useability, udev about covers it once alsa and vmware
;) get sysfs-ified.

My own foray into udev was a little lacking owing to these little
issues.

On Mon, 2004-01-05 at 18:43, Greg KH wrote:
> In summary, udev doesn't care squat about the major/minor that the
> kernel has used for a device. It merely uses those numbers and creates
> a /dev entry with them, assigned to a name that it comes up with.
>
> Does that help out? The udev OLS paper might also help explain some of
> this.

2004-01-06 01:06:58

by Andries Brouwer

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Mon, Jan 05, 2004 at 03:32:03PM -0800, Linus Torvalds wrote:

> > Something reproducible is better.
>
> And I've told you why reproducibility is a BAD THING
>
> Basically, if you cannot 100% guarantee reproducibility,
> then the _appearance_ of reproducibility is literally a mistake.

OK. We now understand perfectly each others point of view.
It was a pleasure to provoke this discussion - can hardly
wait for 2.7 :-)

Andries

2004-01-06 00:59:50

by Al Viro

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Mon, Jan 05, 2004 at 03:32:03PM -0800, Linus Torvalds wrote:
> dev_t lbt_devno()
> {
> return nr++;
> }
>
> since the numbers do have to be unique "per boot". They just shouldn't be
> considered "stable" _nor_ "meaningful".

Cute. There's a little issue of, say it, meaningful relationship between
sda and sda4, completely lost that way. And _that_ has nothing to do with
device enumeration.

2004-01-06 01:17:35

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev and devfs - The final word



On Tue, 6 Jan 2004 [email protected] wrote:
>
> Cute. There's a little issue of, say it, meaningful relationship between
> sda and sda4, completely lost that way. And _that_ has nothing to do with
> device enumeration.

Oh, don't look too closely at some pseudo-code, it's not like the code
would actually do that for a minor number. But for things like major
number allocation for disk devices, it might not be too far off. And we
migth even want to start off the minors at some "random" offset (obviously
while keeping the alignment right for the partition handling)

Linus

2004-01-06 01:41:26

by Andries Brouwer

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Mon, Jan 05, 2004 at 04:00:15PM -0800, Greg KH wrote:

> > > Have you even _tried_ udev?
> >
> > Yes, and it works reasonably well. I have version 012 here.
> > Some flaws will be fixed in 013 or so.
>
> What flaws would that be? The short time delay for partitions? Or
> something else?

Yes, partitions are not handled very well.
So far I have never seen udev discover partitions on its own.
I provoke it using "blockdev --rereadpt".
The result is that partitions appear in /proc/partitions and in /udev.
After removing the media another "blockdev --rereadpt" returns
"No such device or address" and the entry in /proc/partitions
disappears, but that in /udev stays.

> > Some difficulties are of a more fundamental type, not so easy to fix.
>
> Such as?

Udev cannot do anything when there are no events.
And media insertion or removal does not always give events.

Andries

[By the way, a compilation warning for every C file:
% make
gcc -pipe -Wall -Wmore.. -Os -fomit-frame-pointer -D_GNU_SOURCE \
-I/usr/lib/gcc-lib/i486-suse-linux/3.2/include -I.../udev-012/libsysfs
-c -o udev.o udev.c
cc1: warning: changing search order for system directory
"/usr/lib/gcc-lib/i486-suse-linux/3.2/include"
cc1: warning: as it has already been specified as a non-system directory]



2004-01-06 04:28:34

by Al Viro

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Mon, Jan 05, 2004 at 05:17:20PM -0800, Linus Torvalds wrote:
>
>
> On Tue, 6 Jan 2004 [email protected] wrote:
> >
> > Cute. There's a little issue of, say it, meaningful relationship between
> > sda and sda4, completely lost that way. And _that_ has nothing to do with
> > device enumeration.
>
> Oh, don't look too closely at some pseudo-code, it's not like the code
> would actually do that for a minor number. But for things like major
> number allocation for disk devices, it might not be too far off. And we
> migth even want to start off the minors at some "random" offset (obviously
> while keeping the alignment right for the partition handling)

True, but... Let me put it that way - entire area is a minefield and
I would really like to avoid nasty surprises from "obvious" patches,
what with having just spent 4 months dealing with the fallout from one
such beast.

Let's clean the things up first; then it will be easier to see what can
and should be done. Sure thing, reducing amount of places that deal with
device numbers is a good thing. Let's see how far we can get it, what
obstacles still remain (and during 2.5 a _lot_ of them had been killed)
and what is needed to remove the rest.

Once they are gone (and that will be one-by-one, keeping the list of
things to grep for and checking the results of greps as we go) - then
we'll have cleaner playing field for any experiments in that area.
_And_ there will be less temptation to play the bundling games for
everyone involved (cf. devfs disaster, aka. "my glorious idea allows
to do $NEEDED_THING that way; merge the entire thing and nevermind
the fact that doing $NEEDED_THING essentially the same way is possible
without the rest of patch and can be split out of it").

2004-01-06 05:07:48

by Linus Torvalds

[permalink] [raw]
Subject: Re: udev and devfs - The final word



On Tue, 6 Jan 2004 [email protected] wrote:
> >
> > Oh, don't look too closely at some pseudo-code, it's not like the code
> > would actually do that for a minor number. But for things like major
> > number allocation for disk devices, it might not be too far off. And we
> > migth even want to start off the minors at some "random" offset (obviously
> > while keeping the alignment right for the partition handling)
>
> True, but... Let me put it that way - entire area is a minefield and
> I would really like to avoid nasty surprises from "obvious" patches,
> what with having just spent 4 months dealing with the fallout from one
> such beast.

Hey, it's entirely possible that we won't be able to do it at _all_ during
2.7.x, since it would require that all the distributions have started
using udev or equivalent. Which is by no means certain at all. It's
possible that just lack of ubiqutous infrastructure will mean that it
would be too painful to even try this in a few months..

Do don't worry too much.

Linus

2004-01-06 07:14:53

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Mon, Jan 05, 2004 at 08:52:28PM +0100, Andries Brouwer wrote:

> > udev can then use those serial numbers to have a stable pathname
>
> True. Provided that it knows how to get them.
> The kernel driver knew all about the device.
> Must udev also know all about all possible devices?

No. But it must have rules about what to do with all possible device
types (at least very generic default rules), based on the data the
drivers can provide to identify the device.

> Do I/O to these devices?

If the using an UUID stored on the device (like the filesystem UUID), yes.

> Or must sysfs export all data that could possibly be used?

Not necessarily. But udev must get the all the data that could possibly
be used for assigning a name to the device. It can get them either as
hotplug command line arguments and environment variables or via sysfs,
or by any other means.

--
Vojtech Pavlik
SuSE Labs, SuSE CR

2004-01-06 16:59:58

by Mark Mielke

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Tue, Jan 06, 2004 at 02:06:48AM +0100, Andries Brouwer wrote:
> On Mon, Jan 05, 2004 at 03:32:03PM -0800, Linus Torvalds wrote:
> > > Something reproducible is better.
> > And I've told you why reproducibility is a BAD THING
> > Basically, if you cannot 100% guarantee reproducibility,
> > then the _appearance_ of reproducibility is literally a mistake.
> OK. We now understand perfectly each others point of view.
> It was a pleasure to provoke this discussion - can hardly
> wait for 2.7 :-)

Hehe.... s/provoke/perpetuate/g

Cheers,
mark

--
[email protected]/[email protected]/[email protected] __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

2004-01-06 17:28:30

by Disconnect

[permalink] [raw]
Subject: [OT] Re: udev and devfs - The final word

On Mon, 2004-01-05 at 19:14, Rob Landley wrote:
> You can run any tcp/ip service over ssh.
>
> Yeah, the script's a quick and dirty hack, but really easy to modify. I have
> a more complicated one using SO_ORIGINAL_DEST and a lookup table if you
> prefer to setup some firewall rules and tell your imap server it lives in the
> 192.168.x.x or 10.x.x.x address range... But I've never gotten around to
> configuring my laptop to use it just to tunnel pop. :)

simpler:
ssh -L<local>:127.0.0.1:<remote> [-C] user@host

eg I forward imap from home (not accessible outside localhost) and
jabber from a second machine at home:
ssh -L143:localhost:143 -L5222:jabber:5222 -C dis@home

Then just point the client at localhost:143 (or 5222) and it 'just
works'. No python required.

For an added bonus, run a script on the far side that does something
like:
while /bin/true; do
if [ -f .dienow ]; then
rm -f .dienow
exit
else sleep 60
done
and on the local side:
while /bin/true; do
ssh -L1:localhost:2 etc user@host ./remotescript
sleep 1
if [ -f .dienow ]; do
scp .dienow user@host: && rm -f .dienow
done
done

..respawn the ssh forwards until it finds ~/.dienow (-and- the current
ssh exits) and then kill off both sides. Don't flood the system by
respawning ssh more than once per 1-2 seconds. (And who said you
couldn't vpn with shell? You can even add/remove ports along the way, if
you get creative.)

--
Disconnect <[email protected]>

2004-01-07 10:23:45

by Olaf Hering

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Thu, Jan 01, Rob Landley wrote:

> Fundamental problem: "Unique" depends on the other devices in the system. You
> can't guarantee unique by looking at one device, more or less by definition.

This is certainly not true. (well, maybe for a few device types).

Almost everything can be reached via a well defined bus (or more than
one bus). Each of them does obviously require an identifier. Thats the
hardware part.
Software tends to put a unique identifier into the 'logical' stuff, like
filesystem UUIDs.
So you can construct a unique device node for every device in the
system. And this will work even across distributions!
Stuff like sda3, mouse1 or dsp0 will obviously break. It just happend to
work because everyone on this list knows what to do and where to look.

Sure, there are exceptions, like 2 identical mice, or 2 identical USB
audio devices. But this cant be fixed.

--
USB is for mice, FireWire is for men!

sUse lINUX ag, nÜRNBERG

2004-01-07 13:39:10

by Robin Rosenberg

[permalink] [raw]
Subject: Re: udev and devfs - The final word

m?ndagen den 5 januari 2004 13.39 skrev Nigel Cunningham:
> Hi.
>
> The suspend to disk implementations all assume that devices are not
> [dis]appearing under us while we're suspended. If you do go adding and
> removing devices while the power is off, you can expect the same
> problems you'd get if you removed them without suspending the machine.
> It would be roughly equivalent to hot[un]plugging devices.

Yes. It's very unclear unless you do mind reading, but I had in mind mounted filesystems
such as /home on a USB stick or firewire Reasonable? yes! But such devices have to
be rediscovered and allocated in such a way that the user can resume using the device
as soon as it has been found. And it should not fail miserably if the user forgets to connect
the device before resuming the machine. As you cannot unmount /home (usually) the
kernel must remember the device somehow or make mounting file systems more loosely
than today.

-- robin


2004-01-07 17:14:22

by Greg KH

[permalink] [raw]
Subject: Re: udev and devfs - The final word

On Tue, Jan 06, 2004 at 02:41:15AM +0100, Andries Brouwer wrote:
> On Mon, Jan 05, 2004 at 04:00:15PM -0800, Greg KH wrote:
>
> > > > Have you even _tried_ udev?
> > >
> > > Yes, and it works reasonably well. I have version 012 here.
> > > Some flaws will be fixed in 013 or so.
> >
> > What flaws would that be? The short time delay for partitions? Or
> > something else?
>
> Yes, partitions are not handled very well.
> So far I have never seen udev discover partitions on its own.

That is because it can not. Please see the current thread "removable
media revalidation - udev vs. devfs or static /dev" on lkml for a
solution to this.

> > > Some difficulties are of a more fundamental type, not so easy to fix.
> >
> > Such as?
>
> Udev cannot do anything when there are no events.
> And media insertion or removal does not always give events.

Exactly. That's why userspace needs to poll for this.

> [By the way, a compilation warning for every C file:
> % make
> gcc -pipe -Wall -Wmore.. -Os -fomit-frame-pointer -D_GNU_SOURCE \
> -I/usr/lib/gcc-lib/i486-suse-linux/3.2/include -I.../udev-012/libsysfs
> -c -o udev.o udev.c
> cc1: warning: changing search order for system directory
> "/usr/lib/gcc-lib/i486-suse-linux/3.2/include"
> cc1: warning: as it has already been specified as a non-system directory]

Odd, it works here just fine on a number of different Red Hat boxes :)

thanks,

greg k-h

2004-01-07 18:19:42

by Nigel Cunningham

[permalink] [raw]
Subject: Re: udev and devfs - The final word

Ah. Well if you've unmounted filesystems prior to suspending, I would
expect you should be fine. The device numbers might change - if they can
change between mounts - but that won't be any different because of
suspending. If you're talking about suspending with the file systems
mounted, that ought to work to (once the appropriate power management
support is done). If the user fails to reconnect the device before
resuming, they should expect the same problems that they would encounter
if they pulled it out without suspending. Of course I'm saying 'should'
a lot here. Let me use it one more time... in my mind at least, the fact
that we've suspended should be irrelevant to how things work.

Regards,

Nigel

On Thu, 2004-01-08 at 02:39, Robin Rosenberg wrote:
> m?ndagen den 5 januari 2004 13.39 skrev Nigel Cunningham:
> > Hi.
> >
> > The suspend to disk implementations all assume that devices are not
> > [dis]appearing under us while we're suspended. If you do go adding and
> > removing devices while the power is off, you can expect the same
> > problems you'd get if you removed them without suspending the machine.
> > It would be roughly equivalent to hot[un]plugging devices.
>
> Yes. It's very unclear unless you do mind reading, but I had in mind mounted filesystems
> such as /home on a USB stick or firewire Reasonable? yes! But such devices have to
> be rediscovered and allocated in such a way that the user can resume using the device
> as soon as it has been found. And it should not fail miserably if the user forgets to connect
> the device before resuming the machine. As you cannot unmount /home (usually) the
> kernel must remember the device somehow or make mounting file systems more loosely
> than today.
>
> -- robin
--
My work on Software Suspend is graciously brought to you by
LinuxFund.org.


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2004-01-12 20:59:44

by Pavel Machek

[permalink] [raw]
Subject: Re: udev and devfs - The final word

Hi!

> If nothing else, things like SATA will end up meaning that the device you
> were used to seeign as /dev/hdc will suddenly show up as /dev/scd0
> instead. Just because you changed the cabling while you upgraded to a

I do not see easy solution for cdroms... UUID is not going to work there...
Pavel