2002-11-10 22:13:08

by Alexander Viro

[permalink] [raw]
Subject: [RFC] devfs API

During the last couple of weeks I'd done a lot of digging in
devfs-related code. Results are interesting, and not in a good sense.

1) a _lot_ of functions exported by devfs are never used. At
all.
2) a bunch of functions is used only by SGI hwgraph "port".
Moreover, a lot of codepaths in functions that *are* used outside of
said port, is only exercised by hwgraph. (More on that below)
3) gratitious arguments (read: all callers pass the same value).
4) semantics of devfs_register() and devfs_unregister() is, er,
suboptimal and leads to rather messy cleanup paths in drivers. (More on that
below).
5) instead of using dev_t, devfs insists on keeping and passing
majors/minors separately, which makes both callers _and_ devfs itself
messier than necessary.
6) devfs_entry is a nightmare. It's a structure that contains
union (by node type), one of the fields of said union is a structure
that contains void *ops, flags, and a union of dev_t (stored as major/minor
pair) and size_t. The reason for that (and charming expressions like
de->u.fcb.u.device.major) is an attempt to use one field for regular files,
character devices and block devices. The *only* thing really common to all of
them is set of flags.
7) idea of "slave" entries (== unregistered when master is
unregistered) hadn't worked out - it's easier to do explicit unregister
in 3 or 4 places that use these animals.

Overall, I would expect ~50% size reduction in devfs/base.c simply from
dropping unused code. The rest would be much easier to debug.

Note that devfs is *seriously* burdened by hwgraph. To the point where
it would be better to give SGI folks a private copy (they want slightly
different semantics anyway) and merge it with hcl.c and friends. And
drop the unused stuff from devfs proper.

Situation when one obscure caller is responsible for ~30% of exported API
and pretty much gets unrestrictred access to internals is Not Good(tm).
Situation when ~40% of exported API is either not used at all or used only
by devfs itself is also not pretty.

Note that there's a large part of devfs that is never used by hwgraph
code. IOW, after such split *both* devfs and hwgraph copy would shrink
a lot. Shared part is actually rather small and both sides would be
better off from clear separation - e.g. locking and refcounting mechanisms
should be different, judging by the tricks hcl.c tries to pull off.

Another problem is the semantics of devfs_register/devfs_unregister.
rm -rf and install(1) do not match each other, even if they were
suitable as primitives (which is a dubious idea, to start with).

First of all, devfs_unregister(devfs_register(...)) is _not_ a no-op.
It may leave you with a bunch of new directories that have to be removed
later. Moreover, you don't even know how many of them were there before
devfs_register() and need to be removed.

What's more, after devfs_register() we are allowed to create objects in
the intermediate directories that might appear (we can call mkdir(2) in
there, to start with). However, devfs_unregister() wipes these out, which
is arguably a wrong thing to do - they were not created by driver, so
driver has no business to decide when they should be gone.

The following scheme would give saner behaviour (and deal with devfs_register()
failing in the middle of the way, etc. more gracefully):
* all entries get two new fields: integer R and boolen V.
* VFS creation methods (mknod, symlink, etc.) set R on the new node
to 0 and set V to true for that node and all its ancestors. Refcount of
new node is set to 1.
* register creates nodes with V set to false and increment R on the
object we had created and all its ancestors. Refcount of created nodes is
set to 1. If we fail to create a node (out of memory) we undo all increments
of R we had done so far.
* VFS removal methods (unlink and rmdir) fail if R is positive or
V is false. Otherwise they set V to false.
* unregister decrements R on the victim and all its ancestors.
* node is detached from parent whenever (R == 0 && !V) becomes true.
After that refcount of node is decremented.
* node is freed when refcount reaches 0.
* root is originally created with R set to 0 and V set to true.
* places that currently grab/drop refcount still do that.

That will guarantee that
* once userland creates an object, only userland can remove it or
any of its ancestors.
* object can't be removed when kernel holds it or some of its
descendents registered.
* unregister(register(...)) is a no-op
* register failing mid-way cleans up after itself
* objects are always removed by those who had created them.
* as long as driver unregisters all objects it had registered,
it doesn't have to worry about intermediate directories, etc.

The price of switching to that scheme is that we will need to switch
drivers to explicit cleanups (i.e. instead of devfs_mk_dir "loop" +
register <n> in that directory + remove "loop" upon the exit we would
register loop/<n> when we initialize struct loop_device and unregister
it when we clean struct loop_device - actually, that could be done as
side effects of add_disk()/del_gendisk()).

Transition to explicit cleanups can be done before any changes in devfs
proper - the sequence is
* add explicit devfs_unregister() (or devfs_find_and_unregister())
where needed in drivers; everything keeps working.
* add aforementioned fields to devfs_entry and modify devfs_register()
and friends (see above). No changes in drivers.
* drop a shitload of devfs_mk_dir()/corresponding directory removals
from drivers; everything still works.
* shift most of remaining calls in block device drivers into
add_disk()/del_gendisk(), etc.

I'd estimate that sequence as about a week of work - devfs changes in it can
be kept fairly local. And IMNSHO it is needed, since it will make devfs
users much cleaner.

Aside of that, there is a bunch of obvious cleanups - e.g. the 6th
argument of devfs_find_and_unregister() is (and should be) always 0; 3rd,
4th and 5th arguments are never looked at; the first one is NULL in almost
all cases and getting 3 or 4 exceptions into that form is absolutely trivial.
IOW, 4 arguments out of 6 are completely gratitious and reducing the thing
to devfs_remove(pathname) is a matter of several one-liners.

There's a lot of such cases, but they definitely fall into "obvious
cleanups" category. Really critical issues are getting sane model for
register/unregister (doable in small steps and I'm ready to do the entire
series) and separation of hwgraph - preferably giving it a filesystem of
its own with the interface hwgraph wants.


2002-11-10 22:43:09

by Bart De Schuymer

[permalink] [raw]
Subject: oops when adding bridge interface, using v2.5.45

Hello,

Ever since I've been using the bridge with a 2.5 kernel I've been having a
non-fatal oops when adding a bridge interface: adding a bridge works, then
adding a first interface gives the oops below, adding a second interface
works. Note that the bridge behaves correct afaik (as in "it works") and both
interfaces get created, which is why this was no priority.

net/bridge/br_if.c::new_nbp() does
p = kmalloc(sizeof(*p), GFP_KERNEL);
which triggers the oops through mm/slab.c::kmem_flagcheck()
I don't know what is wrong (I have no such problems with 2.4).

The trace is below:

Debug: sleeping function called from illegal context at mm/slab.c:1304
Call Trace:
[<c011a93e>] __might_sleep+0x52/0x54
[<c013b18e>] kmem_flagcheck+0x1e/0x54
[<c013bf04>] kmalloc+0x54/0x134
[<d084864e>] new_nbp+0x22/0xb8 [bridge]
[<d084ba65>] .rodata+0x485/0xf60 [bridge]
[<d0848826>] br_add_if+0x8e/0x1a4 [bridge]
[<d0848ee0>] br_ioctl_device+0x60/0x454 [bridge]
[<d084d92c>] ioctl_mutex+0x0/0x18 [bridge]
[<c0114829>] smp_local_timer_interrupt+0xbd/0xd4
[<d084d92c>] ioctl_mutex+0x0/0x18 [bridge]
[<c0136943>] filemap_nopage+0xeb/0x2b4
[<c0136973>] filemap_nopage+0x11b/0x2b4
[<d08494a4>] br_ioctl+0x68/0x7c [bridge]
[<d08471aa>] br_dev_do_ioctl+0x4a/0x5c [bridge]
[<c0287870>] dev_ifsioc+0x344/0x360
[<c0287ada>] dev_ioctl+0x24e/0x2f4
[<c027fc7d>] sock_ioctl+0x95/0x354
[<c015f09a>] sys_ioctl+0x25e/0x2d5
[<c0108eb7>] syscall_call+0x7/0xb

--
cheers,
Bart

2002-11-10 23:02:23

by Francois Romieu

[permalink] [raw]
Subject: Re: oops when adding bridge interface, using v2.5.45

Bart De Schuymer <[email protected]> :
[...]
> net/bridge/br_if.c::new_nbp() does
> p = kmalloc(sizeof(*p), GFP_KERNEL);
> which triggers the oops through mm/slab.c::kmem_flagcheck()
> I don't know what is wrong (I have no such problems with 2.4).
>
> The trace is below:
[sleeping function called with lock held]

[net/bridge/br_if.c:236]
write_lock_bh(&br->lock);
if ((p = new_nbp(br, dev)) == NULL) {
write_unlock_bh(&br->lock);

Following patch should fix it. It looks the same in 2.5.46.

--- net/bridge/br_if.c Sun Nov 10 23:57:28 2002
+++ net/bridge/br_if.c Sun Nov 10 23:57:57 2002
@@ -143,7 +143,7 @@ static struct net_bridge_port *new_nbp(s
int i;
struct net_bridge_port *p;

- p = kmalloc(sizeof(*p), GFP_KERNEL);
+ p = kmalloc(sizeof(*p), GFP_ATOMIC);
if (p == NULL)
return p;


--
Ueimor

2002-11-10 23:40:29

by Bart De Schuymer

[permalink] [raw]
Subject: Re: oops when adding bridge interface, using v2.5.45

On Monday 11 November 2002 00:08, [email protected] wrote:
> Following patch should fix it. It looks the same in 2.5.46.

Yes, this fixes it. Btw, I was using 2.5.46, sorry for any confusion...

--
cheers,
Bart

2002-11-12 01:25:56

by Ryan Anderson

[permalink] [raw]
Subject: Re: [RFC] devfs API

On Sun, Nov 10, 2002 at 05:19:42PM -0500, Alexander Viro wrote:
> During the last couple of weeks I'd done a lot of digging in
> devfs-related code. Results are interesting, and not in a good sense.

>From an outsider point of view (and because nobody else responded), I
think the big question here would be: Would you use it after this
cleanup?

If you say yes, I'd say that's a good sign in its favor.

--

Ryan Anderson
sometimes Pug Majere

2002-11-12 01:42:35

by Alexander Viro

[permalink] [raw]
Subject: Re: [RFC] devfs API



On Mon, 11 Nov 2002, Ryan Anderson wrote:

> On Sun, Nov 10, 2002 at 05:19:42PM -0500, Alexander Viro wrote:
> > During the last couple of weeks I'd done a lot of digging in
> > devfs-related code. Results are interesting, and not in a good sense.
>
> >From an outsider point of view (and because nobody else responded), I
> think the big question here would be: Would you use it after this
> cleanup?
>
> If you say yes, I'd say that's a good sign in its favor.

The only way I'll use devfs is
* on a separate testbox devoid of network interfaces
* with no users
* with no data - disk periodically populated from image on CD.

And that's regardless of that cleanup - fixing the interface doesn't solve
the internal races, so...

2002-11-12 01:43:55

by Robert Love

[permalink] [raw]
Subject: Re: [RFC] devfs API

On Mon, 2002-11-11 at 20:32, Ryan Anderson wrote:

> From an outsider point of view (and because nobody else responded),
> I think the big question here would be: Would you use it after this
> cleanup?
>
> If you say yes, I'd say that's a good sign in its favor.

Good heuristic, except Al would not use devfs in either case :)

Personally, as far as I am concerned, they are sane changes.

Robert Love


2002-11-12 07:58:13

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC] devfs API

On Mon, Nov 11, 2002 at 08:49:22PM -0500, Alexander Viro wrote:
> The only way I'll use devfs is
> * on a separate testbox devoid of network interfaces
> * with no users
> * with no data - disk periodically populated from image on CD.
>
> And that's regardless of that cleanup - fixing the interface doesn't solve
> the internal races, so...

Hi Al,

It's good that you're trying to clean up the devfs code, but...

How many people are actually using devfs these days? I don't like it
myself, and I've had to add a fair amount of hair to fsck's
mount-by-label/uuid code to deal with interesting cases such as
kernels where devfs is configured, but not actually mounted (it
changes what /proc/partitions exports). So I'm one of those who have
never looked all that kindly on devfs, which shouldn't come as a
surprise to most folks.

In any case, if there aren't all that many people using devfs, I can
think of a really easy way in which we could simplify and clean up its
API by slimming it down by 100%......

- Ted

2002-11-12 08:38:15

by Rudmer van Dijk

[permalink] [raw]
Subject: Re: [RFC] devfs API

On Tuesday 12 November 2002 09:04, Theodore Ts'o wrote:
> On Mon, Nov 11, 2002 at 08:49:22PM -0500, Alexander Viro wrote:
> > The only way I'll use devfs is
> > * on a separate testbox devoid of network interfaces
> > * with no users
> > * with no data - disk periodically populated from image on CD.
> >
> > And that's regardless of that cleanup - fixing the interface doesn't solve
> > the internal races, so...
>
> Hi Al,
>
> It's good that you're trying to clean up the devfs code, but...
>
> How many people are actually using devfs these days? I don't like it
> myself, and I've had to add a fair amount of hair to fsck's
> mount-by-label/uuid code to deal with interesting cases such as
> kernels where devfs is configured, but not actually mounted (it
> changes what /proc/partitions exports). So I'm one of those who have
> never looked all that kindly on devfs, which shouldn't come as a
> surprise to most folks.

well I like devfs, in the sense that it is really easy to see what you can
use in /dev. Before i used devfs it could be quite difficult since there were
so much nodes and symlinks in /dev and many have cryptic names... and
sometimes the entries i needed simply were not there so i had to find the
right major/minor numbers to create them...

from a user point of view it is better to keep it because it could really
simplify a users life except ide should be just in discs as hdX and not as
/dev/ide/hostN/busX/targetY/lunZ/disc ...

>
> In any case, if there aren't all that many people using devfs, I can
> think of a really easy way in which we could simplify and clean up its
> API by slimming it down by 100%......

if the code is really that horrible, then maybe that is the best solution but
again i like the concept.

Rudmer

2002-11-12 09:36:19

by Alexander Viro

[permalink] [raw]
Subject: Re: [RFC] devfs API



On Tue, 12 Nov 2002, Theodore Ts'o wrote:

> In any case, if there aren't all that many people using devfs, I can
> think of a really easy way in which we could simplify and clean up its
> API by slimming it down by 100%......

Well. If Linus decides to remove devfs, I certainly won't weep for it.
However, I don't see any signs of that happening right now, and cleaned
interface is less PITA than what we have in the current tree. Right now
I'm mostly interested in making the glue in drivers simpler and less
intrusive. The fact that it leads to less/simpler code in devfs proper
is also a Good Thing(tm)...

2002-11-12 09:43:28

by Miles Bader

[permalink] [raw]
Subject: Re: [RFC] devfs API

Rudmer van Dijk <[email protected]> writes:
> from a user point of view it is better to keep it because it could really
> simplify a users life except ide should be just in discs as hdX and not as
> /dev/ide/hostN/busX/targetY/lunZ/disc ...

Of course, that should really be `disk' (judging from general kernel usage)...

-Miles
--
.Numeric stability is probably not all that important when you're guessing.

2002-11-12 10:40:33

by Helge Hafting

[permalink] [raw]
Subject: Re: [RFC] devfs API

Theodore Ts'o wrote:
>
> On Mon, Nov 11, 2002 at 08:49:22PM -0500, Alexander Viro wrote:

> Hi Al,
>
> It's good that you're trying to clean up the devfs code, but...
>
> How many people are actually using devfs these days? I don't like it
> myself, and I've had to add a fair amount of hair to fsck's
> mount-by-label/uuid code to deal with interesting cases such as
> kernels where devfs is configured, but not actually mounted (it
Either you use devfs, or you don't. Why even support such
an odd case? Those wanting devfs makes the transition once.

> changes what /proc/partitions exports). So I'm one of those who have
> never looked all that kindly on devfs, which shouldn't come as a
> surprise to most folks.

I use devfs and likes it. Perhaps the current _implementation_ is bad,
the _idea_ is certainly good. Seeing only devices I actually have
is so much better than seeing all devices I possibly could
connect. And it is updated automatically whenever something
is added to or removed from the machine.
A cleaner smaller devfs increase the chance of better
maintenance.

I also like having subdirectories instead of a flat /dev, but
that is of course doable with the old way too.

Helge Hafting

2002-11-12 10:50:31

by Helge Hafting

[permalink] [raw]
Subject: Re: [RFC] devfs API

Rudmer van Dijk wrote:

>
> from a user point of view it is better to keep it because it could really
> simplify a users life except ide should be just in discs as hdX and not as
> /dev/ide/hostN/busX/targetY/lunZ/disc ...

Use /dev/discs if that's what you like. The /dev/ide/hostN...
alternative is there for a reason though.

If you have, say 3 ide controllers and remove a disk
from the second one then disks on the third is renumbered
if they're in /dev/discs. The ones in /dev/ide/host2/...
stays put.

Desktop machines may not need all that with one disk only,
but it is useful for servers using IDE drives.

Helge Hafting

2002-11-12 11:40:07

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [RFC] devfs API

On Sun, Nov 10, 2002 at 05:19:42PM -0500, Alexander Viro wrote:
> During the last couple of weeks I'd done a lot of digging in
> devfs-related code. Results are interesting, and not in a good sense.
> 1) a _lot_ of functions exported by devfs are never used. At
> all.

Kill 'em.

On Sun, Nov 10, 2002 at 05:19:42PM -0500, Alexander Viro wrote:
> 2) a bunch of functions is used only by SGI hwgraph "port".
> Moreover, a lot of codepaths in functions that *are* used outside of
> said port, is only exercised by hwgraph. (More on that below)

Is that in-tree? If so, why? Hmm. IA64 arch code. I'd ask Mosberger for
a maintainer check for whether killing it outright would be acceptable
or something.


On Sun, Nov 10, 2002 at 05:19:42PM -0500, Alexander Viro wrote:
> 3) gratitious arguments (read: all callers pass the same value).

Kill 'em.


On Sun, Nov 10, 2002 at 05:19:42PM -0500, Alexander Viro wrote:
> 4) semantics of devfs_register() and devfs_unregister() is, er,
> suboptimal and leads to rather messy cleanup paths in drivers. (More on that
> below).

This is the weird implicit rm -rf stuff?


On Sun, Nov 10, 2002 at 05:19:42PM -0500, Alexander Viro wrote:
> 5) instead of using dev_t, devfs insists on keeping and passing
> majors/minors separately, which makes both callers _and_ devfs itself
> messier than necessary.

Um, wait, ISTR a large exercise in insulating in-kernel API's from such
device numbers. I take this to mean devfs is not cooperating.


On Sun, Nov 10, 2002 at 05:19:42PM -0500, Alexander Viro wrote:
> 6) devfs_entry is a nightmare. It's a structure that contains
> union (by node type), one of the fields of said union is a structure
> that contains void *ops, flags, and a union of dev_t (stored as major/minor
> pair) and size_t. The reason for that (and charming expressions like
> de->u.fcb.u.device.major) is an attempt to use one field for regular files,
> character devices and block devices. The *only* thing really common to all of
> them is set of flags.

I'm having nightmares already.


On Sun, Nov 10, 2002 at 05:19:42PM -0500, Alexander Viro wrote:
> 7) idea of "slave" entries (== unregistered when master is
> unregistered) hadn't worked out - it's easier to do explicit unregister
> in 3 or 4 places that use these animals.

Unused and/or unusable API feature == kill at will IMHO. Even in a
stable release and/or codefreeze.


On Sun, Nov 10, 2002 at 05:19:42PM -0500, Alexander Viro wrote:
> Overall, I would expect ~50% size reduction in devfs/base.c simply from
> dropping unused code. The rest would be much easier to debug.
> Note that devfs is *seriously* burdened by hwgraph. To the point where
> it would be better to give SGI folks a private copy (they want slightly
> different semantics anyway) and merge it with hcl.c and friends. And
> drop the unused stuff from devfs proper.
> Situation when one obscure caller is responsible for ~30% of exported API
> and pretty much gets unrestrictred access to internals is Not Good(tm).
> Situation when ~40% of exported API is either not used at all or used only
> by devfs itself is also not pretty.

Excellent, especially if compiled codesize reduction also follows. I'm
still questioning what something that invasive is doing in arch code,
and why it's there at all if it should be core but does not meet core
code standards.

As far as removing unused crap goes, it's unused. Aside from things
getting smaller, faster, and cleaner, it's a nop. All good.


On Sun, Nov 10, 2002 at 05:19:42PM -0500, Alexander Viro wrote:
> Note that there's a large part of devfs that is never used by hwgraph
> code. IOW, after such split *both* devfs and hwgraph copy would shrink
> a lot. Shared part is actually rather small and both sides would be
> better off from clear separation - e.g. locking and refcounting mechanisms
> should be different, judging by the tricks hcl.c tries to pull off.

I haven't seen hcl.c and I'm afraid to look from hearing this. These
is, after all, the same vendor responsible for shub_mmr_t.h


On Sun, Nov 10, 2002 at 05:19:42PM -0500, Alexander Viro wrote:
> Another problem is the semantics of devfs_register/devfs_unregister.
> rm -rf and install(1) do not match each other, even if they were
> suitable as primitives (which is a dubious idea, to start with).
> First of all, devfs_unregister(devfs_register(...)) is _not_ a no-op.
> It may leave you with a bunch of new directories that have to be removed
> later. Moreover, you don't even know how many of them were there before
> devfs_register() and need to be removed.

That is too bogus for words. Reimplementing the mechanics there is a
pure bugfix IMHO.


On Sun, Nov 10, 2002 at 05:19:42PM -0500, Alexander Viro wrote:
> What's more, after devfs_register() we are allowed to create objects in
> the intermediate directories that might appear (we can call mkdir(2) in
> there, to start with). However, devfs_unregister() wipes these out, which
> is arguably a wrong thing to do - they were not created by driver, so
> driver has no business to decide when they should be gone.

More bugs. Can of Raid == good.


On Sun, Nov 10, 2002 at 05:19:42PM -0500, Alexander Viro wrote:
> The following scheme would give saner behaviour (and deal with devfs_register()
> failing in the middle of the way, etc. more gracefully):
> * all entries get two new fields: integer R and boolen V.
> * VFS creation methods (mknod, symlink, etc.) set R on the new node
> to 0 and set V to true for that node and all its ancestors. Refcount of
> new node is set to 1.
> * register creates nodes with V set to false and increment R on the
> object we had created and all its ancestors. Refcount of created nodes is
> set to 1. If we fail to create a node (out of memory) we undo all increments
> of R we had done so far.
> * VFS removal methods (unlink and rmdir) fail if R is positive or
> V is false. Otherwise they set V to false.
> * unregister decrements R on the victim and all its ancestors.
> * node is detached from parent whenever (R == 0 && !V) becomes true.
> After that refcount of node is decremented.
> * node is freed when refcount reaches 0.
> * root is originally created with R set to 0 and V set to true.
> * places that currently grab/drop refcount still do that.

Okay, straightforward refcounting and (in)validation scheme.
Seal of approval thus far.


On Sun, Nov 10, 2002 at 05:19:42PM -0500, Alexander Viro wrote:
> That will guarantee that
> * once userland creates an object, only userland can remove it or
> any of its ancestors.
> * object can't be removed when kernel holds it or some of its
> descendents registered.
> * unregister(register(...)) is a no-op
> * register failing mid-way cleans up after itself
> * objects are always removed by those who had created them.
> * as long as driver unregisters all objects it had registered,
> it doesn't have to worry about intermediate directories, etc.

Okay, this means the bugs are fixed, which pretty much means that proper
(in)validation and refcounting is happening as above.


On Sun, Nov 10, 2002 at 05:19:42PM -0500, Alexander Viro wrote:
> The price of switching to that scheme is that we will need to switch
> drivers to explicit cleanups (i.e. instead of devfs_mk_dir "loop" +
> register <n> in that directory + remove "loop" upon the exit we would
> register loop/<n> when we initialize struct loop_device and unregister
> it when we clean struct loop_device - actually, that could be done as
> side effects of add_disk()/del_gendisk()).

I personally would balk at this amount of work, but it's the implication
of the handling scheme.


On Sun, Nov 10, 2002 at 05:19:42PM -0500, Alexander Viro wrote:
> Transition to explicit cleanups can be done before any changes in devfs
> proper - the sequence is
> * add explicit devfs_unregister() (or devfs_find_and_unregister())
> where needed in drivers; everything keeps working.
> * add aforementioned fields to devfs_entry and modify devfs_register()
> and friends (see above). No changes in drivers.
> * drop a shitload of devfs_mk_dir()/corresponding directory removals
> from drivers; everything still works.
> * shift most of remaining calls in block device drivers into
> add_disk()/del_gendisk(), etc.

I'm foggy on gendisk mechanics. Sounds like the gendisk bits handle the
disk-specific bits of devfs handling?


On Sun, Nov 10, 2002 at 05:19:42PM -0500, Alexander Viro wrote:
> I'd estimate that sequence as about a week of work - devfs changes in it can
> be kept fairly local. And IMNSHO it is needed, since it will make devfs
> users much cleaner.
> Aside of that, there is a bunch of obvious cleanups - e.g. the 6th
> argument of devfs_find_and_unregister() is (and should be) always 0; 3rd,
> 4th and 5th arguments are never looked at; the first one is NULL in almost
> all cases and getting 3 or 4 exceptions into that form is absolutely trivial.
> IOW, 4 arguments out of 6 are completely gratitious and reducing the thing
> to devfs_remove(pathname) is a matter of several one-liners.

Cheers to the kernel janitor extroardinaire. =)


On Sun, Nov 10, 2002 at 05:19:42PM -0500, Alexander Viro wrote:
> There's a lot of such cases, but they definitely fall into "obvious
> cleanups" category. Really critical issues are getting sane model for
> register/unregister (doable in small steps and I'm ready to do the entire
> series) and separation of hwgraph - preferably giving it a filesystem of
> its own with the interface hwgraph wants.

This hwgraph stuff sounds very suspicious to me. A quick look makes me
relatively suspicious about what it's doing with devfs since it looks
like it's doing a lot more gyrations centered around its own private
structures than anything to do with devfs aside from twiddling the
occasional dentry or two.

And frankly, it looks bizarre especially considering it's attempting to
represent notions of topology not present in Linux at all.


Cheers,
Bill

2002-11-12 21:53:07

by Jon Portnoy

[permalink] [raw]
Subject: Re: [RFC] devfs API



On Tue, 12 Nov 2002, Theodore Ts'o wrote:

> On Mon, Nov 11, 2002 at 08:49:22PM -0500, Alexander Viro wrote:
[snip]
>
> Hi Al,
>
> It's good that you're trying to clean up the devfs code, but...
>
> How many people are actually using devfs these days? I don't like it
> myself, and I've had to add a fair amount of hair to fsck's
> mount-by-label/uuid code to deal with interesting cases such as
> kernels where devfs is configured, but not actually mounted (it
> changes what /proc/partitions exports). So I'm one of those who have
> never looked all that kindly on devfs, which shouldn't come as a
> surprise to most folks.
>
> In any case, if there aren't all that many people using devfs, I can
> think of a really easy way in which we could simplify and clean up its
> API by slimming it down by 100%......
>

Actually, a lot of people are. Gentoo, at least, uses it by default.

I think devfs is a good idea, as a user of it.

2002-11-13 10:23:09

by Denis Vlasenko

[permalink] [raw]
Subject: Re: [RFC] devfs API

On 12 November 2002 06:04, Theodore Ts'o wrote:
> On Mon, Nov 11, 2002 at 08:49:22PM -0500, Alexander Viro wrote:
> > The only way I'll use devfs is
> > * on a separate testbox devoid of network interfaces
> > * with no users
> > * with no data - disk periodically populated from image on CD.
> >
> > And that's regardless of that cleanup - fixing the interface
> > doesn't solve the internal races, so...
>
> Hi Al,
>
> It's good that you're trying to clean up the devfs code, but...
>
> How many people are actually using devfs these days? I don't like it
> myself, and I've had to add a fair amount of hair to fsck's
> mount-by-label/uuid code to deal with interesting cases such as
> kernels where devfs is configured, but not actually mounted (it
> changes what /proc/partitions exports). So I'm one of those who have
> never looked all that kindly on devfs, which shouldn't come as a
> surprise to most folks.
>
> In any case, if there aren't all that many people using devfs, I can
> think of a really easy way in which we could simplify and clean up
> its API by slimming it down by 100%......

I do use it.
--
vda

2002-11-14 17:58:00

by Alexander Viro

[permalink] [raw]
Subject: Re: [RFC] devfs API



On Thu, 14 Nov 2002, Richard Gooch wrote:

> - I'm leery of changing the API and breaking compatibility between 2.4
> and 2.5 drivers. I also don't want to break out-of-tree drivers
> without giving maintainers plenty of warning. There are a number
> such out there

It's a bit late for that. Compatibility between 2.4 and 2.5 in the
drivers is already broken and devfs won't be anywhere near the top of
the list.

2002-11-14 17:44:32

by Richard Gooch

[permalink] [raw]
Subject: Re: [RFC] devfs API

Alexander Viro writes:
> During the last couple of weeks I'd done a lot of digging in
> devfs-related code. Results are interesting, and not in a good sense.
>
> 1) a _lot_ of functions exported by devfs are never used. At
> all.
[...]

I don't have time right now do deal with all the points you raised,
I'll deal with each of the points you raise over the next week or
so. However, I'll make a couple of quick points:

- I'm leery of changing the API and breaking compatibility between 2.4
and 2.5 drivers. I also don't want to break out-of-tree drivers
without giving maintainers plenty of warning. There are a number
such out there

- I have far more drastic plans for code reduction in devfs. The plan
I've mentioned since OLS is to leverage sysfs so that devfsd can be
used to populate devfs from user-space. For the root FS device, I
figure on writing a mini devfsd for initramfs. From the perspective
of user-space, this will provide functional compatibility. From
kernel-space it will effectively be an API change, so I don't want
to do this twice.

I'd hoped to get the new major version of devfs ready before the
freeze, but limited time (read: funding) has been available. Oh, well.

Regards,

Richard....
Permanent: [email protected]
Current: [email protected]