On Tue, Aug 30, 2005 at 04:28:46PM -0700, Andrew Morton wrote:
> Sure, but all that copying-and-pasting really sucks. I'm sure there's some
> way of providing the slightly different semantics from the same codebase?
First, let's look at sharing the primary structures.
[kobject vs config_item]
When I first started configfs, I tried to make it part of sysfs.
That was obviously a bad idea before I got very far at all (more on this
as we go along). So I forked the codebase. I wanted, however, to
preserve kobject and kset, because I figured we didn't need another
kernel object type. For the longest time usysfs (user-sysfs, as it was
originally called) used kobjects and jumped through hoops to make it work.
The base structures, kobject and config_item, are indeed
identical in physical layout. But in use, they are very different. A
kobject, when created, is expected to appear sysfs. You mistakenly
kobject_add(), whoops, there it is. kobject_init() tries to link a
kobject to the kobject->kset pointer, because it's expecting the kernel
to be handling the relationship.
In configfs, the relationship is handled by the VFS. The
linkage is purposefully controlled by the ->mkdir() codepath. The
client driver using this relationship can follow it, but never modify it
or call anything that modifies it.
In the earlier configfs incarnations that used kobject/kset, I
found myself having to fool the kobject infrastructure to get
kobject->kset correct. In addition, it was a complexity I had to always
think about. In configfs with config_item, it's not an issue because it
doesn't exist.
[kobj_type vs config_item_type]
kobj_type has three pointers. A release function for the
object, a set of sysfs_ops for attribute functions, and a set of
default_attrs.
config_item_type needs to know the module owner to pin the
client module. It has 5 item operations (this includes the parallels
for sysfs_ops) and 4 group operations (this is for config_group, the
parallel of kset). Because operation structures are often shared, these
are made into structures.
kobj_type would have to know all the features of
config_item_type, even though it is useless stuff for sysfs. Waste of
memory and code.
[subsystem vs configfs_subsystem]
The sysfs folks are trying to lose the semaphore from a sysfs
subsystem, because it makes sense to have locking that is finer grained.
It should apply to the granularity of the specific device or so.
configfs_subsystem needs the semaphore, because it protects only
one thing: the config_group/config_item hierarchy. When a client module
wants to navigate the tree of its groups and items, it merely takes the
semaphore and navigates. Once it finds and grabs a kref to what it
needs, it can drop it. There is no need for any more complex locking.
[kset vs config_group]
Here is where it gets really different. A kset may or may not
represent a directory exactly. kobjects can be part of a kset they
aren't under in the sysfs tree. configfs_items must be part of the
parent config_group, becuase the VFS tree (the dcache) is the exact
arbiter of the relationship.
Then there is the physical structure. A kset contains a pointer
to a kobj_type that its children will be assigned. This field is
useless in configfs, and more importantly, it could result in an
assignemnt that isn't related to the real object type, becuase of kset's
ability to have non-children members. This adds more complexity to the
code, as you have to handle, prevent, or kludge around this case.
kset contains a lock around its list of children. configfs
doesn't need this because the subsystem lock is protecting the hierarchy
here.
kset contains a hotplug ops pointer. configfs has nothing to do
with hotplug. So here are three pointers any configfs object has to
carry around for no reason other than sharing a structure.
Conversely, a config_group contains a list of default subgroups.
They are not attributes, they are groups. This is not the same as
config_item's default_attrs. Here, kset would carry around an extra
pointer for every object.
Given the stress everyone places on the memory usage of these
objects, adding useless pointers to both sides just to share the
structure seems a very bad idea.
--
"Well-timed silence hath more eloquence than speech."
- Martin Fraquhar Tupper
Joel Becker
Senior Member of Technical Staff
Oracle
E-mail: [email protected]
Phone: (650) 506-8127
On Tue, Aug 30, 2005 at 04:28:46PM -0700, Andrew Morton wrote:
> Sure, but all that copying-and-pasting really sucks. I'm sure there's some
> way of providing the slightly different semantics from the same codebase?
What about the backing store? Specifically, sysfs_dirent vs
configfs_dirent.
The structures are almost identical. What's different?
configfs has a list of symlinks, as these are hard linkages and involve
pinning and reference counting. So, to merge the structures, you have
to add two pointers (a list_head) to every sysfs object.
Allocating, initializing, and freeing them really does appear to
be virtually identical. The functions that call the creation are very
different, but they could call the same thing. There are more types of
things in configfs, so all shared calls would have to be able to handle
them.
Oh, but the get_name() functions, the one that return the string
name of a _dirent, are very different. So you'd have to add another
pointer to the structure, a ->get_name() callback. That's an additional
pointer for every sysfs object.
The attach_attr() functions are different. Some of that is the
BIN_ATTR type of sysfs, which configfs doesn't and shouldn't have. In
that case, the code still works, as BIN_ATTR test wouldn't succeed.
They configure dentry_ops, which are different in sysfs and configfs.
So the API would have to change to specify the appropriate dentry_ops.
This is certainly not insurmountable. I don't know what you'd
call it, fs/libfs/backing_store.c? I'm interested in what the sysfs
folks have to say on this, and how much they'd like to help.
Joel
--
A good programming language should have features that make the
kind of people who use the phrase "software engineering" shake
their heads disapprovingly.
- Paul Graham
Joel Becker
Senior Member of Technical Staff
Oracle
E-mail: [email protected]
Phone: (650) 506-8127
On Sat, Sep 03, 2005 at 09:12:24PM -0700, Joel Becker wrote:
> On Tue, Aug 30, 2005 at 04:28:46PM -0700, Andrew Morton wrote:
> > Sure, but all that copying-and-pasting really sucks. I'm sure there's some
> > way of providing the slightly different semantics from the same codebase?
The final piece of similar code is the buffered I/O setup for
attribute files. Here, the major difference is how config_items and
kobjects refer to their show/store operations. The functions have
different rules on this.
A kobject doesn't need show/store, it can provide one for the
entire kset or subsystem. A config_item necessarily has one for its
type, and cannot chain up. So somehow the code would need to know which
was the case.
A kobject directly has a sysfs_ops structure. The config_item
has show/store in a config_item_operations strucutre. If you split them
out, you add a pointer and some needless complexity. So the different
code paths need to refer to the functions differently.
The attribute and configfs_attribute structures are physically
identical, but the sysfs one has poor naming (IMHO). If struct
attribute wasn't going to change, and no magic would be added to its
usage (magic like kobject_add()'s intertwining with sysfs), they could
probably be shared.
Joel
--
"The doctrine of human equality reposes on this: that there is no
man really clever who has not found that he is stupid."
- Gilbert K. Chesterson
Joel Becker
Senior Member of Technical Staff
Oracle
E-mail: [email protected]
Phone: (650) 506-8127
On Sat, Sep 03, 2005 at 09:41:36PM -0700, Joel Becker wrote:
> On Sat, Sep 03, 2005 at 09:12:24PM -0700, Joel Becker wrote:
> > On Tue, Aug 30, 2005 at 04:28:46PM -0700, Andrew Morton wrote:
> > > Sure, but all that copying-and-pasting really sucks. I'm sure there's some
> > > way of providing the slightly different semantics from the same codebase?
>
So, what conclusions can we draw from looking at this again?
First, merging the kobject/kset structures with the
config_item/config_group structures has one benefit: they have the same
name. However, that's also the first problem. As it currently stands,
you know from the name what you are working with. If they have the same
name, you don't know if you are working with an object in sysfs or
configfs. We call a pid "pid_t" rather than "unsigned int" for the same
reason, clarity of usage.
Merging them has a lot of other problems. Far from removing
code bloat (a configfs object and a sysfs object need different
initialization, etc, so there would be two paths anyway), it adds
significant memory bloat, as configfs objects and sysfs objects are
carrying around members for the other filesystem. Finally, there are
surprises in store if you try to use some of the API that isn't
appropriate. With a different name and type, you just _can't_ do that.
We could perhaps share the attribute structure. This has the
same name confusion problem above, but that's about it, assuming that
sysfs never wants to do anything smarter with them. I'm hoping the
sysfs folks comment on this.
Sharing the attribute file code is much harder. They operate on
different types, and those types are laid out differently. I think we'd
have to make the API far more invasive. Instead of black-boxing the API
and having only show/store to implement, we'd have to provide some sort
of translation from the native structure to a shared type. Basically,
repeating what the VFS has already done for us.
Finally, assuming that the sysfs_dirent/configfs_dirent
arrangement is pretty fleshed out, I think that perhaps this backing
store could be joined. Again, no more magic could be added, and it
would have to handle the sysfs and configfs types in concurrence, but I
think it could be done. Again, sysfs folks, please comment.
Joel
--
"Hell is oneself, hell is alone, the other figures in it, merely projections."
- T. S. Eliot
Joel Becker
Senior Member of Technical Staff
Oracle
E-mail: [email protected]
Phone: (650) 506-8127
On Sat, Sep 03, 2005 at 09:54:45PM -0700, Joel Becker wrote:
> Finally, assuming that the sysfs_dirent/configfs_dirent
> arrangement is pretty fleshed out, I think that perhaps this backing
> store could be joined. Again, no more magic could be added, and it
> would have to handle the sysfs and configfs types in concurrence, but I
> think it could be done. Again, sysfs folks, please comment.
Thanks for the good explainations. I still agree with Joel that these
need to be separate fses.
thanks,
greg k-h