Right now I have about 80+ patches reworking the kset/ktype mess in the
-mm tree, cleaning things up and hopefully making it all easier for
people to both use, and understand.
So, while it is all relativly fresh in my mind, I thought it would be
good to also document the whole mess, and provide some solid example
code for others to use in the future.
Jonathan, I used your old lwn.net article about kobjects as the basis
for this document, I hope you don't mind (if you do, I'll be glad to
start over). I've updated it to what is going on in the -mm tree, and
added new information.
This file should replace the existing Documentation/kobject.txt which is
woefully out of date and obsolete now.
I also have two example kernel modules showing how to use a simple
kobject and attributes, as well as a more complex kset/ktype/kobject
interaction. I'll reply to this message with them as well, and I am
going to place them in the samples/kobject/ directory unless someone
really objects.
Any review comments that people might have on both the document, and the
two sample modules would be greatly appreciated.
thanks,
greg k-h
-----------------------------
Everything you never wanted to know about kobjects, ksets, and ktypes
Greg Kroah-Hartman <[email protected]>
Based on an original article by Jon Corbet for lwn.net written October 1,
2003 and located at http://lwn.net/Articles/51437/
Last updated November 27, 2008
Part of the difficulty in understanding the driver model - and the kobject
abstraction upon which it is built - is that there is no obvious starting
place. Dealing with kobjects requires understanding a few different types,
all of which make reference to each other. In an attempt to make things
easier, we'll take a multi-pass approach, starting with vague terms and
adding detail as we go. To that end, here are some quick definitions of
some terms we will be working with.
- A kobject is an object of type struct kobject. Kobjects have a name
and a reference count. A kobject also has a parent pointer (allowing
objects to be arranged into hierarchies), a specific type, and,
usually, a representation in the sysfs virtual filesystem.
Kobjects are generally not interesting on their own; instead, they are
usually embedded within some other structure which contains the stuff
the code is really interested in.
No structure should EVER have more than one kobject embedded within it.
If it does, the reference counting for the object is sure to be messed
up and incorrect, and your code will be buggy. So do not do this.
- A ktype is the type of object that embeds a kobject. Every structure
that embeds a kobject needs a corresponding ktype. The ktype controls
what happens when a kobject is no longer referenced and the kobject's
default representation in sysfs.
- A kset is a group of kobjects. These kobjects can be of the same ktype
or belong to different ktypes. The kset is the basic container type for
collections of kobjects. Ksets contain their own kobjects, but you can
safely ignore that implementation detail as the kset core code handles
this kobject automatically.
When you see a sysfs directory full of other directories, generally each
of those directories corresponds to a kobject in the same kset.
We'll look at how to create and manipulate all of these types. A bottom-up
approach will be taken, so we'll go back to kobjects.
Embedding kobjects
It is rare (even unknown) for kernel code to create a standalone kobject;
with one major exception explained below. Instead, kobjects are used to
control access to a larger, domain-specific object. To this end, kobjects
will be found embedded in other structures. If you are used to thinking of
things in object-oriented terms, kobjects can be seen as a top-level,
abstract class from which other classes are derived. A kobject implements
a set of capabilities which are not particularly useful by themselves, but
which are nice to have in other objects. The C language does not allow for
the direct expression of inheritance, so other techniques - such as
structure embedding - must be used.
So, for example, UIO code has a structure that defines the memory region
associated with a uio device:
struct uio_mem {
struct kobject kobj;
unsigned long addr;
unsigned long size;
int memtype;
void __iomem *internal_addr;
};
If you have a struct uio_mem structure, finding its embedded kobject is just a
matter of using the kobj pointer. Code that works with kobjects will often
have the opposite problem, however: given a struct kobject pointer, what is
the pointer to the containing structure? You must avoid tricks (such as
assuming that the kobject is at the beginning of the structure) and,
instead, use the container_of() macro, found in <linux/kernel.h>:
container_of(pointer, type, member)
where pointer is the pointer to the embedded kobject, type is the type of
the containing structure, and member is the name of the structure field to
which pointer points. The return value from container_of() is a pointer to
the given type. So, for example, a pointer to a struct kobject embedded
within a struct cdev called "kp" could be converted to a pointer to the
containing structure with:
struct uio_mem *u_mem = container_of(kp, struct uio_mem, kobj);
Programmers will often define a simple macro for "back-casting" kobject
pointers to the containing type.
Initialization of kobjects
Code which creates a kobject must, of course, initialize that object. Some
of the internal fields are setup with a (mandatory) call to kobject_init():
void kobject_init(struct kobject *kobj);
Among other things, kobject_init() sets the kobject's reference count to
one. Calling kobject_init() is not sufficient, however. Kobject users
must, at a minimum, set the name of the kobject; this is the name that will
be used in sysfs entries. To set the name of a kobject properly, do not
attempt to manipulate the internal name field, but instead use:
int kobject_set_name(struct kobject *kobj, const char *format, ...);
This function takes a printk-style variable argument list. Believe it or
not, it is actually possible for this operation to fail; conscientious code
should check the return value and react accordingly.
The other kobject fields which should be set, directly or indirectly, by
the creator are its ktype, kset, and parent. We will get to those shortly,
however please note that the ktype and kset must be set before the
kobject_init() function is called.
Reference counts
One of the key functions of a kobject is to serve as a reference counter
for the object in which it is embedded. As long as references to the object
exist, the object (and the code which supports it) must continue to exist.
The low-level functions for manipulating a kobject's reference counts are:
struct kobject *kobject_get(struct kobject *kobj);
void kobject_put(struct kobject *kobj);
A successful call to kobject_get() will increment the kobject's reference
counter and return the pointer to the kobject. If, however, the kobject is
already in the process of being destroyed, the operation will fail and
kobject_get() will return NULL. This return value must always be tested, or
no end of unpleasant race conditions could result.
When a reference is released, the call to kobject_put() will decrement the
reference count and, possibly, free the object. Note that kobject_init()
sets the reference count to one, so the code which sets up the kobject will
need to do a kobject_put() eventually to release that reference.
Because kobjects are dynamic, they must not be declared statically or on
the stack, but instead, always from the heap. Future versions of the
kernel will contain a run-time check for kobjects that are created
statically and will warn the developer of this improper usage.
Hooking into sysfs
An initialized kobject will perform reference counting without trouble, but
it will not appear in sysfs. To create sysfs entries, kernel code must pass
the object to kobject_add():
int kobject_add(struct kobject *kobj);
As always, this operation can fail. The function:
void kobject_del(struct kobject *kobj);
will remove the kobject from sysfs.
There is a kobject_register() function, which is really just the
combination of the calls to kobject_init() and kobject_add(). Similarly,
kobject_unregister() will call kobject_del(), then call kobject_put() to
release the initial reference created with kobject_register() (or really
kobject_init()).
Creating "simple" kobjects
Sometimes all that a developer wants is a way to create a simple directory
in the sysfs heirachy, and not have to mess with the whole complication of
ksets, show and store functions, and other details. To create such an
entry, use the function:
struct kobject *kobject_create_and_register(char *name, struct kobject *parent);
This function will create a kobject and place it in sysfs in the location
underneath the specified parent kobject. To create simple attributes
associated with this kobject, use:
int sysfs_create_file(struct kobject *kobj, struct attribute *attr);
or
int sysfs_create_group(struct kobject *kobj, struct attribute_group *grp);
Both types of attributes used here, with a kobject that has been created
with the kobject_create_and_register() can be of type kobj_attribute, no
special custom attribute is needed to be created.
See the example module, samples/kobject/kobject-example.c for an
implementation of a simple kobject and attributes.
ktypes and release methods
One important thing still missing from the discussion is what happens to a
kobject when its reference count reaches zero. The code which created the
kobject generally does not know when that will happen; if it did, there
would be little point in using a kobject in the first place. Even
predicatable object lifecycles become more complicated when sysfs is
brought in; user-space programs can keep a reference to a kobject (by
keeping one of its associated sysfs files open) for an arbitrary period of
time.
The end result is that a structure protected by a kobject cannot be freed
before its reference count goes to zero. The reference count is not under
the direct control of the code which created the kobject. So that code must
be notified asynchronously whenever the last reference to one of its
kobjects goes away.
This notification is done through a kobject's release() method. Usually
such a method has a form like:
void my_object_release(struct kobject *kobj)
{
struct my_object *mine = container_of(kobj, struct my_object, kobj);
/* Perform any additional cleanup on this object, then... */
kfree (mine);
}
One important point cannot be overstated: every kobject must have a
release() method, and the kobject must persist (in a consistent state)
until that method is called. If these constraints are not met, the code is
flawed. Note that the kernel will warn you if you forget to provide a
release() method. Do not try to get rid of this warning by providing an
"empty" release function, you will be mocked merciously by the kobject
maintainer if you attempt this.
Interestingly, the release() method is not stored in the kobject itself;
instead, it is associated with the ktype. So let us introduce struct
kobj_type:
struct kobj_type {
void (*release)(struct kobject *);
struct sysfs_ops *sysfs_ops;
struct attribute **default_attrs;
};
This structure is used to describe a particular type of kobject (or, more
correctly, of containing object). Every kobject needs to have an associated
kobj_type structure; a pointer to that structure can be placed in the
kobject's ktype field at initialization time, or (more likely) it can be
defined by the kobject's containing kset.
The release field in struct kobj_type is, of course, a pointer to the
release() method for this type of kobject. The other two fields (sysfs_ops
and default_attrs) control how objects of this type are represented in
sysfs; they are beyond the scope of this document.
ksets
A kset is merely a collection of kobjects that want to be associated with
each other. There is no restriction that they be of the same ktype, but be
very careful if they are not.
A kset serves these functions:
- It serves as a bag containing a group of objects. A kset can be used by
the kernel to track "all block devices" or "all PCI device drivers."
- A kset is also a subdirectory in sysfs, where the associated kobjects
with the kset can show up. Every kset contains a kobject which can be
set up to be the parent of other kobjects; in this way the device model
hierarchy is constructed.
- Ksets can support the "hotplugging" of kobjects and influence how
uevent events are reported to user space.
- A kset can provide a set of default attributes that all kobjects that
belong to it automatically inherit and have created whenever a kobject
is registered belonging to the kset.
In object-oriented terms, "kset" is the top-level container class; ksets
contain their own kobject, but that kobject is managed by the kset code and
should not be manipulated by any other user.
A kset keeps its children in a standard kernel linked list. Kobjects point
back to their containing kset via their kset field. In almost all cases,
the contained kobjects also have a pointer to the kset (or, strictly, its
embedded kobject) in their parent field.
As a kset contains a kobject within it, it should always be dynamically
created and never declared statically or on the stack. To create a new
kset use:
struct kset *kset_create_and_register(char *name,
struct kset_uevent_ops *u,
struct kobject *parent);
When you are finished with the kset, call:
void kset_unregister(struct kset *kset);
to destroy it.
An example of using a kset can be seen in the
samples/kobject/kset-example.c file in the kernel tree.
If a kset wishes to control the uevent operations of the kobjects
associated with it, it can use the struct kset_uevent_ops to handle it:
struct kset_uevent_ops {
int (*filter)(struct kset *kset, struct kobject *kobj);
const char *(*name)(struct kset *kset, struct kobject *kobj);
int (*uevent)(struct kset *kset, struct kobject *kobj,
struct kobj_uevent_env *env);
};
The filter function allows a kset to prevent a uevent from being emitted to
userspace for a specific kobject. If the function returns 0, the uevent
will not be emitted.
The name function will be called to override the default name of the kset
that the uevent sends to userspace. By default, the name will be the same
as the kset itself, but this function, if present, can override that name.
The uevent function will be called when the uevent is about to be sent to
userspace to allow more environment variables to be added to the uevent.
One might ask how, exactly, a kobject is added to a kset, given that no
functions which perform that function have been presented. The answer is
that this task is handled by kobject_add(). When a kobject is passed to
kobject_add(), its kset member should point to the kset to which the
kobject will belong. kobject_add() will handle the rest. There is currently
no other way to add a kobject to a kset without directly messing with the
list pointers.
Kobject initialization again
Now that we have covered all of that stuff, we can talk in detail about how
a kobject should be prepared for its existence in the kernel. Here are all
of the struct kobject fields which must be initialized somehow:
- k_name - the name of the object. This fields should always be
initialized with kobject_set_name(), or specified in the original call
to kobject_create_and_register().
- refcount is the kobject's reference count; it is initialized by kobject_init()
- parent is the kobject's parent in whatever hierarchy it belongs to. It
can be set explicitly by the creator. If parent is NULL when
kobject_add() is called, it will be set to the kobject of the containing
kset.
- kset is a pointer to the kset which will contain this kobject; it should
be set prior to calling kobject_init().
- ktype is the type of the kobject; it should be set prior to calling
kobject_init().
Often, much of the initialization of a kobject is handled by the layer that
manages the containing kset. See the sample/kobject/kset-example.c for how
this is usually handled.
/*
* Sample kobject implementation
*
* Copyright (C) 2004-2007 Greg Kroah-Hartman <[email protected]>
* Copyright (C) 2007 Novell Inc.
*
* Released under the GPL version 2 only.
*
*/
#include <linux/kobject.h>
#include <linux/string.h>
#include <linux/sysfs.h>
#include <linux/module.h>
#include <linux/init.h>
/*
* This module shows how to create a simple subdirectory in sysfs called
* /sys/kernel/kobject-example In that directory, 3 files are created:
* "foo", "baz", and "bar". If an integer is written to these files, it can be
* later read out of it.
*/
static int foo;
static int baz;
static int bar;
/*
* The "foo" file where a static variable is read from and written to.
*/
static ssize_t foo_show(struct kobject *kobj, struct kobj_attribute *attr,
char *buf)
{
return sprintf(buf, "%d\n", foo);
}
static ssize_t foo_store(struct kobject *kobj, struct kobj_attribute *attr,
const char *buf, size_t count)
{
sscanf(buf, "%du", &foo);
return count;
}
static struct kobj_attribute foo_attribute =
__ATTR(foo, 0666, foo_show, foo_store);
/*
* More complex function where we determine which varible is being accessed by
* looking at the attribute for the "baz" and "bar" files.
*/
static ssize_t b_show(struct kobject *kobj, struct kobj_attribute *attr,
char *buf)
{
int var;
if (strcmp(attr->attr.name, "baz") == 0)
var = baz;
else
var = bar;
return sprintf(buf, "%d\n", var);
}
static ssize_t b_store(struct kobject *kobj, struct kobj_attribute *attr,
const char *buf, size_t count)
{
int var;
sscanf(buf, "%du", &var);
if (strcmp(attr->attr.name, "baz") == 0)
baz = var;
else
bar = var;
return count;
}
static struct kobj_attribute baz_attribute =
__ATTR(baz, 0666, b_show, b_store);
static struct kobj_attribute bar_attribute =
__ATTR(bar, 0666, b_show, b_store);
/*
* Create a group of attributes so that we can create and destory them all
* at once.
*/
static struct attribute *attrs[] = {
&foo_attribute.attr,
&baz_attribute.attr,
&bar_attribute.attr,
NULL, /* need to NULL terminate the list of attributes */
};
/*
* An unnamed attribute group will put all of the attributes directly in
* the kobject directory. If we specify a name, a subdirectory will be
* created for the attributes with the directory being the name of the
* attribute group.
*/
static struct attribute_group attr_group = {
.attrs = attrs,
};
static struct kobject *example_kobj;
static int example_init(void)
{
int retval;
/*
* Create a simple kobject with the name of "kobject_example",
* located under /sys/kernel/
*/
example_kobj = kobject_create_and_register("kobject_example",
kernel_kobj);
if (!example_kobj)
return -ENOMEM;
/*
* Note, these files will be created _after_ the kobject above is
* created. This can cause userspace to be looking around in sysfs
* for these files before they are really created. If you are
* worried about something like this, perhaps you really need to
* create your own kset and have a default attribute group for your
* kobject.
*/
retval = sysfs_create_group(example_kobj, &attr_group);
if (retval)
kobject_unregister(example_kobj);
return retval;
}
static void example_exit(void)
{
kobject_unregister(example_kobj);
}
module_init(example_init);
module_exit(example_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Greg Kroah-Hartman <[email protected]>");
On Tue, Nov 27, 2007 at 03:02:52PM -0800, Greg KH wrote:
> Right now I have about 80+ patches reworking the kset/ktype mess in the
> -mm tree, cleaning things up and hopefully making it all easier for
> people to both use, and understand.
>
> So, while it is all relativly fresh in my mind, I thought it would be
> good to also document the whole mess, and provide some solid example
> code for others to use in the future.
>
> Jonathan, I used your old lwn.net article about kobjects as the basis
> for this document, I hope you don't mind (if you do, I'll be glad to
> start over). I've updated it to what is going on in the -mm tree, and
> added new information.
>
> This file should replace the existing Documentation/kobject.txt which is
> woefully out of date and obsolete now.
>
> I also have two example kernel modules showing how to use a simple
> kobject and attributes, as well as a more complex kset/ktype/kobject
> interaction. I'll reply to this message with them as well, and I am
> going to place them in the samples/kobject/ directory unless someone
> really objects.
>
> Any review comments that people might have on both the document, and the
> two sample modules would be greatly appreciated.
>
> thanks,
>
> greg k-h
>
> -----------------------------
>
> Everything you never wanted to know about kobjects, ksets, and ktypes
>
> Greg Kroah-Hartman <[email protected]>
>
> Based on an original article by Jon Corbet for lwn.net written October 1,
> 2003 and located at http://lwn.net/Articles/51437/
>
> Last updated November 27, 2008
>
>
> Part of the difficulty in understanding the driver model - and the kobject
> abstraction upon which it is built - is that there is no obvious starting
> place. Dealing with kobjects requires understanding a few different types,
> all of which make reference to each other. In an attempt to make things
> easier, we'll take a multi-pass approach, starting with vague terms and
> adding detail as we go. To that end, here are some quick definitions of
> some terms we will be working with.
>
> - A kobject is an object of type struct kobject. Kobjects have a name
> and a reference count. A kobject also has a parent pointer (allowing
> objects to be arranged into hierarchies), a specific type, and,
> usually, a representation in the sysfs virtual filesystem.
>
> Kobjects are generally not interesting on their own; instead, they are
> usually embedded within some other structure which contains the stuff
> the code is really interested in.
>
> No structure should EVER have more than one kobject embedded within it.
> If it does, the reference counting for the object is sure to be messed
> up and incorrect, and your code will be buggy. So do not do this.
>
> - A ktype is the type of object that embeds a kobject. Every structure
> that embeds a kobject needs a corresponding ktype. The ktype controls
> what happens when a kobject is no longer referenced and the kobject's
> default representation in sysfs.
>
> - A kset is a group of kobjects. These kobjects can be of the same ktype
> or belong to different ktypes. The kset is the basic container type for
> collections of kobjects. Ksets contain their own kobjects, but you can
> safely ignore that implementation detail as the kset core code handles
> this kobject automatically.
>
> When you see a sysfs directory full of other directories, generally each
> of those directories corresponds to a kobject in the same kset.
>
> We'll look at how to create and manipulate all of these types. A bottom-up
> approach will be taken, so we'll go back to kobjects.
>
>
> Embedding kobjects
>
> It is rare (even unknown) for kernel code to create a standalone kobject;
> with one major exception explained below. Instead, kobjects are used to
> control access to a larger, domain-specific object. To this end, kobjects
> will be found embedded in other structures. If you are used to thinking of
> things in object-oriented terms, kobjects can be seen as a top-level,
> abstract class from which other classes are derived. A kobject implements
> a set of capabilities which are not particularly useful by themselves, but
> which are nice to have in other objects. The C language does not allow for
> the direct expression of inheritance, so other techniques - such as
> structure embedding - must be used.
>
> So, for example, UIO code has a structure that defines the memory region
> associated with a uio device:
>
> struct uio_mem {
> struct kobject kobj;
> unsigned long addr;
> unsigned long size;
> int memtype;
> void __iomem *internal_addr;
> };
>
> If you have a struct uio_mem structure, finding its embedded kobject is just a
> matter of using the kobj pointer. Code that works with kobjects will often
> have the opposite problem, however: given a struct kobject pointer, what is
> the pointer to the containing structure? You must avoid tricks (such as
> assuming that the kobject is at the beginning of the structure) and,
> instead, use the container_of() macro, found in <linux/kernel.h>:
>
> container_of(pointer, type, member)
>
> where pointer is the pointer to the embedded kobject, type is the type of
> the containing structure, and member is the name of the structure field to
> which pointer points. The return value from container_of() is a pointer to
> the given type. So, for example, a pointer to a struct kobject embedded
> within a struct cdev called "kp" could be converted to a pointer to the
> containing structure with:
>
> struct uio_mem *u_mem = container_of(kp, struct uio_mem, kobj);
>
> Programmers will often define a simple macro for "back-casting" kobject
> pointers to the containing type.
>
>
> Initialization of kobjects
>
> Code which creates a kobject must, of course, initialize that object. Some
> of the internal fields are setup with a (mandatory) call to kobject_init():
>
> void kobject_init(struct kobject *kobj);
>
> Among other things, kobject_init() sets the kobject's reference count to
> one. Calling kobject_init() is not sufficient, however. Kobject users
> must, at a minimum, set the name of the kobject; this is the name that will
> be used in sysfs entries. To set the name of a kobject properly, do not
> attempt to manipulate the internal name field, but instead use:
>
> int kobject_set_name(struct kobject *kobj, const char *format, ...);
>
> This function takes a printk-style variable argument list. Believe it or
> not, it is actually possible for this operation to fail; conscientious code
> should check the return value and react accordingly.
>
> The other kobject fields which should be set, directly or indirectly, by
> the creator are its ktype, kset, and parent. We will get to those shortly,
> however please note that the ktype and kset must be set before the
> kobject_init() function is called.
>
>
>
> Reference counts
>
> One of the key functions of a kobject is to serve as a reference counter
> for the object in which it is embedded. As long as references to the object
> exist, the object (and the code which supports it) must continue to exist.
> The low-level functions for manipulating a kobject's reference counts are:
>
> struct kobject *kobject_get(struct kobject *kobj);
> void kobject_put(struct kobject *kobj);
>
> A successful call to kobject_get() will increment the kobject's reference
> counter and return the pointer to the kobject. If, however, the kobject is
> already in the process of being destroyed, the operation will fail and
> kobject_get() will return NULL. This return value must always be tested, or
> no end of unpleasant race conditions could result.
>
> When a reference is released, the call to kobject_put() will decrement the
> reference count and, possibly, free the object. Note that kobject_init()
> sets the reference count to one, so the code which sets up the kobject will
> need to do a kobject_put() eventually to release that reference.
>
> Because kobjects are dynamic, they must not be declared statically or on
> the stack, but instead, always from the heap. Future versions of the
> kernel will contain a run-time check for kobjects that are created
> statically and will warn the developer of this improper usage.
>
>
> Hooking into sysfs
>
> An initialized kobject will perform reference counting without trouble, but
> it will not appear in sysfs. To create sysfs entries, kernel code must pass
> the object to kobject_add():
>
> int kobject_add(struct kobject *kobj);
>
> As always, this operation can fail. The function:
>
> void kobject_del(struct kobject *kobj);
>
> will remove the kobject from sysfs.
>
> There is a kobject_register() function, which is really just the
> combination of the calls to kobject_init() and kobject_add(). Similarly,
> kobject_unregister() will call kobject_del(), then call kobject_put() to
> release the initial reference created with kobject_register() (or really
> kobject_init()).
>
>
> Creating "simple" kobjects
>
> Sometimes all that a developer wants is a way to create a simple directory
> in the sysfs heirachy, and not have to mess with the whole complication of
> ksets, show and store functions, and other details. To create such an
> entry, use the function:
>
> struct kobject *kobject_create_and_register(char *name, struct kobject *parent);
>
> This function will create a kobject and place it in sysfs in the location
> underneath the specified parent kobject. To create simple attributes
> associated with this kobject, use:
>
> int sysfs_create_file(struct kobject *kobj, struct attribute *attr);
> or
> int sysfs_create_group(struct kobject *kobj, struct attribute_group *grp);
>
> Both types of attributes used here, with a kobject that has been created
> with the kobject_create_and_register() can be of type kobj_attribute, no
> special custom attribute is needed to be created.
>
> See the example module, samples/kobject/kobject-example.c for an
> implementation of a simple kobject and attributes.
>
>
>
> ktypes and release methods
>
> One important thing still missing from the discussion is what happens to a
> kobject when its reference count reaches zero. The code which created the
> kobject generally does not know when that will happen; if it did, there
> would be little point in using a kobject in the first place. Even
> predicatable object lifecycles become more complicated when sysfs is
> brought in; user-space programs can keep a reference to a kobject (by
> keeping one of its associated sysfs files open) for an arbitrary period of
> time.
>
> The end result is that a structure protected by a kobject cannot be freed
> before its reference count goes to zero. The reference count is not under
> the direct control of the code which created the kobject. So that code must
> be notified asynchronously whenever the last reference to one of its
> kobjects goes away.
>
> This notification is done through a kobject's release() method. Usually
> such a method has a form like:
>
> void my_object_release(struct kobject *kobj)
> {
> struct my_object *mine = container_of(kobj, struct my_object, kobj);
>
> /* Perform any additional cleanup on this object, then... */
> kfree (mine);
> }
>
> One important point cannot be overstated: every kobject must have a
> release() method, and the kobject must persist (in a consistent state)
> until that method is called. If these constraints are not met, the code is
> flawed. Note that the kernel will warn you if you forget to provide a
> release() method. Do not try to get rid of this warning by providing an
> "empty" release function, you will be mocked merciously by the kobject
> maintainer if you attempt this.
>
> Interestingly, the release() method is not stored in the kobject itself;
> instead, it is associated with the ktype. So let us introduce struct
> kobj_type:
>
> struct kobj_type {
> void (*release)(struct kobject *);
> struct sysfs_ops *sysfs_ops;
> struct attribute **default_attrs;
> };
>
> This structure is used to describe a particular type of kobject (or, more
> correctly, of containing object). Every kobject needs to have an associated
> kobj_type structure; a pointer to that structure can be placed in the
> kobject's ktype field at initialization time, or (more likely) it can be
> defined by the kobject's containing kset.
>
> The release field in struct kobj_type is, of course, a pointer to the
> release() method for this type of kobject. The other two fields (sysfs_ops
> and default_attrs) control how objects of this type are represented in
> sysfs; they are beyond the scope of this document.
>
>
> ksets
>
> A kset is merely a collection of kobjects that want to be associated with
> each other. There is no restriction that they be of the same ktype, but be
> very careful if they are not.
>
> A kset serves these functions:
>
> - It serves as a bag containing a group of objects. A kset can be used by
> the kernel to track "all block devices" or "all PCI device drivers."
>
> - A kset is also a subdirectory in sysfs, where the associated kobjects
> with the kset can show up. Every kset contains a kobject which can be
> set up to be the parent of other kobjects; in this way the device model
> hierarchy is constructed.
>
> - Ksets can support the "hotplugging" of kobjects and influence how
> uevent events are reported to user space.
>
> - A kset can provide a set of default attributes that all kobjects that
> belong to it automatically inherit and have created whenever a kobject
> is registered belonging to the kset.
>
> In object-oriented terms, "kset" is the top-level container class; ksets
> contain their own kobject, but that kobject is managed by the kset code and
> should not be manipulated by any other user.
>
> A kset keeps its children in a standard kernel linked list. Kobjects point
> back to their containing kset via their kset field. In almost all cases,
> the contained kobjects also have a pointer to the kset (or, strictly, its
> embedded kobject) in their parent field.
>
> As a kset contains a kobject within it, it should always be dynamically
> created and never declared statically or on the stack. To create a new
> kset use:
> struct kset *kset_create_and_register(char *name,
> struct kset_uevent_ops *u,
> struct kobject *parent);
>
> When you are finished with the kset, call:
> void kset_unregister(struct kset *kset);
> to destroy it.
>
> An example of using a kset can be seen in the
> samples/kobject/kset-example.c file in the kernel tree.
>
> If a kset wishes to control the uevent operations of the kobjects
> associated with it, it can use the struct kset_uevent_ops to handle it:
>
> struct kset_uevent_ops {
> int (*filter)(struct kset *kset, struct kobject *kobj);
> const char *(*name)(struct kset *kset, struct kobject *kobj);
> int (*uevent)(struct kset *kset, struct kobject *kobj,
> struct kobj_uevent_env *env);
> };
>
>
> The filter function allows a kset to prevent a uevent from being emitted to
> userspace for a specific kobject. If the function returns 0, the uevent
> will not be emitted.
>
> The name function will be called to override the default name of the kset
> that the uevent sends to userspace. By default, the name will be the same
> as the kset itself, but this function, if present, can override that name.
>
> The uevent function will be called when the uevent is about to be sent to
> userspace to allow more environment variables to be added to the uevent.
>
> One might ask how, exactly, a kobject is added to a kset, given that no
> functions which perform that function have been presented. The answer is
> that this task is handled by kobject_add(). When a kobject is passed to
> kobject_add(), its kset member should point to the kset to which the
> kobject will belong. kobject_add() will handle the rest. There is currently
> no other way to add a kobject to a kset without directly messing with the
> list pointers.
>
>
> Kobject initialization again
>
> Now that we have covered all of that stuff, we can talk in detail about how
> a kobject should be prepared for its existence in the kernel. Here are all
> of the struct kobject fields which must be initialized somehow:
>
> - k_name - the name of the object. This fields should always be
> initialized with kobject_set_name(), or specified in the original call
> to kobject_create_and_register().
>
> - refcount is the kobject's reference count; it is initialized by kobject_init()
>
> - parent is the kobject's parent in whatever hierarchy it belongs to. It
> can be set explicitly by the creator. If parent is NULL when
> kobject_add() is called, it will be set to the kobject of the containing
> kset.
>
> - kset is a pointer to the kset which will contain this kobject; it should
> be set prior to calling kobject_init().
>
> - ktype is the type of the kobject; it should be set prior to calling
> kobject_init().
>
> Often, much of the initialization of a kobject is handled by the layer that
> manages the containing kset. See the sample/kobject/kset-example.c for how
> this is usually handled.
>
/*
* Sample kset and ktype implementation
*
* Copyright (C) 2004-2007 Greg Kroah-Hartman <[email protected]>
* Copyright (C) 2007 Novell Inc.
*
* Released under the GPL version 2 only.
*
*/
#include <linux/kobject.h>
#include <linux/string.h>
#include <linux/sysfs.h>
#include <linux/module.h>
#include <linux/init.h>
/*
* This module shows how to create a kset in sysfs called
* /sys/kernel/kset-example
* Then tree kobjects are created and assigned to this kset, "foo", "baz",
* and "bar". In those kobjects, attributes of the same name are also
* created and if an integer is written to these files, it can be later
* read out of it.
*/
/*
* This is our "object" that we will create a few of and register them with
* sysfs.
*/
struct foo_obj {
struct kobject kobj;
int foo;
int baz;
int bar;
};
#define to_foo_obj(x) container_of(x, struct foo_obj, kobj)
/* a custom attribute that works just for a struct foo_obj. */
struct foo_attribute {
struct attribute attr;
ssize_t (*show)(struct foo_obj *foo, struct foo_attribute *attr, char *buf);
ssize_t (*store)(struct foo_obj *foo, struct foo_attribute *attr, const char *buf, size_t count);
};
#define to_foo_attr(x) container_of(x, struct foo_attribute, attr)
/*
* The default show function that must be passed to sysfs. This will be
* called by sysfs for whenever a show function is called by the user on a
* sysfs file associated with the kobjects we have registered. We need to
* transpose back from a "default" kobject to our custom struct foo_obj and
* then call the show function for that specific object.
*/
static ssize_t foo_attr_show(struct kobject *kobj,
struct attribute *attr,
char *buf)
{
struct foo_attribute *attribute;
struct foo_obj *foo;
attribute = to_foo_attr(attr);
foo = to_foo_obj(kobj);
if (!attribute->show)
return -EIO;
return attribute->show(foo, attribute, buf);
}
/*
* Just like the default show function above, but this one is for when the
* sysfs "store" is requested (when a value is written to a file.)
*/
static ssize_t foo_attr_store(struct kobject *kobj,
struct attribute *attr,
const char *buf, size_t len)
{
struct foo_attribute *attribute;
struct foo_obj *foo;
attribute = to_foo_attr(attr);
foo = to_foo_obj(kobj);
if (!attribute->store)
return -EIO;
return attribute->store(foo, attribute, buf, len);
}
/* Our custom sysfs_ops that we will associate with our ktype later on */
static struct sysfs_ops foo_sysfs_ops = {
.show = foo_attr_show,
.store = foo_attr_store,
};
/*
* The release function for our object. This is REQUIRED by the kernel to
* have. We free the memory held in our object here.
*
* NEVER try to get away with just a "blank" release function to try to be
* smarter than the kernel. Turns out, no one ever is...
*/
static void foo_release(struct kobject *kobj)
{
struct foo_obj *foo;
foo = to_foo_obj(kobj);
kfree(foo);
}
/*
* The "foo" file where the .foo variable is read from and written to.
*/
static ssize_t foo_show(struct foo_obj *foo_obj, struct foo_attribute *attr,
char *buf)
{
return sprintf(buf, "%d\n", foo_obj->foo);
}
static ssize_t foo_store(struct foo_obj *foo_obj, struct foo_attribute *attr,
const char *buf, size_t count)
{
sscanf(buf, "%du", &foo_obj->foo);
return count;
}
static struct foo_attribute foo_attribute =
__ATTR(foo, 0666, foo_show, foo_store);
/*
* More complex function where we determine which varible is being accessed by
* looking at the attribute for the "baz" and "bar" files.
*/
static ssize_t b_show(struct foo_obj *foo_obj, struct foo_attribute *attr,
char *buf)
{
int var;
if (strcmp(attr->attr.name, "baz") == 0)
var = foo_obj->baz;
else
var = foo_obj->bar;
return sprintf(buf, "%d\n", var);
}
static ssize_t b_store(struct foo_obj *foo_obj, struct foo_attribute *attr,
const char *buf, size_t count)
{
int var;
sscanf(buf, "%du", &var);
if (strcmp(attr->attr.name, "baz") == 0)
foo_obj->baz = var;
else
foo_obj->bar = var;
return count;
}
static struct foo_attribute baz_attribute =
__ATTR(baz, 0666, b_show, b_store);
static struct foo_attribute bar_attribute =
__ATTR(bar, 0666, b_show, b_store);
/*
* Create a group of attributes so that we can create and destory them all
* at once.
*/
static struct attribute *foo_default_attrs[] = {
&foo_attribute.attr,
&baz_attribute.attr,
&bar_attribute.attr,
NULL, /* need to NULL terminate the list of attributes */
};
/*
* Our own ktype for our kobjects. Here we specify our sysfs ops, the
* release function, and the set of default attributes we want created
* whenever a kobject of this type is registered with the kernel.
*/
static struct kobj_type foo_ktype = {
.sysfs_ops = &foo_sysfs_ops,
.release = foo_release,
.default_attrs = foo_default_attrs,
};
static struct kset *example_kset;
static struct foo_obj *foo_obj;
static struct foo_obj *bar_obj;
static struct foo_obj *baz_obj;
static struct foo_obj *create_foo_obj(const char *name)
{
struct foo_obj *foo;
int retval;
/* allocate the memory for the whole object */
foo = kzalloc(sizeof(*foo), GFP_KERNEL);
if (!foo)
return NULL;
/* initialize the kobject portion of the object properly */
kobject_set_name(&foo->kobj, "%s", name);
foo->kobj.kset = example_kset;
foo->kobj.ktype = &foo_ktype;
/*
* Register the kobject with the kernel, all the default files will
* be created here and the uevent will be sent out. If we were to
* call kobject_init() and then kobject_add() we would be
* responsible for sending out the initial KOBJ_ADD uevent.
*/
retval = kobject_register(&foo->kobj);
if (retval) {
kfree(foo);
return NULL;
}
return foo;
}
static void destroy_foo_obj(struct foo_obj *foo)
{
kobject_unregister(&foo->kobj);
}
static int example_init(void)
{
/*
* Create a kset with the name of "kset_example",
* located under /sys/kernel/
*/
example_kset = kset_create_and_register("kset_example", NULL, kernel_kobj);
if (!example_kset)
return -ENOMEM;
/*
* Create three objects and register them with our kset
*/
foo_obj = create_foo_obj("foo");
if (!foo_obj)
goto foo_error;
bar_obj = create_foo_obj("bar");
if (!bar_obj)
goto bar_error;
baz_obj = create_foo_obj("baz");
if (!baz_obj)
goto baz_error;
return 0;
baz_error:
destroy_foo_obj(bar_obj);
bar_error:
destroy_foo_obj(foo_obj);
foo_error:
return -EINVAL;
}
static void example_exit(void)
{
destroy_foo_obj(baz_obj);
destroy_foo_obj(bar_obj);
destroy_foo_obj(foo_obj);
kset_unregister(example_kset);
}
module_init(example_init);
module_exit(example_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Greg Kroah-Hartman <[email protected]>");
On Tue, Nov 27, 2007 at 03:02:52PM -0800, Greg KH wrote:
> Last updated November 27, 2008
>
The future is now! ;-)
cheers,
Kyle
Greg KH wrote:
> Based on an original article by Jon Corbet for lwn.net written October 1,
> 2003 and located at http://lwn.net/Articles/51437/
>
> Last updated November 27, 2008
^^^
Wow, that's impressive: both kobjects de-mystified and time travel made possible!
;-)
On Tue, Nov 27, 2007 at 06:10:42PM -0500, Kyle McMartin wrote:
> On Tue, Nov 27, 2007 at 03:02:52PM -0800, Greg KH wrote:
> > Last updated November 27, 2008
> >
>
> The future is now! ;-)
/me goes off to recharge the flux generator...
Heh, thanks, I'll go fix that.
greg k-h
Greg KH <[email protected]> wrote:
> Jonathan, I used your old lwn.net article about kobjects as the basis
> for this document, I hope you don't mind
Certainly I have no objections, I'm glad it was useful.
A few little things...
> It is rare (even unknown) for kernel code to create a standalone kobject;
> with one major exception explained below.
You don't keep this promise - bet you thought we wouldn't notice...
Actually I guess you do, in the "creating simple kobjects" section.
When you get to that point, you should mention that this is a situation
where standalone kobjects make sense.
Given that there are quite a few standalone kobjects created by this
patch set (kernel_kobj, security_kobj, s390_kobj, etc.), the "(even
unknown)" should probably come out.
> So, for example, UIO code has a structure that defines the memory region
> associated with a uio device:
*The* UIO code, presumably.
> the given type. So, for example, a pointer to a struct kobject embedded
> within a struct cdev called "kp" could be converted to a pointer to the
> containing structure with:
That should be "struct uio_mem", I think.
> one. Calling kobject_init() is not sufficient, however. Kobject users
> must, at a minimum, set the name of the kobject; this is the name that will
> be used in sysfs entries.
Is setting the name mandatory now, or are there still places where
kobjects (which do not appear in sysfs) do have - and do not need - a
name?
> Because kobjects are dynamic, they must not be declared statically or on
> the stack, but instead, always from the heap. Future versions of the
"always be allocated from the heap"?
> "empty" release function, you will be mocked merciously by the kobject
> maintainer if you attempt this.
So just how should severely should we mock kobject maintainers who can't
spell "mercilessly"? :)
> - A kset can provide a set of default attributes that all kobjects that
> belong to it automatically inherit and have created whenever a kobject
> is registered belonging to the kset.
Can we try that one again?
- A kset can provide a set of default attributes for all kobjects which
belong to it.
> There is currently
> no other way to add a kobject to a kset without directly messing with the
> list pointers.
Presumably the latter way is not recommended; I would either say so or
not mention this possibility at all.
jon
On Tue, 27 Nov 2007 15:02:52 -0800,
Greg KH <[email protected]> wrote:
> So, while it is all relativly fresh in my mind, I thought it would be
> good to also document the whole mess, and provide some solid example
> code for others to use in the future.
Cool.
> Embedding kobjects
>
> It is rare (even unknown) for kernel code to create a standalone kobject;
Unknown? It's the less common case, I'd say.
> with one major exception explained below. Instead, kobjects are used to
> control access to a larger, domain-specific object. To this end, kobjects
> will be found embedded in other structures. If you are used to thinking of
> things in object-oriented terms, kobjects can be seen as a top-level,
> abstract class from which other classes are derived. A kobject implements
> a set of capabilities which are not particularly useful by themselves, but
> which are nice to have in other objects.
Hm, hierarchy and representation in sysfs are useful things by
themselves.
> So, for example, UIO code has a structure that defines the memory region
> associated with a uio device:
>
> struct uio_mem {
> struct kobject kobj;
> unsigned long addr;
> unsigned long size;
> int memtype;
> void __iomem *internal_addr;
> };
>
> If you have a struct uio_mem structure, finding its embedded kobject is just a
> matter of using the kobj pointer.
Pointer may be a confusing term, how about "structure member"?
> Code that works with kobjects will often
> have the opposite problem, however: given a struct kobject pointer, what is
> the pointer to the containing structure? You must avoid tricks (such as
> assuming that the kobject is at the beginning of the structure) and,
> instead, use the container_of() macro, found in <linux/kernel.h>:
>
> container_of(pointer, type, member)
>
> where pointer is the pointer to the embedded kobject, type is the type of
> the containing structure, and member is the name of the structure field to
> which pointer points. The return value from container_of() is a pointer to
> the given type. So, for example, a pointer to a struct kobject embedded
> within a struct cdev called "kp" could be converted to a pointer to the
"struct uio_mem", I guess.
> containing structure with:
>
> struct uio_mem *u_mem = container_of(kp, struct uio_mem, kobj);
>
> Programmers will often define a simple macro for "back-casting" kobject
> pointers to the containing type.
>
>
> Initialization of kobjects
>
> Code which creates a kobject must, of course, initialize that object. Some
> of the internal fields are setup with a (mandatory) call to kobject_init():
>
> void kobject_init(struct kobject *kobj);
>
> Among other things, kobject_init() sets the kobject's reference count to
> one. Calling kobject_init() is not sufficient, however. Kobject users
> must, at a minimum, set the name of the kobject; this is the name that will
> be used in sysfs entries.
Unless they don't register their kobject. (But they should always set a
name anyway to avoid funny debug messages, so it is probably a good
idea to call this a "must").
> To set the name of a kobject properly, do not
> attempt to manipulate the internal name field, but instead use:
>
> int kobject_set_name(struct kobject *kobj, const char *format, ...);
>
> This function takes a printk-style variable argument list. Believe it or
> not, it is actually possible for this operation to fail; conscientious code
> should check the return value and react accordingly.
>
> The other kobject fields which should be set, directly or indirectly, by
> the creator are its ktype, kset, and parent. We will get to those shortly,
> however please note that the ktype and kset must be set before the
> kobject_init() function is called.
>
>
>
> Reference counts
>
> One of the key functions of a kobject is to serve as a reference counter
> for the object in which it is embedded.
Hm, I thought that was the purpose of struct kref?
> As long as references to the object
> exist, the object (and the code which supports it) must continue to exist.
> The low-level functions for manipulating a kobject's reference counts are:
>
> struct kobject *kobject_get(struct kobject *kobj);
> void kobject_put(struct kobject *kobj);
>
> A successful call to kobject_get() will increment the kobject's reference
> counter and return the pointer to the kobject. If, however, the kobject is
> already in the process of being destroyed, the operation will fail and
> kobject_get() will return NULL.
Eh, no. We'll always return !NULL if the kobject is !NULL to start
with. If the reference count is already 0, the code will moan, but the
caller will still get a pointer.
> This return value must always be tested, or
> no end of unpleasant race conditions could result.
>
> When a reference is released, the call to kobject_put() will decrement the
> reference count and, possibly, free the object. Note that kobject_init()
> sets the reference count to one, so the code which sets up the kobject will
> need to do a kobject_put() eventually to release that reference.
>
> Because kobjects are dynamic, they must not be declared statically or on
> the stack, but instead, always from the heap. Future versions of the
> kernel will contain a run-time check for kobjects that are created
> statically and will warn the developer of this improper usage.
>
>
> Hooking into sysfs
>
> An initialized kobject will perform reference counting without trouble, but
> it will not appear in sysfs. To create sysfs entries, kernel code must pass
> the object to kobject_add():
>
> int kobject_add(struct kobject *kobj);
>
> As always, this operation can fail. The function:
>
> void kobject_del(struct kobject *kobj);
>
> will remove the kobject from sysfs.
>
> There is a kobject_register() function, which is really just the
> combination of the calls to kobject_init() and kobject_add().
Plus the uevent.
> Similarly,
> kobject_unregister() will call kobject_del(), then call kobject_put() to
> release the initial reference created with kobject_register() (or really
> kobject_init()).
And also throw a uevent :)
>
>
> Creating "simple" kobjects
>
> Sometimes all that a developer wants is a way to create a simple directory
> in the sysfs heirachy, and not have to mess with the whole complication of
hierarchy
> ksets, show and store functions, and other details. To create such an
> entry, use the function:
>
> struct kobject *kobject_create_and_register(char *name, struct kobject *parent);
>
> This function will create a kobject and place it in sysfs in the location
> underneath the specified parent kobject. To create simple attributes
> associated with this kobject, use:
>
> int sysfs_create_file(struct kobject *kobj, struct attribute *attr);
> or
> int sysfs_create_group(struct kobject *kobj, struct attribute_group *grp);
>
> Both types of attributes used here, with a kobject that has been created
> with the kobject_create_and_register() can be of type kobj_attribute, no
> special custom attribute is needed to be created.
>
> See the example module, samples/kobject/kobject-example.c for an
> implementation of a simple kobject and attributes.
>
>
>
> ktypes and release methods
>
> One important thing still missing from the discussion is what happens to a
> kobject when its reference count reaches zero. The code which created the
> kobject generally does not know when that will happen; if it did, there
> would be little point in using a kobject in the first place. Even
> predicatable object lifecycles become more complicated when sysfs is
> brought in; user-space programs can keep a reference to a kobject (by
> keeping one of its associated sysfs files open) for an arbitrary period of
> time.
This is no longer true?
The major problem is that registered kobjects can be looked-up by other
kernel code that can get a reference.
>
> The end result is that a structure protected by a kobject cannot be freed
> before its reference count goes to zero. The reference count is not under
> the direct control of the code which created the kobject. So that code must
> be notified asynchronously whenever the last reference to one of its
> kobjects goes away.
We should perhaps add a bit fat warning here:
Note that once you registered your kobject via kobject_add(), you must
never use kfree() to free it directly. The only safe way is to use
kobject_put(). It is good practice to always use kobject_put() after
kobject_init() to avoid errors creeping in.
>
> This notification is done through a kobject's release() method. Usually
> such a method has a form like:
>
> void my_object_release(struct kobject *kobj)
> {
> struct my_object *mine = container_of(kobj, struct my_object, kobj);
>
> /* Perform any additional cleanup on this object, then... */
> kfree (mine);
> }
>
> One important point cannot be overstated: every kobject must have a
> release() method, and the kobject must persist (in a consistent state)
> until that method is called.
Which is especially hurting if you use kobjects in modules. (Which
reminds me: Must dig up the patchset that fixes the module unload vs.
release problem.)
> If these constraints are not met, the code is
> flawed. Note that the kernel will warn you if you forget to provide a
> release() method. Do not try to get rid of this warning by providing an
> "empty" release function, you will be mocked merciously by the kobject
I think you meant "mercilessly" :)
> maintainer if you attempt this.
>
> Interestingly, the release() method is not stored in the kobject itself;
> instead, it is associated with the ktype. So let us introduce struct
> kobj_type:
>
> struct kobj_type {
> void (*release)(struct kobject *);
> struct sysfs_ops *sysfs_ops;
> struct attribute **default_attrs;
> };
>
> This structure is used to describe a particular type of kobject (or, more
> correctly, of containing object). Every kobject needs to have an associated
> kobj_type structure; a pointer to that structure can be placed in the
> kobject's ktype field at initialization time, or (more likely) it can be
> defined by the kobject's containing kset.
>
> The release field in struct kobj_type is, of course, a pointer to the
> release() method for this type of kobject. The other two fields (sysfs_ops
> and default_attrs) control how objects of this type are represented in
> sysfs; they are beyond the scope of this document.
Attention span ran out, will comment on the rest of this document later.
On Tue, 27 Nov 2007 15:02:52 -0800,
Greg KH <[email protected]> wrote:
>
> ksets
>
> A kset is merely a collection of kobjects that want to be associated with
> each other. There is no restriction that they be of the same ktype, but be
> very careful if they are not.
>
> A kset serves these functions:
>
> - It serves as a bag containing a group of objects. A kset can be used by
> the kernel to track "all block devices" or "all PCI device drivers."
>
> - A kset is also a subdirectory in sysfs, where the associated kobjects
> with the kset can show up.
Perhaps better wording:
A kset is also represented via a subdirectory in sysfs, under which the
kobjects associated with the kset can show up.
> Every kset contains a kobject which can be
> set up to be the parent of other kobjects; in this way the device model
> hierarchy is constructed.
>
> - Ksets can support the "hotplugging" of kobjects and influence how
> uevent events are reported to user space.
>
> - A kset can provide a set of default attributes that all kobjects that
> belong to it automatically inherit and have created whenever a kobject
> is registered belonging to the kset.
Hm, the default attributes are provided by the ktype?
>
> In object-oriented terms, "kset" is the top-level container class; ksets
> contain their own kobject, but that kobject is managed by the kset code and
> should not be manipulated by any other user.
>
> A kset keeps its children in a standard kernel linked list. Kobjects point
> back to their containing kset via their kset field. In almost all cases,
> the contained kobjects also have a pointer to the kset (or, strictly, its
> embedded kobject) in their parent field.
>
> As a kset contains a kobject within it, it should always be dynamically
> created and never declared statically or on the stack. To create a new
> kset use:
> struct kset *kset_create_and_register(char *name,
> struct kset_uevent_ops *u,
> struct kobject *parent);
>
> When you are finished with the kset, call:
> void kset_unregister(struct kset *kset);
> to destroy it.
Maybe also mention kset_get()/kset_put() here? kset_unregister() will
only destroy the kset if the caller holds the last reference to it.
>
> An example of using a kset can be seen in the
> samples/kobject/kset-example.c file in the kernel tree.
>
> If a kset wishes to control the uevent operations of the kobjects
> associated with it, it can use the struct kset_uevent_ops to handle it:
>
> struct kset_uevent_ops {
> int (*filter)(struct kset *kset, struct kobject *kobj);
> const char *(*name)(struct kset *kset, struct kobject *kobj);
> int (*uevent)(struct kset *kset, struct kobject *kobj,
> struct kobj_uevent_env *env);
> };
>
>
> The filter function allows a kset to prevent a uevent from being emitted to
> userspace for a specific kobject. If the function returns 0, the uevent
> will not be emitted.
>
> The name function will be called to override the default name of the kset
> that the uevent sends to userspace. By default, the name will be the same
> as the kset itself, but this function, if present, can override that name.
>
> The uevent function will be called when the uevent is about to be sent to
> userspace to allow more environment variables to be added to the uevent.
It may be helpful to mention which uevents are by default created by
the kobject core (KOBJ_ADD, KOBJ_DEL, KOBJ_MOVE).
>
> One might ask how, exactly, a kobject is added to a kset, given that no
> functions which perform that function have been presented. The answer is
> that this task is handled by kobject_add(). When a kobject is passed to
> kobject_add(), its kset member should point to the kset to which the
> kobject will belong. kobject_add() will handle the rest. There is currently
> no other way to add a kobject to a kset without directly messing with the
> list pointers.
>
>
> Kobject initialization again
>
> Now that we have covered all of that stuff, we can talk in detail about how
> a kobject should be prepared for its existence in the kernel. Here are all
> of the struct kobject fields which must be initialized somehow:
>
> - k_name - the name of the object. This fields should always be
> initialized with kobject_set_name(), or specified in the original call
> to kobject_create_and_register().
>
> - refcount is the kobject's reference count; it is initialized by kobject_init()
There is no field called "refcount"; the embedded struct kref kref is
initialized by kobject_init().
>
> - parent is the kobject's parent in whatever hierarchy it belongs to. It
> can be set explicitly by the creator. If parent is NULL when
> kobject_add() is called, it will be set to the kobject of the containing
> kset.
>
> - kset is a pointer to the kset which will contain this kobject; it should
> be set prior to calling kobject_init().
>
> - ktype is the type of the kobject; it should be set prior to calling
> kobject_init().
>
> Often, much of the initialization of a kobject is handled by the layer that
> manages the containing kset. See the sample/kobject/kset-example.c for how
> this is usually handled.
Do we also want to mention kobject_rename() and kobject_move(), or are
those functions so esoteric that most people don't want to know about
them?
On Wed, 2007-11-28 at 12:45 +0100, Cornelia Huck wrote:
> On Tue, 27 Nov 2007 15:02:52 -0800, Greg KH <[email protected]> wrote:
> > A kset serves these functions:
> >
> > - It serves as a bag containing a group of objects. A kset can be used by
> > the kernel to track "all block devices" or "all PCI device drivers."
> >
> > - A kset is also a subdirectory in sysfs, where the associated kobjects
> > with the kset can show up.
>
> Perhaps better wording:
>
> A kset is also represented via a subdirectory in sysfs, under which the
> kobjects associated with the kset can show up.
This draws a misleading picture. A member of a kset shows up where the
"parent" pointer points to. Like /sys/block is a kset, the kset contains
disks and partitions, but partitions do not live at the kset, and tons
of other kset directories where this is the case.
"If the kobject belonging to a kset has no parent kobject set, it will
be added to the kset's directory. Not all members of a kset do
necessarily live in the kset directory. If an explicit parent kobject is
assigned before the kobject is added, the kobject is registered with the
kset, but added below the parent kobject."
> > Every kset contains a kobject which can be
> > set up to be the parent of other kobjects; in this way the device model
> > hierarchy is constructed.
> >
> > - Ksets can support the "hotplugging" of kobjects and influence how
> > uevent events are reported to user space.
> >
> > - A kset can provide a set of default attributes that all kobjects that
> > belong to it automatically inherit and have created whenever a kobject
> > is registered belonging to the kset.
>
> Hm, the default attributes are provided by the ktype?
Yeah, it's the type, not the set.
> > In object-oriented terms, "kset" is the top-level container class; ksets
> > contain their own kobject, but that kobject is managed by the kset code and
> > should not be manipulated by any other user.
> >
> > A kset keeps its children in a standard kernel linked list. Kobjects point
> > back to their containing kset via their kset field. In almost all cases,
> > the contained kobjects also have a pointer to the kset (or, strictly, its
> > embedded kobject) in their parent field.
It's by far not "almost all cases". Can we please drop that? It's very
common that kobjects are a member of a kset but have a different parent.
> > The uevent function will be called when the uevent is about to be sent to
> > userspace to allow more environment variables to be added to the uevent.
>
> It may be helpful to mention which uevents are by default created by
> the kobject core (KOBJ_ADD, KOBJ_DEL, KOBJ_MOVE).
I think, we should remove all these default events from the kobject
core. We will not be able to manage the timing issues and "raw" kobject
users should request the events on their own, when they are finished
adding stuff to the kobject. I see currently no way to solve the
"attributes created after the event" problem. The new
*_create_and_register functions do not allow default attributes to be
created, which will just lead to serious trouble when someone wants to
use udev to set defaults and such things. We may just want to require an
explicit call to send the event?
Thanks,
Kay
On Wed, 2007-11-28 at 10:01 +0100, Cornelia Huck wrote:
> On Tue, 27 Nov 2007 15:02:52 -0800, Greg KH <[email protected]> wrote:
> > Among other things, kobject_init() sets the kobject's reference count to
> > one. Calling kobject_init() is not sufficient, however. Kobject users
> > must, at a minimum, set the name of the kobject; this is the name that will
> > be used in sysfs entries.
>
> Unless they don't register their kobject. (But they should always set a
> name anyway to avoid funny debug messages, so it is probably a good
> idea to call this a "must").
Yeah, we should require it. And kobject_cleanup() needs to be called to
free the allocated name, if the object is not already added and needs to
be deleted (common in rewinding on error).
Btw, do you have a good example for unregistered/unnamed kobjects?
Thanks,
Kay
On Wed, 28 Nov 2007 13:23:02 +0100,
Kay Sievers <[email protected]> wrote:
> On Wed, 2007-11-28 at 12:45 +0100, Cornelia Huck wrote:
> > On Tue, 27 Nov 2007 15:02:52 -0800, Greg KH <[email protected]> wrote:
>
> > > A kset serves these functions:
> > >
> > > - It serves as a bag containing a group of objects. A kset can be used by
> > > the kernel to track "all block devices" or "all PCI device drivers."
> > >
> > > - A kset is also a subdirectory in sysfs, where the associated kobjects
> > > with the kset can show up.
> >
> > Perhaps better wording:
> >
> > A kset is also represented via a subdirectory in sysfs, under which the
> > kobjects associated with the kset can show up.
>
> This draws a misleading picture. A member of a kset shows up where the
> "parent" pointer points to. Like /sys/block is a kset, the kset contains
> disks and partitions, but partitions do not live at the kset, and tons
> of other kset directories where this is the case.
>
> "If the kobject belonging to a kset has no parent kobject set, it will
> be added to the kset's directory. Not all members of a kset do
> necessarily live in the kset directory. If an explicit parent kobject is
> assigned before the kobject is added, the kobject is registered with the
> kset, but added below the parent kobject."
Yes, that's better.
> > > The uevent function will be called when the uevent is about to be sent to
> > > userspace to allow more environment variables to be added to the uevent.
> >
> > It may be helpful to mention which uevents are by default created by
> > the kobject core (KOBJ_ADD, KOBJ_DEL, KOBJ_MOVE).
>
> I think, we should remove all these default events from the kobject
> core. We will not be able to manage the timing issues and "raw" kobject
> users should request the events on their own, when they are finished
> adding stuff to the kobject. I see currently no way to solve the
> "attributes created after the event" problem. The new
> *_create_and_register functions do not allow default attributes to be
> created, which will just lead to serious trouble when someone wants to
> use udev to set defaults and such things. We may just want to require an
> explicit call to send the event?
There will always be attributes that will show up later (for example,
after a device is activated). Probably the best approach is to keep the
default uevents, but have the attribute-adder send another uevent when
they are done?
On Wed, 28 Nov 2007 13:35:54 +0100,
Kay Sievers <[email protected]> wrote:
> On Wed, 2007-11-28 at 10:01 +0100, Cornelia Huck wrote:
> > On Tue, 27 Nov 2007 15:02:52 -0800, Greg KH <[email protected]> wrote:
>
> > > Among other things, kobject_init() sets the kobject's reference count to
> > > one. Calling kobject_init() is not sufficient, however. Kobject users
> > > must, at a minimum, set the name of the kobject; this is the name that will
> > > be used in sysfs entries.
> >
> > Unless they don't register their kobject. (But they should always set a
> > name anyway to avoid funny debug messages, so it is probably a good
> > idea to call this a "must").
>
> Yeah, we should require it. And kobject_cleanup() needs to be called to
> free the allocated name, if the object is not already added and needs to
> be deleted (common in rewinding on error).
Always using kobject_put() should take care of that.
>
> Btw, do you have a good example for unregistered/unnamed kobjects?
Unfortunately not (I can only think of devices that might never be
registered).
On Wed, 2007-11-28 at 16:48 +0100, Cornelia Huck wrote:
> On Wed, 28 Nov 2007 13:23:02 +0100,
> Kay Sievers <[email protected]> wrote:
> > On Wed, 2007-11-28 at 12:45 +0100, Cornelia Huck wrote:
> > > On Tue, 27 Nov 2007 15:02:52 -0800, Greg KH <[email protected]> wrote:
> > > > The uevent function will be called when the uevent is about to be sent to
> > > > userspace to allow more environment variables to be added to the uevent.
> > >
> > > It may be helpful to mention which uevents are by default created by
> > > the kobject core (KOBJ_ADD, KOBJ_DEL, KOBJ_MOVE).
> >
> > I think, we should remove all these default events from the kobject
> > core. We will not be able to manage the timing issues and "raw" kobject
> > users should request the events on their own, when they are finished
> > adding stuff to the kobject. I see currently no way to solve the
> > "attributes created after the event" problem. The new
> > *_create_and_register functions do not allow default attributes to be
> > created, which will just lead to serious trouble when someone wants to
> > use udev to set defaults and such things. We may just want to require an
> > explicit call to send the event?
>
> There will always be attributes that will show up later (for example,
> after a device is activated). Probably the best approach is to keep the
> default uevents, but have the attribute-adder send another uevent when
> they are done?
Uh, that's more an exception where we can't give guarantees because of
very specific hardware setups, and it would be an additional "change"
event. There are valid cases for this, but only a _very_ few.
There is absolutely no reason not to do it right with the "add" event,
just because we are too lazy to solve it proper the current code. It's
just so broken by design, what we are doing today. :)
Kay
On Wed, 2007-11-28 at 16:52 +0100, Cornelia Huck wrote:
> On Wed, 28 Nov 2007 13:35:54 +0100,
> Kay Sievers <[email protected]> wrote:
>
> > On Wed, 2007-11-28 at 10:01 +0100, Cornelia Huck wrote:
> > > On Tue, 27 Nov 2007 15:02:52 -0800, Greg KH <[email protected]> wrote:
> >
> > > > Among other things, kobject_init() sets the kobject's reference count to
> > > > one. Calling kobject_init() is not sufficient, however. Kobject users
> > > > must, at a minimum, set the name of the kobject; this is the name that will
> > > > be used in sysfs entries.
> > >
> > > Unless they don't register their kobject. (But they should always set a
> > > name anyway to avoid funny debug messages, so it is probably a good
> > > idea to call this a "must").
> >
> > Yeah, we should require it. And kobject_cleanup() needs to be called to
> > free the allocated name, if the object is not already added and needs to
> > be deleted (common in rewinding on error).
>
> Always using kobject_put() should take care of that.
Ok, so we can always require a kobject_put() after a kobject_init(), to
cleanup what happened after _init() and before _add()?
> > Btw, do you have a good example for unregistered/unnamed kobjects?
>
> Unfortunately not (I can only think of devices that might never be
> registered).
This is good, we should just get rid of that use case, it seems it
doesn't make any sense to use kobjects which will never be registered.
Thanks,
Kay
On Wed, 28 Nov 2007 17:03:07 +0100,
Kay Sievers <[email protected]> wrote:
> On Wed, 2007-11-28 at 16:52 +0100, Cornelia Huck wrote:
> > On Wed, 28 Nov 2007 13:35:54 +0100,
> > Kay Sievers <[email protected]> wrote:
> >
> > > On Wed, 2007-11-28 at 10:01 +0100, Cornelia Huck wrote:
> > > > On Tue, 27 Nov 2007 15:02:52 -0800, Greg KH <[email protected]> wrote:
> > >
> > > > > Among other things, kobject_init() sets the kobject's reference count to
> > > > > one. Calling kobject_init() is not sufficient, however. Kobject users
> > > > > must, at a minimum, set the name of the kobject; this is the name that will
> > > > > be used in sysfs entries.
> > > >
> > > > Unless they don't register their kobject. (But they should always set a
> > > > name anyway to avoid funny debug messages, so it is probably a good
> > > > idea to call this a "must").
> > >
> > > Yeah, we should require it. And kobject_cleanup() needs to be called to
> > > free the allocated name, if the object is not already added and needs to
> > > be deleted (common in rewinding on error).
> >
> > Always using kobject_put() should take care of that.
>
> Ok, so we can always require a kobject_put() after a kobject_init(), to
> cleanup what happened after _init() and before _add()?
That's a sensible requirement, I think.
On Wed, 28 Nov 2007 16:57:48 +0100,
Kay Sievers <[email protected]> wrote:
> On Wed, 2007-11-28 at 16:48 +0100, Cornelia Huck wrote:
> > On Wed, 28 Nov 2007 13:23:02 +0100,
> > Kay Sievers <[email protected]> wrote:
> > > On Wed, 2007-11-28 at 12:45 +0100, Cornelia Huck wrote:
> > > > On Tue, 27 Nov 2007 15:02:52 -0800, Greg KH <[email protected]> wrote:
>
> > > > > The uevent function will be called when the uevent is about to be sent to
> > > > > userspace to allow more environment variables to be added to the uevent.
> > > >
> > > > It may be helpful to mention which uevents are by default created by
> > > > the kobject core (KOBJ_ADD, KOBJ_DEL, KOBJ_MOVE).
> > >
> > > I think, we should remove all these default events from the kobject
> > > core. We will not be able to manage the timing issues and "raw" kobject
> > > users should request the events on their own, when they are finished
> > > adding stuff to the kobject. I see currently no way to solve the
> > > "attributes created after the event" problem. The new
> > > *_create_and_register functions do not allow default attributes to be
> > > created, which will just lead to serious trouble when someone wants to
> > > use udev to set defaults and such things. We may just want to require an
> > > explicit call to send the event?
> >
> > There will always be attributes that will show up later (for example,
> > after a device is activated). Probably the best approach is to keep the
> > default uevents, but have the attribute-adder send another uevent when
> > they are done?
>
> Uh, that's more an exception where we can't give guarantees because of
> very specific hardware setups, and it would be an additional "change"
> event. There are valid cases for this, but only a _very_ few.
>
> There is absolutely no reason not to do it right with the "add" event,
> just because we are too lazy to solve it proper the current code. It's
> just so broken by design, what we are doing today. :)
I'm worrying a bit about changes that impact the whole code tree in
lots of places. I'd be fine with the device layer doing its uevent
manually in device_add() at the very end, though. (This would allow
drivers to add attributes in their probe function before the uevent,
for example.)
On Tue, 27 Nov 2007 15:04:06 -0800,
Greg KH <[email protected]> wrote:
> static struct foo_obj *create_foo_obj(const char *name)
> {
> struct foo_obj *foo;
> int retval;
>
> /* allocate the memory for the whole object */
> foo = kzalloc(sizeof(*foo), GFP_KERNEL);
> if (!foo)
> return NULL;
>
> /* initialize the kobject portion of the object properly */
> kobject_set_name(&foo->kobj, "%s", name);
Returncode not checked :)
> foo->kobj.kset = example_kset;
> foo->kobj.ktype = &foo_ktype;
>
> /*
> * Register the kobject with the kernel, all the default files will
> * be created here and the uevent will be sent out. If we were to
> * call kobject_init() and then kobject_add() we would be
> * responsible for sending out the initial KOBJ_ADD uevent.
> */
> retval = kobject_register(&foo->kobj);
> if (retval) {
> kfree(foo);
kobject_put(foo) is needed since it gets you through kobject_cleanup()
where the name can be freed.
> return NULL;
> }
>
> return foo;
> }
On Wed, 2007-11-28 at 17:12 +0100, Cornelia Huck wrote:
> On Wed, 28 Nov 2007 16:57:48 +0100,
> Kay Sievers <[email protected]> wrote:
>
> > On Wed, 2007-11-28 at 16:48 +0100, Cornelia Huck wrote:
> > > On Wed, 28 Nov 2007 13:23:02 +0100,
> > > Kay Sievers <[email protected]> wrote:
> > > > On Wed, 2007-11-28 at 12:45 +0100, Cornelia Huck wrote:
> > > > > On Tue, 27 Nov 2007 15:02:52 -0800, Greg KH <[email protected]> wrote:
> >
> > > > > > The uevent function will be called when the uevent is about to be sent to
> > > > > > userspace to allow more environment variables to be added to the uevent.
> > > > >
> > > > > It may be helpful to mention which uevents are by default created by
> > > > > the kobject core (KOBJ_ADD, KOBJ_DEL, KOBJ_MOVE).
> > > >
> > > > I think, we should remove all these default events from the kobject
> > > > core. We will not be able to manage the timing issues and "raw" kobject
> > > > users should request the events on their own, when they are finished
> > > > adding stuff to the kobject. I see currently no way to solve the
> > > > "attributes created after the event" problem. The new
> > > > *_create_and_register functions do not allow default attributes to be
> > > > created, which will just lead to serious trouble when someone wants to
> > > > use udev to set defaults and such things. We may just want to require an
> > > > explicit call to send the event?
> > >
> > > There will always be attributes that will show up later (for example,
> > > after a device is activated). Probably the best approach is to keep the
> > > default uevents, but have the attribute-adder send another uevent when
> > > they are done?
> >
> > Uh, that's more an exception where we can't give guarantees because of
> > very specific hardware setups, and it would be an additional "change"
> > event. There are valid cases for this, but only a _very_ few.
> >
> > There is absolutely no reason not to do it right with the "add" event,
> > just because we are too lazy to solve it proper the current code. It's
> > just so broken by design, what we are doing today. :)
>
> I'm worrying a bit about changes that impact the whole code tree in
> lots of places. I'd be fine with the device layer doing its uevent
> manually in device_add() at the very end, though. (This would allow
> drivers to add attributes in their probe function before the uevent,
> for example.)
The driver core does use the split already in most places, I did that
long ago. There are not too many (~20) users of kobject_register(), and
it's a pretty straight-forward change to change that to _init, _add,
_uevent, and get rid of that totally useless "convenience api".
I think there is no longer any excuse to keep that broken code around,
and even require to document that it's broken. The whole purpose of the
uevent is userspace consumption, which just doesn't work correctly with
the code we offer. The fix is trivial, and should be done now, and we no
longer need to fiddle around timing issues, just because we are too
lazy.
I propose the removal of _all_ funtions that have *register* in their
name, and always require the following sequence:
_init()
_add()
_uevent(_ADD)
_uevent(_REMOVE)
_del()
_put()
The _create_and_register() functions would become _create_ and_add()
and will need an additional _uevent() call after they populated the
object.
Thanks,
Kay
On Wed, 28 Nov 2007 17:36:29 +0100,
Kay Sievers <[email protected]> wrote:
>
> On Wed, 2007-11-28 at 17:12 +0100, Cornelia Huck wrote:
> > On Wed, 28 Nov 2007 16:57:48 +0100,
> > Kay Sievers <[email protected]> wrote:
> >
> > > On Wed, 2007-11-28 at 16:48 +0100, Cornelia Huck wrote:
> > > > On Wed, 28 Nov 2007 13:23:02 +0100,
> > > > Kay Sievers <[email protected]> wrote:
> > > > > On Wed, 2007-11-28 at 12:45 +0100, Cornelia Huck wrote:
> > > > > > On Tue, 27 Nov 2007 15:02:52 -0800, Greg KH <[email protected]> wrote:
> > >
> > > > > > > The uevent function will be called when the uevent is about to be sent to
> > > > > > > userspace to allow more environment variables to be added to the uevent.
> > > > > >
> > > > > > It may be helpful to mention which uevents are by default created by
> > > > > > the kobject core (KOBJ_ADD, KOBJ_DEL, KOBJ_MOVE).
> > > > >
> > > > > I think, we should remove all these default events from the kobject
> > > > > core. We will not be able to manage the timing issues and "raw" kobject
> > > > > users should request the events on their own, when they are finished
> > > > > adding stuff to the kobject. I see currently no way to solve the
> > > > > "attributes created after the event" problem. The new
> > > > > *_create_and_register functions do not allow default attributes to be
> > > > > created, which will just lead to serious trouble when someone wants to
> > > > > use udev to set defaults and such things. We may just want to require an
> > > > > explicit call to send the event?
> > > >
> > > > There will always be attributes that will show up later (for example,
> > > > after a device is activated). Probably the best approach is to keep the
> > > > default uevents, but have the attribute-adder send another uevent when
> > > > they are done?
> > >
> > > Uh, that's more an exception where we can't give guarantees because of
> > > very specific hardware setups, and it would be an additional "change"
> > > event. There are valid cases for this, but only a _very_ few.
> > >
> > > There is absolutely no reason not to do it right with the "add" event,
> > > just because we are too lazy to solve it proper the current code. It's
> > > just so broken by design, what we are doing today. :)
> >
> > I'm worrying a bit about changes that impact the whole code tree in
> > lots of places. I'd be fine with the device layer doing its uevent
> > manually in device_add() at the very end, though. (This would allow
> > drivers to add attributes in their probe function before the uevent,
> > for example.)
<Looks at device_add() again: It already throws the uevent manually...>
>
> The driver core does use the split already in most places, I did that
> long ago. There are not too many (~20) users of kobject_register(), and
> it's a pretty straight-forward change to change that to _init, _add,
> _uevent, and get rid of that totally useless "convenience api".
>
> I think there is no longer any excuse to keep that broken code around,
> and even require to document that it's broken. The whole purpose of the
> uevent is userspace consumption, which just doesn't work correctly with
> the code we offer. The fix is trivial, and should be done now, and we no
> longer need to fiddle around timing issues, just because we are too
> lazy.
>
> I propose the removal of _all_ funtions that have *register* in their
> name, and always require the following sequence:
> _init()
> _add()
> _uevent(_ADD)
>
> _uevent(_REMOVE)
> _del()
> _put()
>
> The _create_and_register() functions would become _create_ and_add()
> and will need an additional _uevent() call after they populated the
> object.
I'm absolutely fine with doing that at the kobject level (after all,
it's a quite contained change, and the uevent function explicitely
works on a kobject).
For the other _register()/_unregister() functions, it's a different
piece of cake. They are:
- distributed through lot of different code
- at a higher level than kobjects, and kobject_uevent() acts on the
kobject
- usually encapsulating a sequence that wants to be used by almost all
callers, and that includes a uevent
I don't think we want people registering a higher level object and then
wondering why udev doesn't seem to take notice of it.
On Wed, 2007-11-28 at 17:51 +0100, Cornelia Huck wrote:
> On Wed, 28 Nov 2007 17:36:29 +0100,
> Kay Sievers <[email protected]> wrote:
>
> >
> > On Wed, 2007-11-28 at 17:12 +0100, Cornelia Huck wrote:
> > > On Wed, 28 Nov 2007 16:57:48 +0100,
> > > Kay Sievers <[email protected]> wrote:
> > >
> > > > On Wed, 2007-11-28 at 16:48 +0100, Cornelia Huck wrote:
> > > > > On Wed, 28 Nov 2007 13:23:02 +0100,
> > > > > Kay Sievers <[email protected]> wrote:
> > > > > > On Wed, 2007-11-28 at 12:45 +0100, Cornelia Huck wrote:
> > > > > > > On Tue, 27 Nov 2007 15:02:52 -0800, Greg KH <[email protected]> wrote:
> > > >
> > > > > > > > The uevent function will be called when the uevent is about to be sent to
> > > > > > > > userspace to allow more environment variables to be added to the uevent.
> > > > > > >
> > > > > > > It may be helpful to mention which uevents are by default created by
> > > > > > > the kobject core (KOBJ_ADD, KOBJ_DEL, KOBJ_MOVE).
> > > > > >
> > > > > > I think, we should remove all these default events from the kobject
> > > > > > core. We will not be able to manage the timing issues and "raw" kobject
> > > > > > users should request the events on their own, when they are finished
> > > > > > adding stuff to the kobject. I see currently no way to solve the
> > > > > > "attributes created after the event" problem. The new
> > > > > > *_create_and_register functions do not allow default attributes to be
> > > > > > created, which will just lead to serious trouble when someone wants to
> > > > > > use udev to set defaults and such things. We may just want to require an
> > > > > > explicit call to send the event?
> > > > >
> > > > > There will always be attributes that will show up later (for example,
> > > > > after a device is activated). Probably the best approach is to keep the
> > > > > default uevents, but have the attribute-adder send another uevent when
> > > > > they are done?
> > > >
> > > > Uh, that's more an exception where we can't give guarantees because of
> > > > very specific hardware setups, and it would be an additional "change"
> > > > event. There are valid cases for this, but only a _very_ few.
> > > >
> > > > There is absolutely no reason not to do it right with the "add" event,
> > > > just because we are too lazy to solve it proper the current code. It's
> > > > just so broken by design, what we are doing today. :)
> > >
> > > I'm worrying a bit about changes that impact the whole code tree in
> > > lots of places. I'd be fine with the device layer doing its uevent
> > > manually in device_add() at the very end, though. (This would allow
> > > drivers to add attributes in their probe function before the uevent,
> > > for example.)
>
> <Looks at device_add() again: It already throws the uevent manually...>
I think I still remember what I did 2.5 years ago :)
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e57cd73e2e844a3da25cc6b420674c81bbe1b387
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=18c3d5271b472c096adfc856e107c79f6fd30d7d
> > The driver core does use the split already in most places, I did that
> > long ago. There are not too many (~20) users of kobject_register(), and
> > it's a pretty straight-forward change to change that to _init, _add,
> > _uevent, and get rid of that totally useless "convenience api".
> >
> > I think there is no longer any excuse to keep that broken code around,
> > and even require to document that it's broken. The whole purpose of the
> > uevent is userspace consumption, which just doesn't work correctly with
> > the code we offer. The fix is trivial, and should be done now, and we no
> > longer need to fiddle around timing issues, just because we are too
> > lazy.
> >
> > I propose the removal of _all_ funtions that have *register* in their
> > name, and always require the following sequence:
> > _init()
> > _add()
> > _uevent(_ADD)
> >
> > _uevent(_REMOVE)
> > _del()
> > _put()
> >
> > The _create_and_register() functions would become _create_ and_add()
> > and will need an additional _uevent() call after they populated the
> > object.
>
> I'm absolutely fine with doing that at the kobject level (after all,
> it's a quite contained change, and the uevent function explicitely
> works on a kobject).
>
> For the other _register()/_unregister() functions, it's a different
> piece of cake. They are:
> - distributed through lot of different code
> - at a higher level than kobjects, and kobject_uevent() acts on the
> kobject
> - usually encapsulating a sequence that wants to be used by almost all
> callers, and that includes a uevent
>
> I don't think we want people registering a higher level object and then
> wondering why udev doesn't seem to take notice of it.
Oh, I'm just talking about lib/kobject.c. And the new kobj/kset stuff we
added which is currently in the -mm tree.
It suffers from the same old problem, and even gets documentend as
"broken" now. I really think that should be fixed proper instead, and
it's the right time to do it now.
Thanks,
Kay
On Wed, Nov 28, 2007 at 05:09:50PM +0100, Cornelia Huck wrote:
> On Wed, 28 Nov 2007 17:03:07 +0100,
> Kay Sievers <[email protected]> wrote:
>
> > On Wed, 2007-11-28 at 16:52 +0100, Cornelia Huck wrote:
> > > On Wed, 28 Nov 2007 13:35:54 +0100,
> > > Kay Sievers <[email protected]> wrote:
> > >
> > > > On Wed, 2007-11-28 at 10:01 +0100, Cornelia Huck wrote:
> > > > > On Tue, 27 Nov 2007 15:02:52 -0800, Greg KH <[email protected]> wrote:
> > > >
> > > > > > Among other things, kobject_init() sets the kobject's reference count to
> > > > > > one. Calling kobject_init() is not sufficient, however. Kobject users
> > > > > > must, at a minimum, set the name of the kobject; this is the name that will
> > > > > > be used in sysfs entries.
> > > > >
> > > > > Unless they don't register their kobject. (But they should always set a
> > > > > name anyway to avoid funny debug messages, so it is probably a good
> > > > > idea to call this a "must").
> > > >
> > > > Yeah, we should require it. And kobject_cleanup() needs to be called to
> > > > free the allocated name, if the object is not already added and needs to
> > > > be deleted (common in rewinding on error).
> > >
> > > Always using kobject_put() should take care of that.
> >
> > Ok, so we can always require a kobject_put() after a kobject_init(), to
> > cleanup what happened after _init() and before _add()?
>
> That's a sensible requirement, I think.
Yes, there are a few usages of unregistered kobjects in the kernel
today, but I think that was people just not realizing they could use a
simple kref instead. I'm working on cleaning them up right now...
thanks,
greg k-h
On Tue, 27 Nov 2007, Greg KH wrote:
> Part of the difficulty in understanding the driver model - and the kobject
> abstraction upon which it is built - is that there is no obvious starting
> place. Dealing with kobjects requires understanding a few different types,
> all of which make reference to each other. In an attempt to make things
> easier, we'll take a multi-pass approach, starting with vague terms and
> adding detail as we go. To that end, here are some quick definitions of
> some terms we will be working with.
>
> - A kobject is an object of type struct kobject. Kobjects have a name
> and a reference count. A kobject also has a parent pointer (allowing
> objects to be arranged into hierarchies), a specific type, and,
> usually, a representation in the sysfs virtual filesystem.
As Cornelia said, it would be worthwhile mentioning krefs in this
document as well. They are simple enough to explain, after all.
> Initialization of kobjects
>
> Code which creates a kobject must, of course, initialize that object. Some
> of the internal fields are setup with a (mandatory) call to kobject_init():
kobject_init() isn't mandatory if you use kobject_register(). But then
Kay wants to do away with kobject_register()...
> The other kobject fields which should be set, directly or indirectly, by
> the creator are its ktype, kset, and parent. We will get to those shortly,
> however please note that the ktype and kset must be set before the
> kobject_init() function is called.
In fact kset, ktype, and parent are optional, right? You might mention
at this point that not all those fields are needed, and explain later
which combinations are legal.
> When a reference is released, the call to kobject_put() will decrement the
> reference count and, possibly, free the object. Note that kobject_init()
> sets the reference count to one, so the code which sets up the kobject will
> need to do a kobject_put() eventually to release that reference.
It's worth mentioning here (and perhaps elsewhere too) that all of the
function calls described here can sleep and hence must be made in
process context, with the exception of the *_get() routines. It's
possible to call *_put() in atomic context; the SCSI core does this
(with device_put, not kobject_put) and has to jump through hoops to run
the corresponding release routine in a waitqueue task. In general,
though, it isn't safe.
> Because kobjects are dynamic, they must not be declared statically or on
> the stack, but instead, always from the heap. Future versions of the
> kernel will contain a run-time check for kobjects that are created
> statically and will warn the developer of this improper usage.
Why not? What's wrong with static kobjects? I've never understood this.
> ktypes and release methods
>
> One important thing still missing from the discussion is what happens to a
> kobject when its reference count reaches zero. The code which created the
> kobject generally does not know when that will happen; if it did, there
> would be little point in using a kobject in the first place. Even
> predicatable object lifecycles become more complicated when sysfs is
predictable
> One important point cannot be overstated: every kobject must have a
> release() method, and the kobject must persist (in a consistent state)
> until that method is called. If these constraints are not met, the code is
> flawed. Note that the kernel will warn you if you forget to provide a
> release() method. Do not try to get rid of this warning by providing an
> "empty" release function, you will be mocked merciously by the kobject
> maintainer if you attempt this.
Not to mention that doing this will leak memory. Unless the kobject
is static...
> struct kobj_type {
> void (*release)(struct kobject *);
> struct sysfs_ops *sysfs_ops;
> struct attribute **default_attrs;
> };
>
> This structure is used to describe a particular type of kobject (or, more
> correctly, of containing object). Every kobject needs to have an associated
> kobj_type structure; a pointer to that structure can be placed in the
> kobject's ktype field at initialization time, or (more likely) it can be
> defined by the kobject's containing kset.
Actually the current code doesn't seem to check whether kobj->ktype is
NULL or to use the value of kobj->kset->kobj.ktype. Is this an oversight?
> - A kset is also a subdirectory in sysfs, where the associated kobjects
> with the kset can show up. Every kset contains a kobject which can be
That's where a kobject shows up if its parent field isn't set when
kobject_add() is called. But if the parent field _is_ set, does
anything (such as a symbolic link) show up in the kset's directory?
> A kset keeps its children in a standard kernel linked list. Kobjects point
> back to their containing kset via their kset field. In almost all cases,
> the contained kobjects also have a pointer to the kset (or, strictly, its
> embedded kobject) in their parent field.
"almost all" isn't right. "In some cases" would be more realistic.
> Kobject initialization again
>
> Now that we have covered all of that stuff, we can talk in detail about how
> a kobject should be prepared for its existence in the kernel. Here are all
> of the struct kobject fields which must be initialized somehow:
>
> - k_name - the name of the object. This fields should always be
> initialized with kobject_set_name(), or specified in the original call
> to kobject_create_and_register().
>
> - refcount is the kobject's reference count; it is initialized by kobject_init()
There's no need to mention this here. Not only is it wrong, more
importantly users of kobjects never have to initialize the refcount.
> - parent is the kobject's parent in whatever hierarchy it belongs to. It
> can be set explicitly by the creator. If parent is NULL when
> kobject_add() is called, it will be set to the kobject of the containing
> kset.
If there is no containing kset, the parent remains NULL. What happens
then? Does the kobject show up in the sysfs top-level directory?
> - kset is a pointer to the kset which will contain this kobject; it should
> be set prior to calling kobject_init().
>
> - ktype is the type of the kobject; it should be set prior to calling
> kobject_init().
There are no checks for this! At least, not until the cleanup occurs.
Here's a question for you: The code in kobject.c is full of odd things
like this (from the start of kobject_add):
if (!(kobj = kobject_get(kobj)))
return -ENOENT;
What's the point of the assignment? We know that kobject_get() always
returns its argument. This sort of thing happens all over the place.
Alan Stern
On Wed, 28 Nov 2007, Cornelia Huck wrote:
> We should perhaps add a bit fat warning here:
>
> Note that once you registered your kobject via kobject_add(), you must
> never use kfree() to free it directly. The only safe way is to use
Slightly ambiguous. Instead just say:
If you have initialized your kobject via kobject_init() or
kobject_register(), you must not deallocate the kobject anywhere other
than its release() method (which is invoked during the final
kobject_put() call). Otherwise the kernel will leak memory.
> > One important point cannot be overstated: every kobject must have a
> > release() method, and the kobject must persist (in a consistent state)
> > until that method is called.
>
> Which is especially hurting if you use kobjects in modules. (Which
> reminds me: Must dig up the patchset that fixes the module unload vs.
> release problem.)
In theory modules shouldn't present a problem -- especially if Greg
merges the "Kobjects: drop child->parent ref at unregistration" patch.
When a module is unloaded, it has to unregister all its kobjects, which
should force all their children to be unregistered too. At that time
the children's drivers should drop all their references to the parent
kobject, leaving only references held by the module being unloaded.
Presumably it can arrange to drop its own references before its exit()
routine returns.
The only problem arises when a child's driver retains a reference to
the parent kobject. If things are done properly, this reference should
involve incrementing the module count -- which would prevent the module
from being unloaded in the first place.
Alan Stern
On Wed, 2007-11-28 at 14:03 -0500, Alan Stern wrote:
> On Tue, 27 Nov 2007, Greg KH wrote:
>
> > Part of the difficulty in understanding the driver model - and the kobject
> > abstraction upon which it is built - is that there is no obvious starting
> > place. Dealing with kobjects requires understanding a few different types,
> > all of which make reference to each other. In an attempt to make things
> > easier, we'll take a multi-pass approach, starting with vague terms and
> > adding detail as we go. To that end, here are some quick definitions of
> > some terms we will be working with.
> >
> > - A kobject is an object of type struct kobject. Kobjects have a name
> > and a reference count. A kobject also has a parent pointer (allowing
> > objects to be arranged into hierarchies), a specific type, and,
> > usually, a representation in the sysfs virtual filesystem.
>
> As Cornelia said, it would be worthwhile mentioning krefs in this
> document as well. They are simple enough to explain, after all.
>
> > Initialization of kobjects
> >
> > Code which creates a kobject must, of course, initialize that object. Some
> > of the internal fields are setup with a (mandatory) call to kobject_init():
>
> kobject_init() isn't mandatory if you use kobject_register(). But then
> Kay wants to do away with kobject_register()...
Yeah, it's a kind of convenience that causes far more problems than is
solves. Kobjects are low-level and we depend entirely on proper events
today, so I vote for removal of all the *register* crap.
> > The other kobject fields which should be set, directly or indirectly, by
> > the creator are its ktype, kset, and parent. We will get to those shortly,
> > however please note that the ktype and kset must be set before the
> > kobject_init() function is called.
>
> In fact kset, ktype, and parent are optional, right? You might mention
> at this point that not all those fields are needed, and explain later
> which combinations are legal.
Ksets are optional, yes. Ktypes are required for releasing the object,
so we should say they are mandatory.
> > When a reference is released, the call to kobject_put() will decrement the
> > reference count and, possibly, free the object. Note that kobject_init()
> > sets the reference count to one, so the code which sets up the kobject will
> > need to do a kobject_put() eventually to release that reference.
>
> It's worth mentioning here (and perhaps elsewhere too) that all of the
> function calls described here can sleep and hence must be made in
> process context, with the exception of the *_get() routines. It's
> possible to call *_put() in atomic context; the SCSI core does this
> (with device_put, not kobject_put) and has to jump through hoops to run
> the corresponding release routine in a waitqueue task. In general,
> though, it isn't safe.
Yes, it's an important point.
> Actually the current code doesn't seem to check whether kobj->ktype is
> NULL or to use the value of kobj->kset->kobj.ktype. Is this an oversight?
We just require the ktype.
> > - A kset is also a subdirectory in sysfs, where the associated kobjects
> > with the kset can show up. Every kset contains a kobject which can be
>
> That's where a kobject shows up if its parent field isn't set when
> kobject_add() is called. But if the parent field _is_ set, does
> anything (such as a symbolic link) show up in the kset's directory?
>
> > A kset keeps its children in a standard kernel linked list. Kobjects point
> > back to their containing kset via their kset field. In almost all cases,
> > the contained kobjects also have a pointer to the kset (or, strictly, its
> > embedded kobject) in their parent field.
>
> "almost all" isn't right. "In some cases" would be more realistic.
It all depends on the parent pointer, and I think we have far more kobjects
no showing up at the kset like all devices belong to the devices_kset but
only one or two show up directly in the kset directory, all the other are
just childs.
> If there is no containing kset, the parent remains NULL. What happens
> then? Does the kobject show up in the sysfs top-level directory?
"If the kobject belonging to a kset has no parent kobject set, it will
be added to the kset's directory. Not all members of a kset do
necessarily live in the kset directory. If an explicit parent kobject is
assigned before the kobject is added, the kobject is registered with the
kset, but added below the parent kobject."
> > - kset is a pointer to the kset which will contain this kobject; it should
> > be set prior to calling kobject_init().
> >
> > - ktype is the type of the kobject; it should be set prior to calling
> > kobject_init().
>
> There are no checks for this! At least, not until the cleanup occurs.
It does not matter anymore to assign the values before kobject_init(),
but we should mention, that we require a ktype before we add the object.
> Here's a question for you: The code in kobject.c is full of odd things
> like this (from the start of kobject_add):
>
> if (!(kobj = kobject_get(kobj)))
> return -ENOENT;
>
> What's the point of the assignment? We know that kobject_get() always
> returns its argument. This sort of thing happens all over the place.
Right, that looks not too useful.
Kay
On Wed, 28 Nov 2007, Kay Sievers wrote:
> > Actually the current code doesn't seem to check whether kobj->ktype is
> > NULL or to use the value of kobj->kset->kobj.ktype. Is this an oversight?
>
> We just require the ktype.
No -- we should but we don't. Look at the code for kobject_init() and
kobject_add() in Greg's tree and you'll see. Neither of them checks
that kobj->ktype is set.
> > If there is no containing kset, the parent remains NULL. What happens
> > then? Does the kobject show up in the sysfs top-level directory?
>
> "If the kobject belonging to a kset has no parent kobject set, it will
> be added to the kset's directory. Not all members of a kset do
> necessarily live in the kset directory. If an explicit parent kobject is
> assigned before the kobject is added, the kobject is registered with the
> kset, but added below the parent kobject."
Yes, but what if neither kobj->parent nor kobj->kset is set?
Alan Stern
On Wed, 2007-11-28 at 14:36 -0500, Alan Stern wrote:
> On Wed, 28 Nov 2007, Kay Sievers wrote:
>
> > > Actually the current code doesn't seem to check whether kobj->ktype is
> > > NULL or to use the value of kobj->kset->kobj.ktype. Is this an oversight?
> >
> > We just require the ktype.
>
> No -- we should but we don't. Look at the code for kobject_init() and
> kobject_add() in Greg's tree and you'll see. Neither of them checks
> that kobj->ktype is set.
Yeah, it was another "magic" that was built into the core, the code even
tried to find sysfs_ops for completely untyped kobjects. That is gone
now, and we should just require a ktype, I think.
> > > If there is no containing kset, the parent remains NULL. What happens
> > > then? Does the kobject show up in the sysfs top-level directory?
> >
> > "If the kobject belonging to a kset has no parent kobject set, it will
> > be added to the kset's directory. Not all members of a kset do
> > necessarily live in the kset directory. If an explicit parent kobject is
> > assigned before the kobject is added, the kobject is registered with the
> > kset, but added below the parent kobject."
>
> Yes, but what if neither kobj->parent nor kobj->kset is set?
It will show up in the root of sysfs, yes.
Kay
This patch (as1020) adds a check to kobject_init() to insure that the
ktype field is not NULL. This is just for safety's sake; as far as I
know there are no remaining places where the field is left unset. But
ironically, kset_init() did fail to set it! The patch fixes that and
removes some redundant initialization in kset_createa().
The patch also fixes up elevator_init(), where ktype was set after
calling kobject_init() instead of before.
Signed-off-by: Alan Stern <[email protected]>
CC: Kay Sievers <[email protected]>
---
Here is the check we discussed. I'm a little behind on Kay's most
recent updates; I hope this doesn't clash with them.
There may be some other places where ktype gets set after
kobject_init() rather than before. My testing can't be termendously
thorough. But they should all be easy to fix (like elevator.c was).
Index: usb-2.6/lib/kobject.c
===================================================================
--- usb-2.6.orig/lib/kobject.c
+++ usb-2.6/lib/kobject.c
@@ -188,6 +188,7 @@ void kobject_init(struct kobject * kobj)
return;
WARN_ON(atomic_read(&kobj->kref.refcount));
verify_dynamic_kobject_allocation(kobj);
+ WARN_ON(!kobj->ktype);
kref_init(&kobj->kref);
INIT_LIST_HEAD(&kobj->entry);
kobj->kset = kset_get(kobj->kset);
@@ -624,18 +625,6 @@ struct kobject *kobject_create_and_regis
}
EXPORT_SYMBOL_GPL(kobject_create_and_register);
-/**
- * kset_init - initialize a kset for use
- * @k: kset
- */
-
-void kset_init(struct kset * k)
-{
- kobject_init(&k->kobj);
- INIT_LIST_HEAD(&k->list);
- spin_lock_init(&k->list_lock);
-}
-
/* default kobject attribute operations */
static ssize_t kobj_attr_show(struct kobject *kobj, struct attribute *attr,
char *buf)
@@ -751,6 +740,19 @@ static struct kobj_type kset_ktype = {
};
/**
+ * kset_init - initialize a kset for use
+ * @k: kset
+ */
+
+void kset_init(struct kset * k)
+{
+ k->kobj.ktype = &kset_ktype;
+ kobject_init(&k->kobj);
+ INIT_LIST_HEAD(&k->list);
+ spin_lock_init(&k->list_lock);
+}
+
+/**
* kset_create - create a struct kset dynamically
*
* @name: the name for the kset
@@ -779,12 +781,10 @@ static struct kset *kset_create(const ch
kset->kobj.parent = parent_kobj;
/*
- * The kobject of this kset will have a type of kset_ktype and belong to
- * no kset itself. That way we can properly free it when it is
+ * The kobject of this kset will have a type of kset_ktype and belong
+ * to no kset itself. That way we can properly free it when it is
* finished being used.
*/
- kset->kobj.ktype = &kset_ktype;
- kset->kobj.kset = NULL;
return kset;
}
Index: usb-2.6/block/elevator.c
===================================================================
--- usb-2.6.orig/block/elevator.c
+++ usb-2.6/block/elevator.c
@@ -185,9 +185,9 @@ static elevator_t *elevator_alloc(struct
eq->ops = &e->ops;
eq->elevator_type = e;
- kobject_init(&eq->kobj);
kobject_set_name(&eq->kobj, "%s", "iosched");
eq->kobj.ktype = &elv_ktype;
+ kobject_init(&eq->kobj);
mutex_init(&eq->sysfs_lock);
eq->hash = kmalloc_node(sizeof(struct hlist_head) * ELV_HASH_ENTRIES,
On Wed, 2007-11-28 at 15:42 -0500, Alan Stern wrote:
> This patch (as1020) adds a check to kobject_init() to insure that the
> ktype field is not NULL.
That's good.
> This is just for safety's sake; as far as I
> know there are no remaining places where the field is left unset. But
> ironically, kset_init() did fail to set it!
Nice to have that fixed.
> The patch fixes that and
> removes some redundant initialization in kset_createa().
>
> The patch also fixes up elevator_init(), where ktype was set after
> calling kobject_init() instead of before.
It should not harm, only the kset was needed before _init(), but even
that should be history with Greg's latest patch:
http://git.kernel.org/?p=linux/kernel/git/gregkh/patches.git;a=blob;f=driver/kobject-grab-the-kset-reference-in-kobject_add-not-kobject_init.patch;hb=HEAD
> Here is the check we discussed. I'm a little behind on Kay's most
> recent updates; I hope this doesn't clash with them.
>
> There may be some other places where ktype gets set after
> kobject_init() rather than before. My testing can't be termendously
> thorough. But they should all be easy to fix (like elevator.c was)
I think it's fine and elevator does not need to be changed, and we can
remove the requirement now from the documentation, as _init() now does
not depend on any pre-assigned values, right?
Kay
On Wed, Nov 28, 2007 at 03:42:00PM -0500, Alan Stern wrote:
> This patch (as1020) adds a check to kobject_init() to insure that the
> ktype field is not NULL. This is just for safety's sake; as far as I
> know there are no remaining places where the field is left unset. But
> ironically, kset_init() did fail to set it! The patch fixes that and
> removes some redundant initialization in kset_createa().
>
> The patch also fixes up elevator_init(), where ktype was set after
> calling kobject_init() instead of before.
No, it's safe to set ktype after kobject_init, it was just not safe to
set the kset. As Kay pointed out, I just added a patch to my tree to
resolve this issue, and I'll go back and update the documentation now.
I do like the "check for a ktype" warning, I'll go add that to my local
tree and see what breaks. Odds are, all the static kobjects will :(
thanks,
greg k-h
On Wed, 28 Nov 2007, Greg KH wrote:
> On Wed, Nov 28, 2007 at 03:42:00PM -0500, Alan Stern wrote:
> > This patch (as1020) adds a check to kobject_init() to insure that the
> > ktype field is not NULL. This is just for safety's sake; as far as I
> > know there are no remaining places where the field is left unset. But
> > ironically, kset_init() did fail to set it! The patch fixes that and
> > removes some redundant initialization in kset_createa().
> >
> > The patch also fixes up elevator_init(), where ktype was set after
> > calling kobject_init() instead of before.
>
> No, it's safe to set ktype after kobject_init, it was just not safe to
> set the kset. As Kay pointed out, I just added a patch to my tree to
> resolve this issue, and I'll go back and update the documentation now.
>
> I do like the "check for a ktype" warning, I'll go add that to my local
> tree and see what breaks. Odds are, all the static kobjects will :(
You have to be careful. The ktype check I wrote lives in
kobject_init() -- that's why I had to make elevator_init() assign the
ktype before calling kobject_init(). If you put the check into
kobject_add() instead then you won't end up checking objects that get
initialized but not added.
Yes, nobody would deliberately use a kobject without adding it, but it
could happen as the result of an failure between the _init and _add
calls.
In any case, the change to kset_init() in my patch fixes a real bug.
It never did set the ktype at all.
Alan Stern
On Wed, Nov 28, 2007 at 05:00:57PM -0500, Alan Stern wrote:
> On Wed, 28 Nov 2007, Greg KH wrote:
>
> > On Wed, Nov 28, 2007 at 03:42:00PM -0500, Alan Stern wrote:
> > > This patch (as1020) adds a check to kobject_init() to insure that the
> > > ktype field is not NULL. This is just for safety's sake; as far as I
> > > know there are no remaining places where the field is left unset. But
> > > ironically, kset_init() did fail to set it! The patch fixes that and
> > > removes some redundant initialization in kset_createa().
> > >
> > > The patch also fixes up elevator_init(), where ktype was set after
> > > calling kobject_init() instead of before.
> >
> > No, it's safe to set ktype after kobject_init, it was just not safe to
> > set the kset. As Kay pointed out, I just added a patch to my tree to
> > resolve this issue, and I'll go back and update the documentation now.
> >
> > I do like the "check for a ktype" warning, I'll go add that to my local
> > tree and see what breaks. Odds are, all the static kobjects will :(
>
> You have to be careful. The ktype check I wrote lives in
> kobject_init() -- that's why I had to make elevator_init() assign the
> ktype before calling kobject_init(). If you put the check into
> kobject_add() instead then you won't end up checking objects that get
> initialized but not added.
>
> Yes, nobody would deliberately use a kobject without adding it, but it
> could happen as the result of an failure between the _init and _add
> calls.
No, people do deliberately do this. Right now, after a few cleanup
patches in my tree, there are only two users who create kobjects but do
not register them, struct cdev and the kobj_map stuff.
The cdev code I'll work on cleaning up, there's just some historical
usages there that I've never felt all that comfortable with, and the
kobj_map code I'd like to convert to just use a list and a kref and not
a kobject at all so that I stop getting emails about it over time :)
thanks,
greg k-h
On Tue, Nov 27, 2007 at 08:50:14PM -0700, Jonathan Corbet wrote:
> Greg KH <[email protected]> wrote:
>
> > Jonathan, I used your old lwn.net article about kobjects as the basis
> > for this document, I hope you don't mind
>
> Certainly I have no objections, I'm glad it was useful.
Thanks, it was a great framework to work with.
> > It is rare (even unknown) for kernel code to create a standalone kobject;
> > with one major exception explained below.
>
> You don't keep this promise - bet you thought we wouldn't notice...
> Actually I guess you do, in the "creating simple kobjects" section.
> When you get to that point, you should mention that this is a situation
> where standalone kobjects make sense.
Sorry, yes, that is where I tried to explain it. I'll flush it out some
more.
> Given that there are quite a few standalone kobjects created by this
> patch set (kernel_kobj, security_kobj, s390_kobj, etc.), the "(even
> unknown)" should probably come out.
Ok.
> > So, for example, UIO code has a structure that defines the memory region
> > associated with a uio device:
>
> *The* UIO code, presumably.
fixed.
> > the given type. So, for example, a pointer to a struct kobject embedded
> > within a struct cdev called "kp" could be converted to a pointer to the
> > containing structure with:
>
> That should be "struct uio_mem", I think.
fixed.
> > one. Calling kobject_init() is not sufficient, however. Kobject users
> > must, at a minimum, set the name of the kobject; this is the name that will
> > be used in sysfs entries.
>
> Is setting the name mandatory now, or are there still places where
> kobjects (which do not appear in sysfs) do have - and do not need - a
> name?
Any kobject that is registered needs to have a name. If someone tries
to call kobject_register() or kobject_add() without a name set they will
find out that it is not allowed :)
And yes, there are a few places in the kernel with kobjects that are
never registered. I'm working on trying to get rid of them...
> > Because kobjects are dynamic, they must not be declared statically or on
> > the stack, but instead, always from the heap. Future versions of the
>
> "always be allocated from the heap"?
thanks.
> > "empty" release function, you will be mocked merciously by the kobject
> > maintainer if you attempt this.
>
> So just how should severely should we mock kobject maintainers who can't
> spell "mercilessly"? :)
Heh, turns out that a lot of people sent me this privately :)
> > - A kset can provide a set of default attributes that all kobjects that
> > belong to it automatically inherit and have created whenever a kobject
> > is registered belonging to the kset.
>
> Can we try that one again?
>
> - A kset can provide a set of default attributes for all kobjects which
> belong to it.
No, it's the ktype that does this, I'll go fix that up...
> > There is currently
> > no other way to add a kobject to a kset without directly messing with the
> > list pointers.
>
> Presumably the latter way is not recommended; I would either say so or
> not mention this possibility at all.
Ah, yes, now removed.
Thanks for the review, I really appreciate it.
greg k-h
On Wed, Nov 28, 2007 at 10:01:08AM +0100, Cornelia Huck wrote:
> On Tue, 27 Nov 2007 15:02:52 -0800,
> Greg KH <[email protected]> wrote:
> > So, for example, UIO code has a structure that defines the memory region
> > associated with a uio device:
> >
> > struct uio_mem {
> > struct kobject kobj;
> > unsigned long addr;
> > unsigned long size;
> > int memtype;
> > void __iomem *internal_addr;
> > };
> >
> > If you have a struct uio_mem structure, finding its embedded kobject is just a
> > matter of using the kobj pointer.
>
> Pointer may be a confusing term, how about "structure member"?
thanks, now fixed.
> > Code that works with kobjects will often
> > have the opposite problem, however: given a struct kobject pointer, what is
> > the pointer to the containing structure? You must avoid tricks (such as
> > assuming that the kobject is at the beginning of the structure) and,
> > instead, use the container_of() macro, found in <linux/kernel.h>:
> >
> > container_of(pointer, type, member)
> >
> > where pointer is the pointer to the embedded kobject, type is the type of
> > the containing structure, and member is the name of the structure field to
> > which pointer points. The return value from container_of() is a pointer to
> > the given type. So, for example, a pointer to a struct kobject embedded
> > within a struct cdev called "kp" could be converted to a pointer to the
>
> "struct uio_mem", I guess.
yes, now fixed.
> > containing structure with:
> >
> > struct uio_mem *u_mem = container_of(kp, struct uio_mem, kobj);
> >
> > Programmers will often define a simple macro for "back-casting" kobject
> > pointers to the containing type.
> >
> >
> > Initialization of kobjects
> >
> > Code which creates a kobject must, of course, initialize that object. Some
> > of the internal fields are setup with a (mandatory) call to kobject_init():
> >
> > void kobject_init(struct kobject *kobj);
> >
> > Among other things, kobject_init() sets the kobject's reference count to
> > one. Calling kobject_init() is not sufficient, however. Kobject users
> > must, at a minimum, set the name of the kobject; this is the name that will
> > be used in sysfs entries.
>
> Unless they don't register their kobject. (But they should always set a
> name anyway to avoid funny debug messages, so it is probably a good
> idea to call this a "must").
Yeah, I'll leave this in.
> > To set the name of a kobject properly, do not
> > attempt to manipulate the internal name field, but instead use:
> >
> > int kobject_set_name(struct kobject *kobj, const char *format, ...);
> >
> > This function takes a printk-style variable argument list. Believe it or
> > not, it is actually possible for this operation to fail; conscientious code
> > should check the return value and react accordingly.
> >
> > The other kobject fields which should be set, directly or indirectly, by
> > the creator are its ktype, kset, and parent. We will get to those shortly,
> > however please note that the ktype and kset must be set before the
> > kobject_init() function is called.
> >
> >
> >
> > Reference counts
> >
> > One of the key functions of a kobject is to serve as a reference counter
> > for the object in which it is embedded.
>
> Hm, I thought that was the purpose of struct kref?
Yes, I'll add a reference to kref now.
> > As long as references to the object
> > exist, the object (and the code which supports it) must continue to exist.
> > The low-level functions for manipulating a kobject's reference counts are:
> >
> > struct kobject *kobject_get(struct kobject *kobj);
> > void kobject_put(struct kobject *kobj);
> >
> > A successful call to kobject_get() will increment the kobject's reference
> > counter and return the pointer to the kobject. If, however, the kobject is
> > already in the process of being destroyed, the operation will fail and
> > kobject_get() will return NULL.
>
> Eh, no. We'll always return !NULL if the kobject is !NULL to start
> with. If the reference count is already 0, the code will moan, but the
> caller will still get a pointer.
Good point, this was the way things used to work a long time ago, I'll remove
this.
> > This return value must always be tested, or
> > no end of unpleasant race conditions could result.
> >
> > When a reference is released, the call to kobject_put() will decrement the
> > reference count and, possibly, free the object. Note that kobject_init()
> > sets the reference count to one, so the code which sets up the kobject will
> > need to do a kobject_put() eventually to release that reference.
> >
> > Because kobjects are dynamic, they must not be declared statically or on
> > the stack, but instead, always from the heap. Future versions of the
> > kernel will contain a run-time check for kobjects that are created
> > statically and will warn the developer of this improper usage.
> >
> >
> > Hooking into sysfs
> >
> > An initialized kobject will perform reference counting without trouble, but
> > it will not appear in sysfs. To create sysfs entries, kernel code must pass
> > the object to kobject_add():
> >
> > int kobject_add(struct kobject *kobj);
> >
> > As always, this operation can fail. The function:
> >
> > void kobject_del(struct kobject *kobj);
> >
> > will remove the kobject from sysfs.
> >
> > There is a kobject_register() function, which is really just the
> > combination of the calls to kobject_init() and kobject_add().
>
> Plus the uevent.
>
> > Similarly,
> > kobject_unregister() will call kobject_del(), then call kobject_put() to
> > release the initial reference created with kobject_register() (or really
> > kobject_init()).
>
> And also throw a uevent :)
Heh, ok, I've now added this for both :)
> > Creating "simple" kobjects
> >
> > Sometimes all that a developer wants is a way to create a simple directory
> > in the sysfs heirachy, and not have to mess with the whole complication of
>
> hierarchy
thanks.
> > ktypes and release methods
> >
> > One important thing still missing from the discussion is what happens to a
> > kobject when its reference count reaches zero. The code which created the
> > kobject generally does not know when that will happen; if it did, there
> > would be little point in using a kobject in the first place. Even
> > predicatable object lifecycles become more complicated when sysfs is
> > brought in; user-space programs can keep a reference to a kobject (by
> > keeping one of its associated sysfs files open) for an arbitrary period of
> > time.
>
> This is no longer true?
>
> The major problem is that registered kobjects can be looked-up by other
> kernel code that can get a reference.
yes, you are right.
> > The end result is that a structure protected by a kobject cannot be freed
> > before its reference count goes to zero. The reference count is not under
> > the direct control of the code which created the kobject. So that code must
> > be notified asynchronously whenever the last reference to one of its
> > kobjects goes away.
>
> We should perhaps add a bit fat warning here:
>
> Note that once you registered your kobject via kobject_add(), you must
> never use kfree() to free it directly. The only safe way is to use
> kobject_put(). It is good practice to always use kobject_put() after
> kobject_init() to avoid errors creeping in.
Now added, thanks.
thanks for the review, I really appreciate it.
greg k-h
On Wed, Nov 28, 2007 at 01:23:02PM +0100, Kay Sievers wrote:
> On Wed, 2007-11-28 at 12:45 +0100, Cornelia Huck wrote:
> > On Tue, 27 Nov 2007 15:02:52 -0800, Greg KH <[email protected]> wrote:
>
> > > A kset serves these functions:
> > >
> > > - It serves as a bag containing a group of objects. A kset can be used by
> > > the kernel to track "all block devices" or "all PCI device drivers."
> > >
> > > - A kset is also a subdirectory in sysfs, where the associated kobjects
> > > with the kset can show up.
> >
> > Perhaps better wording:
> >
> > A kset is also represented via a subdirectory in sysfs, under which the
> > kobjects associated with the kset can show up.
>
> This draws a misleading picture. A member of a kset shows up where the
> "parent" pointer points to. Like /sys/block is a kset, the kset contains
> disks and partitions, but partitions do not live at the kset, and tons
> of other kset directories where this is the case.
>
> "If the kobject belonging to a kset has no parent kobject set, it will
> be added to the kset's directory. Not all members of a kset do
> necessarily live in the kset directory. If an explicit parent kobject is
> assigned before the kobject is added, the kobject is registered with the
> kset, but added below the parent kobject."
Nice, thanks, I've added this :)
greg k-h
On Wed, Nov 28, 2007 at 12:45:45PM +0100, Cornelia Huck wrote:
> On Tue, 27 Nov 2007 15:02:52 -0800,
> Greg KH <[email protected]> wrote:
> > - A kset can provide a set of default attributes that all kobjects that
> > belong to it automatically inherit and have created whenever a kobject
> > is registered belonging to the kset.
>
> Hm, the default attributes are provided by the ktype?
Yes, now fixed.
> > The uevent function will be called when the uevent is about to be sent to
> > userspace to allow more environment variables to be added to the uevent.
>
> It may be helpful to mention which uevents are by default created by
> the kobject core (KOBJ_ADD, KOBJ_DEL, KOBJ_MOVE).
Is this really needed?
> > - refcount is the kobject's reference count; it is initialized by kobject_init()
>
> There is no field called "refcount"; the embedded struct kref kref is
> initialized by kobject_init().
now removed, thanks.
> > Often, much of the initialization of a kobject is handled by the layer that
> > manages the containing kset. See the sample/kobject/kset-example.c for how
> > this is usually handled.
>
> Do we also want to mention kobject_rename() and kobject_move(), or are
> those functions so esoteric that most people don't want to know about
> them?
They can be found in the kerneldoc api reference if they are needed :)
thanks,
greg k-h
On Wed, Nov 28, 2007 at 06:00:27PM +0100, Kay Sievers wrote:
> On Wed, 2007-11-28 at 17:51 +0100, Cornelia Huck wrote:
> > On Wed, 28 Nov 2007 17:36:29 +0100,
> > Kay Sievers <[email protected]> wrote:
> >
> > >
> > > On Wed, 2007-11-28 at 17:12 +0100, Cornelia Huck wrote:
> > > > On Wed, 28 Nov 2007 16:57:48 +0100,
> > > > Kay Sievers <[email protected]> wrote:
> > > >
> > > > > On Wed, 2007-11-28 at 16:48 +0100, Cornelia Huck wrote:
> > > > > > On Wed, 28 Nov 2007 13:23:02 +0100,
> > > > > > Kay Sievers <[email protected]> wrote:
> > > > > > > On Wed, 2007-11-28 at 12:45 +0100, Cornelia Huck wrote:
> > > > > > > > On Tue, 27 Nov 2007 15:02:52 -0800, Greg KH <[email protected]> wrote:
> > > > >
> > > > > > > > > The uevent function will be called when the uevent is about to be sent to
> > > > > > > > > userspace to allow more environment variables to be added to the uevent.
> > > > > > > >
> > > > > > > > It may be helpful to mention which uevents are by default created by
> > > > > > > > the kobject core (KOBJ_ADD, KOBJ_DEL, KOBJ_MOVE).
> > > > > > >
> > > > > > > I think, we should remove all these default events from the kobject
> > > > > > > core. We will not be able to manage the timing issues and "raw" kobject
> > > > > > > users should request the events on their own, when they are finished
> > > > > > > adding stuff to the kobject. I see currently no way to solve the
> > > > > > > "attributes created after the event" problem. The new
> > > > > > > *_create_and_register functions do not allow default attributes to be
> > > > > > > created, which will just lead to serious trouble when someone wants to
> > > > > > > use udev to set defaults and such things. We may just want to require an
> > > > > > > explicit call to send the event?
> > > > > >
> > > > > > There will always be attributes that will show up later (for example,
> > > > > > after a device is activated). Probably the best approach is to keep the
> > > > > > default uevents, but have the attribute-adder send another uevent when
> > > > > > they are done?
> > > > >
> > > > > Uh, that's more an exception where we can't give guarantees because of
> > > > > very specific hardware setups, and it would be an additional "change"
> > > > > event. There are valid cases for this, but only a _very_ few.
> > > > >
> > > > > There is absolutely no reason not to do it right with the "add" event,
> > > > > just because we are too lazy to solve it proper the current code. It's
> > > > > just so broken by design, what we are doing today. :)
> > > >
> > > > I'm worrying a bit about changes that impact the whole code tree in
> > > > lots of places. I'd be fine with the device layer doing its uevent
> > > > manually in device_add() at the very end, though. (This would allow
> > > > drivers to add attributes in their probe function before the uevent,
> > > > for example.)
> >
> > <Looks at device_add() again: It already throws the uevent manually...>
>
> I think I still remember what I did 2.5 years ago :)
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e57cd73e2e844a3da25cc6b420674c81bbe1b387
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=18c3d5271b472c096adfc856e107c79f6fd30d7d
>
>
> > > The driver core does use the split already in most places, I did that
> > > long ago. There are not too many (~20) users of kobject_register(), and
> > > it's a pretty straight-forward change to change that to _init, _add,
> > > _uevent, and get rid of that totally useless "convenience api".
> > >
> > > I think there is no longer any excuse to keep that broken code around,
> > > and even require to document that it's broken. The whole purpose of the
> > > uevent is userspace consumption, which just doesn't work correctly with
> > > the code we offer. The fix is trivial, and should be done now, and we no
> > > longer need to fiddle around timing issues, just because we are too
> > > lazy.
> > >
> > > I propose the removal of _all_ funtions that have *register* in their
> > > name, and always require the following sequence:
> > > _init()
> > > _add()
> > > _uevent(_ADD)
> > >
> > > _uevent(_REMOVE)
> > > _del()
> > > _put()
> > >
> > > The _create_and_register() functions would become _create_ and_add()
> > > and will need an additional _uevent() call after they populated the
> > > object.
> >
> > I'm absolutely fine with doing that at the kobject level (after all,
> > it's a quite contained change, and the uevent function explicitely
> > works on a kobject).
> >
> > For the other _register()/_unregister() functions, it's a different
> > piece of cake. They are:
> > - distributed through lot of different code
> > - at a higher level than kobjects, and kobject_uevent() acts on the
> > kobject
> > - usually encapsulating a sequence that wants to be used by almost all
> > callers, and that includes a uevent
> >
> > I don't think we want people registering a higher level object and then
> > wondering why udev doesn't seem to take notice of it.
>
> Oh, I'm just talking about lib/kobject.c. And the new kobj/kset stuff we
> added which is currently in the -mm tree.
>
> It suffers from the same old problem, and even gets documentend as
> "broken" now. I really think that should be fixed proper instead, and
> it's the right time to do it now.
Ok, how should it be fixed?
thanks,
greg k-h
On Wed, Nov 28, 2007 at 05:35:32PM +0100, Cornelia Huck wrote:
> On Tue, 27 Nov 2007 15:04:06 -0800,
> Greg KH <[email protected]> wrote:
>
> > static struct foo_obj *create_foo_obj(const char *name)
> > {
> > struct foo_obj *foo;
> > int retval;
> >
> > /* allocate the memory for the whole object */
> > foo = kzalloc(sizeof(*foo), GFP_KERNEL);
> > if (!foo)
> > return NULL;
> >
> > /* initialize the kobject portion of the object properly */
> > kobject_set_name(&foo->kobj, "%s", name);
>
> Returncode not checked :)
good catch. Hm, I don't think anyone checks that function :)
> > foo->kobj.kset = example_kset;
> > foo->kobj.ktype = &foo_ktype;
> >
> > /*
> > * Register the kobject with the kernel, all the default files will
> > * be created here and the uevent will be sent out. If we were to
> > * call kobject_init() and then kobject_add() we would be
> > * responsible for sending out the initial KOBJ_ADD uevent.
> > */
> > retval = kobject_register(&foo->kobj);
> > if (retval) {
> > kfree(foo);
>
> kobject_put(foo) is needed since it gets you through kobject_cleanup()
> where the name can be freed.
No, kobject_register() should have handled that for us, right?
thanks,
greg k-h
On Wed, Nov 28, 2007 at 02:03:28PM -0500, Alan Stern wrote:
> On Tue, 27 Nov 2007, Greg KH wrote:
>
> > Part of the difficulty in understanding the driver model - and the kobject
> > abstraction upon which it is built - is that there is no obvious starting
> > place. Dealing with kobjects requires understanding a few different types,
> > all of which make reference to each other. In an attempt to make things
> > easier, we'll take a multi-pass approach, starting with vague terms and
> > adding detail as we go. To that end, here are some quick definitions of
> > some terms we will be working with.
> >
> > - A kobject is an object of type struct kobject. Kobjects have a name
> > and a reference count. A kobject also has a parent pointer (allowing
> > objects to be arranged into hierarchies), a specific type, and,
> > usually, a representation in the sysfs virtual filesystem.
>
> As Cornelia said, it would be worthwhile mentioning krefs in this
> document as well. They are simple enough to explain, after all.
Now added, thanks.
> > Initialization of kobjects
> >
> > Code which creates a kobject must, of course, initialize that object. Some
> > of the internal fields are setup with a (mandatory) call to kobject_init():
>
> kobject_init() isn't mandatory if you use kobject_register(). But then
> Kay wants to do away with kobject_register()...
>
> > The other kobject fields which should be set, directly or indirectly, by
> > the creator are its ktype, kset, and parent. We will get to those shortly,
> > however please note that the ktype and kset must be set before the
> > kobject_init() function is called.
>
> In fact kset, ktype, and parent are optional, right? You might mention
> at this point that not all those fields are needed, and explain later
> which combinations are legal.
They are optional, but if you want to do anything, you need to set them :)
> > When a reference is released, the call to kobject_put() will decrement the
> > reference count and, possibly, free the object. Note that kobject_init()
> > sets the reference count to one, so the code which sets up the kobject will
> > need to do a kobject_put() eventually to release that reference.
>
> It's worth mentioning here (and perhaps elsewhere too) that all of the
> function calls described here can sleep and hence must be made in
> process context, with the exception of the *_get() routines. It's
> possible to call *_put() in atomic context; the SCSI core does this
> (with device_put, not kobject_put) and has to jump through hoops to run
> the corresponding release routine in a waitqueue task. In general,
> though, it isn't safe.
Is this really needed? If anyone calls them from non-process context,
they will get a nasty run-time warning, right?
> > Because kobjects are dynamic, they must not be declared statically or on
> > the stack, but instead, always from the heap. Future versions of the
> > kernel will contain a run-time check for kobjects that are created
> > statically and will warn the developer of this improper usage.
>
> Why not? What's wrong with static kobjects? I've never understood this.
They are reference counted. Other portions of the kernel can grab them
and think they are safe to use. If you do this with a static object,
what happens when the code goes away?
Most of the nasty race conditions that require this are now cleaned up
with Tejun's great sysfs work, so you will probably not see problems if
you do this, but in general, it's not a good thing to do.
> > ktypes and release methods
> >
> > One important thing still missing from the discussion is what happens to a
> > kobject when its reference count reaches zero. The code which created the
> > kobject generally does not know when that will happen; if it did, there
> > would be little point in using a kobject in the first place. Even
> > predicatable object lifecycles become more complicated when sysfs is
>
> predictable
thanks.
> > One important point cannot be overstated: every kobject must have a
> > release() method, and the kobject must persist (in a consistent state)
> > until that method is called. If these constraints are not met, the code is
> > flawed. Note that the kernel will warn you if you forget to provide a
> > release() method. Do not try to get rid of this warning by providing an
> > "empty" release function, you will be mocked merciously by the kobject
> > maintainer if you attempt this.
>
> Not to mention that doing this will leak memory. Unless the kobject
> is static...
heh.
I think your other questions are already answered, right?
thanks,
greg k-h
On Wed, 2007-11-28 at 22:08 -0800, Greg KH wrote:
> On Wed, Nov 28, 2007 at 06:00:27PM +0100, Kay Sievers wrote:
> > On Wed, 2007-11-28 at 17:51 +0100, Cornelia Huck wrote:
> > > On Wed, 28 Nov 2007 17:36:29 +0100, Kay Sievers <[email protected]> wrote:
> > > > On Wed, 2007-11-28 at 17:12 +0100, Cornelia Huck wrote:
> > > > > On Wed, 28 Nov 2007 16:57:48 +0100, Kay Sievers <[email protected]> wrote:
> > > > > > On Wed, 2007-11-28 at 16:48 +0100, Cornelia Huck wrote:
> > > > > > > On Wed, 28 Nov 2007 13:23:02 +0100, Kay Sievers <[email protected]> wrote:
> > > > > > > > On Wed, 2007-11-28 at 12:45 +0100, Cornelia Huck wrote:
> > > > > > > > > On Tue, 27 Nov 2007 15:02:52 -0800, Greg KH <[email protected]> wrote:
> > > > > >
> > > > > > > > > > The uevent function will be called when the uevent is about to be sent to
> > > > > > > > > > userspace to allow more environment variables to be added to the uevent.
> > > > > > > > >
> > > > > > > > > It may be helpful to mention which uevents are by default created by
> > > > > > > > > the kobject core (KOBJ_ADD, KOBJ_DEL, KOBJ_MOVE).
> > > > > > > >
> > > > > > > > I think, we should remove all these default events from the kobject
> > > > > > > > core. We will not be able to manage the timing issues and "raw" kobject
> > > > > > > > users should request the events on their own, when they are finished
> > > > > > > > adding stuff to the kobject. I see currently no way to solve the
> > > > > > > > "attributes created after the event" problem. The new
> > > > > > > > *_create_and_register functions do not allow default attributes to be
> > > > > > > > created, which will just lead to serious trouble when someone wants to
> > > > > > > > use udev to set defaults and such things. We may just want to require an
> > > > > > > > explicit call to send the event?
> > > > > > >
> > > > > > > There will always be attributes that will show up later (for example,
> > > > > > > after a device is activated). Probably the best approach is to keep the
> > > > > > > default uevents, but have the attribute-adder send another uevent when
> > > > > > > they are done?
> > > > > >
> > > > > > Uh, that's more an exception where we can't give guarantees because of
> > > > > > very specific hardware setups, and it would be an additional "change"
> > > > > > event. There are valid cases for this, but only a _very_ few.
> > > > > >
> > > > > > There is absolutely no reason not to do it right with the "add" event,
> > > > > > just because we are too lazy to solve it proper the current code. It's
> > > > > > just so broken by design, what we are doing today. :)
> > > > >
> > > > > I'm worrying a bit about changes that impact the whole code tree in
> > > > > lots of places. I'd be fine with the device layer doing its uevent
> > > > > manually in device_add() at the very end, though. (This would allow
> > > > > drivers to add attributes in their probe function before the uevent,
> > > > > for example.)
> > >
> > > <Looks at device_add() again: It already throws the uevent manually...>
> >
> > I think I still remember what I did 2.5 years ago :)
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e57cd73e2e844a3da25cc6b420674c81bbe1b387
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=18c3d5271b472c096adfc856e107c79f6fd30d7d
> >
> > > > The driver core does use the split already in most places, I did that
> > > > long ago. There are not too many (~20) users of kobject_register(), and
> > > > it's a pretty straight-forward change to change that to _init, _add,
> > > > _uevent, and get rid of that totally useless "convenience api".
> > > >
> > > > I think there is no longer any excuse to keep that broken code around,
> > > > and even require to document that it's broken. The whole purpose of the
> > > > uevent is userspace consumption, which just doesn't work correctly with
> > > > the code we offer. The fix is trivial, and should be done now, and we no
> > > > longer need to fiddle around timing issues, just because we are too
> > > > lazy.
> > > >
> > > > I propose the removal of _all_ funtions that have *register* in their
> > > > name, and always require the following sequence:
> > > > _init()
> > > > _add()
> > > > _uevent(_ADD)
> > > >
> > > > _uevent(_REMOVE)
> > > > _del()
> > > > _put()
> > > >
> > > > The _create_and_register() functions would become _create_ and_add()
> > > > and will need an additional _uevent() call after they populated the
> > > > object.
> > >
> > > I'm absolutely fine with doing that at the kobject level (after all,
> > > it's a quite contained change, and the uevent function explicitely
> > > works on a kobject).
> > >
> > > For the other _register()/_unregister() functions, it's a different
> > > piece of cake. They are:
> > > - distributed through lot of different code
> > > - at a higher level than kobjects, and kobject_uevent() acts on the
> > > kobject
> > > - usually encapsulating a sequence that wants to be used by almost all
> > > callers, and that includes a uevent
> > >
> > > I don't think we want people registering a higher level object and then
> > > wondering why udev doesn't seem to take notice of it.
> >
> > Oh, I'm just talking about lib/kobject.c. And the new kobj/kset stuff we
> > added which is currently in the -mm tree.
> >
> > It suffers from the same old problem, and even gets documentend as
> > "broken" now. I really think that should be fixed proper instead, and
> > it's the right time to do it now.
>
> Ok, how should it be fixed?
Accept that the kobject API is that low-level, that we can't have a
proper convenience API - it just does the wrong thing.
Do the same for all other in-kernel users what we did 2.5 years ago for
the driver core, and no longer use the kobject/kset_register()
functions, and let the caller send the events manually when the object
is ready, or removed.
We have ~20 callers of kobject_(un)register(), convert them to
kobject_init() + kobject_add() + kobject_uevent(), and just delete the
broken kobject_(un)register() functions from the kobject code.
Rename the new kobject/kset_create_and_register() to _create_and_add()
and also require the uevent to be sent manually when the caller is ready
with the object.
That way we get rid of a broken API which causes far more problems than
it solves. And we will no longer need to carry this in the example code:
/*
* Note, these files will be created _after_ the kobject above
* created. This can cause userspace to be looking around in sysfs
* for these files before they are really created. If you are
* worried about something like this, perhaps you really need to
* create your own kset and have a default attribute group for your
* kobject.
*/
We just get one single API, and we document that the caller needs to
send an event when it has finished populating the object, or deletes the
object.
If we are worrying about users who might forget to send events - I
really prefer missing uevents, which we can simply add, over ones with
the broken timing the current API causes, and failing userspace where
nobody understands what's going wrong at event time. :)
Thanks,
Kay
On Thu, 29 Nov 2007 08:50:31 +0100,
Kay Sievers <[email protected]> wrote:
> We have ~20 callers of kobject_(un)register(), convert them to
> kobject_init() + kobject_add() + kobject_uevent(), and just delete the
> broken kobject_(un)register() functions from the kobject code.
I find a bit more callers in my tree, but it still isn't too many.
This may also be a good time to evaluate which of those kobjects really
need a uevent. For example, I can imagine that you don't need a uevent
for a directory being created, but only for the objects created in that
directory.
On Wed, 28 Nov 2007 22:11:39 -0800,
Greg KH <[email protected]> wrote:
> > > foo->kobj.kset = example_kset;
> > > foo->kobj.ktype = &foo_ktype;
> > >
> > > /*
> > > * Register the kobject with the kernel, all the default files will
> > > * be created here and the uevent will be sent out. If we were to
> > > * call kobject_init() and then kobject_add() we would be
> > > * responsible for sending out the initial KOBJ_ADD uevent.
> > > */
> > > retval = kobject_register(&foo->kobj);
> > > if (retval) {
> > > kfree(foo);
> >
> > kobject_put(foo) is needed since it gets you through kobject_cleanup()
> > where the name can be freed.
>
> No, kobject_register() should have handled that for us, right?
kobject_register() doesn't do a kobject_put() if kobject_add() failed.
On Wed, 28 Nov 2007 22:04:58 -0800,
Greg KH <[email protected]> wrote:
> > > The uevent function will be called when the uevent is about to be sent to
> > > userspace to allow more environment variables to be added to the uevent.
> >
> > It may be helpful to mention which uevents are by default created by
> > the kobject core (KOBJ_ADD, KOBJ_DEL, KOBJ_MOVE).
>
> Is this really needed?
If we go forward with Kay's suggestion of removing kobject_register()
and friends, I guess no.
> > > Often, much of the initialization of a kobject is handled by the layer that
> > > manages the containing kset. See the sample/kobject/kset-example.c for how
> > > this is usually handled.
> >
> > Do we also want to mention kobject_rename() and kobject_move(), or are
> > those functions so esoteric that most people don't want to know about
> > them?
>
> They can be found in the kerneldoc api reference if they are needed :)
OK.
On Wed, 28 Nov 2007 17:00:57 -0500 (EST),
Alan Stern <[email protected]> wrote:
> On Wed, 28 Nov 2007, Greg KH wrote:
>
> > On Wed, Nov 28, 2007 at 03:42:00PM -0500, Alan Stern wrote:
> > > This patch (as1020) adds a check to kobject_init() to insure that the
> > > ktype field is not NULL. This is just for safety's sake; as far as I
> > > know there are no remaining places where the field is left unset. But
> > > ironically, kset_init() did fail to set it! The patch fixes that and
> > > removes some redundant initialization in kset_createa().
> > >
> > > The patch also fixes up elevator_init(), where ktype was set after
> > > calling kobject_init() instead of before.
> >
> > No, it's safe to set ktype after kobject_init, it was just not safe to
> > set the kset. As Kay pointed out, I just added a patch to my tree to
> > resolve this issue, and I'll go back and update the documentation now.
> >
> > I do like the "check for a ktype" warning, I'll go add that to my local
> > tree and see what breaks. Odds are, all the static kobjects will :(
>
> You have to be careful. The ktype check I wrote lives in
> kobject_init() -- that's why I had to make elevator_init() assign the
> ktype before calling kobject_init(). If you put the check into
> kobject_add() instead then you won't end up checking objects that get
> initialized but not added.
>
> Yes, nobody would deliberately use a kobject without adding it, but it
> could happen as the result of an failure between the _init and _add
> calls.
And if someone calls kobject_put() after kobject_init() to clean up,
their release function will not be called if they didn't set the ktype.
So the check really belongs into kobject_init() IMO.
On Wed, 28 Nov 2007 14:18:18 -0500 (EST),
Alan Stern <[email protected]> wrote:
> On Wed, 28 Nov 2007, Cornelia Huck wrote:
>
> > We should perhaps add a bit fat warning here:
> >
> > Note that once you registered your kobject via kobject_add(), you must
> > never use kfree() to free it directly. The only safe way is to use
>
> Slightly ambiguous. Instead just say:
>
> If you have initialized your kobject via kobject_init() or
> kobject_register(), you must not deallocate the kobject anywhere other
> than its release() method (which is invoked during the final
> kobject_put() call). Otherwise the kernel will leak memory.
And references to the kobject's kset...
>
> > > One important point cannot be overstated: every kobject must have a
> > > release() method, and the kobject must persist (in a consistent state)
> > > until that method is called.
> >
> > Which is especially hurting if you use kobjects in modules. (Which
> > reminds me: Must dig up the patchset that fixes the module unload vs.
> > release problem.)
>
> In theory modules shouldn't present a problem -- especially if Greg
> merges the "Kobjects: drop child->parent ref at unregistration" patch.
>
> When a module is unloaded, it has to unregister all its kobjects, which
> should force all their children to be unregistered too. At that time
> the children's drivers should drop all their references to the parent
> kobject, leaving only references held by the module being unloaded.
> Presumably it can arrange to drop its own references before its exit()
> routine returns.
>
> The only problem arises when a child's driver retains a reference to
> the parent kobject. If things are done properly, this reference should
> involve incrementing the module count -- which would prevent the module
> from being unloaded in the first place.
This still leaves the possibility that random code may grab a reference
once the kobject is present in the tree and lookupable. You gave up the
control of the number of references to your object once you made it
public.
On Thu, 2007-11-29 at 10:35 +0100, Cornelia Huck wrote:
> On Thu, 29 Nov 2007 08:50:31 +0100,
> Kay Sievers <[email protected]> wrote:
>
> > We have ~20 callers of kobject_(un)register(), convert them to
> > kobject_init() + kobject_add() + kobject_uevent(), and just delete the
> > broken kobject_(un)register() functions from the kobject code.
>
> I find a bit more callers in my tree, but it still isn't too many.
You are right, there are 27 files.
> This may also be a good time to evaluate which of those kobjects really
> need a uevent. For example, I can imagine that you don't need a uevent
> for a directory being created, but only for the objects created in that
> directory.
True, most plain directories don't need an event (and usually don't have
one today), because the kset does not match, like the "holders/slaves"
directories, or the "glue directories" between bus and class devices.
But for example the bus/class directories, there we really want an
uevent (and have one today) to set bus specific defaults when the
bus/class gets created.
Thanks,
Kay
On Thu, 2007-11-29 at 11:05 +0100, Cornelia Huck wrote:
> On Wed, 28 Nov 2007 17:00:57 -0500 (EST),
> Alan Stern <[email protected]> wrote:
> > On Wed, 28 Nov 2007, Greg KH wrote:
> > > On Wed, Nov 28, 2007 at 03:42:00PM -0500, Alan Stern wrote:
> > > > This patch (as1020) adds a check to kobject_init() to insure that the
> > > > ktype field is not NULL. This is just for safety's sake; as far as I
> > > > know there are no remaining places where the field is left unset. But
> > > > ironically, kset_init() did fail to set it! The patch fixes that and
> > > > removes some redundant initialization in kset_createa().
> > > >
> > > > The patch also fixes up elevator_init(), where ktype was set after
> > > > calling kobject_init() instead of before.
> > >
> > > No, it's safe to set ktype after kobject_init, it was just not safe to
> > > set the kset. As Kay pointed out, I just added a patch to my tree to
> > > resolve this issue, and I'll go back and update the documentation now.
> > >
> > > I do like the "check for a ktype" warning, I'll go add that to my local
> > > tree and see what breaks. Odds are, all the static kobjects will :(
> >
> > You have to be careful. The ktype check I wrote lives in
> > kobject_init() -- that's why I had to make elevator_init() assign the
> > ktype before calling kobject_init(). If you put the check into
> > kobject_add() instead then you won't end up checking objects that get
> > initialized but not added.
> >
> > Yes, nobody would deliberately use a kobject without adding it, but it
> > could happen as the result of an failure between the _init and _add
> > calls.
>
> And if someone calls kobject_put() after kobject_init() to clean up,
> their release function will not be called if they didn't set the ktype.
> So the check really belongs into kobject_init() IMO.
Hmm, will one expect that the whole object will also be free'd when we
suggest to call kobject_put() to cleanup? That might be pretty
unexpected, right?
Kay
On Thu, 29 Nov 2007 11:59:06 +0100,
Kay Sievers <[email protected]> wrote:
> On Thu, 2007-11-29 at 11:05 +0100, Cornelia Huck wrote:
> > On Wed, 28 Nov 2007 17:00:57 -0500 (EST),
> > Alan Stern <[email protected]> wrote:
> > > On Wed, 28 Nov 2007, Greg KH wrote:
> > > > On Wed, Nov 28, 2007 at 03:42:00PM -0500, Alan Stern wrote:
> > > > > This patch (as1020) adds a check to kobject_init() to insure that the
> > > > > ktype field is not NULL. This is just for safety's sake; as far as I
> > > > > know there are no remaining places where the field is left unset. But
> > > > > ironically, kset_init() did fail to set it! The patch fixes that and
> > > > > removes some redundant initialization in kset_createa().
> > > > >
> > > > > The patch also fixes up elevator_init(), where ktype was set after
> > > > > calling kobject_init() instead of before.
> > > >
> > > > No, it's safe to set ktype after kobject_init, it was just not safe to
> > > > set the kset. As Kay pointed out, I just added a patch to my tree to
> > > > resolve this issue, and I'll go back and update the documentation now.
> > > >
> > > > I do like the "check for a ktype" warning, I'll go add that to my local
> > > > tree and see what breaks. Odds are, all the static kobjects will :(
> > >
> > > You have to be careful. The ktype check I wrote lives in
> > > kobject_init() -- that's why I had to make elevator_init() assign the
> > > ktype before calling kobject_init(). If you put the check into
> > > kobject_add() instead then you won't end up checking objects that get
> > > initialized but not added.
> > >
> > > Yes, nobody would deliberately use a kobject without adding it, but it
> > > could happen as the result of an failure between the _init and _add
> > > calls.
> >
> > And if someone calls kobject_put() after kobject_init() to clean up,
> > their release function will not be called if they didn't set the ktype.
> > So the check really belongs into kobject_init() IMO.
>
> Hmm, will one expect that the whole object will also be free'd when we
> suggest to call kobject_put() to cleanup? That might be pretty
> unexpected, right?
I'd expect the kobject to be freed if I called kobject_put() on my last
reference (but that may be because I'm familiar with the code :)).
OTOH, if getting the reference on the kset is moved from kobject_init()
to kobject_add(), we aren't forced to use kobject_put() but may use
kfree().
On Wed, 28 Nov 2007, Greg KH wrote:
> > > The other kobject fields which should be set, directly or indirectly, by
> > > the creator are its ktype, kset, and parent. We will get to those shortly,
> > > however please note that the ktype and kset must be set before the
> > > kobject_init() function is called.
> >
> > In fact kset, ktype, and parent are optional, right? You might mention
> > at this point that not all those fields are needed, and explain later
> > which combinations are legal.
>
> They are optional, but if you want to do anything, you need to set them :)
You don't need to set kset, do you? Or parent -- especially if kset is
set instead. Certainly kobjects without one or the other of those are
useful.
> > It's worth mentioning here (and perhaps elsewhere too) that all of the
> > function calls described here can sleep and hence must be made in
> > process context, with the exception of the *_get() routines. It's
> > possible to call *_put() in atomic context; the SCSI core does this
> > (with device_put, not kobject_put) and has to jump through hoops to run
> > the corresponding release routine in a waitqueue task. In general,
> > though, it isn't safe.
>
> Is this really needed? If anyone calls them from non-process context,
> they will get a nasty run-time warning, right?
IMO it's better to let people know up front. Otherwise they might get
some misconception fixed in their brain, misdesign their drivers, and
have to change a large hunk of code when they discover it doesn't work
at runtime.
That's the whole point of documentation in the first place, right? To
tell people how the code is supposed to be used.
> > Why not? What's wrong with static kobjects? I've never understood this.
>
> They are reference counted. Other portions of the kernel can grab them
> and think they are safe to use. If you do this with a static object,
> what happens when the code goes away?
Let's see. The error scenario you propose is that a loadable module
contains a static kobject and the kobject's refcount is still positive
when the module is unloaded. Then the holder of the remaining
reference tries to access the kobject and crashes. Yes, that's bad.
But turn it around: Suppose the kobject were allocated dynamically
instead. It would remain in memory when the owning module was
unloaded, but its release method would go away along with the rest of
the module. When the last remaining reference is dropped there would
be a crash anyway.
Evidently the bug here is allowing the module to be unloaded while one
of its kobjects still has a positive refcount. It doesn't matter
whether the kobject was allocated statically or dynamically.
> Most of the nasty race conditions that require this are now cleaned up
> with Tejun's great sysfs work, so you will probably not see problems if
> you do this, but in general, it's not a good thing to do.
What about static kobjects in built-in (non-modular) kernel code?
Especially if the kobject in question is never released?
Alan Stern
On Thu, 29 Nov 2007, Cornelia Huck wrote:
> > The only problem arises when a child's driver retains a reference to
> > the parent kobject. If things are done properly, this reference should
> > involve incrementing the module count -- which would prevent the module
> > from being unloaded in the first place.
>
> This still leaves the possibility that random code may grab a reference
> once the kobject is present in the tree and lookupable. You gave up the
> control of the number of references to your object once you made it
> public.
I don't agree with this argument. Code should never grab random
references without insuring that the owner of the referenced object is
pinned. This rule applies to everything, not just kobjects.
Unfortunately kobjects don't have an owner field. In practice this
means that it isn't possible to pin the owner of some random kobject --
you have to know where the kobject came from or what it's embedded in.
All users of kobjects need to work this way.
Alan Stern
On Thu, 29 Nov 2007, Kay Sievers wrote:
> > And if someone calls kobject_put() after kobject_init() to clean up,
> > their release function will not be called if they didn't set the ktype.
> > So the check really belongs into kobject_init() IMO.
Right. And even though cleaning up no longer needs to drop a reference
to the kset, it still might need to free the kobject's name. So for
example, either of these sequences:
kobject_init(); kobject_set_name();
kobject_set_name(); kobject_init();
... ...
kobject_free(); kobject_free();
would leak memory.
In fact, if we were designing the kobject API from scratch, I'd suggest
making the ktype value an argument to kobject_init() so that it
_couldn't_ be omitted.
> Hmm, will one expect that the whole object will also be free'd when we
> suggest to call kobject_put() to cleanup? That might be pretty
> unexpected, right?
I don't understand the question. People _already_ expect the cleanup
routine to free the kobject when the last reference is dropped. Why
should there be any confusion over this?
Alan Stern
On Thu, 2007-11-29 at 10:54 -0500, Alan Stern wrote:
> On Thu, 29 Nov 2007, Kay Sievers wrote:
>
> > > And if someone calls kobject_put() after kobject_init() to clean up,
> > > their release function will not be called if they didn't set the ktype.
> > > So the check really belongs into kobject_init() IMO.
>
> Right. And even though cleaning up no longer needs to drop a reference
> to the kset, it still might need to free the kobject's name. So for
> example, either of these sequences:
>
> kobject_init(); kobject_set_name();
> kobject_set_name(); kobject_init();
> ... ...
> kobject_free(); kobject_free();
>
> would leak memory.
Yeah, only the kobject_put() would free the name.
> In fact, if we were designing the kobject API from scratch, I'd suggest
> making the ktype value an argument to kobject_init() so that it
> _couldn't_ be omitted.
Sounds fine, maybe we should also pass the name along, so it will be
obvious what happens here:
int kobject_init(struct kobject *kobj, struct kobj_type *type, const char *fmt, ...)
> > Hmm, will one expect that the whole object will also be free'd when we
> > suggest to call kobject_put() to cleanup? That might be pretty
> > unexpected, right?
>
> I don't understand the question. People _already_ expect the cleanup
> routine to free the kobject when the last reference is dropped. Why
> should there be any confusion over this?
Oh, if you want to rewind on error and have an initialized but still
unregistered kobject, and just want to free the allocated name by
calling kobject_cleanup() or kobject_put() you might not expect, that
your whole object that embeds the kobject will be gone. Just something
we need to document ...
Kay
On Thu, 29 Nov 2007 17:04:40 +0100,
Kay Sievers <[email protected]> wrote:
> On Thu, 2007-11-29 at 10:54 -0500, Alan Stern wrote:
> > On Thu, 29 Nov 2007, Kay Sievers wrote:
> >
> > > > And if someone calls kobject_put() after kobject_init() to clean up,
> > > > their release function will not be called if they didn't set the ktype.
> > > > So the check really belongs into kobject_init() IMO.
> >
> > Right. And even though cleaning up no longer needs to drop a reference
> > to the kset, it still might need to free the kobject's name. So for
> > example, either of these sequences:
> >
> > kobject_init(); kobject_set_name();
> > kobject_set_name(); kobject_init();
> > ... ...
> > kobject_free(); kobject_free();
> >
> > would leak memory.
>
> Yeah, only the kobject_put() would free the name.
>
> > In fact, if we were designing the kobject API from scratch, I'd suggest
> > making the ktype value an argument to kobject_init() so that it
> > _couldn't_ be omitted.
>
> Sounds fine, maybe we should also pass the name along, so it will be
> obvious what happens here:
> int kobject_init(struct kobject *kobj, struct kobj_type *type, const char *fmt, ...)
Agreed. Better don't hide too much.
On Thu, 29 Nov 2007 10:47:19 -0500 (EST),
Alan Stern <[email protected]> wrote:
> On Thu, 29 Nov 2007, Cornelia Huck wrote:
>
> > > The only problem arises when a child's driver retains a reference to
> > > the parent kobject. If things are done properly, this reference should
> > > involve incrementing the module count -- which would prevent the module
> > > from being unloaded in the first place.
> >
> > This still leaves the possibility that random code may grab a reference
> > once the kobject is present in the tree and lookupable. You gave up the
> > control of the number of references to your object once you made it
> > public.
>
> I don't agree with this argument. Code should never grab random
> references without insuring that the owner of the referenced object is
> pinned. This rule applies to everything, not just kobjects.
Of course. You don't necessarily need prevent module unloading, but pin
the module in memory until the last reference is gone.
> Unfortunately kobjects don't have an owner field. In practice this
> means that it isn't possible to pin the owner of some random kobject --
> you have to know where the kobject came from or what it's embedded in.
> All users of kobjects need to work this way.
Yeah, that is what the-patchset-I-have-to-dig-around-for does:
Introduce an owner field in the kobject for pinning the module. (IIRC,
you even did some of the patches?)
On Thu, 29 Nov 2007, Cornelia Huck wrote:
> > Unfortunately kobjects don't have an owner field. In practice this
> > means that it isn't possible to pin the owner of some random kobject --
> > you have to know where the kobject came from or what it's embedded in.
> > All users of kobjects need to work this way.
>
> Yeah, that is what the-patchset-I-have-to-dig-around-for does:
> Introduce an owner field in the kobject for pinning the module. (IIRC,
> you even did some of the patches?)
Are these two patches what you mean? They are a little out-of-date
now, but the main idea hasn't changed.
Alan Stern
Index: usb-2.6/include/linux/kobject.h
===================================================================
--- usb-2.6.orig/include/linux/kobject.h
+++ usb-2.6/include/linux/kobject.h
@@ -71,7 +71,8 @@ static inline const char * kobject_name(
return kobj->k_name;
}
-extern void kobject_init(struct kobject *);
+extern void kobject_init_owner(struct kobject *, struct module *owner);
+#define kobject_init(kobj) kobject_init_owner(kobj, THIS_MODULE)
extern void kobject_cleanup(struct kobject *);
extern int __must_check kobject_add(struct kobject *);
@@ -84,7 +85,9 @@ extern int __must_check kobject_shadow_r
const char *new_name);
extern int __must_check kobject_move(struct kobject *, struct kobject *);
-extern int __must_check kobject_register(struct kobject *);
+extern int __must_check kobject_register_owner(struct kobject *,
+ struct module *owner);
+#define kobject_register(kobj) kobject_register_owner(kobj, THIS_MODULE)
extern void kobject_unregister(struct kobject *);
extern struct kobject * kobject_get(struct kobject *);
Index: usb-2.6/lib/kobject.c
===================================================================
--- usb-2.6.orig/lib/kobject.c
+++ usb-2.6/lib/kobject.c
@@ -164,10 +164,11 @@ static void verify_dynamic_kobject_alloc
#endif
/**
- * kobject_init - initialize object.
+ * kobject_init_onwer - initialize object.
* @kobj: object in question.
+ * @owner: module owning @kobj.
*/
-void kobject_init(struct kobject * kobj)
+void kobject_init_owner(struct kobject * kobj, struct module *owner)
{
if (!kobj)
return;
@@ -178,6 +179,7 @@ void kobject_init(struct kobject * kobj)
init_waitqueue_head(&kobj->poll);
kobj->kset = kset_get(kobj->kset);
/* Attempt to grab reference of owning module's kobject. */
+ kobj->owner = owner;
mod_kobject_get(kobj->owner);
}
@@ -272,15 +274,16 @@ int kobject_add(struct kobject * kobj)
/**
- * kobject_register - initialize and add an object.
+ * kobject_register_owner - initialize and add an object.
* @kobj: object in question.
+ * @owner: module owning @kobj.
*/
-int kobject_register(struct kobject * kobj)
+int kobject_register_owner(struct kobject * kobj, struct module *owner)
{
int error = -EINVAL;
if (kobj) {
- kobject_init(kobj);
+ kobject_init_owner(kobj, owner);
error = kobject_add(kobj);
if (!error)
kobject_uevent(kobj, KOBJ_ADD);
@@ -750,8 +753,8 @@ void subsys_remove_file(struct subsystem
}
#endif /* 0 */
-EXPORT_SYMBOL(kobject_init);
-EXPORT_SYMBOL(kobject_register);
+EXPORT_SYMBOL(kobject_init_owner);
+EXPORT_SYMBOL(kobject_register_owner);
EXPORT_SYMBOL(kobject_unregister);
EXPORT_SYMBOL(kobject_get);
EXPORT_SYMBOL(kobject_put);
Index: usb-2.6/drivers/base/core.c
===================================================================
--- usb-2.6.orig/drivers/base/core.c
+++ usb-2.6/drivers/base/core.c
@@ -479,8 +479,9 @@ static void klist_children_put(struct kl
/**
- * device_initialize - init device structure.
+ * device_initialize_owner - init device structure.
* @dev: device.
+ * @owner: module owing the release routine.
*
* This prepares the device for use by other layers,
* including adding it to the device hierarchy.
@@ -489,10 +490,10 @@ static void klist_children_put(struct kl
* may use @dev's fields (e.g. the refcount).
*/
-void device_initialize(struct device *dev)
+void device_initialize_owner(struct device *dev, struct module *owner)
{
kobj_set_kset_s(dev, devices_subsys);
- kobject_init(&dev->kobj);
+ kobject_init_owner(&dev->kobj, owner);
klist_init(&dev->klist_children, klist_children_get,
klist_children_put);
INIT_LIST_HEAD(&dev->dma_pools);
@@ -756,8 +757,9 @@ int device_add(struct device *dev)
/**
- * device_register - register a device with the system.
+ * device_register_owner - register a device with the system.
* @dev: pointer to the device structure
+ * @owner: module owning the release routine
*
* This happens in two clean steps - initialize the device
* and add it to the system. The two steps can be called
@@ -767,9 +769,9 @@ int device_add(struct device *dev)
* before it is added to the hierarchy.
*/
-int device_register(struct device *dev)
+int device_register_owner(struct device *dev, struct module *owner)
{
- device_initialize(dev);
+ device_initialize_owner(dev, owner);
return device_add(dev);
}
@@ -1013,9 +1015,9 @@ int __init devices_init(void)
EXPORT_SYMBOL_GPL(device_for_each_child);
EXPORT_SYMBOL_GPL(device_find_child);
-EXPORT_SYMBOL_GPL(device_initialize);
+EXPORT_SYMBOL_GPL(device_initialize_owner);
EXPORT_SYMBOL_GPL(device_add);
-EXPORT_SYMBOL_GPL(device_register);
+EXPORT_SYMBOL_GPL(device_register_owner);
EXPORT_SYMBOL_GPL(device_del);
EXPORT_SYMBOL_GPL(device_unregister);
Index: usb-2.6/include/linux/device.h
===================================================================
--- usb-2.6.orig/include/linux/device.h
+++ usb-2.6/include/linux/device.h
@@ -509,9 +509,12 @@ void driver_init(void);
/*
* High level routines for use by the bus drivers
*/
-extern int __must_check device_register(struct device * dev);
+extern int __must_check device_register_owner(struct device * dev,
+ struct module *owner);
+#define device_register(dev) device_register_owner(dev, THIS_MODULE)
extern void device_unregister(struct device * dev);
-extern void device_initialize(struct device * dev);
+extern void device_initialize_owner(struct device * dev, struct module *owner);
+#define device_initialize(dev) device_initialize_owner(dev, THIS_MODULE)
extern int __must_check device_add(struct device * dev);
extern void device_del(struct device * dev);
extern int device_for_each_child(struct device *, void *,
Index: usb-2.6/drivers/base/platform.c
===================================================================
--- usb-2.6.orig/drivers/base/platform.c
+++ usb-2.6/drivers/base/platform.c
@@ -106,13 +106,15 @@ EXPORT_SYMBOL_GPL(platform_get_irq_bynam
* platform_add_devices - add a numbers of platform devices
* @devs: array of platform devices to add
* @num: number of platform devices in array
+ * @owner: module owning the devices in @devs
*/
-int platform_add_devices(struct platform_device **devs, int num)
+int platform_add_devices_owner(struct platform_device **devs, int num,
+ struct module *owner)
{
int i, ret = 0;
for (i = 0; i < num; i++) {
- ret = platform_device_register(devs[i]);
+ ret = platform_device_register_owner(devs[i], owner);
if (ret) {
while (--i >= 0)
platform_device_unregister(devs[i]);
@@ -122,7 +124,7 @@ int platform_add_devices(struct platform
return ret;
}
-EXPORT_SYMBOL_GPL(platform_add_devices);
+EXPORT_SYMBOL_GPL(platform_add_devices_owner);
struct platform_object {
struct platform_device pdev;
@@ -311,16 +313,18 @@ void platform_device_del(struct platform
EXPORT_SYMBOL_GPL(platform_device_del);
/**
- * platform_device_register - add a platform-level device
+ * platform_device_register_owner - add a platform-level device
* @pdev: platform device we're adding
+ * @owner: module owning @pdev
*
*/
-int platform_device_register(struct platform_device * pdev)
+int platform_device_register_owner(struct platform_device * pdev,
+ struct module *owner)
{
- device_initialize(&pdev->dev);
+ device_initialize_owner(&pdev->dev, owner);
return platform_device_add(pdev);
}
-EXPORT_SYMBOL_GPL(platform_device_register);
+EXPORT_SYMBOL_GPL(platform_device_register_owner);
/**
* platform_device_unregister - unregister a platform-level device
Index: usb-2.6/include/linux/platform_device.h
===================================================================
--- usb-2.6.orig/include/linux/platform_device.h
+++ usb-2.6/include/linux/platform_device.h
@@ -23,7 +23,9 @@ struct platform_device {
#define to_platform_device(x) container_of((x), struct platform_device, dev)
-extern int platform_device_register(struct platform_device *);
+extern int platform_device_register_owner(struct platform_device *,
+ struct module *owner);
+#define platform_device_register(pdev) platform_device_register_owner(pdev, THIS_MODULE)
extern void platform_device_unregister(struct platform_device *);
extern struct bus_type platform_bus_type;
@@ -33,7 +35,9 @@ extern struct resource *platform_get_res
extern int platform_get_irq(struct platform_device *, unsigned int);
extern struct resource *platform_get_resource_byname(struct platform_device *, unsigned int, char *);
extern int platform_get_irq_byname(struct platform_device *, char *);
-extern int platform_add_devices(struct platform_device **, int);
+extern int platform_add_devices_owner(struct platform_device **, int,
+ struct module *owner);
+#define platform_add_devices(devs, n) platform_add_devices_owner(devs, n, THIS_MODULE)
extern struct platform_device *platform_device_register_simple(char *, unsigned int, struct resource *, unsigned int);
On Thu, 29 Nov 2007, Kay Sievers wrote:
> > In fact, if we were designing the kobject API from scratch, I'd suggest
> > making the ktype value an argument to kobject_init() so that it
> > _couldn't_ be omitted.
>
> Sounds fine, maybe we should also pass the name along, so it will be
> obvious what happens here:
> int kobject_init(struct kobject *kobj, struct kobj_type *type, const char *fmt, ...)
I don't know... Normally *_init() routines can't fail, but this could.
Then things like device_register() would run into trouble: The caller
wouldn't know whether a failure occurred before or after the
kobject_init() call, so it wouldn't know what sort of cleanup action
was needed: kfree() or device_put().
> Oh, if you want to rewind on error and have an initialized but still
> unregistered kobject, and just want to free the allocated name by
> calling kobject_cleanup() or kobject_put() you might not expect, that
> your whole object that embeds the kobject will be gone. Just something
> we need to document ...
When that sort of thing happens, the unwinding should be done by the
code responsible for whole object. For example, if device_add() fails
then the caller should go on to call device_put() rather than
kfree(dev).
That's how you would expect things to work in most cases. There aren't
many bare kobjects in the kernel.
I agree that documenting this behavior would be good.
Alan Stern
On Thu, 2007-11-29 at 12:06 -0500, Alan Stern wrote:
> On Thu, 29 Nov 2007, Kay Sievers wrote:
>
> > > In fact, if we were designing the kobject API from scratch, I'd suggest
> > > making the ktype value an argument to kobject_init() so that it
> > > _couldn't_ be omitted.
> >
> > Sounds fine, maybe we should also pass the name along, so it will be
> > obvious what happens here:
> > int kobject_init(struct kobject *kobj, struct kobj_type *type, const char *fmt, ...)
>
> I don't know... Normally *_init() routines can't fail, but this could.
> Then things like device_register() would run into trouble: The caller
> wouldn't know whether a failure occurred before or after the
> kobject_init() call, so it wouldn't know what sort of cleanup action
> was needed: kfree() or device_put().
But wouldn't device_register() do the kobject cleanup for you when it
fails? Why would a caller of device_register() care about the state of
the kobject?
> > Oh, if you want to rewind on error and have an initialized but still
> > unregistered kobject, and just want to free the allocated name by
> > calling kobject_cleanup() or kobject_put() you might not expect, that
> > your whole object that embeds the kobject will be gone. Just something
> > we need to document ...
>
> When that sort of thing happens, the unwinding should be done by the
> code responsible for whole object. For example, if device_add() fails
> then the caller should go on to call device_put() rather than
> kfree(dev).
>
> That's how you would expect things to work in most cases. There aren't
> many bare kobjects in the kernel.
>
> I agree that documenting this behavior would be good.
Ok, fine. Hope we will collect all that information in the end. :)
Kay
On Thu, 29 Nov 2007 11:55:43 -0500 (EST),
Alan Stern <[email protected]> wrote:
> On Thu, 29 Nov 2007, Cornelia Huck wrote:
>
> > > Unfortunately kobjects don't have an owner field. In practice this
> > > means that it isn't possible to pin the owner of some random kobject --
> > > you have to know where the kobject came from or what it's embedded in.
> > > All users of kobjects need to work this way.
> >
> > Yeah, that is what the-patchset-I-have-to-dig-around-for does:
> > Introduce an owner field in the kobject for pinning the module. (IIRC,
> > you even did some of the patches?)
>
> Are these two patches what you mean? They are a little out-of-date
> now, but the main idea hasn't changed.
Yes, thanks. I had also a module-reference-count part that I need to
resurrect.
On Thu, 29 Nov 2007, Kay Sievers wrote:
> > > Sounds fine, maybe we should also pass the name along, so it will be
> > > obvious what happens here:
> > > int kobject_init(struct kobject *kobj, struct kobj_type *type, const char *fmt, ...)
> >
> > I don't know... Normally *_init() routines can't fail, but this could.
> > Then things like device_register() would run into trouble: The caller
> > wouldn't know whether a failure occurred before or after the
> > kobject_init() call, so it wouldn't know what sort of cleanup action
> > was needed: kfree() or device_put().
>
> But wouldn't device_register() do the kobject cleanup for you when it
> fails? Why would a caller of device_register() care about the state of
> the kobject?
Let's say device_register() calls device_init(), which calls
kobject_init(), which fails. Then there's no cleanup to do --
device_register() returns -ENOMEM or some such code and the caller has
to do the kfree().
Now let's say device_register() calls device_init(), which succeeds,
and then calls device_add(), which fails. To recover properly,
somebody then has to call device_put(). That "somebody" can't be the
original caller -- according to the previous paragraph the original
caller won't do anything but kfree(). So the "somebody" has to be
device_register() itself.
But the device_put() will call kobject_put(), which will invoke the
device's cleanup routine, which will deallocate the structure. Now the
original caller gets an error code (perhaps -ENOMEM again) but must
_not_ call kfree().
So what should the original caller do when an error occurs?
Alan Stern
On Thu, 2007-11-29 at 13:04 -0500, Alan Stern wrote:
> On Thu, 29 Nov 2007, Kay Sievers wrote:
>
> > > > Sounds fine, maybe we should also pass the name along, so it will be
> > > > obvious what happens here:
> > > > int kobject_init(struct kobject *kobj, struct kobj_type *type, const char *fmt, ...)
> > >
> > > I don't know... Normally *_init() routines can't fail, but this could.
> > > Then things like device_register() would run into trouble: The caller
> > > wouldn't know whether a failure occurred before or after the
> > > kobject_init() call, so it wouldn't know what sort of cleanup action
> > > was needed: kfree() or device_put().
> >
> > But wouldn't device_register() do the kobject cleanup for you when it
> > fails? Why would a caller of device_register() care about the state of
> > the kobject?
>
> Let's say device_register() calls device_init(), which calls
> kobject_init(), which fails. Then there's no cleanup to do --
> device_register() returns -ENOMEM or some such code and the caller has
> to do the kfree().
>
> Now let's say device_register() calls device_init(), which succeeds,
> and then calls device_add(), which fails. To recover properly,
> somebody then has to call device_put(). That "somebody" can't be the
> original caller -- according to the previous paragraph the original
> caller won't do anything but kfree(). So the "somebody" has to be
> device_register() itself.
>
> But the device_put() will call kobject_put(), which will invoke the
> device's cleanup routine, which will deallocate the structure. Now the
> original caller gets an error code (perhaps -ENOMEM again) but must
> _not_ call kfree().
>
> So what should the original caller do when an error occurs?
Right, and that is not covered today, the current code just leaks the
allocated name.
Your error scenario confirmed my initial concern about suggesting
kobject_put() to clean up an initialized kobject.
We should probably make kobject_cleanup() free only the resources taken
by kobject_init(), and use kobject_cleanup() instead of kobject_put()?
Kay
On Thu, 29 Nov 2007, Kay Sievers wrote:
> Your error scenario confirmed my initial concern about suggesting
> kobject_put() to clean up an initialized kobject.
>
> We should probably make kobject_cleanup() free only the resources taken
> by kobject_init(), and use kobject_cleanup() instead of kobject_put()?
My conclusion is different. We should make kobject_init() not consume
any resources at all; just initialize various fields. That way it
would be okay to call either kfree() or kobject_put() on an initialized
kobject. And then when something like device_register() fails, the
caller would know the proper thing to do would be to call the put()
routine, always.
Of course, once the name has been assigned, only kobject_put() should
be used.
There's another good reason for not assigning the name in
kobject_init(): Code that uses kobjects (like the driver core) doesn't
set the name until later.
Alan Stern
On Thu, 2007-11-29 at 14:05 -0500, Alan Stern wrote:
> On Thu, 29 Nov 2007, Kay Sievers wrote:
>
> > Your error scenario confirmed my initial concern about suggesting
> > kobject_put() to clean up an initialized kobject.
> >
> > We should probably make kobject_cleanup() free only the resources taken
> > by kobject_init(), and use kobject_cleanup() instead of kobject_put()?
>
> My conclusion is different. We should make kobject_init() not consume
> any resources at all; just initialize various fields. That way it
> would be okay to call either kfree() or kobject_put() on an initialized
> kobject. And then when something like device_register() fails, the
> caller would know the proper thing to do would be to call the put()
> routine, always.
>
> Of course, once the name has been assigned, only kobject_put() should
> be used.
Now we just move the exactly the same problem from _init() to
_set_name(). To free the name of an unregistered we would need to call
_put() which free()'s the whole object again. :)
> There's another good reason for not assigning the name in
> kobject_init(): Code that uses kobjects (like the driver core) doesn't
> set the name until later.
That can be done at any stage, I guess. We will rip out the name in the
struct device anyway.
Kay
On Thu, 29 Nov 2007, Kay Sievers wrote:
> > My conclusion is different. We should make kobject_init() not consume
> > any resources at all; just initialize various fields. That way it
> > would be okay to call either kfree() or kobject_put() on an initialized
> > kobject. And then when something like device_register() fails, the
> > caller would know the proper thing to do would be to call the put()
> > routine, always.
> >
> > Of course, once the name has been assigned, only kobject_put() should
> > be used.
>
> Now we just move the exactly the same problem from _init() to
> _set_name(). To free the name of an unregistered we would need to call
> _put() which free()'s the whole object again. :)
I don't see that as a problem and it's not clear why you do.
It doesn't matter whether a kobject has been registered or not; once
it has been initialized you _should_ call kobject_put(). (Although
it's okay to call kfree() if the name hasn't been set yet.)
The same is true of larger objects. Once you have called
device_initialize(), you _should_ call device_put() (although it's okay
to call kfree()). Provided init routines don't consume resources, this
will work.
The only remaining problem is that somebody might set the name first
and then decide to abandon the object before calling kobject_init().
However this probably never happens anywhere.
> > There's another good reason for not assigning the name in
> > kobject_init(): Code that uses kobjects (like the driver core) doesn't
> > set the name until later.
>
> That can be done at any stage, I guess. We will rip out the name in the
> struct device anyway.
Are you also going to change all the places in the kernel where the
device name (.bus_id) isn't set until after device_initialize() has
been called?
Alan Stern
On Thu, 2007-11-29 at 15:09 -0500, Alan Stern wrote:
> On Thu, 29 Nov 2007, Kay Sievers wrote:
>
> > > My conclusion is different. We should make kobject_init() not consume
> > > any resources at all; just initialize various fields. That way it
> > > would be okay to call either kfree() or kobject_put() on an initialized
> > > kobject. And then when something like device_register() fails, the
> > > caller would know the proper thing to do would be to call the put()
> > > routine, always.
> > >
> > > Of course, once the name has been assigned, only kobject_put() should
> > > be used.
> >
> > Now we just move the exactly the same problem from _init() to
> > _set_name(). To free the name of an unregistered we would need to call
> > _put() which free()'s the whole object again. :)
>
> I don't see that as a problem and it's not clear why you do.
>
> It doesn't matter whether a kobject has been registered or not; once
> it has been initialized you _should_ call kobject_put(). (Although
> it's okay to call kfree() if the name hasn't been set yet.)
I just say, it's exactly the same problem and it does not really make a
difference that kobject_init() does not do anything, if we require
another function to be called before we can call kobject_add(). A
kobject without a name will not be valid, and we will need a way for an
easy cleanup of allocated resources I think.
> The same is true of larger objects. Once you have called
> device_initialize(), you _should_ call device_put() (although it's okay
> to call kfree()). Provided init routines don't consume resources, this
> will work.
>
> The only remaining problem is that somebody might set the name first
> and then decide to abandon the object before calling kobject_init().
> However this probably never happens anywhere.
What is the whole point of kobject_init() then? You can just do the same
stuff in kobject_add() if you require not to allocate anything there.
None of the initialized fields can be used before we have called
kobject_add(), right?
> > > There's another good reason for not assigning the name in
> > > kobject_init(): Code that uses kobjects (like the driver core) doesn't
> > > set the name until later.
> >
> > That can be done at any stage, I guess. We will rip out the name in the
> > struct device anyway.
>
> Are you also going to change all the places in the kernel where the
> device name (.bus_id) isn't set until after device_initialize() has
> been called?
Yes, I already have a patch that does that for all stuff that was needed
on my boxes.
Kay
On Thu, 2007-11-29 at 21:20 +0100, Kay Sievers wrote:
> On Thu, 2007-11-29 at 15:09 -0500, Alan Stern wrote:
> > On Thu, 29 Nov 2007, Kay Sievers wrote:
> > > > There's another good reason for not assigning the name in
> > > > kobject_init(): Code that uses kobjects (like the driver core) doesn't
> > > > set the name until later.
> > >
> > > That can be done at any stage, I guess. We will rip out the name in the
> > > struct device anyway.
> >
> > Are you also going to change all the places in the kernel where the
> > device name (.bus_id) isn't set until after device_initialize() has
> > been called?
>
> Yes, I already have a patch that does that for all stuff that was needed
> on my boxes.
To be exact, bus_id will be removed from the device, and the device name
will be set and retrieved directly from the kobject.
Kay
On Thu, Nov 29, 2007 at 10:39:07AM +0100, Cornelia Huck wrote:
> On Wed, 28 Nov 2007 22:11:39 -0800,
> Greg KH <[email protected]> wrote:
>
> > > > foo->kobj.kset = example_kset;
> > > > foo->kobj.ktype = &foo_ktype;
> > > >
> > > > /*
> > > > * Register the kobject with the kernel, all the default files will
> > > > * be created here and the uevent will be sent out. If we were to
> > > > * call kobject_init() and then kobject_add() we would be
> > > > * responsible for sending out the initial KOBJ_ADD uevent.
> > > > */
> > > > retval = kobject_register(&foo->kobj);
> > > > if (retval) {
> > > > kfree(foo);
> > >
> > > kobject_put(foo) is needed since it gets you through kobject_cleanup()
> > > where the name can be freed.
> >
> > No, kobject_register() should have handled that for us, right?
>
> kobject_register() doesn't do a kobject_put() if kobject_add() failed.
Crap. If I can't get this code right in an example, the API is messed
up. Time to take Kay seriously and start to revamp the basic kobject
api :)
thanks,
greg k-h
On Thu, Nov 29, 2007 at 05:21:34PM +0100, Cornelia Huck wrote:
> Kay Sievers <[email protected]> wrote:
> > int kobject_init(struct kobject *kobj, struct kobj_type *type, const char *fmt, ...)
>
> Agreed. Better don't hide too much.
Ok, how about this function. If it errors out, it is free to just call
kfree() on the kobject. Seems simple enough to me, any objections? If
not, I'll go through and fix up the whole tree...
thanks,
greg k-h
-----------------
/**
* kobject_init - initialize a kobject structure
* @kobj: pointer to the kobject to initialize
* @ktype: pointer to the ktype for this kobject.
* @fmt: the name of the kobject
*
* This function will properly initialize a kobject such that it can then
* be passed to the kobject_add() call.
*
* If the function returns an error, the memory allocated by the kobject
* can be safely freed, no other functions need to be called.
*/
int kobject_init(struct kobject *kobj, struct kobj_type *ktype, const char *fmt, ...)
{
va_list args;
int retval;
if (!kobj)
return -EINVAL;
if (!ktype)
return -EINVAL;
WARN_ON(atomic_read(&kobj->kref.refcount));
kref_init(&kobj->kref);
INIT_LIST_HEAD(&kobj->entry);
kobj->ktype = ktype;
va_start(args, fmt);
retval = kobject_set_name_vargs(kobj, fmt, args);
va_end(args);
return retval;
}
One minor documentation update:
> /**
> * kobject_init - initialize a kobject structure
> * @kobj: pointer to the kobject to initialize
> * @ktype: pointer to the ktype for this kobject.
> * @fmt: the name of the kobject
> *
> * This function will properly initialize a kobject such that it can then
> * be passed to the kobject_add() call.
> *
> * If the function returns an error, the memory allocated by the kobject
> * can be safely freed, no other functions need to be called.
If the function does not return an error, the only way the kobject's
memory can be properly freed is with a call to kobject_put(). kfree()
should never be called on a kobject after this function has been called,
execept from the ktype's release function.
> */
> int kobject_init(struct kobject *kobj, struct kobj_type *ktype, const char *fmt, ...)
> {
> va_list args;
> int retval;
>
> if (!kobj)
> return -EINVAL;
>
> if (!ktype)
> return -EINVAL;
>
> WARN_ON(atomic_read(&kobj->kref.refcount));
> kref_init(&kobj->kref);
> INIT_LIST_HEAD(&kobj->entry);
> kobj->ktype = ktype;
>
> va_start(args, fmt);
> retval = kobject_set_name_vargs(kobj, fmt, args);
> va_end(args);
>
> return retval;
> }
>
On Thu, 29 Nov 2007, Greg KH wrote:
> > > > kobject_put(foo) is needed since it gets you through kobject_cleanup()
> > > > where the name can be freed.
> > >
> > > No, kobject_register() should have handled that for us, right?
> >
> > kobject_register() doesn't do a kobject_put() if kobject_add() failed.
>
> Crap. If I can't get this code right in an example, the API is messed
> up. Time to take Kay seriously and start to revamp the basic kobject
> api :)
The rule is simple enough. After calling kobject_register() you should
always use kobject_put() -- even if kobject_register() failed.
In fact, after calling kobject_init() you should use kobject_put().
The first rule follows from this one, since kobject_register() calls
kobject_init() internally.
Alan Stern
On Thu, 29 Nov 2007, Greg KH wrote:
> Ok, how about this function. If it errors out, it is free to just call
> kfree() on the kobject. Seems simple enough to me, any objections? If
> not, I'll go through and fix up the whole tree...
Looks good, _provided_ you also get rid of kobject_register(). Which
you pretty much would have to do anyway, since it doesn't accept the
ktype or the name as an argument.
Alan Stern
On Thu, Nov 29, 2007 at 05:16:28PM -0500, Alan Stern wrote:
> On Thu, 29 Nov 2007, Greg KH wrote:
>
> > Ok, how about this function. If it errors out, it is free to just call
> > kfree() on the kobject. Seems simple enough to me, any objections? If
> > not, I'll go through and fix up the whole tree...
>
> Looks good, _provided_ you also get rid of kobject_register(). Which
> you pretty much would have to do anyway, since it doesn't accept the
> ktype or the name as an argument.
Yes, I would do that at the same time.
Well, not at the _exact_ same time, it might take a few patches to get
there :)
thanks for the review,
greg k-h
On Nov 30, 2007 6:11 AM, Alan Stern <[email protected]> wrote:
> On Thu, 29 Nov 2007, Greg KH wrote:
>
> > > > > kobject_put(foo) is needed since it gets you through kobject_cleanup()
> > > > > where the name can be freed.
> > > >
> > > > No, kobject_register() should have handled that for us, right?
> > >
> > > kobject_register() doesn't do a kobject_put() if kobject_add() failed.
> >
> > Crap. If I can't get this code right in an example, the API is messed
> > up. Time to take Kay seriously and start to revamp the basic kobject
> > api :)
>
> The rule is simple enough. After calling kobject_register() you should
> always use kobject_put() -- even if kobject_register() failed.
>
> In fact, after calling kobject_init() you should use kobject_put().
> The first rule follows from this one, since kobject_register() calls
> kobject_init() internally.
>
Hi,
The behavior is not very clear here, the root problem is that :
1. Should we call kobject_put so cleanup work can be done by refcount
touch zero or call kfree every time after kobject_register failed?
2. If kobject_put calling is true, should this be done in
kobject_register error handling codes or by hand after
kobject_register failed?
Regards
dave
> Alan Stern
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
On Fri, Nov 30, 2007 at 01:07:37PM +0800, Dave Young wrote:
> On Nov 30, 2007 6:11 AM, Alan Stern <[email protected]> wrote:
> > On Thu, 29 Nov 2007, Greg KH wrote:
> >
> > > > > > kobject_put(foo) is needed since it gets you through kobject_cleanup()
> > > > > > where the name can be freed.
> > > > >
> > > > > No, kobject_register() should have handled that for us, right?
> > > >
> > > > kobject_register() doesn't do a kobject_put() if kobject_add() failed.
> > >
> > > Crap. If I can't get this code right in an example, the API is messed
> > > up. Time to take Kay seriously and start to revamp the basic kobject
> > > api :)
> >
> > The rule is simple enough. After calling kobject_register() you should
> > always use kobject_put() -- even if kobject_register() failed.
> >
> > In fact, after calling kobject_init() you should use kobject_put().
> > The first rule follows from this one, since kobject_register() calls
> > kobject_init() internally.
> >
> Hi,
> The behavior is not very clear here, the root problem is that :
>
> 1. Should we call kobject_put so cleanup work can be done by refcount
> touch zero or call kfree every time after kobject_register failed?
>
> 2. If kobject_put calling is true, should this be done in
> kobject_register error handling codes or by hand after
> kobject_register failed?
>
IMO,I'd rather select kobject_put due to the kobj name should also be released.
After searching for kobject_register, I found one leaks as this issue in pktcdvd.
Signed-off-by: Dave Young <[email protected]>
---
drivers/block/pktcdvd.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff -upr linux/drivers/block/pktcdvd.c linux.new/drivers/block/pktcdvd.c
--- linux/drivers/block/pktcdvd.c 2007-11-30 13:13:44.000000000 +0800
+++ linux.new/drivers/block/pktcdvd.c 2007-11-30 13:24:08.000000000 +0800
@@ -117,8 +117,10 @@ static struct pktcdvd_kobj* pkt_kobj_cre
p->kobj.parent = parent;
p->kobj.ktype = ktype;
p->pd = pd;
- if (kobject_register(&p->kobj) != 0)
+ if (kobject_register(&p->kobj) != 0) {
+ kobject_put(&p->kobj);
return NULL;
+ }
return p;
}
/*
> Regards
> dave
> > Alan Stern
> >
> >
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
On Thu, Nov 29, 2007 at 05:11:35PM -0500, Alan Stern wrote:
> On Thu, 29 Nov 2007, Greg KH wrote:
>
> > > > > kobject_put(foo) is needed since it gets you through kobject_cleanup()
> > > > > where the name can be freed.
> > > >
> > > > No, kobject_register() should have handled that for us, right?
> > >
> > > kobject_register() doesn't do a kobject_put() if kobject_add() failed.
> >
> > Crap. If I can't get this code right in an example, the API is messed
> > up. Time to take Kay seriously and start to revamp the basic kobject
> > api :)
>
> The rule is simple enough. After calling kobject_register() you should
> always use kobject_put() -- even if kobject_register() failed.
Yes.
> In fact, after calling kobject_init() you should use kobject_put().
> The first rule follows from this one, since kobject_register() calls
> kobject_init() internally.
Yes, that makes sense, time to write it all down :)
thanks,
greg k-h
On Thu, 29 Nov 2007 21:26:40 +0100,
Kay Sievers <[email protected]> wrote:
> On Thu, 2007-11-29 at 21:20 +0100, Kay Sievers wrote:
> > On Thu, 2007-11-29 at 15:09 -0500, Alan Stern wrote:
> > > On Thu, 29 Nov 2007, Kay Sievers wrote:
>
> > > > > There's another good reason for not assigning the name in
> > > > > kobject_init(): Code that uses kobjects (like the driver core) doesn't
> > > > > set the name until later.
> > > >
> > > > That can be done at any stage, I guess. We will rip out the name in the
> > > > struct device anyway.
> > >
> > > Are you also going to change all the places in the kernel where the
> > > device name (.bus_id) isn't set until after device_initialize() has
> > > been called?
Hmpf, ccw does this as well...
> >
> > Yes, I already have a patch that does that for all stuff that was needed
> > on my boxes.
>
> To be exact, bus_id will be removed from the device, and the device name
> will be set and retrieved directly from the kobject.
I guess you'll have accessor functions for the bus id for use by higher
level code?
On Thu, 29 Nov 2007 13:54:55 -0800,
Greg KH <[email protected]> wrote:
> One minor documentation update:
>
> > /**
> > * kobject_init - initialize a kobject structure
> > * @kobj: pointer to the kobject to initialize
> > * @ktype: pointer to the ktype for this kobject.
> > * @fmt: the name of the kobject
> > *
> > * This function will properly initialize a kobject such that it can then
> > * be passed to the kobject_add() call.
> > *
> > * If the function returns an error, the memory allocated by the kobject
> > * can be safely freed, no other functions need to be called.
>
> If the function does not return an error, the only way the kobject's
> memory can be properly freed is with a call to kobject_put(). kfree()
> should never be called on a kobject after this function has been called,
> execept from the ktype's release function.
>
> > */
> > int kobject_init(struct kobject *kobj, struct kobj_type *ktype, const char *fmt, ...)
> > {
> > va_list args;
> > int retval;
> >
> > if (!kobj)
> > return -EINVAL;
> >
> > if (!ktype)
> > return -EINVAL;
> >
> > WARN_ON(atomic_read(&kobj->kref.refcount));
> > kref_init(&kobj->kref);
> > INIT_LIST_HEAD(&kobj->entry);
> > kobj->ktype = ktype;
> >
> > va_start(args, fmt);
> > retval = kobject_set_name_vargs(kobj, fmt, args);
> > va_end(args);
> >
> > return retval;
> > }
> >
Yes, this looks fine.
On Fri, 30 Nov 2007, Dave Young wrote:
> > The rule is simple enough. After calling kobject_register() you should
> > always use kobject_put() -- even if kobject_register() failed.
> >
> > In fact, after calling kobject_init() you should use kobject_put().
> > The first rule follows from this one, since kobject_register() calls
> > kobject_init() internally.
> >
> Hi,
> The behavior is not very clear here, the root problem is that :
>
> 1. Should we call kobject_put so cleanup work can be done by refcount
> touch zero or call kfree every time after kobject_register failed?
Call kobject_put so that the cleanup work will be done when the
refcount goes to 0. Don't call kfree.
> 2. If kobject_put calling is true, should this be done in
> kobject_register error handling codes or by hand after
> kobject_register failed?
It should be done by hand after kobject_register fails. Otherwise
kobject_register would end up dropping a reference that it never
acquired.
Alan Stern