2005-02-26 15:56:27

by Corey Minyard

[permalink] [raw]
Subject: [PATCH] New operation for kref to help avoid locks

Add a routine to kref that allows the kref_put() routine to be
unserialized even when the get routine attempts to kref_get()
an object without first holding a valid reference to it. This is
useful in situations where this happens multiple times without
freeing the object, as it will avoid having to do a lock/semaphore
except on the final kref_put().

This also adds some kref documentation to the Documentation
directory.

Signed-off-by: Corey Minyard <[email protected]>

Index: linux-2.6.11-rc4/include/linux/kref.h
===================================================================
--- linux-2.6.11-rc4.orig/include/linux/kref.h
+++ linux-2.6.11-rc4/include/linux/kref.h
@@ -28,5 +28,11 @@
void kref_get(struct kref *kref);
void kref_put(struct kref *kref, void (*release) (struct kref *kref));

+/* Get a kref if the count is not already zero. This can be used to
+ avoid a lock in the routine that calls kref_put(). Returns 1 if
+ successful or zero if the count was already zero. See
+ Documentation/kref.txt for details on how to use this. */
+int kref_get_with_check(struct kref *kref);
+
#endif /* __KERNEL__ */
#endif /* _KREF_H_ */
Index: linux-2.6.11-rc4/lib/kref.c
===================================================================
--- linux-2.6.11-rc4.orig/lib/kref.c
+++ linux-2.6.11-rc4/lib/kref.c
@@ -52,6 +52,21 @@
release(kref);
}

+/**
+ * kref_get_with_check - increment refcount if the refcount is not already 0.
+ * @kref: object.
+ */
+int kref_get_with_check(struct kref *kref)
+{
+ atomic_inc(&kref->refcount);
+ if (atomic_read(&kref->refcount) == 1) {
+ atomic_dec(&kref->refcount);
+ return 0;
+ }
+ return 1;
+}
+
EXPORT_SYMBOL(kref_init);
EXPORT_SYMBOL(kref_get);
EXPORT_SYMBOL(kref_put);
+EXPORT_SYMBOL(kref_get_with_check);
Index: linux-2.6.11-rc4/Documentation/kref.txt
===================================================================
--- /dev/null
+++ linux-2.6.11-rc4/Documentation/kref.txt
@@ -0,0 +1,274 @@
+krefs allow you to add reference counters to your objects. If you
+have objects that are used in multiple places and passed around, and
+you don't have refcounts, your code is almost certainly broken. If
+you want refcounts, krefs are the way to go.
+
+To use a kref, add a one to your data structures like:
+
+struct my_data
+{
+ .
+ .
+ struct kref refcount;
+ .
+ .
+};
+
+The kref can occur anywhere within the data structure.
+
+You must initialize the kref after you allocate it. To do this, call
+kref init as so:
+
+ struct my_data *data;
+
+ data = kmalloc(sizeof(*data), GFP_KERNEL);
+ if (!data)
+ return -ENOMEM;
+ kref_init(&data->refcount);
+
+This sets the refcount in the kref to 1.
+
+Once you have a refcount, you must follow the following rules:
+
+1) If you make a non-temporary copy of a pointer, especially if
+ it can be passed to another thread of execution, you must
+ increment the refcount with kref_get() before passing it off:
+ kref_get(&data->refcount);
+ If you already have a valid pointer to a kref-ed structure (the
+ refcount cannot go to zero) you may do this without a lock.
+
+2) When you are done with a pointer, you must call kref_put():
+ kref_put(&data->refcount, data_release);
+ If this is the last reference to the pointer, the release
+ routine will be called. If the code never tries to get
+ a valid pointer to a kref-ed structure without already
+ holding a valid pointer, it is safe to do this without
+ a lock.
+
+3) If the code attempts to gain a reference to a kref-ed structure
+ without already holding a valid pointer, it must serialize access
+ where a kref_put() cannot occur during the kref_get(), and the
+ structure must remain valid during the kref_get().
+
+For example, if you allocate some data and then pass it to another
+thread to process:
+
+void data_release(struct kref *ref)
+{
+ struct my_data *data = container_of(ref, struct my_data, refcount);
+ kfree(data);
+}
+
+void more_data_handling(void *cb_data)
+{
+ struct my_data *data = cb_data;
+ .
+ . do stuff with data here
+ .
+ kref_put(data, data_release);
+}
+
+void my_data_handler(void)
+{
+ int rv = 0;
+ struct my_data *data;
+ data = kmalloc(sizeof(*data), GFP_KERNEL);
+ if (!data)
+ return -ENOMEM;
+ kref_init(&data->refcount);
+
+ kref_get(&data->refcount);
+ rv = kthread_run(more_data_handling, data, "more_data_handling");
+ if (rv) {
+ kref_put(&data->refcount);
+ goto out;
+ }
+
+ .
+ . do stuff with data here
+ .
+ out:
+ kref_put(data, data_release);
+ return rv;
+}
+
+This way, it doesn't matter what order the two threads handle the
+data, the put handles knowing when the data is free and releasing it.
+The kref_get() does not require a lock, since we already have a valid
+pointer that we own a refcount for. The put needs no lock because
+nothing tries to get the data without already holding a pointer.
+
+Note that the "before" in rule 1 is very important. You should never
+do something like:
+
+ rv = kthread_run(more_data_handling, data, "more_data_handling");
+ if (rv)
+ goto out;
+ else
+ /* BAD BAD BAD - get is after the handoff */
+ kref_get(&data->refcount);
+
+Don't assume you know what you are doing and use the above construct.
+First of all, you may not know what you are doing. Second, you may
+know what you are doing (there are some situations where locking is
+involved where the above may be legal) but someone else who doesn't
+know what they are doing may change the code or copy the code. It's
+bad style. Don't do it.
+
+There are some situations where you can optimize the gets and puts.
+For instance, if you are done with an object and enqueuing it for
+something else or passing it off to something else, there is no reason
+to do a get then a put:
+
+ /* Silly extra get and put */
+ kref_get(&obj->ref);
+ enqueue(obj);
+ kref_put(&obj->ref, obj_cleanup);
+
+Just do the enqueue. A comment about this is always welcome:
+
+ enqueue(obj);
+ /* We are done with obj, so we pass our refcount off
+ to the queue. DON'T TOUCH obj AFTER HERE! */
+
+The last rule (rule 3) is the nastiest one to handle. Say, for
+instance, you have a list of items that are each kref-ed, and you wish
+to get the first one. You can't just pull the first item off the list
+and kref_get() it. That violates rule 3 because you are not already
+holding a valid pointer. You must add locks or semaphores. For
+instance:
+
+static DECLARE_MUTEX(sem);
+static LIST_HEAD(q);
+struct my_data
+{
+ struct kref refcount;
+ struct list_head link;
+};
+
+static struct my_data *get_entry()
+{
+ struct my_data *entry = NULL;
+ down(&sem);
+ if (!list_empty(&q)) {
+ entry = container_of(q.next, struct my_q_entry, link);
+ kref_get(&entry->refcount);
+ }
+ up(&sem);
+ return entry;
+}
+
+static void put_entry(struct my_data *entry)
+{
+ down(&sem);
+ kref_put(&entry->refcount);
+ list_del(&entry->link);
+ up(&sem);
+}
+
+
+There is one exception to rule 3. kref_get_with_check() can be used
+to avoid a lock/semaphore in your routines that call kref_put(), except
+in the case that release is called. It is useful in situations where
+you might be trying to get an object that might have just gone free,
+as in our list example above.
+
+For instance, suppose you have a queue of items to process in your
+driver. The head of the queue is the current item being worked on.
+You driver has timers, interrupts, and user context threads that may
+all attempt to claim the head of the queue.
+
+So you define three routines, get_working_entry(), put_entry(), and
+entry_completed(), one to get the current queue entry that is being
+worked on, one to tell the system you are done working on that
+entry for now, and one to tell that the entry is completed and
+should be dequeued. Here is some example code:
+
+static LIST_HEAD(q);
+DEFINE_SPINLOCK(q_lock);
+
+struct my_q_entry
+{
+ struct list_head link;
+ struct kref refcount;
+ char data[10];
+ void (*done)(struct my_q_entry *);
+};
+
+struct my_q_entry *get_working_entry(void)
+{
+ unsigned long flags;
+ struct my_q_entry *entry = NULL;
+
+ spin_lock_irqsave(&q_lock, flags);
+ retry:
+ if (!list_empty(&q)) {
+ entry = container_of(q.next, struct my_q_entry, link);
+ if (!kref_get_with_check(&entry->refcount)) {
+ /* Entry's refcount has been deleted, but it
+ * has not yet been removed from the list. We
+ * raced with put_entry and are between the
+ * kref_put() decrementing the count and the
+ * release routine claiming the lock. Just
+ * remove it and retry. */
+ list_del(&entry->link);
+ entry = NULL;
+ goto retry;
+ }
+ }
+ spin_lock_irqrestore(&q_lock, flags);
+ return entry;
+}
+
+void release_entry(struct kref *ref)
+{
+ unsigned long flags;
+ struct my_q_entry *entry = container_of(ref, struct my_q_entry,
+ refcount);
+
+ spin_lock_irqsave(&q_lock, flags);
+ list_del(&entry->link);
+ spin_lock_irqrestore(&q_lock, flags);
+ entry->done(entry);
+
+ entry = get_working_entry();
+ if (entry) {
+ /* Start next entry here. */
+ }
+ put_entry(entry)
+}
+
+void put_entry(struct my_q_entry *entry)
+{
+ kref_put(&entry->refcount, release_entry);
+}
+
+void entry_completed(struct my_q_entry *entry)
+{
+ kref_put(&entry->refcount, release_entry);
+}
+
+
+Note that there is no lock around kref_put(), which should avoid a
+lock on most put operations, except for the last one. This works
+because if kref_put() sets the refcount to zero, one and only one
+thread of execution can be doing the kref_get(). If the kref_get()
+beats the kref_put(), everything is fine, the get routine has a
+valid entry. If the kref_put() beats the kref_get(), the kref
+value will be zero and kref_get_with_check() will fail, but the
+release_entry() code will block on the spinlock. So the entry will
+be dequeued and everything still works.
+
+This is generally only useful when you expect do have this happen many
+times for a single object. If 99% of the time you have one get and
+one put, it's probably not worth the complexity. If you generally
+have 20 gets and puts, it saves 19 locking operations, which is
+probably worth it.
+
+Other than this, normal kref rules apply. You can do a kref_get()
+without a lock if you are holding a valid kref pointer.
+
+
+Corey Minyard <[email protected]>
+
+A lot of this was lifted from Greg KH's OLS presentation on krefs.


Attachments:
kref_checked.diff (10.19 kB)

2005-02-26 20:22:22

by Sergey Vlasov

[permalink] [raw]
Subject: Re: [PATCH] New operation for kref to help avoid locks

On Sat, 26 Feb 2005 09:55:41 -0600 Corey Minyard wrote:

> Greg,
>
> This is the patch for krefs that we talked about. If you don't like it
> but like the docs, feel free just to take the documentation and cut out
> the stuff at the end about the new operation.

See below for comments to the patch.

>
> Thanks,
>
> -Corey
>
>
> [kref_checked.diff text/plain (10576 bytes)]
> Add a routine to kref that allows the kref_put() routine to be
> unserialized even when the get routine attempts to kref_get()
> an object without first holding a valid reference to it. This is
> useful in situations where this happens multiple times without
> freeing the object, as it will avoid having to do a lock/semaphore
> except on the final kref_put().
>
> This also adds some kref documentation to the Documentation
> directory.
>
> Signed-off-by: Corey Minyard <[email protected]>
>
> Index: linux-2.6.11-rc4/include/linux/kref.h
> ===================================================================
> --- linux-2.6.11-rc4.orig/include/linux/kref.h
> +++ linux-2.6.11-rc4/include/linux/kref.h
> @@ -28,5 +28,11 @@
> void kref_get(struct kref *kref);
> void kref_put(struct kref *kref, void (*release) (struct kref *kref));
>
> +/* Get a kref if the count is not already zero. This can be used to
> + avoid a lock in the routine that calls kref_put(). Returns 1 if
> + successful or zero if the count was already zero. See
> + Documentation/kref.txt for details on how to use this. */
> +int kref_get_with_check(struct kref *kref);
> +
> #endif /* __KERNEL__ */
> #endif /* _KREF_H_ */
> Index: linux-2.6.11-rc4/lib/kref.c
> ===================================================================
> --- linux-2.6.11-rc4.orig/lib/kref.c
> +++ linux-2.6.11-rc4/lib/kref.c
> @@ -52,6 +52,21 @@
> release(kref);
> }
>
> +/**
> + * kref_get_with_check - increment refcount if the refcount is not already 0.
> + * @kref: object.
> + */
> +int kref_get_with_check(struct kref *kref)
> +{
> + atomic_inc(&kref->refcount);
> + if (atomic_read(&kref->refcount) == 1) {
> + atomic_dec(&kref->refcount);
> + return 0;
> + }
> + return 1;
> +}

Consider this situation:

initially refcount is 1

CPU1 does kref_get_with_check() CPU2 does kref_put()

atomic_inc(&kref->refcount)
(increments refcount to 2)

atomic_dec_and_test(&kref->refcount)
(decrements refcount to 1)

(kref_put() does nothing more
because the refcount did not
reach zero)

atomic_read(&kref->refcount)
(finds refcount == 1 and decides
that the entry was being deleted)

atomic_dec(&kref->refcount);
(decrements refcount to 0)

(then proceeds thinking that the
entry was about to be finalized,
thus creating a leak)

Because of this, separate atomic_inc+atomic_read sequence is unsafe -
you need atomic_inc_return() here instead:

int kref_get_with_check(struct kref *kref)
{
if (atomic_inc_return(&kref->refcount) == 1) {
atomic_dec(&kref->refcount);
return 0;
}
return 1;
}

> +
> EXPORT_SYMBOL(kref_init);
> EXPORT_SYMBOL(kref_get);
> EXPORT_SYMBOL(kref_put);
> +EXPORT_SYMBOL(kref_get_with_check);
> Index: linux-2.6.11-rc4/Documentation/kref.txt
> ===================================================================
> --- /dev/null
> +++ linux-2.6.11-rc4/Documentation/kref.txt
> @@ -0,0 +1,274 @@
> +krefs allow you to add reference counters to your objects. If you
> +have objects that are used in multiple places and passed around, and
> +you don't have refcounts, your code is almost certainly broken. If
> +you want refcounts, krefs are the way to go.
> +
> +To use a kref, add a one to your data structures like:
> +
> +struct my_data
> +{
> + .
> + .
> + struct kref refcount;
> + .
> + .
> +};
> +
> +The kref can occur anywhere within the data structure.
> +
> +You must initialize the kref after you allocate it. To do this, call
> +kref init as so:
> +
> + struct my_data *data;
> +
> + data = kmalloc(sizeof(*data), GFP_KERNEL);
> + if (!data)
> + return -ENOMEM;
> + kref_init(&data->refcount);
> +
> +This sets the refcount in the kref to 1.
> +
> +Once you have a refcount, you must follow the following rules:
> +
> +1) If you make a non-temporary copy of a pointer, especially if
> + it can be passed to another thread of execution, you must
> + increment the refcount with kref_get() before passing it off:
> + kref_get(&data->refcount);
> + If you already have a valid pointer to a kref-ed structure (the
> + refcount cannot go to zero) you may do this without a lock.
> +
> +2) When you are done with a pointer, you must call kref_put():
> + kref_put(&data->refcount, data_release);
> + If this is the last reference to the pointer, the release
> + routine will be called. If the code never tries to get
> + a valid pointer to a kref-ed structure without already
> + holding a valid pointer, it is safe to do this without
> + a lock.
> +
> +3) If the code attempts to gain a reference to a kref-ed structure
> + without already holding a valid pointer, it must serialize access
> + where a kref_put() cannot occur during the kref_get(), and the
> + structure must remain valid during the kref_get().
> +
> +For example, if you allocate some data and then pass it to another
> +thread to process:
> +
> +void data_release(struct kref *ref)
> +{
> + struct my_data *data = container_of(ref, struct my_data, refcount);
> + kfree(data);
> +}
> +
> +void more_data_handling(void *cb_data)
> +{
> + struct my_data *data = cb_data;
> + .
> + . do stuff with data here
> + .
> + kref_put(data, data_release);
> +}
> +
> +void my_data_handler(void)

Should be int - you return something from it below.

> +{
> + int rv = 0;
> + struct my_data *data;
> + data = kmalloc(sizeof(*data), GFP_KERNEL);
> + if (!data)
> + return -ENOMEM;
> + kref_init(&data->refcount);
> +
> + kref_get(&data->refcount);
> + rv = kthread_run(more_data_handling, data, "more_data_handling");
> + if (rv) {

This code is wrong - kthread_run() returns a "struct task_struct *"
(which might be ERR_PTR(-ENOMEM) if it fails).

> + kref_put(&data->refcount);
> + goto out;
> + }
> +
> + .
> + . do stuff with data here
> + .
> + out:
> + kref_put(data, data_release);
> + return rv;
> +}
> +
> +This way, it doesn't matter what order the two threads handle the
> +data, the put handles knowing when the data is free and releasing it.
> +The kref_get() does not require a lock, since we already have a valid
> +pointer that we own a refcount for. The put needs no lock because
> +nothing tries to get the data without already holding a pointer.
> +
> +Note that the "before" in rule 1 is very important. You should never
> +do something like:
> +
> + rv = kthread_run(more_data_handling, data, "more_data_handling");
> + if (rv)
> + goto out;
> + else
> + /* BAD BAD BAD - get is after the handoff */
> + kref_get(&data->refcount);
> +
> +Don't assume you know what you are doing and use the above construct.
> +First of all, you may not know what you are doing. Second, you may
> +know what you are doing (there are some situations where locking is
> +involved where the above may be legal) but someone else who doesn't
> +know what they are doing may change the code or copy the code. It's
> +bad style. Don't do it.
> +
> +There are some situations where you can optimize the gets and puts.
> +For instance, if you are done with an object and enqueuing it for
> +something else or passing it off to something else, there is no reason
> +to do a get then a put:
> +
> + /* Silly extra get and put */
> + kref_get(&obj->ref);
> + enqueue(obj);
> + kref_put(&obj->ref, obj_cleanup);
> +
> +Just do the enqueue. A comment about this is always welcome:
> +
> + enqueue(obj);
> + /* We are done with obj, so we pass our refcount off
> + to the queue. DON'T TOUCH obj AFTER HERE! */
> +
> +The last rule (rule 3) is the nastiest one to handle. Say, for
> +instance, you have a list of items that are each kref-ed, and you wish
> +to get the first one. You can't just pull the first item off the list
> +and kref_get() it. That violates rule 3 because you are not already
> +holding a valid pointer. You must add locks or semaphores. For
> +instance:
> +
> +static DECLARE_MUTEX(sem);
> +static LIST_HEAD(q);
> +struct my_data
> +{
> + struct kref refcount;
> + struct list_head link;
> +};
> +
> +static struct my_data *get_entry()
> +{
> + struct my_data *entry = NULL;
> + down(&sem);
> + if (!list_empty(&q)) {
> + entry = container_of(q.next, struct my_q_entry, link);
> + kref_get(&entry->refcount);
> + }
> + up(&sem);
> + return entry;
> +}
> +
> +static void put_entry(struct my_data *entry)
> +{
> + down(&sem);
> + kref_put(&entry->refcount);
> + list_del(&entry->link);

1) kref_put(&entry->refcount, release_entry);
2) Wrong order for kref_put() and list_del() - should be opposite.
3) In this case you could safely up(&sem) after list_del(&entry->link),
because get_entry() cannot get to this entry anymore, and kref_put()
will ensure that only one invocation will actually call the release
function. Probably the example where the call to list_del() is inside
the release_entry callback (which was omitted here) will illustrate the
rule 3 better.

static void release_entry(struct kref *ref)
{
struct my_data *entry = container_of(ref, struct my_data, refcount);

list_del(&entry->link);
kfree(entry);
}

static void put_entry(struct my_data *entry)
{
down(&sem);
kref_put(&entry->refcount, release_entry);
up(&sem);
}

> + up(&sem);
> +}
> +
> +
> +There is one exception to rule 3. kref_get_with_check() can be used
> +to avoid a lock/semaphore in your routines that call kref_put(), except
> +in the case that release is called. It is useful in situations where
> +you might be trying to get an object that might have just gone free,
> +as in our list example above.
> +
> +For instance, suppose you have a queue of items to process in your
> +driver. The head of the queue is the current item being worked on.
> +You driver has timers, interrupts, and user context threads that may
> +all attempt to claim the head of the queue.
> +
> +So you define three routines, get_working_entry(), put_entry(), and
> +entry_completed(), one to get the current queue entry that is being
> +worked on, one to tell the system you are done working on that
> +entry for now, and one to tell that the entry is completed and
> +should be dequeued. Here is some example code:
> +
> +static LIST_HEAD(q);
> +DEFINE_SPINLOCK(q_lock);
> +
> +struct my_q_entry
> +{
> + struct list_head link;
> + struct kref refcount;
> + char data[10];
> + void (*done)(struct my_q_entry *);
> +};
> +
> +struct my_q_entry *get_working_entry(void)
> +{
> + unsigned long flags;
> + struct my_q_entry *entry = NULL;
> +
> + spin_lock_irqsave(&q_lock, flags);
> + retry:
> + if (!list_empty(&q)) {
> + entry = container_of(q.next, struct my_q_entry, link);
> + if (!kref_get_with_check(&entry->refcount)) {
> + /* Entry's refcount has been deleted, but it
> + * has not yet been removed from the list. We
> + * raced with put_entry and are between the
> + * kref_put() decrementing the count and the
> + * release routine claiming the lock. Just
> + * remove it and retry. */
> + list_del(&entry->link);

This must be list_del_init(&entry->link) - otherwise release_entry()
will oops on its list_del() call.

> + entry = NULL;
> + goto retry;
> + }
> + }
> + spin_lock_irqrestore(&q_lock, flags);
> + return entry;
> +}
> +
> +void release_entry(struct kref *ref)
> +{
> + unsigned long flags;
> + struct my_q_entry *entry = container_of(ref, struct my_q_entry,
> + refcount);
> +
> + spin_lock_irqsave(&q_lock, flags);
> + list_del(&entry->link);
> + spin_lock_irqrestore(&q_lock, flags);
> + entry->done(entry);
> +
> + entry = get_working_entry();
> + if (entry) {
> + /* Start next entry here. */
> + }
> + put_entry(entry)

Obviously this call must be inside the if branch, because your example
put_entry() implementation does not accept NULL.

> +}
> +
> +void put_entry(struct my_q_entry *entry)
> +{
> + kref_put(&entry->refcount, release_entry);
> +}
> +
> +void entry_completed(struct my_q_entry *entry)
> +{
> + kref_put(&entry->refcount, release_entry);
> +}
> +
> +
> +Note that there is no lock around kref_put(), which should avoid a
> +lock on most put operations, except for the last one. This works
> +because if kref_put() sets the refcount to zero, one and only one
> +thread of execution can be doing the kref_get(). If the kref_get()
> +beats the kref_put(), everything is fine, the get routine has a
> +valid entry. If the kref_put() beats the kref_get(), the kref
> +value will be zero and kref_get_with_check() will fail, but the
> +release_entry() code will block on the spinlock. So the entry will
> +be dequeued and everything still works.
> +
> +This is generally only useful when you expect do have this happen many
> +times for a single object. If 99% of the time you have one get and
> +one put, it's probably not worth the complexity. If you generally
> +have 20 gets and puts, it saves 19 locking operations, which is
> +probably worth it.
> +
> +Other than this, normal kref rules apply. You can do a kref_get()
> +without a lock if you are holding a valid kref pointer.
> +
> +
> +Corey Minyard <[email protected]>
> +
> +A lot of this was lifted from Greg KH's OLS presentation on krefs.


Attachments:
(No filename) (13.29 kB)
(No filename) (189.00 B)
Download all attachments

2005-02-26 22:23:51

by Corey Minyard

[permalink] [raw]
Subject: Re: [PATCH] New operation for kref to help avoid locks

Add a routine to kref that allows the kref_put() routine to be
unserialized even when the get routine attempts to kref_get()
an object without first holding a valid reference to it. This is
useful in situations where this happens multiple times without
freeing the object, as it will avoid having to do a lock/semaphore
except on the final kref_put().

This also adds some kref documentation to the Documentation
directory.

Signed-off-by: Corey Minyard <[email protected]>

Index: linux-2.6.11-rc4/include/linux/kref.h
===================================================================
--- linux-2.6.11-rc4.orig/include/linux/kref.h
+++ linux-2.6.11-rc4/include/linux/kref.h
@@ -28,5 +28,11 @@
void kref_get(struct kref *kref);
void kref_put(struct kref *kref, void (*release) (struct kref *kref));

+/* Get a kref if the count is not already zero. This can be used to
+ avoid a lock in the routine that calls kref_put(). Returns 1 if
+ successful or zero if the count was already zero. See
+ Documentation/kref.txt for details on how to use this. */
+int kref_get_with_check(struct kref *kref);
+
#endif /* __KERNEL__ */
#endif /* _KREF_H_ */
Index: linux-2.6.11-rc4/lib/kref.c
===================================================================
--- linux-2.6.11-rc4.orig/lib/kref.c
+++ linux-2.6.11-rc4/lib/kref.c
@@ -52,6 +52,20 @@
release(kref);
}

+/**
+ * kref_get_with_check - increment refcount if the refcount is not already 0.
+ * @kref: object.
+ */
+int kref_get_with_check(struct kref *kref)
+{
+ if (atomic_inc_return(&kref->refcount) == 1) {
+ atomic_dec(&kref->refcount);
+ return 0;
+ }
+ return 1;
+}
+
EXPORT_SYMBOL(kref_init);
EXPORT_SYMBOL(kref_get);
EXPORT_SYMBOL(kref_put);
+EXPORT_SYMBOL(kref_get_with_check);
Index: linux-2.6.11-rc4/Documentation/kref.txt
===================================================================
--- /dev/null
+++ linux-2.6.11-rc4/Documentation/kref.txt
@@ -0,0 +1,283 @@
+krefs allow you to add reference counters to your objects. If you
+have objects that are used in multiple places and passed around, and
+you don't have refcounts, your code is almost certainly broken. If
+you want refcounts, krefs are the way to go.
+
+To use a kref, add a one to your data structures like:
+
+struct my_data
+{
+ .
+ .
+ struct kref refcount;
+ .
+ .
+};
+
+The kref can occur anywhere within the data structure.
+
+You must initialize the kref after you allocate it. To do this, call
+kref init as so:
+
+ struct my_data *data;
+
+ data = kmalloc(sizeof(*data), GFP_KERNEL);
+ if (!data)
+ return -ENOMEM;
+ kref_init(&data->refcount);
+
+This sets the refcount in the kref to 1.
+
+Once you have a refcount, you must follow the following rules:
+
+1) If you make a non-temporary copy of a pointer, especially if
+ it can be passed to another thread of execution, you must
+ increment the refcount with kref_get() before passing it off:
+ kref_get(&data->refcount);
+ If you already have a valid pointer to a kref-ed structure (the
+ refcount cannot go to zero) you may do this without a lock.
+
+2) When you are done with a pointer, you must call kref_put():
+ kref_put(&data->refcount, data_release);
+ If this is the last reference to the pointer, the release
+ routine will be called. If the code never tries to get
+ a valid pointer to a kref-ed structure without already
+ holding a valid pointer, it is safe to do this without
+ a lock.
+
+3) If the code attempts to gain a reference to a kref-ed structure
+ without already holding a valid pointer, it must serialize access
+ where a kref_put() cannot occur during the kref_get(), and the
+ structure must remain valid during the kref_get().
+
+For example, if you allocate some data and then pass it to another
+thread to process:
+
+void data_release(struct kref *ref)
+{
+ struct my_data *data = container_of(ref, struct my_data, refcount);
+ kfree(data);
+}
+
+void more_data_handling(void *cb_data)
+{
+ struct my_data *data = cb_data;
+ .
+ . do stuff with data here
+ .
+ kref_put(data, data_release);
+}
+
+int my_data_handler(void)
+{
+ int rv = 0;
+ struct my_data *data;
+ struct task_struct *task;
+ data = kmalloc(sizeof(*data), GFP_KERNEL);
+ if (!data)
+ return -ENOMEM;
+ kref_init(&data->refcount);
+
+ kref_get(&data->refcount);
+ task = kthread_run(more_data_handling, data, "more_data_handling");
+ if (task == ERR_PTR(-ENOMEM)) {
+ rv = -ENOMEM;
+ kref_put(&data->refcount);
+ goto out;
+ }
+
+ .
+ . do stuff with data here
+ .
+ out:
+ kref_put(data, data_release);
+ return rv;
+}
+
+This way, it doesn't matter what order the two threads handle the
+data, the put handles knowing when the data is free and releasing it.
+The kref_get() does not require a lock, since we already have a valid
+pointer that we own a refcount for. The put needs no lock because
+nothing tries to get the data without already holding a pointer.
+
+Note that the "before" in rule 1 is very important. You should never
+do something like:
+
+ rv = kthread_run(more_data_handling, data, "more_data_handling");
+ if (rv)
+ goto out;
+ else
+ /* BAD BAD BAD - get is after the handoff */
+ kref_get(&data->refcount);
+
+Don't assume you know what you are doing and use the above construct.
+First of all, you may not know what you are doing. Second, you may
+know what you are doing (there are some situations where locking is
+involved where the above may be legal) but someone else who doesn't
+know what they are doing may change the code or copy the code. It's
+bad style. Don't do it.
+
+There are some situations where you can optimize the gets and puts.
+For instance, if you are done with an object and enqueuing it for
+something else or passing it off to something else, there is no reason
+to do a get then a put:
+
+ /* Silly extra get and put */
+ kref_get(&obj->ref);
+ enqueue(obj);
+ kref_put(&obj->ref, obj_cleanup);
+
+Just do the enqueue. A comment about this is always welcome:
+
+ enqueue(obj);
+ /* We are done with obj, so we pass our refcount off
+ to the queue. DON'T TOUCH obj AFTER HERE! */
+
+The last rule (rule 3) is the nastiest one to handle. Say, for
+instance, you have a list of items that are each kref-ed, and you wish
+to get the first one. You can't just pull the first item off the list
+and kref_get() it. That violates rule 3 because you are not already
+holding a valid pointer. You must add locks or semaphores. For
+instance:
+
+static DECLARE_MUTEX(sem);
+static LIST_HEAD(q);
+struct my_data
+{
+ struct kref refcount;
+ struct list_head link;
+};
+
+static struct my_data *get_entry()
+{
+ struct my_data *entry = NULL;
+ down(&sem);
+ if (!list_empty(&q)) {
+ entry = container_of(q.next, struct my_q_entry, link);
+ kref_get(&entry->refcount);
+ }
+ up(&sem);
+ return entry;
+}
+
+static void release_entry(struct kref *ref)
+{
+ struct my_data *entry = container_of(ref, struct my_data, refcount);
+
+ list_del(&entry->link);
+ kfree(entry);
+}
+
+static void put_entry(struct my_data *entry)
+{
+ down(&sem);
+ kref_put(&entry->refcount, release_entry);
+ up(&sem);
+}
+
+
+There is one exception to rule 3. kref_get_with_check() can be used
+to avoid a lock/semaphore in your routines that call kref_put(), except
+in the case that release is called. It is useful in situations where
+you might be trying to get an object that might have just gone free,
+as in our list example above.
+
+For instance, suppose you have a queue of items to process in your
+driver. The head of the queue is the current item being worked on.
+You driver has timers, interrupts, and user context threads that may
+all attempt to claim the head of the queue.
+
+So you define three routines, get_working_entry(), put_entry(), and
+entry_completed(), one to get the current queue entry that is being
+worked on, one to tell the system you are done working on that
+entry for now, and one to tell that the entry is completed and
+should be dequeued. Here is some example code:
+
+static LIST_HEAD(q);
+DEFINE_SPINLOCK(q_lock);
+
+struct my_q_entry
+{
+ struct list_head link;
+ struct kref refcount;
+ char data[10];
+ void (*done)(struct my_q_entry *);
+};
+
+struct my_q_entry *get_working_entry(void)
+{
+ unsigned long flags;
+ struct my_q_entry *entry = NULL;
+
+ spin_lock_irqsave(&q_lock, flags);
+ retry:
+ if (!list_empty(&q)) {
+ entry = container_of(q.next, struct my_q_entry, link);
+ if (!kref_get_with_check(&entry->refcount)) {
+ /* Entry's refcount has been deleted, but it
+ * has not yet been removed from the list. We
+ * raced with put_entry and are between the
+ * kref_put() decrementing the count and the
+ * release routine claiming the lock. Just
+ * remove it and retry. */
+ list_del_init(&entry->link);
+ entry = NULL;
+ goto retry;
+ }
+ }
+ spin_lock_irqrestore(&q_lock, flags);
+ return entry;
+}
+
+void release_entry(struct kref *ref)
+{
+ unsigned long flags;
+ struct my_q_entry *entry = container_of(ref, struct my_q_entry,
+ refcount);
+
+ spin_lock_irqsave(&q_lock, flags);
+ list_del(&entry->link);
+ spin_lock_irqrestore(&q_lock, flags);
+ entry->done(entry);
+
+ entry = get_working_entry();
+ if (entry) {
+ /* Start next entry here. */
+ put_entry(entry)
+ }
+}
+
+void put_entry(struct my_q_entry *entry)
+{
+ kref_put(&entry->refcount, release_entry);
+}
+
+void entry_completed(struct my_q_entry *entry)
+{
+ kref_put(&entry->refcount, release_entry);
+}
+
+
+Note that there is no lock around kref_put(), which should avoid a
+lock on most put operations, except for the last one. This works
+because if kref_put() sets the refcount to zero, one and only one
+thread of execution can be doing the kref_get(). If the kref_get()
+beats the kref_put(), everything is fine, the get routine has a
+valid entry. If the kref_put() beats the kref_get(), the kref
+value will be zero and kref_get_with_check() will fail, but the
+release_entry() code will block on the spinlock. So the entry will
+be dequeued and everything still works.
+
+This is generally only useful when you expect do have this happen many
+times for a single object. If 99% of the time you have one get and
+one put, it's probably not worth the complexity. If you generally
+have 20 gets and puts, it saves 19 locking operations, which is
+probably worth it.
+
+Other than this, normal kref rules apply. You can do a kref_get()
+without a lock if you are holding a valid kref pointer.
+
+
+Corey Minyard <[email protected]>
+
+A lot of this was lifted from Greg KH's OLS presentation on krefs.


Attachments:
kref_checked.diff (10.39 kB)

2005-03-01 20:55:15

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] New operation for kref to help avoid locks

On Sat, Feb 26, 2005 at 04:23:04PM -0600, Corey Minyard wrote:
> Add a routine to kref that allows the kref_put() routine to be
> unserialized even when the get routine attempts to kref_get()
> an object without first holding a valid reference to it. This is
> useful in situations where this happens multiple times without
> freeing the object, as it will avoid having to do a lock/semaphore
> except on the final kref_put().
>
> This also adds some kref documentation to the Documentation
> directory.

I like the first part of the documentation, that's nice.

But I don't like the new kref_get_with_check() function that you
implemented. If you look in the -mm tree, kref_put() now returns if
this was the last put on the reference count or not, to help with lists
of objects with a kref in it.

Perhaps you can use that to implement what you need instead?

thanks,

greg k-h

2005-03-01 21:02:55

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] New operation for kref to help avoid locks

On Tue, 2005-03-01 at 12:15 -0800, Greg KH wrote:
> On Sat, Feb 26, 2005 at 04:23:04PM -0600, Corey Minyard wrote:
> > Add a routine to kref that allows the kref_put() routine to be
> > unserialized even when the get routine attempts to kref_get()
> > an object without first holding a valid reference to it. This is
> > useful in situations where this happens multiple times without
> > freeing the object, as it will avoid having to do a lock/semaphore
> > except on the final kref_put().
> >
> > This also adds some kref documentation to the Documentation
> > directory.
>
> I like the first part of the documentation, that's nice.
>
> But I don't like the new kref_get_with_check() function that you
> implemented. If you look in the -mm tree, kref_put() now returns if
> this was the last put on the reference count or not, to help with lists
> of objects with a kref in it.
>
> Perhaps you can use that to implement what you need instead?

note that I'm not convinced the "lockless" implementation actually is
faster. It still uses an atomic variable, which is just as expensive as
taking a lock normally...

2005-03-01 21:28:20

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] New operation for kref to help avoid locks

On Tue, Mar 01, 2005 at 10:02:43PM +0100, Arjan van de Ven wrote:
> On Tue, 2005-03-01 at 12:15 -0800, Greg KH wrote:
> > On Sat, Feb 26, 2005 at 04:23:04PM -0600, Corey Minyard wrote:
> > > Add a routine to kref that allows the kref_put() routine to be
> > > unserialized even when the get routine attempts to kref_get()
> > > an object without first holding a valid reference to it. This is
> > > useful in situations where this happens multiple times without
> > > freeing the object, as it will avoid having to do a lock/semaphore
> > > except on the final kref_put().
> > >
> > > This also adds some kref documentation to the Documentation
> > > directory.
> >
> > I like the first part of the documentation, that's nice.
> >
> > But I don't like the new kref_get_with_check() function that you
> > implemented. If you look in the -mm tree, kref_put() now returns if
> > this was the last put on the reference count or not, to help with lists
> > of objects with a kref in it.
> >
> > Perhaps you can use that to implement what you need instead?
>
> note that I'm not convinced the "lockless" implementation actually is
> faster. It still uses an atomic variable, which is just as expensive as
> taking a lock normally...

I have never stated it would be "faster" that I know of, and you still
need a lock to protect some of the paths. But that is documented in my
2004 ols paper about kref.

thanks,

greg k-h

2005-03-01 21:54:39

by Corey Minyard

[permalink] [raw]
Subject: Re: [PATCH] New operation for kref to help avoid locks

Arjan van de Ven wrote:

>On Tue, 2005-03-01 at 12:15 -0800, Greg KH wrote:
>
>
>>On Sat, Feb 26, 2005 at 04:23:04PM -0600, Corey Minyard wrote:
>>
>>
>>>Add a routine to kref that allows the kref_put() routine to be
>>>unserialized even when the get routine attempts to kref_get()
>>>an object without first holding a valid reference to it. This is
>>>useful in situations where this happens multiple times without
>>>freeing the object, as it will avoid having to do a lock/semaphore
>>>except on the final kref_put().
>>>
>>>This also adds some kref documentation to the Documentation
>>>directory.
>>>
>>>
>>I like the first part of the documentation, that's nice.
>>
>>But I don't like the new kref_get_with_check() function that you
>>implemented. If you look in the -mm tree, kref_put() now returns if
>>this was the last put on the reference count or not, to help with lists
>>of objects with a kref in it.
>>
>>Perhaps you can use that to implement what you need instead?
>>
>>
Yes, that helps a lot. I had actually already implemented something
like that :). But that's a different thing than avoiding the lock.

It's just that with the I2C stuff, you may be calling kref_put() 20-30
times for a single operation. That's a lot of lock/unlock operations.
But it is wierd, so I understand. Thanks.

>
>note that I'm not convinced the "lockless" implementation actually is
>faster. It still uses an atomic variable, which is just as expensive as
>taking a lock normally...
>
>
Just doing an atomic operation is not faster than doing a lock, an
atomic operation, then an unlock? Am I missing something?

-Corey

2005-03-01 22:14:25

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] New operation for kref to help avoid locks


> Just doing an atomic operation is not faster than doing a lock, an
> atomic operation, then an unlock? Am I missing something?

if the lock and the atomic are on the same cacheline they're the same
cost on most modern cpus...



2005-03-01 23:35:39

by Corey Minyard

[permalink] [raw]
Subject: Re: [PATCH] New operation for kref to help avoid locks

Arjan van de Ven wrote:

>>Just doing an atomic operation is not faster than doing a lock, an
>>atomic operation, then an unlock? Am I missing something?
>>
>>
>
>if the lock and the atomic are on the same cacheline they're the same
>cost on most modern cpus...
>
>
Ah, I see. Not likely to ever be the case with this. The lock will
likely be with the main data structure (the list, or whatever) and the
refcount will be in the individual item in the main data structure (list
entry).

-Corey

2005-03-02 00:02:42

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] New operation for kref to help avoid locks

Corey Minyard wrote:
> Arjan van de Ven wrote:
>
>>> Just doing an atomic operation is not faster than doing a lock, an
>>> atomic operation, then an unlock? Am I missing something?
>>>
>>
>>
>> if the lock and the atomic are on the same cacheline they're the same
>> cost on most modern cpus...
>>
>>
> Ah, I see. Not likely to ever be the case with this. The lock will
> likely be with the main data structure (the list, or whatever) and the
> refcount will be in the individual item in the main data structure (list
> entry).
>

Is get_with_check actually going to be useful for anything? It
seems like it promotes complex and potentially unsafe schemes.

eg. In your queue example, it would usually be better to have
a refcount for being on queue, and entry_completed would remove
the entry from the queue and accordingly drop the refcount. The
release function would then just free it.

2005-03-02 00:28:40

by Corey Minyard

[permalink] [raw]
Subject: Re: [PATCH] New operation for kref to help avoid locks

Nick Piggin wrote:

> Corey Minyard wrote:
>
>> Arjan van de Ven wrote:
>>
>>>> Just doing an atomic operation is not faster than doing a lock, an
>>>> atomic operation, then an unlock? Am I missing something?
>>>>
>>>
>>>
>>>
>>> if the lock and the atomic are on the same cacheline they're the same
>>> cost on most modern cpus...
>>>
>>>
>> Ah, I see. Not likely to ever be the case with this. The lock will
>> likely be with the main data structure (the list, or whatever) and
>> the refcount will be in the individual item in the main data
>> structure (list entry).
>>
>
> Is get_with_check actually going to be useful for anything? It
> seems like it promotes complex and potentially unsafe schemes.

It is certainly more complex to use this, and I'm guessing that's why
Greg rejected it. Certainly a valid problem.

>
> eg. In your queue example, it would usually be better to have
> a refcount for being on queue, and entry_completed would remove
> the entry from the queue and accordingly drop the refcount. The
> release function would then just free it.

True. But if things picked up entries of the queue and incremented
their refcount, then you would need a lock. The same technique would
apply. But your example would be the more common one, I would think.

Thanks,

-Corey

2005-03-02 00:34:53

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] New operation for kref to help avoid locks

Corey Minyard wrote:
> Nick Piggin wrote:

>> Is get_with_check actually going to be useful for anything? It
>> seems like it promotes complex and potentially unsafe schemes.
>
>
> It is certainly more complex to use this, and I'm guessing that's why
> Greg rejected it. Certainly a valid problem.
>
>>
>> eg. In your queue example, it would usually be better to have
>> a refcount for being on queue, and entry_completed would remove
>> the entry from the queue and accordingly drop the refcount. The
>> release function would then just free it.
>
>
> True. But if things picked up entries of the queue and incremented
> their refcount, then you would need a lock. The same technique would
> apply. But your example would be the more common one, I would think.
>

Well, but you take a lock in your system too, to protect the
queue (ie. in get_working_entry()).

Nick