2005-04-01 01:34:19

by Andrew Morton

[permalink] [raw]
Subject: cn_queue.c


> /*
> * cn_queue.c
> *
> * 2004 Copyright (c) Evgeniy Polyakov <[email protected]>
> * All rights reserved.
> *
> * This program is free software; you can redistribute it and/or modify
> * it under the terms of the GNU General Public License as published by
> * the Free Software Foundation; either version 2 of the License, or
> * (at your option) any later version.
> *
> * This program is distributed in the hope that it will be useful,
> * but WITHOUT ANY WARRANTY; without even the implied warranty of
> * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> * GNU General Public License for more details.
> *
> * You should have received a copy of the GNU General Public License
> * along with this program; if not, write to the Free Software
> * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
> *
> */

It's conventinal to add at the start of the file a description of what it
all does.

> static void cn_queue_wrapper(void *data)
> {
> struct cn_callback_entry *cbq = (struct cn_callback_entry *)data;
>
> smp_mb__before_atomic_inc();
> atomic_inc(&cbq->cb->refcnt);
> smp_mb__after_atomic_inc();
> cbq->cb->callback(cbq->cb->priv);
> smp_mb__before_atomic_inc();
> atomic_dec(&cbq->cb->refcnt);
> smp_mb__after_atomic_inc();
>
> cbq->destruct_data(cbq->ddata);
> }

What on earth are all these barriers for? Barriers should have an
associated comment describing why they were added, and what they are
defending against, becuase it is very hard to tell that from reading the
code.

Please describe (via code comments) what the refcounting rules are for
these data structures. It all looks very strange. You basically have:


atomic_inc(refcnt);
use(some_object);
atomic_dec(refcnt);

Which looks very racy. Some other CPU could see a zero refcount just
before this CPU did the atomic_inc() and the other CPU will go and free up
some_object. There should be locking associated with refcounting to
prevent such races, and I don't see any.

> static struct cn_callback_entry *cn_queue_alloc_callback_entry(struct cn_callback *cb)

80 cols again.

> static void cn_queue_free_callback(struct cn_callback_entry *cbq)
> {
> cancel_delayed_work(&cbq->work);
>
> while (atomic_read(&cbq->cb->refcnt)) {
> printk(KERN_INFO "Waiting for %s to become free: refcnt=%d.\n",
> cbq->pdev->name, atomic_read(&cbq->cb->refcnt));
>
> msleep_interruptible(1000);
>
> if (current->flags & PF_FREEZE)
> refrigerator(PF_FREEZE);
>
> if (signal_pending(current))
> flush_signals(current);
> }
>
> kfree(cbq);
> }

erp. Can you please do this properly?

Free the object on the refcount going to zero (with appropriate locking)?
If you want (for some reason) to wait until all users of an object have
gone away then you should take a ref on the object (inside locking), then
sleep on a waitqueue_head embedded in that object. Make all
refcount-droppers do a wake_up.

Why is this function playing with signals?

> int cn_cb_equal(struct cb_id *i1, struct cb_id *i2)
> {
> #if 0
> printk(KERN_INFO "%s: comparing %04x.%04x and %04x.%04x\n",
> __func__,
> i1->idx, i1->val,
> i2->idx, i2->val);
> #endif
> return ((i1->idx == i2->idx) && (i1->val == i2->val));
> }

Please review all functions, check that they really need kernel-wide scope.

Please comment all functions.

> int cn_queue_add_callback(struct cn_queue_dev *dev, struct cn_callback *cb)
> {
> struct cn_callback_entry *cbq, *n, *__cbq;
> int found = 0;
>
> cbq = cn_queue_alloc_callback_entry(cb);
> if (!cbq)
> return -ENOMEM;
>
> atomic_inc(&dev->refcnt);
> smp_mb__after_atomic_inc();
> cbq->pdev = dev;
>
> spin_lock_bh(&dev->queue_lock);
> list_for_each_entry_safe(__cbq, n, &dev->queue_list, callback_entry) {
> if (cn_cb_equal(&__cbq->cb->id, &cb->id)) {
> found = 1;
> break;
> }
> }
> if (!found) {
> atomic_set(&cbq->cb->refcnt, 1);
> list_add_tail(&cbq->callback_entry, &dev->queue_list);
> }
> spin_unlock_bh(&dev->queue_lock);

Why use spin_lock_bh()? Does none of this code ever get called from IRQ
context?

> if (found) {
> smp_mb__before_atomic_inc();
> atomic_dec(&dev->refcnt);
> smp_mb__after_atomic_inc();
> atomic_set(&cbq->cb->refcnt, 0);
> cn_queue_free_callback(cbq);
> return -EINVAL;
> }

More strange barriers. This practice seems to be unique to this part of the
kernel. That's probably a bad sign.


> struct cn_queue_dev *cn_queue_alloc_dev(char *name, struct sock *nls)
> {
> struct cn_queue_dev *dev;
>
> dev = kmalloc(sizeof(*dev), GFP_KERNEL);
> if (!dev) {
> printk(KERN_ERR "%s: Failed to allocte new struct cn_queue_dev.\n",
> name);
> return NULL;
> }
>
> memset(dev, 0, sizeof(*dev));
>
> snprintf(dev->name, sizeof(dev->name), "%s", name);

scnprintf?

> void cn_queue_free_dev(struct cn_queue_dev *dev)
> {
> struct cn_callback_entry *cbq, *n;
>
> flush_workqueue(dev->cn_queue);
> destroy_workqueue(dev->cn_queue);
>
> spin_lock_bh(&dev->queue_lock);
> list_for_each_entry_safe(cbq, n, &dev->queue_list, callback_entry) {
> list_del(&cbq->callback_entry);
> smp_mb__before_atomic_inc();
> atomic_dec(&cbq->cb->refcnt);
> smp_mb__after_atomic_inc();
> }
> spin_unlock_bh(&dev->queue_lock);
>
> while (atomic_read(&dev->refcnt)) {
> printk(KERN_INFO "Waiting for %s to become free: refcnt=%d.\n",
> dev->name, atomic_read(&dev->refcnt));
>
> msleep_interruptible(1000);
>
> if (current->flags & PF_FREEZE)
> refrigerator(PF_FREEZE);
>
> if (signal_pending(current))
> flush_signals(current);
> }
>
> memset(dev, 0, sizeof(*dev));
> kfree(dev);
> dev = NULL;
> }

Again, why is this code playing with signals? Hopefully it is never called
by user tasks - that would be very bad.

With proper refcounting and lifetime management and the use of the kthread
API we should be able to make all this go away.

Once we're trying to free up the controlling device, what prevents new
messages from being queued?


2005-04-01 07:34:55

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: cn_queue.c

On Thu, 2005-03-31 at 17:32 -0800, Andrew Morton wrote:
> > /*
> > * cn_queue.c
> > *
> > * 2004 Copyright (c) Evgeniy Polyakov <[email protected]>
> > * All rights reserved.
> > *
> > * This program is free software; you can redistribute it and/or modify
> > * it under the terms of the GNU General Public License as published by
> > * the Free Software Foundation; either version 2 of the License, or
> > * (at your option) any later version.
> > *
> > * This program is distributed in the hope that it will be useful,
> > * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > * GNU General Public License for more details.
> > *
> > * You should have received a copy of the GNU General Public License
> > * along with this program; if not, write to the Free Software
> > * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
> > *
> > */
>
> It's conventinal to add at the start of the file a description of what it
> all does.

Yep, I need to force myself to write better docs...

> > static void cn_queue_wrapper(void *data)
> > {
> > struct cn_callback_entry *cbq = (struct cn_callback_entry *)data;
> >
> > smp_mb__before_atomic_inc();
> > atomic_inc(&cbq->cb->refcnt);
> > smp_mb__after_atomic_inc();
> > cbq->cb->callback(cbq->cb->priv);
> > smp_mb__before_atomic_inc();
> > atomic_dec(&cbq->cb->refcnt);
> > smp_mb__after_atomic_inc();
> >
> > cbq->destruct_data(cbq->ddata);
> > }
>
> What on earth are all these barriers for? Barriers should have an
> associated comment describing why they were added, and what they are
> defending against, becuase it is very hard to tell that from reading the
> code.
>
> Please describe (via code comments) what the refcounting rules are for
> these data structures. It all looks very strange. You basically have:
>
>
> atomic_inc(refcnt);
> use(some_object);
> atomic_dec(refcnt);
>
> Which looks very racy. Some other CPU could see a zero refcount just
> before this CPU did the atomic_inc() and the other CPU will go and free up
> some_object. There should be locking associated with refcounting to
> prevent such races, and I don't see any.

It can not happen in the above case.
It can race with callback removing, but remove path
call cancel_delayed_work() which calls del_timer_sync(),
so above code can not be run or will not run,
only after it refcnt is being checked to be zero.
It is probably an overhead since sinchronization is done
in other places.
All above barriers are introduced to remove already not theoretical
atomic vs. non-atomic reordering [ppc64 very like it],
when reference counter must be decremented after object was used,
but due to reordering it can happen before use [and it will be seen
atomically on all CPUs],
and other CPU may free object based on knowledge that
refcnt is zero.

> > static struct cn_callback_entry *cn_queue_alloc_callback_entry(struct cn_callback *cb)
>
> 80 cols again.
>
> > static void cn_queue_free_callback(struct cn_callback_entry *cbq)
> > {
> > cancel_delayed_work(&cbq->work);
> >
> > while (atomic_read(&cbq->cb->refcnt)) {
> > printk(KERN_INFO "Waiting for %s to become free: refcnt=%d.\n",
> > cbq->pdev->name, atomic_read(&cbq->cb->refcnt));
> >
> > msleep_interruptible(1000);
> >
> > if (current->flags & PF_FREEZE)
> > refrigerator(PF_FREEZE);
> >
> > if (signal_pending(current))
> > flush_signals(current);
> > }
> >
> > kfree(cbq);
> > }
>
> erp. Can you please do this properly?
>
> Free the object on the refcount going to zero (with appropriate locking)?
> If you want (for some reason) to wait until all users of an object have
> gone away then you should take a ref on the object (inside locking), then
> sleep on a waitqueue_head embedded in that object. Make all
> refcount-droppers do a wake_up.
>
> Why is this function playing with signals?

It is not, just can be interrupted to check state befor timeout expires.
It is not a problem to remove signal interrupting here.

> > int cn_cb_equal(struct cb_id *i1, struct cb_id *i2)
> > {
> > #if 0
> > printk(KERN_INFO "%s: comparing %04x.%04x and %04x.%04x\n",
> > __func__,
> > i1->idx, i1->val,
> > i2->idx, i2->val);
> > #endif
> > return ((i1->idx == i2->idx) && (i1->val == i2->val));
> > }
>
> Please review all functions, check that they really need kernel-wide scope.

It is not exported, but is used in the other files which build
connector.
Only callback adding/removing and message sending functions are
exported.

> Please comment all functions.

Yep...

> > int cn_queue_add_callback(struct cn_queue_dev *dev, struct cn_callback *cb)
> > {
> > struct cn_callback_entry *cbq, *n, *__cbq;
> > int found = 0;
> >
> > cbq = cn_queue_alloc_callback_entry(cb);
> > if (!cbq)
> > return -ENOMEM;
> >
> > atomic_inc(&dev->refcnt);
> > smp_mb__after_atomic_inc();
> > cbq->pdev = dev;
> >
> > spin_lock_bh(&dev->queue_lock);
> > list_for_each_entry_safe(__cbq, n, &dev->queue_list, callback_entry) {
> > if (cn_cb_equal(&__cbq->cb->id, &cb->id)) {
> > found = 1;
> > break;
> > }
> > }
> > if (!found) {
> > atomic_set(&cbq->cb->refcnt, 1);
> > list_add_tail(&cbq->callback_entry, &dev->queue_list);
> > }
> > spin_unlock_bh(&dev->queue_lock);
>
> Why use spin_lock_bh()? Does none of this code ever get called from IRQ
> context?

Test documentation module that lives in
Documentation/connector/cn_test.c
describes different usage cases for connector, and it uses
cn_netlink_send()
from irq context, which may race with the callback adding/removing.

> > if (found) {
> > smp_mb__before_atomic_inc();
> > atomic_dec(&dev->refcnt);
> > smp_mb__after_atomic_inc();
> > atomic_set(&cbq->cb->refcnt, 0);
> > cn_queue_free_callback(cbq);
> > return -EINVAL;
> > }
>
> More strange barriers. This practice seems to be unique to this part of the
> kernel. That's probably a bad sign.

It was introduced recently in network stack, where skb may be freed
erroneously due to atomic reordering.
It is similar case.
It could be just smb_mb(), but it is more expensive.

>
> > struct cn_queue_dev *cn_queue_alloc_dev(char *name, struct sock *nls)
> > {
> > struct cn_queue_dev *dev;
> >
> > dev = kmalloc(sizeof(*dev), GFP_KERNEL);
> > if (!dev) {
> > printk(KERN_ERR "%s: Failed to allocte new struct cn_queue_dev.\n",
> > name);
> > return NULL;
> > }
> >
> > memset(dev, 0, sizeof(*dev));
> >
> > snprintf(dev->name, sizeof(dev->name), "%s", name);
>
> scnprintf?

Return value is ignored here, but I will change function though.

> > void cn_queue_free_dev(struct cn_queue_dev *dev)
> > {
> > struct cn_callback_entry *cbq, *n;
> >
> > flush_workqueue(dev->cn_queue);
> > destroy_workqueue(dev->cn_queue);
> >
> > spin_lock_bh(&dev->queue_lock);
> > list_for_each_entry_safe(cbq, n, &dev->queue_list, callback_entry) {
> > list_del(&cbq->callback_entry);
> > smp_mb__before_atomic_inc();
> > atomic_dec(&cbq->cb->refcnt);
> > smp_mb__after_atomic_inc();
> > }
> > spin_unlock_bh(&dev->queue_lock);
> >
> > while (atomic_read(&dev->refcnt)) {
> > printk(KERN_INFO "Waiting for %s to become free: refcnt=%d.\n",
> > dev->name, atomic_read(&dev->refcnt));
> >
> > msleep_interruptible(1000);
> >
> > if (current->flags & PF_FREEZE)
> > refrigerator(PF_FREEZE);
> >
> > if (signal_pending(current))
> > flush_signals(current);
> > }
> >
> > memset(dev, 0, sizeof(*dev));
> > kfree(dev);
> > dev = NULL;
> > }
>
> Again, why is this code playing with signals? Hopefully it is never called
> by user tasks - that would be very bad.

I will remove signal interrupting.

According to wait queue inside the object - it can have it, but it will
be used only for waiting and repeated checking in a slow path.
Let's postpone it for now until other isssues are resolved first.

> With proper refcounting and lifetime management and the use of the kthread
> API we should be able to make all this go away.
>
> Once we're trying to free up the controlling device, what prevents new
> messages from being queued?

Before callback device is removed it is unregistered from the callback
subsystem,
so all messages will be discarded, until main socket is removed
(actually only released so it can be properly freed in network subsystem
later).


Andrew, thank you for your comments,
I will cook up patches against your tree soon.

--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2005-04-01 07:58:03

by Andrew Morton

[permalink] [raw]
Subject: Re: cn_queue.c

Evgeniy Polyakov <[email protected]> wrote:
>
> > > static void cn_queue_wrapper(void *data)
> > > {
> > > struct cn_callback_entry *cbq = (struct cn_callback_entry *)data;
> > >
> > > smp_mb__before_atomic_inc();
> > > atomic_inc(&cbq->cb->refcnt);
> > > smp_mb__after_atomic_inc();
> > > cbq->cb->callback(cbq->cb->priv);
> > > smp_mb__before_atomic_inc();
> > > atomic_dec(&cbq->cb->refcnt);
> > > smp_mb__after_atomic_inc();
> > >
> > > cbq->destruct_data(cbq->ddata);
> > > }
> >
> > What on earth are all these barriers for? Barriers should have an
> > associated comment describing why they were added, and what they are
> > defending against, becuase it is very hard to tell that from reading the
> > code.
> >
> > Please describe (via code comments) what the refcounting rules are for
> > these data structures. It all looks very strange. You basically have:
> >
> >
> > atomic_inc(refcnt);
> > use(some_object);
> > atomic_dec(refcnt);
> >
> > Which looks very racy. Some other CPU could see a zero refcount just
> > before this CPU did the atomic_inc() and the other CPU will go and free up
> > some_object. There should be locking associated with refcounting to
> > prevent such races, and I don't see any.
>
> It can not happen in the above case.
> It can race with callback removing, but remove path
> call cancel_delayed_work() which calls del_timer_sync(),
> so above code can not be run or will not run,
> only after it refcnt is being checked to be zero.
> It is probably an overhead since sinchronization is done
> in other places.
> All above barriers are introduced to remove already not theoretical
> atomic vs. non-atomic reordering [ppc64 very like it],
> when reference counter must be decremented after object was used,
> but due to reordering it can happen before use [and it will be seen
> atomically on all CPUs],
> and other CPU may free object based on knowledge that
> refcnt is zero.

But all the above seems to be tied into the
poll-until-refcount-goes-to-zero cleanup code. I think - it's quite
unclear what the refcount protocol is for these objects.

Why cannot we manage object lifetimes here in the same manner as dentries,
inodes, skbuffs, etc?

> > > static struct cn_callback_entry *cn_queue_alloc_callback_entry(struct cn_callback *cb)
> >
> > 80 cols again.
> >
> > > static void cn_queue_free_callback(struct cn_callback_entry *cbq)
> > > {
> > > cancel_delayed_work(&cbq->work);
> > >
> > > while (atomic_read(&cbq->cb->refcnt)) {
> > > printk(KERN_INFO "Waiting for %s to become free: refcnt=%d.\n",
> > > cbq->pdev->name, atomic_read(&cbq->cb->refcnt));
> > >
> > > msleep_interruptible(1000);
> > >
> > > if (current->flags & PF_FREEZE)
> > > refrigerator(PF_FREEZE);
> > >
> > > if (signal_pending(current))
> > > flush_signals(current);
> > > }
> > >
> > > kfree(cbq);
> > > }
> >
> > erp. Can you please do this properly?
> >
> > Free the object on the refcount going to zero (with appropriate locking)?
> > If you want (for some reason) to wait until all users of an object have
> > gone away then you should take a ref on the object (inside locking), then
> > sleep on a waitqueue_head embedded in that object. Make all
> > refcount-droppers do a wake_up.
> >
> > Why is this function playing with signals?
>
> It is not, just can be interrupted to check state befor timeout expires.
> It is not a problem to remove signal interrupting here.

If this function is called by userspace tasks then we've just dumped any
signals which that task might have had pending.

> > > break;
> > > }
> > > }
> > > if (!found) {
> > > atomic_set(&cbq->cb->refcnt, 1);
> > > list_add_tail(&cbq->callback_entry, &dev->queue_list);
> > > }
> > > spin_unlock_bh(&dev->queue_lock);
> >
> > Why use spin_lock_bh()? Does none of this code ever get called from IRQ
> > context?
>
> Test documentation module that lives in
> Documentation/connector/cn_test.c
> describes different usage cases for connector, and it uses
> cn_netlink_send()
> from irq context, which may race with the callback adding/removing.

But cn_netlink_send() takes ->queue_lock. If this CPU happened to be
holding the same lock in process or bh context it will deadlock in IRQ
context.

>
> >
> > > struct cn_queue_dev *cn_queue_alloc_dev(char *name, struct sock *nls)
> > > {
> > > struct cn_queue_dev *dev;
> > >
> > > dev = kmalloc(sizeof(*dev), GFP_KERNEL);
> > > if (!dev) {
> > > printk(KERN_ERR "%s: Failed to allocte new struct cn_queue_dev.\n",
> > > name);
> > > return NULL;
> > > }
> > >
> > > memset(dev, 0, sizeof(*dev));
> > >
> > > snprintf(dev->name, sizeof(dev->name), "%s", name);
> >
> > scnprintf?
>
> Return value is ignored here, but I will change function though.

scnprintf() null-terminates the destination even if it hits the end of its
buffer.

> >
> > Again, why is this code playing with signals? Hopefully it is never called
> > by user tasks - that would be very bad.
>
> I will remove signal interrupting.

And use the kthread API, please.

> According to wait queue inside the object - it can have it, but it will
> be used only for waiting and repeated checking in a slow path.
> Let's postpone it for now until other isssues are resolved first.

This all ties into the refcounting and object lifetime design. What you
have there appears quite unconventional. Maybe there's good reason for
that and maybe the code is race-free. I don't understand what you've done
there sufficiently well to be able to say.

> > With proper refcounting and lifetime management and the use of the kthread
> > API we should be able to make all this go away.
> >
> > Once we're trying to free up the controlling device, what prevents new
> > messages from being queued?
>
> Before callback device is removed it is unregistered from the callback
> subsystem,
> so all messages will be discarded, until main socket is removed
> (actually only released so it can be properly freed in network subsystem
> later).
>

OK, I missed that.

2005-04-01 08:33:57

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: cn_queue.c

On Thu, 2005-03-31 at 23:57 -0800, Andrew Morton wrote:
> Evgeniy Polyakov <[email protected]> wrote:
> >
> > > > static void cn_queue_wrapper(void *data)
> > > > {
> > > > struct cn_callback_entry *cbq = (struct cn_callback_entry *)data;
> > > >
> > > > smp_mb__before_atomic_inc();
> > > > atomic_inc(&cbq->cb->refcnt);
> > > > smp_mb__after_atomic_inc();
> > > > cbq->cb->callback(cbq->cb->priv);
> > > > smp_mb__before_atomic_inc();
> > > > atomic_dec(&cbq->cb->refcnt);
> > > > smp_mb__after_atomic_inc();
> > > >
> > > > cbq->destruct_data(cbq->ddata);
> > > > }
> > >
> > > What on earth are all these barriers for? Barriers should have an
> > > associated comment describing why they were added, and what they are
> > > defending against, becuase it is very hard to tell that from reading the
> > > code.
> > >
> > > Please describe (via code comments) what the refcounting rules are for
> > > these data structures. It all looks very strange. You basically have:
> > >
> > >
> > > atomic_inc(refcnt);
> > > use(some_object);
> > > atomic_dec(refcnt);
> > >
> > > Which looks very racy. Some other CPU could see a zero refcount just
> > > before this CPU did the atomic_inc() and the other CPU will go and free up
> > > some_object. There should be locking associated with refcounting to
> > > prevent such races, and I don't see any.
> >
> > It can not happen in the above case.
> > It can race with callback removing, but remove path
> > call cancel_delayed_work() which calls del_timer_sync(),
> > so above code can not be run or will not run,
> > only after it refcnt is being checked to be zero.
> > It is probably an overhead since sinchronization is done
> > in other places.
> > All above barriers are introduced to remove already not theoretical
> > atomic vs. non-atomic reordering [ppc64 very like it],
> > when reference counter must be decremented after object was used,
> > but due to reordering it can happen before use [and it will be seen
> > atomically on all CPUs],
> > and other CPU may free object based on knowledge that
> > refcnt is zero.
>
> But all the above seems to be tied into the
> poll-until-refcount-goes-to-zero cleanup code. I think - it's quite
> unclear what the refcount protocol is for these objects.
>
> Why cannot we manage object lifetimes here in the same manner as dentries,
> inodes, skbuffs, etc?

It was easier to implement it in a that way.
Differencies will be only in function names - instead of msleep() it
will
be sleep_on.
Since it is atomic refcnt that is being checked wait_event can not be
used there directly.
I can change it's logic, it is quite simple.
Is it ok to postpone it until documentation changes are made in the
source code?

> > > > static struct cn_callback_entry *cn_queue_alloc_callback_entry(struct cn_callback *cb)
> > >
> > > 80 cols again.
> > >
> > > > static void cn_queue_free_callback(struct cn_callback_entry *cbq)
> > > > {
> > > > cancel_delayed_work(&cbq->work);
> > > >
> > > > while (atomic_read(&cbq->cb->refcnt)) {
> > > > printk(KERN_INFO "Waiting for %s to become free: refcnt=%d.\n",
> > > > cbq->pdev->name, atomic_read(&cbq->cb->refcnt));
> > > >
> > > > msleep_interruptible(1000);
> > > >
> > > > if (current->flags & PF_FREEZE)
> > > > refrigerator(PF_FREEZE);
> > > >
> > > > if (signal_pending(current))
> > > > flush_signals(current);
> > > > }
> > > >
> > > > kfree(cbq);
> > > > }
> > >
> > > erp. Can you please do this properly?
> > >
> > > Free the object on the refcount going to zero (with appropriate locking)?
> > > If you want (for some reason) to wait until all users of an object have
> > > gone away then you should take a ref on the object (inside locking), then
> > > sleep on a waitqueue_head embedded in that object. Make all
> > > refcount-droppers do a wake_up.
> > >
> > > Why is this function playing with signals?
> >
> > It is not, just can be interrupted to check state befor timeout expires.
> > It is not a problem to remove signal interrupting here.
>
> If this function is called by userspace tasks then we've just dumped any
> signals which that task might have had pending.

It is only rmmod/insmod who can call it
[it is called from module unloading/error path].
Implemented just to check state when Ctrl+C is pressed.

> > > > break;
> > > > }
> > > > }
> > > > if (!found) {
> > > > atomic_set(&cbq->cb->refcnt, 1);
> > > > list_add_tail(&cbq->callback_entry, &dev->queue_list);
> > > > }
> > > > spin_unlock_bh(&dev->queue_lock);
> > >
> > > Why use spin_lock_bh()? Does none of this code ever get called from IRQ
> > > context?
> >
> > Test documentation module that lives in
> > Documentation/connector/cn_test.c
> > describes different usage cases for connector, and it uses
> > cn_netlink_send()
> > from irq context, which may race with the callback adding/removing.
>
> But cn_netlink_send() takes ->queue_lock. If this CPU happened to be
> holding the same lock in process or bh context it will deadlock in IRQ
> context.

I need to state that sending must be limited to BH context too.
I thought about it when creating locking schema and decided to
not allow hard irq context, since receiving is already limited to
BH context.
I definitely need to put it into documentation.

> >
> > >
> > > > struct cn_queue_dev *cn_queue_alloc_dev(char *name, struct sock *nls)
> > > > {
> > > > struct cn_queue_dev *dev;
> > > >
> > > > dev = kmalloc(sizeof(*dev), GFP_KERNEL);
> > > > if (!dev) {
> > > > printk(KERN_ERR "%s: Failed to allocte new struct cn_queue_dev.\n",
> > > > name);
> > > > return NULL;
> > > > }
> > > >
> > > > memset(dev, 0, sizeof(*dev));
> > > >
> > > > snprintf(dev->name, sizeof(dev->name), "%s", name);
> > >
> > > scnprintf?
> >
> > Return value is ignored here, but I will change function though.
>
> scnprintf() null-terminates the destination even if it hits the end of its
> buffer.

Hmm...
snprintf() differs from scnprintf() only by returning value:

int snprintf(char * buf, size_t size, const char *fmt, ...)
{
va_list args;
int i;

va_start(args, fmt);
i=vsnprintf(buf,size,fmt,args);
va_end(args);
return i;
}

int scnprintf(char * buf, size_t size, const char *fmt, ...)
{
va_list args;
int i;

va_start(args, fmt);
i = vsnprintf(buf, size, fmt, args);
va_end(args);
return (i >= size) ? (size - 1) : i;
}

> > >
> > > Again, why is this code playing with signals? Hopefully it is never called
> > > by user tasks - that would be very bad.
> >
> > I will remove signal interrupting.
>
> And use the kthread API, please.

Yep.

> > According to wait queue inside the object - it can have it, but it will
> > be used only for waiting and repeated checking in a slow path.
> > Let's postpone it for now until other isssues are resolved first.
>
> This all ties into the refcounting and object lifetime design. What you
> have there appears quite unconventional. Maybe there's good reason for
> that and maybe the code is race-free. I don't understand what you've done
> there sufficiently well to be able to say.

New object has 0 reference counter when created.
If some work is appointed to the object, then it's counter is atomically
incremented. It is decremented when the work is finished.
If object is supposed to be removed while some work
may be appointed to it, core ensures that no work _is_ appointed,
and atomically disallows[for example removing workqueue, removing
callback, all with appropriate locks being hold]
any other work appointment for the given object.
After it [when no work can be appointed to the object] if object
still has pending work [and thus has it's refcounter not zero],
removing path waits untill appropriate refcnt hits zero.
Since no _new_ work can be appointed at that level it is just
while (atomic_read(refcnt) != 0)
msleep();


--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2005-04-01 08:49:00

by Andrew Morton

[permalink] [raw]
Subject: Re: cn_queue.c

Evgeniy Polyakov <[email protected]> wrote:
>
> New object has 0 reference counter when created.
> If some work is appointed to the object, then it's counter is atomically
> incremented. It is decremented when the work is finished.
> If object is supposed to be removed while some work
> may be appointed to it, core ensures that no work _is_ appointed,
> and atomically disallows[for example removing workqueue, removing
> callback, all with appropriate locks being hold]
> any other work appointment for the given object.
> After it [when no work can be appointed to the object] if object
> still has pending work [and thus has it's refcounter not zero],
> removing path waits untill appropriate refcnt hits zero.
> Since no _new_ work can be appointed at that level it is just
> while (atomic_read(refcnt) != 0)
> msleep();

More like:

while (atomic_read(&obj->refcnt))
msleep();
kfree(obj);

which introduces the possibility of someone grabbing a new ref on the
object just before the kfree(). If there is no means by which any other
actor can acquire a ref to this object then OK, no race.

But it's rather surprising that such a thing can be achieved without any
locking. What happens if another CPU has just entered
cn_queue_del_callback(), for example? It has a live cn_callback_entry in
`cbq' which has a zero refcount - cn_queue_free_dev() can throw it away.

2005-04-01 09:28:06

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: cn_queue.c

On Fri, 2005-04-01 at 00:48 -0800, Andrew Morton wrote:
> Evgeniy Polyakov <[email protected]> wrote:
> >
> > New object has 0 reference counter when created.
> > If some work is appointed to the object, then it's counter is atomically
> > incremented. It is decremented when the work is finished.
> > If object is supposed to be removed while some work
> > may be appointed to it, core ensures that no work _is_ appointed,
> > and atomically disallows[for example removing workqueue, removing
> > callback, all with appropriate locks being hold]
> > any other work appointment for the given object.
> > After it [when no work can be appointed to the object] if object
> > still has pending work [and thus has it's refcounter not zero],
> > removing path waits untill appropriate refcnt hits zero.
> > Since no _new_ work can be appointed at that level it is just
> > while (atomic_read(refcnt) != 0)
> > msleep();
>
> More like:
>
> while (atomic_read(&obj->refcnt))
> msleep();
> kfree(obj);

Yep :)

> which introduces the possibility of someone grabbing a new ref on the
> object just before the kfree(). If there is no means by which any other
> actor can acquire a ref to this object then OK, no race.

No, object is already removed from the pathes where someone may access
it.
It is only waiting until already assigned work is finished.

> But it's rather surprising that such a thing can be achieved without any
> locking. What happens if another CPU has just entered
> cn_queue_del_callback(), for example? It has a live cn_callback_entry in
> `cbq' which has a zero refcount - cn_queue_free_dev() can throw it away.

cn_queue_free_dev() will wait until dev->refcnt hits zero
before freeing any resources,
but it can happen only after cn_queue_del_callback() does
it's work on given callback device [actually when all callbacks
are removed].
When new callback is added into device, it's refcnt is incremented
[before adition btw, if addition fails in the middle, reference is
decremented], when callbak is removed, device's reference counter
is decremented aromically after all work is finished.

--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2005-04-01 09:51:02

by Andrew Morton

[permalink] [raw]
Subject: Re: cn_queue.c

Evgeniy Polyakov <[email protected]> wrote:
>
> cn_queue_free_dev() will wait until dev->refcnt hits zero
> before freeing any resources,
> but it can happen only after cn_queue_del_callback() does
> it's work on given callback device [actually when all callbacks
> are removed].
> When new callback is added into device, it's refcnt is incremented
> [before adition btw, if addition fails in the middle, reference is
> decremented], when callbak is removed, device's reference counter
> is decremented aromically after all work is finished.

hm.

How come cn_queue_del_callback() uses all those barriers if no other CPU
can grab new references against cbq->cb->refcnt?

cn_queue_free_callback() forgot to do flush_workqueue(), so
cn_queue_wrapper() can still be running while cn_queue_free_callback()
frees up the cn_callback_entry, I think.

2005-04-01 10:30:20

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: cn_queue.c

On Fri, 2005-04-01 at 01:50 -0800, Andrew Morton wrote:
> Evgeniy Polyakov <[email protected]> wrote:
> >
> > cn_queue_free_dev() will wait until dev->refcnt hits zero
> > before freeing any resources,
> > but it can happen only after cn_queue_del_callback() does
> > it's work on given callback device [actually when all callbacks
> > are removed].
> > When new callback is added into device, it's refcnt is incremented
> > [before adition btw, if addition fails in the middle, reference is
> > decremented], when callbak is removed, device's reference counter
> > is decremented aromically after all work is finished.
>
> hm.
>
> How come cn_queue_del_callback() uses all those barriers if no other CPU
> can grab new references against cbq->cb->refcnt?

The work may be already assigned to that callback device,
new work cant, barriers are there to ensure that
reference counters are updated in proper places, but not
before.
It would be a bug to update dev->refcnt before assigned work is finished
and callback removed.

> cn_queue_free_callback() forgot to do flush_workqueue(), so
> cn_queue_wrapper() can still be running while cn_queue_free_callback()
> frees up the cn_callback_entry, I think.

cn_queue_wrapper() atomically increments cbq->cb->refcnt if runs, so it
will
be caught in
while (atomic_read(&cbq->cb->refcnt))
msleep(1000);
in cn_queue_free_callback().
If it does not run, then all will be ok.

Btw, it looks like comments for del_timer_sync() and cancel_delayed_work
()
are controversial - del_timer_sync() says that pending timer
can not run on different CPU after returning,
but cancel_delayed_work() says, that work to be cancelled still
can run after returning.

--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2005-04-01 10:43:44

by Andrew Morton

[permalink] [raw]
Subject: Re: cn_queue.c

Evgeniy Polyakov <[email protected]> wrote:
>
> On Fri, 2005-04-01 at 01:50 -0800, Andrew Morton wrote:
> > Evgeniy Polyakov <[email protected]> wrote:
> > >
> > > cn_queue_free_dev() will wait until dev->refcnt hits zero
> > > before freeing any resources,
> > > but it can happen only after cn_queue_del_callback() does
> > > it's work on given callback device [actually when all callbacks
> > > are removed].
> > > When new callback is added into device, it's refcnt is incremented
> > > [before adition btw, if addition fails in the middle, reference is
> > > decremented], when callbak is removed, device's reference counter
> > > is decremented aromically after all work is finished.
> >
> > hm.
> >
> > How come cn_queue_del_callback() uses all those barriers if no other CPU
> > can grab new references against cbq->cb->refcnt?
>
> The work may be already assigned to that callback device,
> new work cant, barriers are there to ensure that
> reference counters are updated in proper places, but not
> before.

What are the "proper places"? What other control paths could be inspecting
the refcount at this time? (That's the problem with barriers - you can't
tell what they are barriering against).

> It would be a bug to update dev->refcnt before assigned work is finished
> and callback removed.
>
> > cn_queue_free_callback() forgot to do flush_workqueue(), so
> > cn_queue_wrapper() can still be running while cn_queue_free_callback()
> > frees up the cn_callback_entry, I think.
>
> cn_queue_wrapper() atomically increments cbq->cb->refcnt if runs, so it
> will
> be caught in
> while (atomic_read(&cbq->cb->refcnt))
> msleep(1000);
> in cn_queue_free_callback().
> If it does not run, then all will be ok.

But there's a time window on entry to cn_queue_wrapper() where the recsount
hasn't been incremented yet, and there's no locking. If
cn_queue_free_callback() inspects the refcount in that window it will free
the cn_callback_entry() while cn_queue_wrapper() is playing with it?

> Btw, it looks like comments for del_timer_sync() and cancel_delayed_work
> ()
> are controversial - del_timer_sync() says that pending timer
> can not run on different CPU after returning,
> but cancel_delayed_work() says, that work to be cancelled still
> can run after returning.

Not controversial - the timer can have expired and have been successfully
deleted but the work_struct which the timer handler scheduled is still
pending, or has just started to run.

2005-04-01 11:06:24

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: cn_queue.c

On Fri, 2005-04-01 at 02:43 -0800, Andrew Morton wrote:
> Evgeniy Polyakov <[email protected]> wrote:
> >
> > On Fri, 2005-04-01 at 01:50 -0800, Andrew Morton wrote:
> > > Evgeniy Polyakov <[email protected]> wrote:
> > > >
> > > > cn_queue_free_dev() will wait until dev->refcnt hits zero
> > > > before freeing any resources,
> > > > but it can happen only after cn_queue_del_callback() does
> > > > it's work on given callback device [actually when all callbacks
> > > > are removed].
> > > > When new callback is added into device, it's refcnt is incremented
> > > > [before adition btw, if addition fails in the middle, reference is
> > > > decremented], when callbak is removed, device's reference counter
> > > > is decremented aromically after all work is finished.
> > >
> > > hm.
> > >
> > > How come cn_queue_del_callback() uses all those barriers if no other CPU
> > > can grab new references against cbq->cb->refcnt?
> >
> > The work may be already assigned to that callback device,
> > new work cant, barriers are there to ensure that
> > reference counters are updated in proper places, but not
> > before.
>
> What are the "proper places"? What other control paths could be inspecting
> the refcount at this time? (That's the problem with barriers - you can't
> tell what they are barriering against).

Device's refcnt must be updated after work is done, not before.
Callback's refcnt must be incremented before work is done, and
decremented after.

Without barriers all above cases can be mixed.

> > It would be a bug to update dev->refcnt before assigned work is finished
> > and callback removed.
> >
> > > cn_queue_free_callback() forgot to do flush_workqueue(), so
> > > cn_queue_wrapper() can still be running while cn_queue_free_callback()
> > > frees up the cn_callback_entry, I think.
> >
> > cn_queue_wrapper() atomically increments cbq->cb->refcnt if runs, so it
> > will
> > be caught in
> > while (atomic_read(&cbq->cb->refcnt))
> > msleep(1000);
> > in cn_queue_free_callback().
> > If it does not run, then all will be ok.
>
> But there's a time window on entry to cn_queue_wrapper() where the recsount
> hasn't been incremented yet, and there's no locking. If
> cn_queue_free_callback() inspects the refcount in that window it will free
> the cn_callback_entry() while cn_queue_wrapper() is playing with it?

If we already run cn_queue_wrapper() [even before refcnt incrementing,
probably it is not even needed there], then cancel_delayed_work() will
sleep,
since appropriate timer will be deleted in del_timer_sync(), which
will wait untill it is finished on the different CPU.

> > Btw, it looks like comments for del_timer_sync() and cancel_delayed_work
> > ()
> > are controversial - del_timer_sync() says that pending timer
> > can not run on different CPU after returning,
> > but cancel_delayed_work() says, that work to be cancelled still
> > can run after returning.
>
> Not controversial - the timer can have expired and have been successfully
> deleted but the work_struct which the timer handler scheduled is still
> pending, or has just started to run.

Ugh, I see now.
There are two levels of work deferring - into cpu_workqueue_struct,
and, in case of delayed work, into work->timer.


Yes, I need to place flush_workqueue() in cn_queue_free_callback();

--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2005-04-01 11:09:28

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: cn_queue.c

On Fri, 2005-04-01 at 15:12 +0400, Evgeniy Polyakov wrote:

> > > cn_queue_wrapper() atomically increments cbq->cb->refcnt if runs, so it
> > > will
> > > be caught in
> > > while (atomic_read(&cbq->cb->refcnt))
> > > msleep(1000);
> > > in cn_queue_free_callback().
> > > If it does not run, then all will be ok.
> >
> > But there's a time window on entry to cn_queue_wrapper() where the recsount
> > hasn't been incremented yet, and there's no locking. If
> > cn_queue_free_callback() inspects the refcount in that window it will free
> > the cn_callback_entry() while cn_queue_wrapper() is playing with it?
>
> If we already run cn_queue_wrapper() [even before refcnt incrementing,
> probably it is not even needed there], then cancel_delayed_work() will
> sleep,
> since appropriate timer will be deleted in del_timer_sync(), which
> will wait untill it is finished on the different CPU.
>
> > > Btw, it looks like comments for del_timer_sync() and cancel_delayed_work
> > > ()
> > > are controversial - del_timer_sync() says that pending timer
> > > can not run on different CPU after returning,
> > > but cancel_delayed_work() says, that work to be cancelled still
> > > can run after returning.
> >
> > Not controversial - the timer can have expired and have been successfully
> > deleted but the work_struct which the timer handler scheduled is still
> > pending, or has just started to run.
>
> Ugh, I see now.
> There are two levels of work deferring - into cpu_workqueue_struct,
> and, in case of delayed work, into work->timer.
>
>
> Yes, I need to place flush_workqueue() in cn_queue_free_callback();
>

That actullay NULLifies above sentence about sleeping in
cancel_delayed_work().

--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part