2012-01-23 15:23:14

by Jeff Layton

[permalink] [raw]
Subject: Re: WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()

On Sun, 22 Jan 2012 00:46:14 -0800
Stephen Boyd <[email protected]> wrote:

> On 1/20/2012 10:56 AM, Jeff Layton wrote:
> > What I'm finding though is that if I stop and start nfsd multiple
> > times, then I eventually get a warning like this on start:
> >
> > [ 5157.128514] WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
> > [ 5157.128742] Hardware name: Bochs
> > [ 5157.128742] ODEBUG: activate not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20
> > [ 5157.128742] Modules linked in: nfsd(O) nfs_acl auth_rpcgss lockd sunrpc floppy virtio_net i2c_piix4 i2c_core virtio_balloon joydev pcspkr virtio_blk [last unloaded: nfsd]
> > [ 5157.128742] Pid: 1312, comm: rpc.nfsd Tainted: G W O 3.3.0-rc1+ #1
> > [ 5157.128742] Call Trace:
> > [ 5157.128742] [<ffffffff8106135f>] warn_slowpath_common+0x7f/0xc0
> > [ 5157.128742] [<ffffffff81061456>] warn_slowpath_fmt+0x46/0x50
> > [ 5157.128742] [<ffffffff8132ba2c>] debug_print_object+0x8c/0xb0
> > [ 5157.128742] [<ffffffff81070db0>] ? timer_debug_hint+0x10/0x10
> > [ 5157.128742] [<ffffffff8132c02b>] debug_object_activate+0xfb/0x190
> > [ 5157.128742] [<ffffffff81072728>] ? lock_timer_base.isra.24+0x38/0x70
> > [ 5157.128742] [<ffffffff81074676>] mod_timer+0xf6/0x450
> > [ 5157.128742] [<ffffffff810749e8>] add_timer+0x18/0x20
> > [ 5157.128742] [<ffffffff8108168e>] queue_delayed_work_on+0xbe/0x140
> > [ 5157.128742] [<ffffffff81084441>] queue_delayed_work+0x21/0x40
> > [ 5157.128742] [<ffffffffa006e568>] rpc_queue_upcall+0xe8/0x100 [sunrpc]
> > [ 5157.128742] [<ffffffffa012a121>] __cld_pipe_upcall+0x61/0xc0 [nfsd]
> > [ 5157.128742] [<ffffffffa012ad98>] nfsd4_cld_init+0x48/0x140 [nfsd]
> > [ 5157.128742] [<ffffffffa012b22a>] nfsd4_client_tracking_init+0x2a/0xc0 [nfsd]
> > [ 5157.128742] [<ffffffff8169797e>] ? mutex_unlock+0xe/0x10
> > [ 5157.128742] [<ffffffffa01266fa>] nfs4_state_start+0x1a/0x100 [nfsd]
> > [ 5157.128742] [<ffffffffa01028c5>] nfsd_svc+0x135/0x200 [nfsd]
> > [ 5157.128742] [<ffffffffa0103df0>] ? write_maxblksize+0x130/0x130 [nfsd]
> > [ 5157.128742] [<ffffffffa0103e6d>] write_threads+0x7d/0xd0 [nfsd]
> > [ 5157.128742] [<ffffffff811dd16a>] ? simple_transaction_get+0xca/0xe0
> > [ 5157.128742] [<ffffffffa0102ee7>] nfsctl_transaction_write+0x57/0x90 [nfsd]
> > [ 5157.128742] [<ffffffff811b4c9f>] vfs_write+0xaf/0x190
> > [ 5157.128742] [<ffffffff811b4fdd>] sys_write+0x4d/0x90
> > [ 5157.128742] [<ffffffff816a3469>] system_call_fastpath+0x16/0x1b
>
> This one is telling you that the timer you're about to run hasn't been
> registered with debug_objects. Most likely the work item hasn't been
> initialized properly and so it hasn't had INIT_DELAYED_WORK() called on it.
>
> >
> > ...or this on stop:
> >
> > [ 5200.804410] WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
> > [ 5200.808205] Hardware name: Bochs
> > [ 5200.810121] ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20
> > [ 5200.812734] Modules linked in: nfsd(O) nfs_acl auth_rpcgss lockd sunrpc floppy virtio_net i2c_piix4 i2c_core virtio_balloon joydev pcspkr virtio_blk [last unloaded: nfsd]
> > [ 5200.821596] Pid: 1394, comm: nfsd Tainted: G W O 3.3.0-rc1+ #1
> > [ 5200.822409] Call Trace:
> > [ 5200.822726] [<ffffffff8106135f>] warn_slowpath_common+0x7f/0xc0
> > [ 5200.823509] [<ffffffff81061456>] warn_slowpath_fmt+0x46/0x50
> > [ 5200.824243] [<ffffffff8132ba2c>] debug_print_object+0x8c/0xb0
> > [ 5200.824935] [<ffffffff81070db0>] ? timer_debug_hint+0x10/0x10
> > [ 5200.825665] [<ffffffff8132c543>] debug_object_assert_init+0xe3/0x120
> > [ 5200.826474] [<ffffffff810737b4>] del_timer+0x24/0x1b0
> > [ 5200.827129] [<ffffffff81085314>] __cancel_work_timer+0x34/0x140
> > [ 5200.827829] [<ffffffff81085432>] cancel_delayed_work_sync+0x12/0x20
> > [ 5200.832905] [<ffffffffa006e430>] rpc_unlink+0x1e0/0x230 [sunrpc]
> > [ 5200.833716] [<ffffffffa012a195>] nfsd4_remove_cld_pipe+0x15/0x40 [nfsd]
> > [ 5200.834558] [<ffffffffa012b2e0>] nfsd4_client_tracking_exit+0x20/0x30 [nfsd]
> > [ 5200.835434] [<ffffffffa0126984>] nfs4_state_shutdown+0x1a4/0x1c0 [nfsd]
> > [ 5200.836271] [<ffffffffa010219a>] nfsd_last_thread+0x2a/0x60 [nfsd]
> > [ 5200.837045] [<ffffffffa005f66c>] svc_destroy+0x5c/0x140 [sunrpc]
> > [ 5200.837778] [<ffffffffa005f9f6>] svc_exit_thread+0xa6/0xb0 [sunrpc]
> > [ 5200.838545] [<ffffffffa0102123>] nfsd+0x123/0x170 [nfsd]
> > [ 5200.839221] [<ffffffffa0102000>] ? 0xffffffffa0101fff
> > [ 5200.839820] [<ffffffff8108a747>] kthread+0xb7/0xc0
> > [ 5200.840431] [<ffffffff816a48b4>] kernel_thread_helper+0x4/0x10
> > [ 5200.850883] [<ffffffff8169acf4>] ? retint_restore_args+0x13/0x13
> > [ 5200.851654] [<ffffffff8108a690>] ? kthread_worker_fn+0x1a0/0x1a0
> > [ 5200.852418] [<ffffffff816a48b0>] ? gs_change+0x13/0x13
>
> This is similar. Now you're deleting a timer that debug_objects doesn't
> know about.
>
> >
> > The problem is that even after staring at this code for a while, I'm
> > still clueless as to what this is telling me. It doesn't happen every
> > time either, so maybe there's a race of some sort involved.
> >
> > It's possible that my patch is broken and doing something wrong, but
> > I'm starting not to think so. I'm not really using timers directly --
> > I'm using rpc_pipefs, which queues a delayed job to a workqueue, and
> > that is manipulating the timer. The delayed_work (and hence the timer)
> > are part of an rpc_inode.
> >
> > Can anyone shed a little light on what this is complaining about?
> >
>
> Do you have workqueue debugging enabled too (DEBUG_OBJECTS_WORK)? I
> would hope that work item debugging would say the same things and then
> we would know that the work item itself wasn't initialized properly.

(adding linux-nfs to cc list...)

Ok, I think I sort of see what's happening, but I'm not sure if it's a
bug in the debug objects code or something else. To answer the question
that I didn't before, the kernels I've been testing have this set:

CONFIG_DEBUG_OBJECTS_WORK=y

...so workqueue debugging is on but I haven't seen any messages that
come from it AFAICT. The rpc_inode objects have their delayed_work
fields initialized via a slab "constructor". Here's the rpc_inode_cache
creation call:

rpc_inode_cachep = kmem_cache_create("rpc_inode_cache",
sizeof(struct rpc_inode),
0, (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
SLAB_MEM_SPREAD),
init_once);

...and the init_once() routine does this:

INIT_DELAYED_WORK(&rpci->queue_timeout,
rpc_timeout_upcall_queue);

On the first use of an rpc_inode object, everything works fine. I think
the problem comes in when rpc_inode objects get recycled without the
INIT_DELAYED_WORK() getting called on it again.

Before the object is freed it gets cancel_delayed_work_sync() called on
it, but that's apparently not enough to convince the debugobjects code
that it's already correctly initialized. As a Q&D check, the following
patch seems to stop the warnings. Is there some better way to do this
that doesn't require the reinitialization of the delayed work on each
inode allocation?

--- a/net/sunrpc/rpc_pipe.c
+++ b/net/sunrpc/rpc_pipe.c
@@ -178,6 +178,8 @@ rpc_alloc_inode(struct super_block *sb)
rpci = (struct rpc_inode *)kmem_cache_alloc(rpc_inode_cachep, GFP_KERNEL);
if (!rpci)
return NULL;
+
+ INIT_DELAYED_WORK(&rpci->queue_timeout, rpc_timeout_upcall_queue);
return &rpci->vfs_inode;
}

--
Jeff Layton <[email protected]>


2012-01-24 07:45:21

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()

On Mon, Jan 23, 2012 at 10:23:11AM -0500, Jeff Layton wrote:
> On Sun, 22 Jan 2012 00:46:14 -0800
> Stephen Boyd <[email protected]> wrote:
>
> > On 1/20/2012 10:56 AM, Jeff Layton wrote:
> > > What I'm finding though is that if I stop and start nfsd multiple
> > > times, then I eventually get a warning like this on start:
> > >
> > > [ 5157.128514] WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
> > > [ 5157.128742] Hardware name: Bochs
> > > [ 5157.128742] ODEBUG: activate not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20
> > > [ 5157.128742] Modules linked in: nfsd(O) nfs_acl auth_rpcgss lockd sunrpc floppy virtio_net i2c_piix4 i2c_core virtio_balloon joydev pcspkr virtio_blk [last unloaded: nfsd]
> > > [ 5157.128742] Pid: 1312, comm: rpc.nfsd Tainted: G W O 3.3.0-rc1+ #1
> > > [ 5157.128742] Call Trace:
> > > [ 5157.128742] [<ffffffff8106135f>] warn_slowpath_common+0x7f/0xc0
> > > [ 5157.128742] [<ffffffff81061456>] warn_slowpath_fmt+0x46/0x50
> > > [ 5157.128742] [<ffffffff8132ba2c>] debug_print_object+0x8c/0xb0
> > > [ 5157.128742] [<ffffffff81070db0>] ? timer_debug_hint+0x10/0x10
> > > [ 5157.128742] [<ffffffff8132c02b>] debug_object_activate+0xfb/0x190
> > > [ 5157.128742] [<ffffffff81072728>] ? lock_timer_base.isra.24+0x38/0x70
> > > [ 5157.128742] [<ffffffff81074676>] mod_timer+0xf6/0x450
> > > [ 5157.128742] [<ffffffff810749e8>] add_timer+0x18/0x20
> > > [ 5157.128742] [<ffffffff8108168e>] queue_delayed_work_on+0xbe/0x140
> > > [ 5157.128742] [<ffffffff81084441>] queue_delayed_work+0x21/0x40
> > > [ 5157.128742] [<ffffffffa006e568>] rpc_queue_upcall+0xe8/0x100 [sunrpc]
> > > [ 5157.128742] [<ffffffffa012a121>] __cld_pipe_upcall+0x61/0xc0 [nfsd]
> > > [ 5157.128742] [<ffffffffa012ad98>] nfsd4_cld_init+0x48/0x140 [nfsd]
> > > [ 5157.128742] [<ffffffffa012b22a>] nfsd4_client_tracking_init+0x2a/0xc0 [nfsd]
> > > [ 5157.128742] [<ffffffff8169797e>] ? mutex_unlock+0xe/0x10
> > > [ 5157.128742] [<ffffffffa01266fa>] nfs4_state_start+0x1a/0x100 [nfsd]
> > > [ 5157.128742] [<ffffffffa01028c5>] nfsd_svc+0x135/0x200 [nfsd]
> > > [ 5157.128742] [<ffffffffa0103df0>] ? write_maxblksize+0x130/0x130 [nfsd]
> > > [ 5157.128742] [<ffffffffa0103e6d>] write_threads+0x7d/0xd0 [nfsd]
> > > [ 5157.128742] [<ffffffff811dd16a>] ? simple_transaction_get+0xca/0xe0
> > > [ 5157.128742] [<ffffffffa0102ee7>] nfsctl_transaction_write+0x57/0x90 [nfsd]
> > > [ 5157.128742] [<ffffffff811b4c9f>] vfs_write+0xaf/0x190
> > > [ 5157.128742] [<ffffffff811b4fdd>] sys_write+0x4d/0x90
> > > [ 5157.128742] [<ffffffff816a3469>] system_call_fastpath+0x16/0x1b
> >
> > This one is telling you that the timer you're about to run hasn't been
> > registered with debug_objects. Most likely the work item hasn't been
> > initialized properly and so it hasn't had INIT_DELAYED_WORK() called on it.
> >
> > >
> > > ...or this on stop:
> > >
> > > [ 5200.804410] WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
> > > [ 5200.808205] Hardware name: Bochs
> > > [ 5200.810121] ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20
> > > [ 5200.812734] Modules linked in: nfsd(O) nfs_acl auth_rpcgss lockd sunrpc floppy virtio_net i2c_piix4 i2c_core virtio_balloon joydev pcspkr virtio_blk [last unloaded: nfsd]
> > > [ 5200.821596] Pid: 1394, comm: nfsd Tainted: G W O 3.3.0-rc1+ #1
> > > [ 5200.822409] Call Trace:
> > > [ 5200.822726] [<ffffffff8106135f>] warn_slowpath_common+0x7f/0xc0
> > > [ 5200.823509] [<ffffffff81061456>] warn_slowpath_fmt+0x46/0x50
> > > [ 5200.824243] [<ffffffff8132ba2c>] debug_print_object+0x8c/0xb0
> > > [ 5200.824935] [<ffffffff81070db0>] ? timer_debug_hint+0x10/0x10
> > > [ 5200.825665] [<ffffffff8132c543>] debug_object_assert_init+0xe3/0x120
> > > [ 5200.826474] [<ffffffff810737b4>] del_timer+0x24/0x1b0
> > > [ 5200.827129] [<ffffffff81085314>] __cancel_work_timer+0x34/0x140
> > > [ 5200.827829] [<ffffffff81085432>] cancel_delayed_work_sync+0x12/0x20
> > > [ 5200.832905] [<ffffffffa006e430>] rpc_unlink+0x1e0/0x230 [sunrpc]
> > > [ 5200.833716] [<ffffffffa012a195>] nfsd4_remove_cld_pipe+0x15/0x40 [nfsd]
> > > [ 5200.834558] [<ffffffffa012b2e0>] nfsd4_client_tracking_exit+0x20/0x30 [nfsd]
> > > [ 5200.835434] [<ffffffffa0126984>] nfs4_state_shutdown+0x1a4/0x1c0 [nfsd]
> > > [ 5200.836271] [<ffffffffa010219a>] nfsd_last_thread+0x2a/0x60 [nfsd]
> > > [ 5200.837045] [<ffffffffa005f66c>] svc_destroy+0x5c/0x140 [sunrpc]
> > > [ 5200.837778] [<ffffffffa005f9f6>] svc_exit_thread+0xa6/0xb0 [sunrpc]
> > > [ 5200.838545] [<ffffffffa0102123>] nfsd+0x123/0x170 [nfsd]
> > > [ 5200.839221] [<ffffffffa0102000>] ? 0xffffffffa0101fff
> > > [ 5200.839820] [<ffffffff8108a747>] kthread+0xb7/0xc0
> > > [ 5200.840431] [<ffffffff816a48b4>] kernel_thread_helper+0x4/0x10
> > > [ 5200.850883] [<ffffffff8169acf4>] ? retint_restore_args+0x13/0x13
> > > [ 5200.851654] [<ffffffff8108a690>] ? kthread_worker_fn+0x1a0/0x1a0
> > > [ 5200.852418] [<ffffffff816a48b0>] ? gs_change+0x13/0x13
> >
> > This is similar. Now you're deleting a timer that debug_objects doesn't
> > know about.
> >
> > >
> > > The problem is that even after staring at this code for a while, I'm
> > > still clueless as to what this is telling me. It doesn't happen every
> > > time either, so maybe there's a race of some sort involved.
> > >
> > > It's possible that my patch is broken and doing something wrong, but
> > > I'm starting not to think so. I'm not really using timers directly --
> > > I'm using rpc_pipefs, which queues a delayed job to a workqueue, and
> > > that is manipulating the timer. The delayed_work (and hence the timer)
> > > are part of an rpc_inode.
> > >
> > > Can anyone shed a little light on what this is complaining about?
> > >
> >
> > Do you have workqueue debugging enabled too (DEBUG_OBJECTS_WORK)? I
> > would hope that work item debugging would say the same things and then
> > we would know that the work item itself wasn't initialized properly.
>
> (adding linux-nfs to cc list...)
>
> Ok, I think I sort of see what's happening, but I'm not sure if it's a
> bug in the debug objects code or something else. To answer the question
> that I didn't before, the kernels I've been testing have this set:
>
> CONFIG_DEBUG_OBJECTS_WORK=y
>
> ...so workqueue debugging is on but I haven't seen any messages that
> come from it AFAICT. The rpc_inode objects have their delayed_work
> fields initialized via a slab "constructor". Here's the rpc_inode_cache
> creation call:
>
> rpc_inode_cachep = kmem_cache_create("rpc_inode_cache",
> sizeof(struct rpc_inode),
> 0, (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
> SLAB_MEM_SPREAD),
> init_once);
>
> ...and the init_once() routine does this:
>
> INIT_DELAYED_WORK(&rpci->queue_timeout,
> rpc_timeout_upcall_queue);
>
> On the first use of an rpc_inode object, everything works fine. I think
> the problem comes in when rpc_inode objects get recycled without the
> INIT_DELAYED_WORK() getting called on it again.
>
> Before the object is freed it gets cancel_delayed_work_sync() called on
> it, but that's apparently not enough to convince the debugobjects code
> that it's already correctly initialized. As a Q&D check, the following
> patch seems to stop the warnings. Is there some better way to do this
> that doesn't require the reinitialization of the delayed work on each
> inode allocation?

If this is debug objects false positive, it should be rather fixed in
debug object code itself, not workaround in nfs code, but I do not have
any idea how to do this.

Stanislaw

> --- a/net/sunrpc/rpc_pipe.c
> +++ b/net/sunrpc/rpc_pipe.c
> @@ -178,6 +178,8 @@ rpc_alloc_inode(struct super_block *sb)
> rpci = (struct rpc_inode *)kmem_cache_alloc(rpc_inode_cachep, GFP_KERNEL);
> if (!rpci)
> return NULL;
> +
> + INIT_DELAYED_WORK(&rpci->queue_timeout, rpc_timeout_upcall_queue);
> return &rpci->vfs_inode;
> }
>
> --
> Jeff Layton <[email protected]>

2012-01-24 12:36:57

by Jeff Layton

[permalink] [raw]
Subject: Re: WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()

On Tue, 24 Jan 2012 11:51:59 +0200
Boaz Harrosh <[email protected]> wrote:

> On 01/24/2012 09:45 AM, Stanislaw Gruszka wrote:
> > On Mon, Jan 23, 2012 at 10:23:11AM -0500, Jeff Layton wrote:
> >> On Sun, 22 Jan 2012 00:46:14 -0800
> >> Stephen Boyd <[email protected]> wrote:
> >>
> >>> On 1/20/2012 10:56 AM, Jeff Layton wrote:
> >>>> What I'm finding though is that if I stop and start nfsd multiple
> >>>> times, then I eventually get a warning like this on start:
> >>>>
> >>>> [ 5157.128514] WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
> >>>> [ 5157.128742] Hardware name: Bochs
> >>>> [ 5157.128742] ODEBUG: activate not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20
> >>>> [ 5157.128742] Modules linked in: nfsd(O) nfs_acl auth_rpcgss lockd sunrpc floppy virtio_net i2c_piix4 i2c_core virtio_balloon joydev pcspkr virtio_blk [last unloaded: nfsd]
> >>>> [ 5157.128742] Pid: 1312, comm: rpc.nfsd Tainted: G W O 3.3.0-rc1+ #1
> >>>> [ 5157.128742] Call Trace:
> >>>> [ 5157.128742] [<ffffffff8106135f>] warn_slowpath_common+0x7f/0xc0
> >>>> [ 5157.128742] [<ffffffff81061456>] warn_slowpath_fmt+0x46/0x50
> >>>> [ 5157.128742] [<ffffffff8132ba2c>] debug_print_object+0x8c/0xb0
> >>>> [ 5157.128742] [<ffffffff81070db0>] ? timer_debug_hint+0x10/0x10
> >>>> [ 5157.128742] [<ffffffff8132c02b>] debug_object_activate+0xfb/0x190
> >>>> [ 5157.128742] [<ffffffff81072728>] ? lock_timer_base.isra.24+0x38/0x70
> >>>> [ 5157.128742] [<ffffffff81074676>] mod_timer+0xf6/0x450
> >>>> [ 5157.128742] [<ffffffff810749e8>] add_timer+0x18/0x20
> >>>> [ 5157.128742] [<ffffffff8108168e>] queue_delayed_work_on+0xbe/0x140
> >>>> [ 5157.128742] [<ffffffff81084441>] queue_delayed_work+0x21/0x40
> >>>> [ 5157.128742] [<ffffffffa006e568>] rpc_queue_upcall+0xe8/0x100 [sunrpc]
> >>>> [ 5157.128742] [<ffffffffa012a121>] __cld_pipe_upcall+0x61/0xc0 [nfsd]
> >>>> [ 5157.128742] [<ffffffffa012ad98>] nfsd4_cld_init+0x48/0x140 [nfsd]
> >>>> [ 5157.128742] [<ffffffffa012b22a>] nfsd4_client_tracking_init+0x2a/0xc0 [nfsd]
> >>>> [ 5157.128742] [<ffffffff8169797e>] ? mutex_unlock+0xe/0x10
> >>>> [ 5157.128742] [<ffffffffa01266fa>] nfs4_state_start+0x1a/0x100 [nfsd]
> >>>> [ 5157.128742] [<ffffffffa01028c5>] nfsd_svc+0x135/0x200 [nfsd]
> >>>> [ 5157.128742] [<ffffffffa0103df0>] ? write_maxblksize+0x130/0x130 [nfsd]
> >>>> [ 5157.128742] [<ffffffffa0103e6d>] write_threads+0x7d/0xd0 [nfsd]
> >>>> [ 5157.128742] [<ffffffff811dd16a>] ? simple_transaction_get+0xca/0xe0
> >>>> [ 5157.128742] [<ffffffffa0102ee7>] nfsctl_transaction_write+0x57/0x90 [nfsd]
> >>>> [ 5157.128742] [<ffffffff811b4c9f>] vfs_write+0xaf/0x190
> >>>> [ 5157.128742] [<ffffffff811b4fdd>] sys_write+0x4d/0x90
> >>>> [ 5157.128742] [<ffffffff816a3469>] system_call_fastpath+0x16/0x1b
> >>>
> >>> This one is telling you that the timer you're about to run hasn't been
> >>> registered with debug_objects. Most likely the work item hasn't been
> >>> initialized properly and so it hasn't had INIT_DELAYED_WORK() called on it.
> >>>
> >>>>
> >>>> ...or this on stop:
> >>>>
> >>>> [ 5200.804410] WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
> >>>> [ 5200.808205] Hardware name: Bochs
> >>>> [ 5200.810121] ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20
> >>>> [ 5200.812734] Modules linked in: nfsd(O) nfs_acl auth_rpcgss lockd sunrpc floppy virtio_net i2c_piix4 i2c_core virtio_balloon joydev pcspkr virtio_blk [last unloaded: nfsd]
> >>>> [ 5200.821596] Pid: 1394, comm: nfsd Tainted: G W O 3.3.0-rc1+ #1
> >>>> [ 5200.822409] Call Trace:
> >>>> [ 5200.822726] [<ffffffff8106135f>] warn_slowpath_common+0x7f/0xc0
> >>>> [ 5200.823509] [<ffffffff81061456>] warn_slowpath_fmt+0x46/0x50
> >>>> [ 5200.824243] [<ffffffff8132ba2c>] debug_print_object+0x8c/0xb0
> >>>> [ 5200.824935] [<ffffffff81070db0>] ? timer_debug_hint+0x10/0x10
> >>>> [ 5200.825665] [<ffffffff8132c543>] debug_object_assert_init+0xe3/0x120
> >>>> [ 5200.826474] [<ffffffff810737b4>] del_timer+0x24/0x1b0
> >>>> [ 5200.827129] [<ffffffff81085314>] __cancel_work_timer+0x34/0x140
> >>>> [ 5200.827829] [<ffffffff81085432>] cancel_delayed_work_sync+0x12/0x20
> >>>> [ 5200.832905] [<ffffffffa006e430>] rpc_unlink+0x1e0/0x230 [sunrpc]
> >>>> [ 5200.833716] [<ffffffffa012a195>] nfsd4_remove_cld_pipe+0x15/0x40 [nfsd]
> >>>> [ 5200.834558] [<ffffffffa012b2e0>] nfsd4_client_tracking_exit+0x20/0x30 [nfsd]
> >>>> [ 5200.835434] [<ffffffffa0126984>] nfs4_state_shutdown+0x1a4/0x1c0 [nfsd]
> >>>> [ 5200.836271] [<ffffffffa010219a>] nfsd_last_thread+0x2a/0x60 [nfsd]
> >>>> [ 5200.837045] [<ffffffffa005f66c>] svc_destroy+0x5c/0x140 [sunrpc]
> >>>> [ 5200.837778] [<ffffffffa005f9f6>] svc_exit_thread+0xa6/0xb0 [sunrpc]
> >>>> [ 5200.838545] [<ffffffffa0102123>] nfsd+0x123/0x170 [nfsd]
> >>>> [ 5200.839221] [<ffffffffa0102000>] ? 0xffffffffa0101fff
> >>>> [ 5200.839820] [<ffffffff8108a747>] kthread+0xb7/0xc0
> >>>> [ 5200.840431] [<ffffffff816a48b4>] kernel_thread_helper+0x4/0x10
> >>>> [ 5200.850883] [<ffffffff8169acf4>] ? retint_restore_args+0x13/0x13
> >>>> [ 5200.851654] [<ffffffff8108a690>] ? kthread_worker_fn+0x1a0/0x1a0
> >>>> [ 5200.852418] [<ffffffff816a48b0>] ? gs_change+0x13/0x13
> >>>
> >>> This is similar. Now you're deleting a timer that debug_objects doesn't
> >>> know about.
> >>>
> >>>>
> >>>> The problem is that even after staring at this code for a while, I'm
> >>>> still clueless as to what this is telling me. It doesn't happen every
> >>>> time either, so maybe there's a race of some sort involved.
> >>>>
> >>>> It's possible that my patch is broken and doing something wrong, but
> >>>> I'm starting not to think so. I'm not really using timers directly --
> >>>> I'm using rpc_pipefs, which queues a delayed job to a workqueue, and
> >>>> that is manipulating the timer. The delayed_work (and hence the timer)
> >>>> are part of an rpc_inode.
> >>>>
> >>>> Can anyone shed a little light on what this is complaining about?
> >>>>
> >>>
> >>> Do you have workqueue debugging enabled too (DEBUG_OBJECTS_WORK)? I
> >>> would hope that work item debugging would say the same things and then
> >>> we would know that the work item itself wasn't initialized properly.
> >>
> >> (adding linux-nfs to cc list...)
> >>
> >> Ok, I think I sort of see what's happening, but I'm not sure if it's a
> >> bug in the debug objects code or something else. To answer the question
> >> that I didn't before, the kernels I've been testing have this set:
> >>
> >> CONFIG_DEBUG_OBJECTS_WORK=y
> >>
> >> ...so workqueue debugging is on but I haven't seen any messages that
> >> come from it AFAICT. The rpc_inode objects have their delayed_work
> >> fields initialized via a slab "constructor". Here's the rpc_inode_cache
> >> creation call:
> >>
> >> rpc_inode_cachep = kmem_cache_create("rpc_inode_cache",
> >> sizeof(struct rpc_inode),
> >> 0, (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
> >> SLAB_MEM_SPREAD),
> >> init_once);
> >>
> >> ...and the init_once() routine does this:
> >>
> >> INIT_DELAYED_WORK(&rpci->queue_timeout,
> >> rpc_timeout_upcall_queue);
> >>
> >> On the first use of an rpc_inode object, everything works fine. I think
> >> the problem comes in when rpc_inode objects get recycled without the
> >> INIT_DELAYED_WORK() getting called on it again.
> >>
> >> Before the object is freed it gets cancel_delayed_work_sync() called on
> >> it, but that's apparently not enough to convince the debugobjects code
> >> that it's already correctly initialized. As a Q&D check, the following
> >> patch seems to stop the warnings. Is there some better way to do this
> >> that doesn't require the reinitialization of the delayed work on each
> >> inode allocation?
> >
> > If this is debug objects false positive, it should be rather fixed in
> > debug object code itself, not workaround in nfs code, but I do not have
> > any idea how to do this.
> >
> > Stanislaw
> >
>
> I don't think so. Look at what INIT_DELAYED_WORK does.
> INIT_WORK - Set some start values, register with the objects-debugger if on.
> init_timer -
>
> I'd imagine that cancel_delayed_work_sync() would: 1st cancel some of those
> registrations, specifically with the objects-debugger. 2nd will leave some
> state undefined as left from the last round.
>
> I think this patch is cardinal and you might have found a potential bug.
>
> Also the cost of INIT_DELAYED_WORK is marginal compare to the extra safety.
>
> Recomended-by: Boaz Harrosh <[email protected]>
>

No, I don't think the state would be undefined after
cancel_delayed_work_sync. In principle you could requeue that work
again if you like without needing to reinitialize it.

I think this is a problem in the debugobjects code. It doesn't have
any way to know that when the object is recycled out of the slab that
the work is already initialized.

Certainly it's simple enough to reinitialize the work every time we
allocate an inode here, but I don't think this is really a rpc_pipefs
bug per-se. I can send a patch that works around this problem, but
if there are plans to fix this in the debugobjects code, I won't
bother...

--
Jeff Layton <[email protected]>

2012-01-25 14:46:33

by Jeff Layton

[permalink] [raw]
Subject: Re: WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()

On Wed, 25 Jan 2012 15:05:03 +0100 (CET)
Thomas Gleixner <[email protected]> wrote:

> On Tue, 24 Jan 2012, Jeff Layton wrote:
> > > Still, I wonder if there are other problems like this around. The slab
> > > allocators seem to call debug_check_no_obj_freed() on kmem_cache_free,
> > > but parts of the objects themselves (like the timer in the work object
> > > here) get initialized in other places and aren't necessarily
> > > reinitialized when they're recycled out of the slab...
> > >
> >
> > On second thought...getting rid of the ctor function here might be
> > problematic. We have to call inode_init_once, etc...
> >
> > Almost all of the inode slabs have one, so I've settled for just moving
> > the INIT_DELAYED_WORK call out of init_once and into rpc_alloc_inode. I
> > sent a patch to Trond and linux-nfs to do that. That will fix this
> > case, but I do wonder if there are other places in the kernel that have
> > similar problems with debugobject initialization.
>
> The problem is that debugobject requires that a newly allocated object
> is reinitialized and made available to the debugobjects code again
> simply because we remove it from the debugobjects core on
> kmem_cache_free().
>
> The real question is why the heck kmem_cache_alloc() does not call the
> ctor on each allocation and just expects the previously used and freed
> object to be in a consistent initialiazed state.
>
> Thanks,
>
> tglx


I believe that's by design. The comments at the top of slab.c say:

* This means, that your constructor is used only for newly allocated
* slabs and you must pass objects with the same initializations to
* kmem_cache_free.

I assume that it was done that way for efficiency, but not passing
"clean" objects to kmem_cache_free has been the source of bugs in the
past.

Rerunning the ctor should be safe. SLAB already does that when memory
poisoning is enabled. Perhaps we could make sure all the allocators do
that when debug objects are enabled?

OTOH, doing that might paper over bugs in some cases...

--
Jeff Layton <[email protected]>

2012-01-24 16:32:47

by Jeff Layton

[permalink] [raw]
Subject: Re: WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()

On Tue, 24 Jan 2012 17:01:29 +0200
Boaz Harrosh <[email protected]> wrote:

> On 01/24/2012 02:36 PM, Jeff Layton wrote:
> >
> > No, I don't think the state would be undefined after
> > cancel_delayed_work_sync. In principle you could requeue that work
> > again if you like without needing to reinitialize it.
> >
> > I think this is a problem in the debugobjects code. It doesn't have
> > any way to know that when the object is recycled out of the slab that
> > the work is already initialized.
> >
>
> The only difference between your above example of requeue after
> cancel_delayed_work_sync, and this here is the visit back to the
> slab. Does the slab (Maybe in debug mode) stumps over some of the
> record memory?
>
> If the memory is constant what is then the difference between the two
> cases?
>
> > Certainly it's simple enough to reinitialize the work every time we
> > allocate an inode here, but I don't think this is really a rpc_pipefs
> > bug per-se.
>
> That depends on the API intention. If an init is intended after
> SLAB free then yes if not then not. We should ask for the intention
> of this API.
>
> > I can send a patch that works around this problem, but
> > if there are plans to fix this in the debugobjects code, I won't
> > bother...
> >
>
> You mean other fix then calling INIT_DELAYED_WORK? is that so
> bad that we need more code to avoid it?
>

I'm not opposed to a patch that sidesteps this problem, but I want to
make sure we understand it so that we don't get bitten by it in other
places. That's a good point. I hadn't considered whether memory
poisoning is a factor. In the kernel I was testing:

CONFIG_SLUB=y
CONFIG_SLUB_DEBUG_ON=y

...just to be sure:

# cat /sys/kernel/slab/rpc_inode_cache/poison
1

Looking at the code...

It looks like SLAB will call the ctor on every object when it's
allocated, even if it was recycled from an existing slab. SLUB doesn't
do that however -- as best I can tell it avoids poisoning objects when
there is a ctor function, so they don't get reinitialized like they
would with SLAB.

Probably the best solution here is to eliminate the ctor function and
just initialize the objects whenever they're allocated. Since these
objects aren't frequently recycled then there's little benefit to
keeping that around, IMO. I'll spin up a patch for that soon.

Still, I wonder if there are other problems like this around. The slab
allocators seem to call debug_check_no_obj_freed() on kmem_cache_free,
but parts of the objects themselves (like the timer in the work object
here) get initialized in other places and aren't necessarily
reinitialized when they're recycled out of the slab...

--
Jeff Layton <[email protected]>

2012-01-25 14:05:10

by Thomas Gleixner

[permalink] [raw]
Subject: Re: WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()

On Tue, 24 Jan 2012, Jeff Layton wrote:
> > Still, I wonder if there are other problems like this around. The slab
> > allocators seem to call debug_check_no_obj_freed() on kmem_cache_free,
> > but parts of the objects themselves (like the timer in the work object
> > here) get initialized in other places and aren't necessarily
> > reinitialized when they're recycled out of the slab...
> >
>
> On second thought...getting rid of the ctor function here might be
> problematic. We have to call inode_init_once, etc...
>
> Almost all of the inode slabs have one, so I've settled for just moving
> the INIT_DELAYED_WORK call out of init_once and into rpc_alloc_inode. I
> sent a patch to Trond and linux-nfs to do that. That will fix this
> case, but I do wonder if there are other places in the kernel that have
> similar problems with debugobject initialization.

The problem is that debugobject requires that a newly allocated object
is reinitialized and made available to the debugobjects code again
simply because we remove it from the debugobjects core on
kmem_cache_free().

The real question is why the heck kmem_cache_alloc() does not call the
ctor on each allocation and just expects the previously used and freed
object to be in a consistent initialiazed state.

Thanks,

tglx

2012-01-24 17:44:09

by Jeff Layton

[permalink] [raw]
Subject: Re: WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()

On Tue, 24 Jan 2012 11:32:34 -0500
Jeff Layton <[email protected]> wrote:

> On Tue, 24 Jan 2012 17:01:29 +0200
> Boaz Harrosh <[email protected]> wrote:
>
> > On 01/24/2012 02:36 PM, Jeff Layton wrote:
> > >
> > > No, I don't think the state would be undefined after
> > > cancel_delayed_work_sync. In principle you could requeue that work
> > > again if you like without needing to reinitialize it.
> > >
> > > I think this is a problem in the debugobjects code. It doesn't have
> > > any way to know that when the object is recycled out of the slab that
> > > the work is already initialized.
> > >
> >
> > The only difference between your above example of requeue after
> > cancel_delayed_work_sync, and this here is the visit back to the
> > slab. Does the slab (Maybe in debug mode) stumps over some of the
> > record memory?
> >
> > If the memory is constant what is then the difference between the two
> > cases?
> >
> > > Certainly it's simple enough to reinitialize the work every time we
> > > allocate an inode here, but I don't think this is really a rpc_pipefs
> > > bug per-se.
> >
> > That depends on the API intention. If an init is intended after
> > SLAB free then yes if not then not. We should ask for the intention
> > of this API.
> >
> > > I can send a patch that works around this problem, but
> > > if there are plans to fix this in the debugobjects code, I won't
> > > bother...
> > >
> >
> > You mean other fix then calling INIT_DELAYED_WORK? is that so
> > bad that we need more code to avoid it?
> >
>
> I'm not opposed to a patch that sidesteps this problem, but I want to
> make sure we understand it so that we don't get bitten by it in other
> places. That's a good point. I hadn't considered whether memory
> poisoning is a factor. In the kernel I was testing:
>
> CONFIG_SLUB=y
> CONFIG_SLUB_DEBUG_ON=y
>
> ...just to be sure:
>
> # cat /sys/kernel/slab/rpc_inode_cache/poison
> 1
>
> Looking at the code...
>
> It looks like SLAB will call the ctor on every object when it's
> allocated, even if it was recycled from an existing slab. SLUB doesn't
> do that however -- as best I can tell it avoids poisoning objects when
> there is a ctor function, so they don't get reinitialized like they
> would with SLAB.
>
> Probably the best solution here is to eliminate the ctor function and
> just initialize the objects whenever they're allocated. Since these
> objects aren't frequently recycled then there's little benefit to
> keeping that around, IMO. I'll spin up a patch for that soon.
>
> Still, I wonder if there are other problems like this around. The slab
> allocators seem to call debug_check_no_obj_freed() on kmem_cache_free,
> but parts of the objects themselves (like the timer in the work object
> here) get initialized in other places and aren't necessarily
> reinitialized when they're recycled out of the slab...
>

On second thought...getting rid of the ctor function here might be
problematic. We have to call inode_init_once, etc...

Almost all of the inode slabs have one, so I've settled for just moving
the INIT_DELAYED_WORK call out of init_once and into rpc_alloc_inode. I
sent a patch to Trond and linux-nfs to do that. That will fix this
case, but I do wonder if there are other places in the kernel that have
similar problems with debugobject initialization.

--
Jeff Layton <[email protected]>

2012-01-24 09:59:13

by Boaz Harrosh

[permalink] [raw]
Subject: Re: WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()

On 01/24/2012 09:45 AM, Stanislaw Gruszka wrote:
> On Mon, Jan 23, 2012 at 10:23:11AM -0500, Jeff Layton wrote:
>> On Sun, 22 Jan 2012 00:46:14 -0800
>> Stephen Boyd <[email protected]> wrote:
>>
>>> On 1/20/2012 10:56 AM, Jeff Layton wrote:
>>>> What I'm finding though is that if I stop and start nfsd multiple
>>>> times, then I eventually get a warning like this on start:
>>>>
>>>> [ 5157.128514] WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
>>>> [ 5157.128742] Hardware name: Bochs
>>>> [ 5157.128742] ODEBUG: activate not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20
>>>> [ 5157.128742] Modules linked in: nfsd(O) nfs_acl auth_rpcgss lockd sunrpc floppy virtio_net i2c_piix4 i2c_core virtio_balloon joydev pcspkr virtio_blk [last unloaded: nfsd]
>>>> [ 5157.128742] Pid: 1312, comm: rpc.nfsd Tainted: G W O 3.3.0-rc1+ #1
>>>> [ 5157.128742] Call Trace:
>>>> [ 5157.128742] [<ffffffff8106135f>] warn_slowpath_common+0x7f/0xc0
>>>> [ 5157.128742] [<ffffffff81061456>] warn_slowpath_fmt+0x46/0x50
>>>> [ 5157.128742] [<ffffffff8132ba2c>] debug_print_object+0x8c/0xb0
>>>> [ 5157.128742] [<ffffffff81070db0>] ? timer_debug_hint+0x10/0x10
>>>> [ 5157.128742] [<ffffffff8132c02b>] debug_object_activate+0xfb/0x190
>>>> [ 5157.128742] [<ffffffff81072728>] ? lock_timer_base.isra.24+0x38/0x70
>>>> [ 5157.128742] [<ffffffff81074676>] mod_timer+0xf6/0x450
>>>> [ 5157.128742] [<ffffffff810749e8>] add_timer+0x18/0x20
>>>> [ 5157.128742] [<ffffffff8108168e>] queue_delayed_work_on+0xbe/0x140
>>>> [ 5157.128742] [<ffffffff81084441>] queue_delayed_work+0x21/0x40
>>>> [ 5157.128742] [<ffffffffa006e568>] rpc_queue_upcall+0xe8/0x100 [sunrpc]
>>>> [ 5157.128742] [<ffffffffa012a121>] __cld_pipe_upcall+0x61/0xc0 [nfsd]
>>>> [ 5157.128742] [<ffffffffa012ad98>] nfsd4_cld_init+0x48/0x140 [nfsd]
>>>> [ 5157.128742] [<ffffffffa012b22a>] nfsd4_client_tracking_init+0x2a/0xc0 [nfsd]
>>>> [ 5157.128742] [<ffffffff8169797e>] ? mutex_unlock+0xe/0x10
>>>> [ 5157.128742] [<ffffffffa01266fa>] nfs4_state_start+0x1a/0x100 [nfsd]
>>>> [ 5157.128742] [<ffffffffa01028c5>] nfsd_svc+0x135/0x200 [nfsd]
>>>> [ 5157.128742] [<ffffffffa0103df0>] ? write_maxblksize+0x130/0x130 [nfsd]
>>>> [ 5157.128742] [<ffffffffa0103e6d>] write_threads+0x7d/0xd0 [nfsd]
>>>> [ 5157.128742] [<ffffffff811dd16a>] ? simple_transaction_get+0xca/0xe0
>>>> [ 5157.128742] [<ffffffffa0102ee7>] nfsctl_transaction_write+0x57/0x90 [nfsd]
>>>> [ 5157.128742] [<ffffffff811b4c9f>] vfs_write+0xaf/0x190
>>>> [ 5157.128742] [<ffffffff811b4fdd>] sys_write+0x4d/0x90
>>>> [ 5157.128742] [<ffffffff816a3469>] system_call_fastpath+0x16/0x1b
>>>
>>> This one is telling you that the timer you're about to run hasn't been
>>> registered with debug_objects. Most likely the work item hasn't been
>>> initialized properly and so it hasn't had INIT_DELAYED_WORK() called on it.
>>>
>>>>
>>>> ...or this on stop:
>>>>
>>>> [ 5200.804410] WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
>>>> [ 5200.808205] Hardware name: Bochs
>>>> [ 5200.810121] ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20
>>>> [ 5200.812734] Modules linked in: nfsd(O) nfs_acl auth_rpcgss lockd sunrpc floppy virtio_net i2c_piix4 i2c_core virtio_balloon joydev pcspkr virtio_blk [last unloaded: nfsd]
>>>> [ 5200.821596] Pid: 1394, comm: nfsd Tainted: G W O 3.3.0-rc1+ #1
>>>> [ 5200.822409] Call Trace:
>>>> [ 5200.822726] [<ffffffff8106135f>] warn_slowpath_common+0x7f/0xc0
>>>> [ 5200.823509] [<ffffffff81061456>] warn_slowpath_fmt+0x46/0x50
>>>> [ 5200.824243] [<ffffffff8132ba2c>] debug_print_object+0x8c/0xb0
>>>> [ 5200.824935] [<ffffffff81070db0>] ? timer_debug_hint+0x10/0x10
>>>> [ 5200.825665] [<ffffffff8132c543>] debug_object_assert_init+0xe3/0x120
>>>> [ 5200.826474] [<ffffffff810737b4>] del_timer+0x24/0x1b0
>>>> [ 5200.827129] [<ffffffff81085314>] __cancel_work_timer+0x34/0x140
>>>> [ 5200.827829] [<ffffffff81085432>] cancel_delayed_work_sync+0x12/0x20
>>>> [ 5200.832905] [<ffffffffa006e430>] rpc_unlink+0x1e0/0x230 [sunrpc]
>>>> [ 5200.833716] [<ffffffffa012a195>] nfsd4_remove_cld_pipe+0x15/0x40 [nfsd]
>>>> [ 5200.834558] [<ffffffffa012b2e0>] nfsd4_client_tracking_exit+0x20/0x30 [nfsd]
>>>> [ 5200.835434] [<ffffffffa0126984>] nfs4_state_shutdown+0x1a4/0x1c0 [nfsd]
>>>> [ 5200.836271] [<ffffffffa010219a>] nfsd_last_thread+0x2a/0x60 [nfsd]
>>>> [ 5200.837045] [<ffffffffa005f66c>] svc_destroy+0x5c/0x140 [sunrpc]
>>>> [ 5200.837778] [<ffffffffa005f9f6>] svc_exit_thread+0xa6/0xb0 [sunrpc]
>>>> [ 5200.838545] [<ffffffffa0102123>] nfsd+0x123/0x170 [nfsd]
>>>> [ 5200.839221] [<ffffffffa0102000>] ? 0xffffffffa0101fff
>>>> [ 5200.839820] [<ffffffff8108a747>] kthread+0xb7/0xc0
>>>> [ 5200.840431] [<ffffffff816a48b4>] kernel_thread_helper+0x4/0x10
>>>> [ 5200.850883] [<ffffffff8169acf4>] ? retint_restore_args+0x13/0x13
>>>> [ 5200.851654] [<ffffffff8108a690>] ? kthread_worker_fn+0x1a0/0x1a0
>>>> [ 5200.852418] [<ffffffff816a48b0>] ? gs_change+0x13/0x13
>>>
>>> This is similar. Now you're deleting a timer that debug_objects doesn't
>>> know about.
>>>
>>>>
>>>> The problem is that even after staring at this code for a while, I'm
>>>> still clueless as to what this is telling me. It doesn't happen every
>>>> time either, so maybe there's a race of some sort involved.
>>>>
>>>> It's possible that my patch is broken and doing something wrong, but
>>>> I'm starting not to think so. I'm not really using timers directly --
>>>> I'm using rpc_pipefs, which queues a delayed job to a workqueue, and
>>>> that is manipulating the timer. The delayed_work (and hence the timer)
>>>> are part of an rpc_inode.
>>>>
>>>> Can anyone shed a little light on what this is complaining about?
>>>>
>>>
>>> Do you have workqueue debugging enabled too (DEBUG_OBJECTS_WORK)? I
>>> would hope that work item debugging would say the same things and then
>>> we would know that the work item itself wasn't initialized properly.
>>
>> (adding linux-nfs to cc list...)
>>
>> Ok, I think I sort of see what's happening, but I'm not sure if it's a
>> bug in the debug objects code or something else. To answer the question
>> that I didn't before, the kernels I've been testing have this set:
>>
>> CONFIG_DEBUG_OBJECTS_WORK=y
>>
>> ...so workqueue debugging is on but I haven't seen any messages that
>> come from it AFAICT. The rpc_inode objects have their delayed_work
>> fields initialized via a slab "constructor". Here's the rpc_inode_cache
>> creation call:
>>
>> rpc_inode_cachep = kmem_cache_create("rpc_inode_cache",
>> sizeof(struct rpc_inode),
>> 0, (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
>> SLAB_MEM_SPREAD),
>> init_once);
>>
>> ...and the init_once() routine does this:
>>
>> INIT_DELAYED_WORK(&rpci->queue_timeout,
>> rpc_timeout_upcall_queue);
>>
>> On the first use of an rpc_inode object, everything works fine. I think
>> the problem comes in when rpc_inode objects get recycled without the
>> INIT_DELAYED_WORK() getting called on it again.
>>
>> Before the object is freed it gets cancel_delayed_work_sync() called on
>> it, but that's apparently not enough to convince the debugobjects code
>> that it's already correctly initialized. As a Q&D check, the following
>> patch seems to stop the warnings. Is there some better way to do this
>> that doesn't require the reinitialization of the delayed work on each
>> inode allocation?
>
> If this is debug objects false positive, it should be rather fixed in
> debug object code itself, not workaround in nfs code, but I do not have
> any idea how to do this.
>
> Stanislaw
>

I don't think so. Look at what INIT_DELAYED_WORK does.
INIT_WORK - Set some start values, register with the objects-debugger if on.
init_timer -

I'd imagine that cancel_delayed_work_sync() would: 1st cancel some of those
registrations, specifically with the objects-debugger. 2nd will leave some
state undefined as left from the last round.

I think this patch is cardinal and you might have found a potential bug.

Also the cost of INIT_DELAYED_WORK is marginal compare to the extra safety.

Recomended-by: Boaz Harrosh <[email protected]>

>> --- a/net/sunrpc/rpc_pipe.c
>> +++ b/net/sunrpc/rpc_pipe.c
>> @@ -178,6 +178,8 @@ rpc_alloc_inode(struct super_block *sb)
>> rpci = (struct rpc_inode *)kmem_cache_alloc(rpc_inode_cachep, GFP_KERNEL);
>> if (!rpci)
>> return NULL;
>> +
>> + INIT_DELAYED_WORK(&rpci->queue_timeout, rpc_timeout_upcall_queue);
>> return &rpci->vfs_inode;
>> }
>>
>> --
>> Jeff Layton <[email protected]>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2012-01-25 14:48:12

by Boaz Harrosh

[permalink] [raw]
Subject: Re: WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()

On 01/25/2012 04:05 PM, Thomas Gleixner wrote:
> On Tue, 24 Jan 2012, Jeff Layton wrote:
>>> Still, I wonder if there are other problems like this around. The slab
>>> allocators seem to call debug_check_no_obj_freed() on kmem_cache_free,
>>> but parts of the objects themselves (like the timer in the work object
>>> here) get initialized in other places and aren't necessarily
>>> reinitialized when they're recycled out of the slab...
>>>
>>
>> On second thought...getting rid of the ctor function here might be
>> problematic. We have to call inode_init_once, etc...
>>
>> Almost all of the inode slabs have one, so I've settled for just moving
>> the INIT_DELAYED_WORK call out of init_once and into rpc_alloc_inode. I
>> sent a patch to Trond and linux-nfs to do that. That will fix this
>> case, but I do wonder if there are other places in the kernel that have
>> similar problems with debugobject initialization.
>
> The problem is that debugobject requires that a newly allocated object
> is reinitialized and made available to the debugobjects code again
> simply because we remove it from the debugobjects core on
> kmem_cache_free().
>
> The real question is why the heck kmem_cache_alloc() does not call the
> ctor on each allocation and just expects the previously used and freed
> object to be in a consistent initialiazed state.
>

Perhaps it's some flags that is to do with the RCU delete thing.
You know for the lockless walks and stuff

> Thanks,
>
> tglx


2012-01-24 15:01:50

by Boaz Harrosh

[permalink] [raw]
Subject: Re: WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()

On 01/24/2012 02:36 PM, Jeff Layton wrote:
>
> No, I don't think the state would be undefined after
> cancel_delayed_work_sync. In principle you could requeue that work
> again if you like without needing to reinitialize it.
>
> I think this is a problem in the debugobjects code. It doesn't have
> any way to know that when the object is recycled out of the slab that
> the work is already initialized.
>

The only difference between your above example of requeue after
cancel_delayed_work_sync, and this here is the visit back to the
slab. Does the slab (Maybe in debug mode) stumps over some of the
record memory?

If the memory is constant what is then the difference between the two
cases?

> Certainly it's simple enough to reinitialize the work every time we
> allocate an inode here, but I don't think this is really a rpc_pipefs
> bug per-se.

That depends on the API intention. If an init is intended after
SLAB free then yes if not then not. We should ask for the intention
of this API.

> I can send a patch that works around this problem, but
> if there are plans to fix this in the debugobjects code, I won't
> bother...
>

You mean other fix then calling INIT_DELAYED_WORK? is that so
bad that we need more code to avoid it?

Thanks
Boaz