LinuxLists.cc - Re: 2.6.23-rc1-mm2

2007-08-03 11:00:46

Subject: Re: 2.6.23-rc1-mm2

Hi,

Am Wednesday 01 August 2007 08:09 schrieb Andrew Morton:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc1/2.
>6.23-rc1-mm2/
>
>
> - the git-block tree remains dropped due to disageement with the Vaio
>
> - git-e1000new was withdrawn by the authors
>
> - git-wireless is back. It is still a >3MB diff, and appears to compile.
>
> - Is anyone testing the kgdb code in here?

I still get some nfs related locking bug.

I applied

linux-2.6.23-001-fix_rpciod_down_race.dif
linux-2.6.23-003-fix_locking_regression.dif
linux-2.6.23-004-fix_stateid_regression.dif

=============================================
[ INFO: possible recursive locking detected ]
2.6.23-rc1-mm2 #3
---------------------------------------------
events/0/5 is trying to acquire lock:
(events){--..}, at: [<c012ed90>] flush_workqueue+0x0/0x70

but task is already holding lock:
(events){--..}, at: [<c012e5c4>] run_workqueue+0xd4/0x1e0

other info that might help us debug this:
2 locks held by events/0/5:
#0: (events){--..}, at: [<c012e5c4>] run_workqueue+0xd4/0x1e0
#1: ((nfs_automount_task).work){--..}, at: [<c012e5c4>]
run_workqueue+0xd4/0x1e0

stack backtrace:
[<c0104fda>] show_trace_log_lvl+0x1a/0x30
[<c0105c02>] show_trace+0x12/0x20
[<c0105d15>] dump_stack+0x15/0x20
[<c013ee42>] __lock_acquire+0xc22/0x1030
[<c013f2b1>] lock_acquire+0x61/0x80
[<c012edd9>] flush_workqueue+0x49/0x70
[<c012ee0d>] flush_scheduled_work+0xd/0x10
[<dcf55c0c>] nfs_release_automount_timer+0x2c/0x30 [nfs]
[<dcf45d8e>] nfs_free_server+0x9e/0xd0 [nfs]
[<dcf4e626>] nfs_kill_super+0x16/0x20 [nfs]
[<c017b38d>] deactivate_super+0x7d/0xa0
[<c018f94b>] mntput_no_expire+0x4b/0x80
[<c018fd94>] expire_mount_list+0xe4/0x140
[<c0191219>] mark_mounts_for_expiry+0x99/0xb0
[<dcf55d1d>] nfs_expire_automounts+0xd/0x40 [nfs]
[<c012e61b>] run_workqueue+0x12b/0x1e0
[<c012f05b>] worker_thread+0x9b/0x100
[<c0131c72>] kthread+0x42/0x70
[<c0104c0f>] kernel_thread_helper+0x7/0x18
=======================

thanks

Marc

--
"The enemy uses unauthorized weapons."
Lord Arthur Ponsonby, "Falsehood in Wartime: Propaganda Lies of the First
World War", 1928

2007-08-06 16:24:30

by Trond Myklebust

[permalink] [raw]

Subject: Re: 2.6.23-rc1-mm2

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

Attachments:

linux-2.6.23-005-dont_call_rpc_putcred_from_rcu.dif (2.08 kB)
(No filename) (315.00 B)
(No filename) (140.00 B)
Download all attachments

2007-08-07 12:09:30

by Marc Dietrich

[permalink] [raw]

Subject: Re: [NFS] 2.6.23-rc1-mm2

Am Monday 06 August 2007 18:24 schrieb Trond Myklebust:
> On Mon, 2007-08-06 at 13:05 +0200, Marc Dietrich wrote:
> > Hi,
> >
> > > (...)
> >
> > just booting into X is enough.
> >
> > I applied the patch, but now I get:
> >
> > =================================
> > [ INFO: inconsistent lock state ]
> > 2.6.23-rc1-mm2 #4
> > ---------------------------------
> > inconsistent {softirq-on-W} -> {in-softirq-W} usage.
> > swapper/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
> > (rpc_credcache_lock){-+..}, at: [<c01dc487>]
> > _atomic_dec_and_lock+0x17/0x60 {softirq-on-W} state was registered at:
> > [<c013e870>] __lock_acquire+0x650/0x1030
> > [<c013f2b1>] lock_acquire+0x61/0x80
> > [<c02db9ac>] _spin_lock+0x2c/0x40
> > [<c01dc487>] _atomic_dec_and_lock+0x17/0x60
> > [<dced55fd>] put_rpccred+0x5d/0x100 [sunrpc]
> > [<dced56c1>] rpcauth_unbindcred+0x21/0x60 [sunrpc]
> > [<dced3fd4>] a0 [sunrpc]
> > [<dcecefe0>] rpc_call_sync+0x30/0x40 [sunrpc]
> > [<dcedc73b>] rpcb_register+0xdb/0x180 [sunrpc]
> > [<dced65b3>] svc_register+0x93/0x160 [sunrpc]
> > [<dced6ebe>] __svc_create+0x1ee/0x220 [sunrpc]
> > [<dced7053>] svc_create+0x13/0x20 [sunrpc]
> > [<dcf6d722>] nfs_callback_up+0x82/0x120 [nfs]
> > [<dcf48f36>] nfs_get_client+0x176/0x390 [nfs]
> > [<dcf49181>] nfs4_set_client+0x31/0x190 [nfs]
> > [<dcf49983>] nfs4_create_server+0x63/0x3b0 [nfs]
> > [<dcf52426>] nfs4_get_sb+0x346/0x5b0 [nfs]
> > [<c017b444>] vfs_kern_mount+0x94/0x110
> > [<c0190a62>] do_mount+0x1f2/0x7d0
> > [<c01910a6>] sys_mount+0x66/0xa0
> > [<c0104046>] syscall_call+0x7/0xb
> > [<ffffffff>] 0xffffffff
> > irq event stamp: 5277830
> > hardirqs last enabled at (5277830): [<c017530a>]
> > kmem_cache_free+0x8a/0xc0 hardirqs last disabled at (5277829):
> > [<c01752d2>] kmem_cache_free+0x52/0xc0 softirqs last enabled at
> > (5277798): [<c0124173>] __do_softirq+0xa3/0xc0 softirqs last disabled at
> > (5277817): [<c01241d7>] do_softirq+0x47/0x50
> >
> > other info that might help us debug this:
> > no locks held by swapper/0.
> >
> > stack backtrace:
> > [<c0104fda>] show_trace_log_lvl+0x1a/0x30
> > [<c0105c02>] show_trace+0x12/0x20
> > [<c0105d15>] dump_stack+0x15/0x20
> > [<c013ccc3>] print_usage_bug+0x153/0x160
> > [<c013d8b9>] mark_lock+0x449/0x620
> > [<c013e824>] __lock_acquire+0x604/0x1030
> > [<c013f2b1>] lock_acquire+0x61/0x80
> > [<c02db9ac>] _spin_lock+0x2c/0x40
> > [<c01dc487>] _atomic_dec_and_lock+0x17/0x60
> > [<dced55fd>] put_rpccred+0x5d/0x100 [sunrpc]
> > [<dcf6bf83>] nfs_free_delegation_callback+0x13/0x20 [nfs]
> > [<c012f9ea>] __rcu_process_callbacks+0x6a/0x1c0
> > [<c012fb52>] rcu_process_callbacks+0x12/0x30
> > [<c0124218>] tasklet_action+0x38/0x80
> > [<c0124125>] __do_softirq+0x55/0xc0
> > [<c01241d7>] do_softirq+0x47/0x50
> > [<c0124605>] irq_exit+0x35/0x40
> > [<c0112463>] smp_apic_timer_interrupt+0x43/0x80
> > [<c0104a77>] apic_timer_interrupt+0x33/0x38
> > [<c02690df>] cpuidle_idle_call+0x6f/0x90
> > [<c01023c3>] cpu_idle+0x43/0x70
> > [<c02d8c27>] rest_init+0x47/0x50
> > [<c03bcb6a>] start_kernel+0x22a/0x2b0
> > [<00000000>] 0x0
> > =======================
>
> That is a different matter. I assume this patch should suffice to fix
> the above problem.
>
> Trond

yes - it does.

thanks.

Marc

--
"Our cause has a sacred nature."
Lord Arthur Ponsonby, "Falsehood in Wartime: Propaganda Lies of the First
World War", 1928

2007-08-07 21:09:58

by Trond Myklebust

[permalink] [raw]

Subject: Re: 2.6.23-rc1-mm2

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

Attachments:

linux-2.6.23-007-nfs_dont_call_flush_workqueue.dif (2.36 kB)
linux-2.6.23-008-sunrpc_dont_call_flush_workqueue.dif (1.37 kB)
(No filename) (315.00 B)
(No filename) (140.00 B)
Download all attachments

2007-08-07 21:38:01

by Oleg Nesterov

[permalink] [raw]

Subject: Re: 2.6.23-rc1-mm2

On 08/07, Trond Myklebust wrote:
>
> On Fri, 2007-08-03 at 21:21 +0400, Oleg Nesterov wrote:
> > On 08/03, Trond Myklebust wrote:
> > > I'll have a look at this. I suspect that most if not all of our calls to
> > > run_workqueue()/flush_scheduled_work() can now be replaced by more
> > > targeted calls to cancel_work_sync() and cancel_delayed_work_sync().
> >
> > Yes, please, if possible.
>
> All the NFS and SUNRPC cases appear to be trivial. IOW: the only reason
> for the flush_workqueue()/flush_scheduled_work() calls was to ensure
> that the cancel_work()/cancel_delayed_work() calls preceding them have
> completed. Nevertheless I've split the conversion into two patches,
> since one touches only the NFS code, whereas the other touches the
> SUNRPC client and server code.
>
> The two patches have been tested, and appear to work...

Great!

> void
> nfs4_kill_renewd(struct nfs_client *clp)
> {
> down_read(&clp->cl_sem);
> - cancel_delayed_work(&clp->cl_renewd);
> + cancel_delayed_work_sync(&clp->cl_renewd);
> up_read(&clp->cl_sem);
> - flush_scheduled_work();
> }

this looks unsafe to me, the window is very small, but afaics this can
deadlock if called when nfs4_renew_state() has already started, but didn't
take ->cl_sem yet.

Can't we avoid taking clp->cl_sem here?

Btw, unless I missed something, the code without this patch looks incorrect
too: cancel_delayed_work() can fail if the timer expired, but the ->cl_renewd
didn't run yet. In that case nfs4_renew_state() can run and re-schedule itself
after flush_scheduled_work() returns.

Oleg.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-08-07 22:05:39

by Trond Myklebust

[permalink] [raw]

Subject: Re: 2.6.23-rc1-mm2

On Wed, 2007-08-08 at 01:37 +0400, Oleg Nesterov wrote:
> On 08/07, Trond Myklebust wrote:
> >
> > On Fri, 2007-08-03 at 21:21 +0400, Oleg Nesterov wrote:
> > > On 08/03, Trond Myklebust wrote:
> > > > I'll have a look at this. I suspect that most if not all of our calls to
> > > > run_workqueue()/flush_scheduled_work() can now be replaced by more
> > > > targeted calls to cancel_work_sync() and cancel_delayed_work_sync().
> > >
> > > Yes, please, if possible.
> >
> > All the NFS and SUNRPC cases appear to be trivial. IOW: the only reason
> > for the flush_workqueue()/flush_scheduled_work() calls was to ensure
> > that the cancel_work()/cancel_delayed_work() calls preceding them have
> > completed. Nevertheless I've split the conversion into two patches,
> > since one touches only the NFS code, whereas the other touches the
> > SUNRPC client and server code.
> >
> > The two patches have been tested, and appear to work...
>
> Great!
>
> > void
> > nfs4_kill_renewd(struct nfs_client *clp)
> > {
> > down_read(&clp->cl_sem);
> > - cancel_delayed_work(&clp->cl_renewd);
> > + cancel_delayed_work_sync(&clp->cl_renewd);
> > up_read(&clp->cl_sem);
> > - flush_scheduled_work();
> > }
>
> this looks unsafe to me, the window is very small, but afaics this can
> deadlock if called when nfs4_renew_state() has already started, but didn't
> take ->cl_sem yet.

Not really. We have removed the nfs_client from the public lists, and we
are guaranteed that there are no more active superblocks attached to it
so nothing can call the reclaimer routine (which is the only routine
that takes a write lock on clp->cl_sem).

> Can't we avoid taking clp->cl_sem here?

Yes, I believe that we can, for the same reasons as above: the race with
the reclaimer is impossible, hence the read lock on cl_sem is redundant.

> Btw, unless I missed something, the code without this patch looks incorrect
> too: cancel_delayed_work() can fail if the timer expired, but the ->cl_renewd
> didn't run yet. In that case nfs4_renew_state() can run and re-schedule itself
> after flush_scheduled_work() returns.

No, that should not be possible. Again, see above: there are no active
superblocks, so clp->cl_superblocks is empty.

Cheers
Trond

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-08-07 22:20:36

by Oleg Nesterov

[permalink] [raw]

Subject: Re: 2.6.23-rc1-mm2

On 08/07, Trond Myklebust wrote:
>
> On Wed, 2007-08-08 at 01:37 +0400, Oleg Nesterov wrote:
> > On 08/07, Trond Myklebust wrote:
> > >
> > > On Fri, 2007-08-03 at 21:21 +0400, Oleg Nesterov wrote:
> > > > On 08/03, Trond Myklebust wrote:
> > > > > I'll have a look at this. I suspect that most if not all of our calls to
> > > > > run_workqueue()/flush_scheduled_work() can now be replaced by more
> > > > > targeted calls to cancel_work_sync() and cancel_delayed_work_sync().
> > > >
> > > > Yes, please, if possible.
> > >
> > > All the NFS and SUNRPC cases appear to be trivial. IOW: the only reason
> > > for the flush_workqueue()/flush_scheduled_work() calls was to ensure
> > > that the cancel_work()/cancel_delayed_work() calls preceding them have
> > > completed. Nevertheless I've split the conversion into two patches,
> > > since one touches only the NFS code, whereas the other touches the
> > > SUNRPC client and server code.
> > >
> > > The two patches have been tested, and appear to work...
> >
> > Great!
> >
> > > void
> > > nfs4_kill_renewd(struct nfs_client *clp)
> > > {
> > > down_read(&clp->cl_sem);
> > > - cancel_delayed_work(&clp->cl_renewd);
> > > + cancel_delayed_work_sync(&clp->cl_renewd);
> > > up_read(&clp->cl_sem);
> > > - flush_scheduled_work();
> > > }
> >
> > this looks unsafe to me, the window is very small, but afaics this can
> > deadlock if called when nfs4_renew_state() has already started, but didn't
> > take ->cl_sem yet.
>
> Not really. We have removed the nfs_client from the public lists, and we
> are guaranteed that there are no more active superblocks attached to it
> so nothing can call the reclaimer routine (which is the only routine
> that takes a write lock on clp->cl_sem).

Thanks for your explanation. Not that I was able to understand, nfs is a
black magic to me :)

But. nfs4_renew_state() checks list_empty(&clp->cl_superblocks) under
clp->cl_sem? So, if it is possible that clp->cl_renewd was scheduled
at the time when nfs4_kill_renewd(), we can deadlock, no? Because
nfs4_renew_state() needs clp->cl_sem to complete, but nfs4_kill_renewd()
holds this sem, and waits for nfs4_renew_state() completion.

> > Btw, unless I missed something, the code without this patch looks incorrect
> > too: cancel_delayed_work() can fail if the timer expired, but the ->cl_renewd
> > didn't run yet. In that case nfs4_renew_state() can run and re-schedule itself
> > after flush_scheduled_work() returns.
>
> No, that should not be possible. Again, see above: there are no active
> superblocks, so clp->cl_superblocks is empty.

Yes, thanks. I missed "goto out" in nfs4_renew_state().

Oleg.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-08-07 23:08:55

by Trond Myklebust

[permalink] [raw]

Subject: Re: 2.6.23-rc1-mm2

On Wed, 2007-08-08 at 02:20 +0400, Oleg Nesterov wrote:

> But. nfs4_renew_state() checks list_empty(&clp->cl_superblocks) under
> clp->cl_sem? So, if it is possible that clp->cl_renewd was scheduled
> at the time when nfs4_kill_renewd(), we can deadlock, no? Because
> nfs4_renew_state() needs clp->cl_sem to complete, but nfs4_kill_renewd()
> holds this sem, and waits for nfs4_renew_state() completion.

They both take read locks, which means that they can take them
simultaneously. AFAICS, the deadlock can only occur if something manages
to insert a request for a write lock after nfs4_kill_renewd() takes its
read lock, but before nfs4_renew_state() takes its read lock:

1) nfs4_kill_renewd() 2) nfs4_renew_state() 3) somebody else
------------------- ------------------ -------------
read lock
wait on (2) to complete
write lock <waits on (1)>
read lock <waits on (3),
because rw_semaphores
don't allow a read lock
request to jump a write
lock request>

however as I explained earlier, the only process that can take a write
lock is the reclaimer daemon, but we _know_ that cannot be running (for
one thing, the reference count on nfs_client is zero, for the other,
there are no superblocks).

Cheers
Trond

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-08-07 23:14:33

by Oleg Nesterov

[permalink] [raw]

Subject: Re: 2.6.23-rc1-mm2

On 08/07, Trond Myklebust wrote:
>
> On Wed, 2007-08-08 at 02:20 +0400, Oleg Nesterov wrote:
>
> > But. nfs4_renew_state() checks list_empty(&clp->cl_superblocks) under
> > clp->cl_sem? So, if it is possible that clp->cl_renewd was scheduled
> > at the time when nfs4_kill_renewd(), we can deadlock, no? Because
> > nfs4_renew_state() needs clp->cl_sem to complete, but nfs4_kill_renewd()
> > holds this sem, and waits for nfs4_renew_state() completion.
>
> They both take read locks,

Aaaaaaaaaah. Please ignore me, thanks!

> which means that they can take them
> simultaneously. AFAICS, the deadlock can only occur if something manages
> to insert a request for a write lock after nfs4_kill_renewd() takes its
> read lock, but before nfs4_renew_state() takes its read lock:
>
> 1) nfs4_kill_renewd() 2) nfs4_renew_state() 3) somebody else
> ------------------- ------------------ -------------
> read lock
> wait on (2) to complete
> write lock <waits on (1)>
> read lock <waits on (3),
> because rw_semaphores
> don't allow a read lock
> request to jump a write
> lock request>
>
> however as I explained earlier, the only process that can take a write
> lock is the reclaimer daemon, but we _know_ that cannot be running (for
> one thing, the reference count on nfs_client is zero, for the other,
> there are no superblocks).

Oleg.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-08-03 16:39:40

by Andrew Morton

[permalink] [raw]

Subject: Re: 2.6.23-rc1-mm2

On Fri, 3 Aug 2007 13:00:46 +0200 Marc Dietrich <[email protected]> wrote:

>
> Hi,
>
> Am Wednesday 01 August 2007 08:09 schrieb Andrew Morton:
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc1/2.
> >6.23-rc1-mm2/
> >
> >
> > - the git-block tree remains dropped due to disageement with the Vaio
> >
> > - git-e1000new was withdrawn by the authors
> >
> > - git-wireless is back. It is still a >3MB diff, and appears to compile.
> >
> > - Is anyone testing the kgdb code in here?
>
> I still get some nfs related locking bug.
>
> I applied
>
> linux-2.6.23-001-fix_rpciod_down_race.dif
> linux-2.6.23-003-fix_locking_regression.dif
> linux-2.6.23-004-fix_stateid_regression.dif
>
> =============================================
> [ INFO: possible recursive locking detected ]
> 2.6.23-rc1-mm2 #3
> ---------------------------------------------
> events/0/5 is trying to acquire lock:
> (events){--..}, at: [<c012ed90>] flush_workqueue+0x0/0x70
>
> but task is already holding lock:
> (events){--..}, at: [<c012e5c4>] run_workqueue+0xd4/0x1e0
>
> other info that might help us debug this:
> 2 locks held by events/0/5:
> #0: (events){--..}, at: [<c012e5c4>] run_workqueue+0xd4/0x1e0
> #1: ((nfs_automount_task).work){--..}, at: [<c012e5c4>]
> run_workqueue+0xd4/0x1e0
>
> stack backtrace:
> [<c0104fda>] show_trace_log_lvl+0x1a/0x30
> [<c0105c02>] show_trace+0x12/0x20
> [<c0105d15>] dump_stack+0x15/0x20
> [<c013ee42>] __lock_acquire+0xc22/0x1030
> [<c013f2b1>] lock_acquire+0x61/0x80
> [<c012edd9>] flush_workqueue+0x49/0x70
> [<c012ee0d>] flush_scheduled_work+0xd/0x10
> [<dcf55c0c>] nfs_release_automount_timer+0x2c/0x30 [nfs]
> [<dcf45d8e>] nfs_free_server+0x9e/0xd0 [nfs]
> [<dcf4e626>] nfs_kill_super+0x16/0x20 [nfs]
> [<c017b38d>] deactivate_super+0x7d/0xa0
> [<c018f94b>] mntput_no_expire+0x4b/0x80
> [<c018fd94>] expire_mount_list+0xe4/0x140
> [<c0191219>] mark_mounts_for_expiry+0x99/0xb0
> [<dcf55d1d>] nfs_expire_automounts+0xd/0x40 [nfs]
> [<c012e61b>] run_workqueue+0x12b/0x1e0
> [<c012f05b>] worker_thread+0x9b/0x100
> [<c0131c72>] kthread+0x42/0x70
> [<c0104c0f>] kernel_thread_helper+0x7/0x18
> =======================
>

There is new debugging stuff in -mm: deadlockable usage of workqueue
primitives will now trigger lockdep warnings. See

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc1/2.6.23-rc1-mm2/broken-out/workqueue-debug-flushing-deadlocks-with-lockdep.patch
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc1/2.6.23-rc1-mm2/broken-out/workqueue-debug-work-related-deadlocks-with-lockdep.patch

I am suspecting that running flush_scheduled_work() from within run_workqueue()
isn't good.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-08-03 17:04:05

by Trond Myklebust

[permalink] [raw]

Subject: Re: 2.6.23-rc1-mm2

On Fri, 2007-08-03 at 09:38 -0700, Andrew Morton wrote:
> On Fri, 3 Aug 2007 13:00:46 +0200 Marc Dietrich <[email protected]> wrote:
>
> >
> > Hi,
> >
> > Am Wednesday 01 August 2007 08:09 schrieb Andrew Morton:
> > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc1/2.
> > >6.23-rc1-mm2/
> > >
> > >
> > > - the git-block tree remains dropped due to disageement with the Vaio
> > >
> > > - git-e1000new was withdrawn by the authors
> > >
> > > - git-wireless is back. It is still a >3MB diff, and appears to compile.
> > >
> > > - Is anyone testing the kgdb code in here?
> >
> > I still get some nfs related locking bug.
> >
> > I applied
> >
> > linux-2.6.23-001-fix_rpciod_down_race.dif
> > linux-2.6.23-003-fix_locking_regression.dif
> > linux-2.6.23-004-fix_stateid_regression.dif
> >
> > =============================================
> > [ INFO: possible recursive locking detected ]
> > 2.6.23-rc1-mm2 #3
> > ---------------------------------------------
> > events/0/5 is trying to acquire lock:
> > (events){--..}, at: [<c012ed90>] flush_workqueue+0x0/0x70
> >
> > but task is already holding lock:
> > (events){--..}, at: [<c012e5c4>] run_workqueue+0xd4/0x1e0
> >
> > other info that might help us debug this:
> > 2 locks held by events/0/5:
> > #0: (events){--..}, at: [<c012e5c4>] run_workqueue+0xd4/0x1e0
> > #1: ((nfs_automount_task).work){--..}, at: [<c012e5c4>]
> > run_workqueue+0xd4/0x1e0
> >
> > stack backtrace:
> > [<c0104fda>] show_trace_log_lvl+0x1a/0x30
> > [<c0105c02>] show_trace+0x12/0x20
> > [<c0105d15>] dump_stack+0x15/0x20
> > [<c013ee42>] __lock_acquire+0xc22/0x1030
> > [<c013f2b1>] lock_acquire+0x61/0x80
> > [<c012edd9>] flush_workqueue+0x49/0x70
> > [<c012ee0d>] flush_scheduled_work+0xd/0x10
> > [<dcf55c0c>] nfs_release_automount_timer+0x2c/0x30 [nfs]
> > [<dcf45d8e>] nfs_free_server+0x9e/0xd0 [nfs]
> > [<dcf4e626>] nfs_kill_super+0x16/0x20 [nfs]
> > [<c017b38d>] deactivate_super+0x7d/0xa0
> > [<c018f94b>] mntput_no_expire+0x4b/0x80
> > [<c018fd94>] expire_mount_list+0xe4/0x140
> > [<c0191219>] mark_mounts_for_expiry+0x99/0xb0
> > [<dcf55d1d>] nfs_expire_automounts+0xd/0x40 [nfs]
> > [<c012e61b>] run_workqueue+0x12b/0x1e0
> > [<c012f05b>] worker_thread+0x9b/0x100
> > [<c0131c72>] kthread+0x42/0x70
> > [<c0104c0f>] kernel_thread_helper+0x7/0x18
> > =======================
> >
>
> There is new debugging stuff in -mm: deadlockable usage of workqueue
> primitives will now trigger lockdep warnings. See
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc1/2.6.23-rc1-mm2/broken-out/workqueue-debug-flushing-deadlocks-with-lockdep.patch
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc1/2.6.23-rc1-mm2/broken-out/workqueue-debug-work-related-deadlocks-with-lockdep.patch
>
> I am suspecting that running flush_scheduled_work() from within run_workqueue()
> isn't good.

I'll have a look at this. I suspect that most if not all of our calls to
run_workqueue()/flush_scheduled_work() can now be replaced by more
targeted calls to cancel_work_sync() and cancel_delayed_work_sync().

Trond

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-08-03 17:19:39

by Oleg Nesterov

[permalink] [raw]

Subject: Re: 2.6.23-rc1-mm2

On 08/03, Trond Myklebust wrote:
>
> On Fri, 2007-08-03 at 09:38 -0700, Andrew Morton wrote:
> > > stack backtrace:
> > > [<c0104fda>] show_trace_log_lvl+0x1a/0x30
> > > [<c0105c02>] show_trace+0x12/0x20
> > > [<c0105d15>] dump_stack+0x15/0x20
> > > [<c013ee42>] __lock_acquire+0xc22/0x1030
> > > [<c013f2b1>] lock_acquire+0x61/0x80
> > > [<c012edd9>] flush_workqueue+0x49/0x70
> > > [<c012ee0d>] flush_scheduled_work+0xd/0x10
> > > [<dcf55c0c>] nfs_release_automount_timer+0x2c/0x30 [nfs]
> > > [<dcf45d8e>] nfs_free_server+0x9e/0xd0 [nfs]
> > > [<dcf4e626>] nfs_kill_super+0x16/0x20 [nfs]
> > > [<c017b38d>] deactivate_super+0x7d/0xa0
> > > [<c018f94b>] mntput_no_expire+0x4b/0x80
> > > [<c018fd94>] expire_mount_list+0xe4/0x140
> > > [<c0191219>] mark_mounts_for_expiry+0x99/0xb0
> > > [<dcf55d1d>] nfs_expire_automounts+0xd/0x40 [nfs]
> > > [<c012e61b>] run_workqueue+0x12b/0x1e0
> > > [<c012f05b>] worker_thread+0x9b/0x100
> > > [<c0131c72>] kthread+0x42/0x70
> > > [<c0104c0f>] kernel_thread_helper+0x7/0x18
> > > =======================
> > >
> >
> > There is new debugging stuff in -mm: deadlockable usage of workqueue
> > primitives will now trigger lockdep warnings. See
> >
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc1/2.6.23-rc1-mm2/broken-out/workqueue-debug-flushing-deadlocks-with-lockdep.patch
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc1/2.6.23-rc1-mm2/broken-out/workqueue-debug-work-related-deadlocks-with-lockdep.patch
> >
> > I am suspecting that running flush_scheduled_work() from within run_workqueue()
> > isn't good.
>
> I'll have a look at this. I suspect that most if not all of our calls to
> run_workqueue()/flush_scheduled_work() can now be replaced by more
> targeted calls to cancel_work_sync() and cancel_delayed_work_sync().

Yes, please, if possible.

To avoid a possible confusion: it is still OK if work->func() flushes
its own workqueue, so strictly speaking this trace is false positive,
but it would be very nice if we can get rid of this practice.

Oleg.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-08-06 08:07:41

by Johannes Berg

[permalink] [raw]

Subject: Re: 2.6.23-rc1-mm2

On Fri, 2007-08-03 at 21:21 +0400, Oleg Nesterov wrote:

> To avoid a possible confusion: it is still OK if work->func() flushes
> its own workqueue, so strictly speaking this trace is false positive,
> but it would be very nice if we can get rid of this practice.

I just had a thought: we could get rid of this warning by using a
read-lock here. That way, flushing from within a work function (which
would be seen as read-after-read recursive lock) won't trigger this
warning. Patch below. This would, however, also get rid of any warnings
for run_workqueue recursion. Which again we may or may not want, the
code inidicates that it should be allowed up to a depth of three.

However, the question whether we should allow flush_workqueue from
within a struct work is mainly an API policy issue; it doesn't hurt to
flush a workqueue from within a work, but it is probably nearer the
intent to use targeted cancel_work_sync() or such. OTOH, one could
imagine situations where multiple different work structs are on that
workqueue belonging to the same subsystem and then the general
flush_scheduled_work() call is the only way to guarantee nothing is on
scheduled at a given point... I don't feel qualified to make the
decision for or against allowing this use of the API at this point.

Marc, do you have an easy way to trigger this warning? Could you verify
that it goes away with the patch below applied?

johannes

---
kernel/workqueue.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

--- wireless-dev.orig/kernel/workqueue.c 2007-08-06 08:11:23.297846657 +0200
+++ wireless-dev/kernel/workqueue.c 2007-08-06 08:19:54.727846657 +0200
@@ -272,7 +272,7 @@ static void run_workqueue(struct cpu_wor

BUG_ON(get_wq_data(work) != cwq);
work_clear_pending(work);
- lock_acquire(&cwq->wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
+ lock_acquire(&cwq->wq->lockdep_map, 0, 0, 1, 2, _THIS_IP_);
lock_acquire(&lockdep_map, 0, 0, 0, 2, _THIS_IP_);
f(work);
lock_release(&lockdep_map, 1, _THIS_IP_);
@@ -395,7 +395,7 @@ void fastcall flush_workqueue(struct wor
int cpu;

might_sleep();
- lock_acquire(&wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
+ lock_acquire(&wq->lockdep_map, 0, 0, 1, 2, _THIS_IP_);
lock_release(&wq->lockdep_map, 1, _THIS_IP_);
for_each_cpu_mask(cpu, *cpu_map)
flush_cpu_workqueue(per_cpu_ptr(wq->cpu_wq, cpu));
@@ -779,7 +779,7 @@ static void cleanup_workqueue_thread(str
if (cwq->thread == NULL)
return;

- lock_acquire(&cwq->wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
+ lock_acquire(&cwq->wq->lockdep_map, 0, 0, 1, 2, _THIS_IP_);
lock_release(&cwq->wq->lockdep_map, 1, _THIS_IP_);

flush_cpu_workqueue(cwq);

Attachments:

signature.asc (190.00 B)
This is a digitally signed message part (No filename) (315.00 B)
(No filename) (140.00 B)
Download all attachments

2007-08-06 10:53:42

by Oleg Nesterov

[permalink] [raw]

Subject: Re: 2.6.23-rc1-mm2

On 08/06, Johannes Berg wrote:
>
> On Fri, 2007-08-03 at 21:21 +0400, Oleg Nesterov wrote:
>
> > To avoid a possible confusion: it is still OK if work->func() flushes
> > its own workqueue, so strictly speaking this trace is false positive,
> > but it would be very nice if we can get rid of this practice.
>
> However, the question whether we should allow flush_workqueue from
> within a struct work is mainly an API policy issue; it doesn't hurt to
> flush a workqueue from within a work,

I am not sure, but currently I hope we can forbid this eventually, so I
personally think it is good that your patch complains.

> --- wireless-dev.orig/kernel/workqueue.c 2007-08-06 08:11:23.297846657 +0200
> +++ wireless-dev/kernel/workqueue.c 2007-08-06 08:19:54.727846657 +0200
> @@ -272,7 +272,7 @@ static void run_workqueue(struct cpu_wor
>
> BUG_ON(get_wq_data(work) != cwq);
> work_clear_pending(work);
> - lock_acquire(&cwq->wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
> + lock_acquire(&cwq->wq->lockdep_map, 0, 0, 1, 2, _THIS_IP_);
> lock_acquire(&lockdep_map, 0, 0, 0, 2, _THIS_IP_);
> f(work);
> lock_release(&lockdep_map, 1, _THIS_IP_);
> @@ -395,7 +395,7 @@ void fastcall flush_workqueue(struct wor
> int cpu;
>
> might_sleep();
> - lock_acquire(&wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
> + lock_acquire(&wq->lockdep_map, 0, 0, 1, 2, _THIS_IP_);
> lock_release(&wq->lockdep_map, 1, _THIS_IP_);
> for_each_cpu_mask(cpu, *cpu_map)
> flush_cpu_workqueue(per_cpu_ptr(wq->cpu_wq, cpu));
> @@ -779,7 +779,7 @@ static void cleanup_workqueue_thread(str
> if (cwq->thread == NULL)
> return;
>
> - lock_acquire(&cwq->wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
> + lock_acquire(&cwq->wq->lockdep_map, 0, 0, 1, 2, _THIS_IP_);
> lock_release(&cwq->wq->lockdep_map, 1, _THIS_IP_);
>
> flush_cpu_workqueue(cwq);

But this makes ->lockdep_map meaningless? We always take wq->lockdep_map
for reading, now we can't detect deadlocks.

read_lock(A);
lock(B);

vs
lock(B);
read_lock(A);

is valid, kernel/lockdep.c should not complain.

No?

Oleg.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-08-06 10:57:17

by Johannes Berg

[permalink] [raw]

Subject: Re: 2.6.23-rc1-mm2

On Mon, 2007-08-06 at 14:53 +0400, Oleg Nesterov wrote:

> But this makes ->lockdep_map meaningless? We always take wq->lockdep_map
> for reading, now we can't detect deadlocks.
>
> read_lock(A);
> lock(B);
>
> vs
> lock(B);
> read_lock(A);
>
> is valid, kernel/lockdep.c should not complain.

Ah, hmm. Good point, I guess you can always have multiple read locks.
Then we'd have to make a new parameter or such to get rid of the
recursive locking try message. But if you want to deprecate the API
anyway then this is a good way to find it.

johannes

Attachments:

signature.asc (190.00 B)
This is a digitally signed message part (No filename) (315.00 B)
(No filename) (140.00 B)
Download all attachments

2007-08-06 11:06:22

by Marc Dietrich

[permalink] [raw]

Subject: Re: 2.6.23-rc1-mm2

Hi,

Am Monday 06 August 2007 08:24 schrieb Johannes Berg:
> On Fri, 2007-08-03 at 21:21 +0400, Oleg Nesterov wrote:
> > To avoid a possible confusion: it is still OK if work->func() flushes
> > its own workqueue, so strictly speaking this trace is false positive,
> > but it would be very nice if we can get rid of this practice.
>
> I just had a thought: we could get rid of this warning by using a
> read-lock here. That way, flushing from within a work function (which
> would be seen as read-after-read recursive lock) won't trigger this
> warning. Patch below. This would, however, also get rid of any warnings
> for run_workqueue recursion. Which again we may or may not want, the
> code inidicates that it should be allowed up to a depth of three.
>
> However, the question whether we should allow flush_workqueue from
> within a struct work is mainly an API policy issue; it doesn't hurt to
> flush a workqueue from within a work, but it is probably nearer the
> intent to use targeted cancel_work_sync() or such. OTOH, one could
> imagine situations where multiple different work structs are on that
> workqueue belonging to the same subsystem and then the general
> flush_scheduled_work() call is the only way to guarantee nothing is on
> scheduled at a given point... I don't feel qualified to make the
> decision for or against allowing this use of the API at this point.
>
> Marc, do you have an easy way to trigger this warning? Could you verify
> that it goes away with the patch below applied?

just booting into X is enough.

I applied the patch, but now I get:

=================================
[ INFO: inconsistent lock state ]
2.6.23-rc1-mm2 #4
---------------------------------
inconsistent {softirq-on-W} -> {in-softirq-W} usage.
swapper/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
(rpc_credcache_lock){-+..}, at: [<c01dc487>] _atomic_dec_and_lock+0x17/0x60
{softirq-on-W} state was registered at:
[<c013e870>] __lock_acquire+0x650/0x1030
[<c013f2b1>] lock_acquire+0x61/0x80
[<c02db9ac>] _spin_lock+0x2c/0x40
[<c01dc487>] _atomic_dec_and_lock+0x17/0x60
[<dced55fd>] put_rpccred+0x5d/0x100 [sunrpc]
[<dced56c1>] rpcauth_unbindcred+0x21/0x60 [sunrpc]
[<dced3fd4>] a0 [sunrpc]
[<dcecefe0>] rpc_call_sync+0x30/0x40 [sunrpc]
[<dcedc73b>] rpcb_register+0xdb/0x180 [sunrpc]
[<dced65b3>] svc_register+0x93/0x160 [sunrpc]
[<dced6ebe>] __svc_create+0x1ee/0x220 [sunrpc]
[<dced7053>] svc_create+0x13/0x20 [sunrpc]
[<dcf6d722>] nfs_callback_up+0x82/0x120 [nfs]
[<dcf48f36>] nfs_get_client+0x176/0x390 [nfs]
[<dcf49181>] nfs4_set_client+0x31/0x190 [nfs]
[<dcf49983>] nfs4_create_server+0x63/0x3b0 [nfs]
[<dcf52426>] nfs4_get_sb+0x346/0x5b0 [nfs]
[<c017b444>] vfs_kern_mount+0x94/0x110
[<c0190a62>] do_mount+0x1f2/0x7d0
[<c01910a6>] sys_mount+0x66/0xa0
[<c0104046>] syscall_call+0x7/0xb
[<ffffffff>] 0xffffffff
irq event stamp: 5277830
hardirqs last enabled at (5277830): [<c017530a>] kmem_cache_free+0x8a/0xc0
hardirqs last disabled at (5277829): [<c01752d2>] kmem_cache_free+0x52/0xc0
softirqs last enabled at (5277798): [<c0124173>] __do_softirq+0xa3/0xc0
softirqs last disabled at (5277817): [<c01241d7>] do_softirq+0x47/0x50

other info that might help us debug this:
no locks held by swapper/0.

stack backtrace:
[<c0104fda>] show_trace_log_lvl+0x1a/0x30
[<c0105c02>] show_trace+0x12/0x20
[<c0105d15>] dump_stack+0x15/0x20
[<c013ccc3>] print_usage_bug+0x153/0x160
[<c013d8b9>] mark_lock+0x449/0x620
[<c013e824>] __lock_acquire+0x604/0x1030
[<c013f2b1>] lock_acquire+0x61/0x80
[<c02db9ac>] _spin_lock+0x2c/0x40
[<c01dc487>] _atomic_dec_and_lock+0x17/0x60
[<dced55fd>] put_rpccred+0x5d/0x100 [sunrpc]
[<dcf6bf83>] nfs_free_delegation_callback+0x13/0x20 [nfs]
[<c012f9ea>] __rcu_process_callbacks+0x6a/0x1c0
[<c012fb52>] rcu_process_callbacks+0x12/0x30
[<c0124218>] tasklet_action+0x38/0x80
[<c0124125>] __do_softirq+0x55/0xc0
[<c01241d7>] do_softirq+0x47/0x50
[<c0124605>] irq_exit+0x35/0x40
[<c0112463>] smp_apic_timer_interrupt+0x43/0x80
[<c0104a77>] apic_timer_interrupt+0x33/0x38
[<c02690df>] cpuidle_idle_call+0x6f/0x90
[<c01023c3>] cpu_idle+0x43/0x70
[<c02d8c27>] rest_init+0x47/0x50
[<c03bcb6a>] start_kernel+0x22a/0x2b0
[<00000000>] 0x0
=======================

also, sometimes this kernel hangs because of nfs accessing processes remain in
D state.

Marc

> johannes
>
> ---
> kernel/workqueue.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> --- wireless-dev.orig/kernel/workqueue.c 2007-08-06 08:11:23.297846657
> +0200 +++ wireless-dev/kernel/workqueue.c 2007-08-06 08:19:54.727846657
> +0200 @@ -272,7 +272,7 @@ static void run_workqueue(struct cpu_wor
>
> BUG_ON(get_wq_data(work) != cwq);
> work_clear_pending(work);
> - lock_acquire(&cwq->wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
> + lock_acquire(&cwq->wq->lockdep_map, 0, 0, 1, 2, _THIS_IP_);
> lock_acquire(&lockdep_map, 0, 0, 0, 2, _THIS_IP_);
> f(work);
> lock_release(&lockdep_map, 1, _THIS_IP_);
> @@ -395,7 +395,7 @@ void fastcall flush_workqueue(struct wor
> int cpu;
>
> might_sleep();
> - lock_acquire(&wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
> + lock_acquire(&wq->lockdep_map, 0, 0, 1, 2, _THIS_IP_);
> lock_release(&wq->lockdep_map, 1, _THIS_IP_);
> for_each_cpu_mask(cpu, *cpu_map)
> flush_cpu_workqueue(per_cpu_ptr(wq->cpu_wq, cpu));
> @@ -779,7 +779,7 @@ static void cleanup_workqueue_thread(str
> if (cwq->thread == NULL)
> return;
>
> - lock_acquire(&cwq->wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
> + lock_acquire(&cwq->wq->lockdep_map, 0, 0, 1, 2, _THIS_IP_);
> lock_release(&cwq->wq->lockdep_map, 1, _THIS_IP_);
>
> flush_cpu_workqueue(cwq);

--
"The enemy uses unauthorized weapons."
Lord Arthur Ponsonby, "Falsehood in Wartime: Propaganda Lies of the First
World War", 1928

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-08-06 11:12:44

by Johannes Berg

[permalink] [raw]

Subject: Re: 2.6.23-rc1-mm2

On Mon, 2007-08-06 at 13:05 +0200, Marc Dietrich wrote:

> I applied the patch, but now I get:
>
> =================================
> [ INFO: inconsistent lock state ]
> 2.6.23-rc1-mm2 #4
> ---------------------------------
> inconsistent {softirq-on-W} -> {in-softirq-W} usage.
> swapper/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
> (rpc_credcache_lock){-+..}, at: [<c01dc487>] _atomic_dec_and_lock+0x17/0x60

Interesting, but doesn't seem related to this at all. As Oleg just
pointed out this basically disabled checking for workqueue stuff so this
should be looked into by somebody familiar with the NFS code.

johannes

Attachments:

signature.asc (190.00 B)
This is a digitally signed message part (No filename) (315.00 B)
(No filename) (140.00 B)
Download all attachments