2011-02-11 23:39:15

by David Teigland

[permalink] [raw]
Subject: [GIT PULL] dlm fix for 2.6.38

Linus,

Please pull this fix for a dlm regression from:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm.git for-linus

Thanks,
Dave

Author: David Teigland <[email protected]>
Date: Fri Feb 11 16:44:31 2011 -0600

dlm: use single thread workqueues

The recent commit to use cmwq for send and recv threads
dcce240ead802d42b1e45ad2fcb2ed4a399cb255 introduced problems,
apparently due to multiple workqueue threads. Single threads
make the problems go away, so return to that until we fully
understand the concurrency issues with multiple threads.

Signed-off-by: David Teigland <[email protected]>

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 9c64ae9..2d8c87b 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -1468,15 +1468,13 @@ static void work_stop(void)

static int work_start(void)
{
- recv_workqueue = alloc_workqueue("dlm_recv", WQ_MEM_RECLAIM |
- WQ_HIGHPRI | WQ_FREEZEABLE, 0);
+ recv_workqueue = create_singlethread_workqueue("dlm_recv");
if (!recv_workqueue) {
log_print("can't start dlm_recv");
return -ENOMEM;
}

- send_workqueue = alloc_workqueue("dlm_send", WQ_MEM_RECLAIM |
- WQ_HIGHPRI | WQ_FREEZEABLE, 0);
+ send_workqueue = create_singlethread_workqueue("dlm_send");
if (!send_workqueue) {
log_print("can't start dlm_send");
destroy_workqueue(recv_workqueue);


2011-02-12 15:43:21

by Steven Whitehouse

[permalink] [raw]
Subject: Re: [GIT PULL] dlm fix for 2.6.38

Hi,

On Fri, 2011-02-11 at 18:38 -0500, David Teigland wrote:
> Linus,
>
> Please pull this fix for a dlm regression from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm.git for-linus
>
> Thanks,
> Dave
>
> Author: David Teigland <[email protected]>
> Date: Fri Feb 11 16:44:31 2011 -0600
>
> dlm: use single thread workqueues
>
> The recent commit to use cmwq for send and recv threads
> dcce240ead802d42b1e45ad2fcb2ed4a399cb255 introduced problems,
> apparently due to multiple workqueue threads. Single threads
> make the problems go away, so return to that until we fully
> understand the concurrency issues with multiple threads.
>
> Signed-off-by: David Teigland <[email protected]>
>
> diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
> index 9c64ae9..2d8c87b 100644
> --- a/fs/dlm/lowcomms.c
> +++ b/fs/dlm/lowcomms.c
> @@ -1468,15 +1468,13 @@ static void work_stop(void)
>
> static int work_start(void)
> {
> - recv_workqueue = alloc_workqueue("dlm_recv", WQ_MEM_RECLAIM |
> - WQ_HIGHPRI | WQ_FREEZEABLE, 0);
> + recv_workqueue = create_singlethread_workqueue("dlm_recv");
> if (!recv_workqueue) {
> log_print("can't start dlm_recv");
> return -ENOMEM;
> }
>
> - send_workqueue = alloc_workqueue("dlm_send", WQ_MEM_RECLAIM |
> - WQ_HIGHPRI | WQ_FREEZEABLE, 0);
> + send_workqueue = create_singlethread_workqueue("dlm_send");
> if (!send_workqueue) {
> log_print("can't start dlm_send");
> destroy_workqueue(recv_workqueue);
>
>

What is the issue here? If there is a problem with the workqueues then
we should ask Tejun about it,

Steve.

2011-02-12 15:51:09

by Tejun Heo

[permalink] [raw]
Subject: Re: [GIT PULL] dlm fix for 2.6.38

Hello,

On Sat, Feb 12, 2011 at 03:44:35PM +0000, Steven Whitehouse wrote:
> > diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
> > index 9c64ae9..2d8c87b 100644
> > --- a/fs/dlm/lowcomms.c
> > +++ b/fs/dlm/lowcomms.c
> > @@ -1468,15 +1468,13 @@ static void work_stop(void)
> >
> > static int work_start(void)
> > {
> > - recv_workqueue = alloc_workqueue("dlm_recv", WQ_MEM_RECLAIM |
> > - WQ_HIGHPRI | WQ_FREEZEABLE, 0);
> > + recv_workqueue = create_singlethread_workqueue("dlm_recv");

recv_workqueue was multithread one even before the conversion. It
probably is best to leave this part alone.

> > - send_workqueue = alloc_workqueue("dlm_send", WQ_MEM_RECLAIM |
> > - WQ_HIGHPRI | WQ_FREEZEABLE, 0);
> > + send_workqueue = create_singlethread_workqueue("dlm_send");

send_workqueue was converted from ST to MT but the correct way at this
point would be,

alloc_ordered_workqueue("dlm_send",
WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_FREEZEABLE);

> > if (!send_workqueue) {
> > log_print("can't start dlm_send");
> > destroy_workqueue(recv_workqueue);
> >
> >
>
> What is the issue here? If there is a problem with the workqueues then
> we should ask Tejun about it,

Yeah, what kind of problem was it? There's only one work per
connection so reordering is not a problem. All the workqueue
operations use proper locking, so the conversion seemed safe to me.
What am I missing?

Thanks.

--
tejun

2011-02-14 15:39:13

by David Teigland

[permalink] [raw]
Subject: Re: [GIT PULL] dlm fix for 2.6.38

On Sat, Feb 12, 2011 at 04:51:00PM +0100, Tejun Heo wrote:
> On Sat, Feb 12, 2011 at 03:44:35PM +0000, Steven Whitehouse wrote:
> > What is the issue here? If there is a problem with the workqueues then
> > we should ask Tejun about it,
>
> Yeah, what kind of problem was it? There's only one work per
> connection so reordering is not a problem. All the workqueue
> operations use proper locking, so the conversion seemed safe to me.
> What am I missing?

find_lkb seems to be getting an actual, but wrong lkid, so it's returning
the wrong lkb in the receive routines. It happens fairly quickly with
multiple wq threads, but not at all with single. One suspect I'm going to
look at are the ls_stub and fields in the lockspace struct. I'm not
convinced extra send/recv threads give us that much benefit in practice,
so it's not my top priority at the moment.

Dave

2011-02-14 15:46:09

by Tejun Heo

[permalink] [raw]
Subject: Re: [GIT PULL] dlm fix for 2.6.38

Hello,

On Mon, Feb 14, 2011 at 10:38:44AM -0500, David Teigland wrote:
> find_lkb seems to be getting an actual, but wrong lkid, so it's returning
> the wrong lkb in the receive routines. It happens fairly quickly with
> multiple wq threads, but not at all with single. One suspect I'm going to
> look at are the ls_stub and fields in the lockspace struct. I'm not
> convinced extra send/recv threads give us that much benefit in practice,
> so it's not my top priority at the moment.

Hmmm... okay. At any rate, please use alloc[_ordered]_workqueue()
interface. create[_singlethread]_workqueue() will be deprecated soon.

Thanks.

--
tejun

2011-02-14 16:21:24

by David Teigland

[permalink] [raw]
Subject: Re: [GIT PULL] dlm fix for 2.6.38

On Mon, Feb 14, 2011 at 04:46:01PM +0100, Tejun Heo wrote:
> Hello,
>
> On Mon, Feb 14, 2011 at 10:38:44AM -0500, David Teigland wrote:
> > find_lkb seems to be getting an actual, but wrong lkid, so it's returning
> > the wrong lkb in the receive routines. It happens fairly quickly with
> > multiple wq threads, but not at all with single. One suspect I'm going to
> > look at are the ls_stub and fields in the lockspace struct. I'm not
> > convinced extra send/recv threads give us that much benefit in practice,
> > so it's not my top priority at the moment.
>
> Hmmm... okay. At any rate, please use alloc[_ordered]_workqueue()
> interface. create[_singlethread]_workqueue() will be deprecated soon.

Oh, I'll switch to the other API in the patch set for the next merge.
Dave

2011-02-14 16:24:46

by Tejun Heo

[permalink] [raw]
Subject: Re: [GIT PULL] dlm fix for 2.6.38

On Mon, Feb 14, 2011 at 11:21:15AM -0500, David Teigland wrote:
> > Hmmm... okay. At any rate, please use alloc[_ordered]_workqueue()
> > interface. create[_singlethread]_workqueue() will be deprecated soon.
>
> Oh, I'll switch to the other API in the patch set for the next merge.
> Dave

Alright. Sounds good. Thanks.

--
tejun