2019-07-11 19:27:59

by Olga Kornievskaia

[permalink] [raw]
Subject: multipath patches

Hi Trond,

I see that you have nconnect patches in your testing branch (as well
as your linux-next and I assume they are the same). There is
something wrong with that version. A mount hangs the machine.

[ 132.143379] watchdog: BUG: soft lockup - CPU#0 stuck for 23s!
[mount.nfs:2624]

I don't have such problems with the patch series that Neil has posted.

Thank you.


2019-07-11 19:30:21

by Trond Myklebust

[permalink] [raw]
Subject: Re: multipath patches

On Thu, 2019-07-11 at 15:06 -0400, Olga Kornievskaia wrote:
> Hi Trond,
>
> I see that you have nconnect patches in your testing branch (as well
> as your linux-next and I assume they are the same). There is
> something wrong with that version. A mount hangs the machine.
>
> [ 132.143379] watchdog: BUG: soft lockup - CPU#0 stuck for 23s!
> [mount.nfs:2624]
>
> I don't have such problems with the patch series that Neil has
> posted.
>
> Thank you.

How are the patchsets different? As far as I know, all I did was apply
the 3 patches that Neil added to my existing branch.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-07-11 20:35:32

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: multipath patches

On Thu, Jul 11, 2019 at 3:29 PM Trond Myklebust <[email protected]> wrote:
>
> On Thu, 2019-07-11 at 15:06 -0400, Olga Kornievskaia wrote:
> > Hi Trond,
> >
> > I see that you have nconnect patches in your testing branch (as well
> > as your linux-next and I assume they are the same). There is
> > something wrong with that version. A mount hangs the machine.
> >
> > [ 132.143379] watchdog: BUG: soft lockup - CPU#0 stuck for 23s!
> > [mount.nfs:2624]
> >
> > I don't have such problems with the patch series that Neil has
> > posted.
> >
> > Thank you.
>
> How are the patchsets different? As far as I know, all I did was apply
> the 3 patches that Neil added to my existing branch.

I'm not sure. I had a problem with your "multipath" branch before and
I recall what I did is went back and redownloaded your posted patches.
That was when I was testing performance. So if you haven't touched
that branch and just used it I think it's the same problem.

In the current testing branch I don't see several patches that Neil
has added (posted) to the mailing list. So I'm not sure what you mean
you added 3 of his patches on top of yours. At most I can say maybe
you added 2 of his (one that allows for v2 and v3 and another that
does state operations on a single connection. There are no patches for
sunrpc stats that were posted).

What I know is that if I revert your branch to
bf11fbdb20b385157b046ea7781f04d0c62554a3 before patches and apply
Neils patches. All is fine. I really don't want to debug a non-working
version when there is one that works.



>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
>
>

2019-07-11 21:15:53

by Trond Myklebust

[permalink] [raw]
Subject: Re: multipath patches

On Thu, 2019-07-11 at 16:33 -0400, Olga Kornievskaia wrote:
> On Thu, Jul 11, 2019 at 3:29 PM Trond Myklebust <
> [email protected]> wrote:
> > On Thu, 2019-07-11 at 15:06 -0400, Olga Kornievskaia wrote:
> > > Hi Trond,
> > >
> > > I see that you have nconnect patches in your testing branch (as
> > > well
> > > as your linux-next and I assume they are the same). There is
> > > something wrong with that version. A mount hangs the machine.
> > >
> > > [ 132.143379] watchdog: BUG: soft lockup - CPU#0 stuck for 23s!
> > > [mount.nfs:2624]
> > >
> > > I don't have such problems with the patch series that Neil has
> > > posted.
> > >
> > > Thank you.
> >
> > How are the patchsets different? As far as I know, all I did was
> > apply
> > the 3 patches that Neil added to my existing branch.
>
> I'm not sure. I had a problem with your "multipath" branch before and
> I recall what I did is went back and redownloaded your posted
> patches.
> That was when I was testing performance. So if you haven't touched
> that branch and just used it I think it's the same problem.
>
> In the current testing branch I don't see several patches that Neil
> has added (posted) to the mailing list. So I'm not sure what you mean
> you added 3 of his patches on top of yours. At most I can say maybe
> you added 2 of his (one that allows for v2 and v3 and another that
> does state operations on a single connection. There are no patches
> for
> sunrpc stats that were posted).
>
> What I know is that if I revert your branch to
> bf11fbdb20b385157b046ea7781f04d0c62554a3 before patches and apply
> Neils patches. All is fine. I really don't want to debug a non-
> working
> version when there is one that works.

Sure, but that is not really an option given the rules for how trees in
linux-next are supposed to work. They are considered to be more or less
stable.

Anyhow, I think I've found the bug. Neil had silently fixed it in one
of my patches, so I've added an incremental patch that does more or
less what he did.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-07-12 16:40:56

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: multipath patches

On Thu, Jul 11, 2019 at 5:13 PM Trond Myklebust <[email protected]> wrote:
>
> On Thu, 2019-07-11 at 16:33 -0400, Olga Kornievskaia wrote:
> > On Thu, Jul 11, 2019 at 3:29 PM Trond Myklebust <
> > [email protected]> wrote:
> > > On Thu, 2019-07-11 at 15:06 -0400, Olga Kornievskaia wrote:
> > > > Hi Trond,
> > > >
> > > > I see that you have nconnect patches in your testing branch (as
> > > > well
> > > > as your linux-next and I assume they are the same). There is
> > > > something wrong with that version. A mount hangs the machine.
> > > >
> > > > [ 132.143379] watchdog: BUG: soft lockup - CPU#0 stuck for 23s!
> > > > [mount.nfs:2624]
> > > >
> > > > I don't have such problems with the patch series that Neil has
> > > > posted.
> > > >
> > > > Thank you.
> > >
> > > How are the patchsets different? As far as I know, all I did was
> > > apply
> > > the 3 patches that Neil added to my existing branch.
> >
> > I'm not sure. I had a problem with your "multipath" branch before and
> > I recall what I did is went back and redownloaded your posted
> > patches.
> > That was when I was testing performance. So if you haven't touched
> > that branch and just used it I think it's the same problem.
> >
> > In the current testing branch I don't see several patches that Neil
> > has added (posted) to the mailing list. So I'm not sure what you mean
> > you added 3 of his patches on top of yours. At most I can say maybe
> > you added 2 of his (one that allows for v2 and v3 and another that
> > does state operations on a single connection. There are no patches
> > for
> > sunrpc stats that were posted).
> >
> > What I know is that if I revert your branch to
> > bf11fbdb20b385157b046ea7781f04d0c62554a3 before patches and apply
> > Neils patches. All is fine. I really don't want to debug a non-
> > working
> > version when there is one that works.
>
> Sure, but that is not really an option given the rules for how trees in
> linux-next are supposed to work. They are considered to be more or less
> stable.
>
> Anyhow, I think I've found the bug. Neil had silently fixed it in one
> of my patches, so I've added an incremental patch that does more or
> less what he did.

I just pulled and I still have a problem with the nconnect mount.
Machine still hangs.

Stack trace isn't in NFS but I'm betting it's somehow related

[ 235.756747] general protection fault: 0000 [#1] SMP PTI
[ 235.765187] CPU: 0 PID: 2780 Comm: pool Tainted: G W
5.2.0-rc7+ #29
[ 235.768555] Hardware name: VMware, Inc. VMware Virtual
Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
[ 235.774368] RIP: 0010:kmem_cache_alloc_node_trace+0x10b/0x1e0
[ 235.777576] Code: 4d 89 e1 41 f6 44 24 0b 04 0f 84 5f ff ff ff 4c
89 e7 e8 08 b6 01 00 49 89 c1 e9 4f ff ff ff 41 8b 41 20 49 8b 39 48
8d 4a 01 <49> 8b 1c 06 4c 89 f0 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 36
ff ff
[ 235.786811] RSP: 0018:ffffbc7c4200fe58 EFLAGS: 00010246
[ 235.789778] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000002b7c
[ 235.793204] RDX: 0000000000002b7b RSI: 0000000000000dc0 RDI: 000000000002d96
[ 235.796182] RBP: 0000000000000dc0 R08: ffff9c7bfa82d960 R09: ffff9c7bcfc06d00
[ 235.799135] R10: ffff9c7bfddf0240 R11: 0000000000000001 R12: ffff9c7bcfc06d00
[ 235.802094] R13: 0000000000000000 R14: f000ff53f000ff53 R15: ffffffffbe2d4d71
[ 235.805072] FS: 00007fd7f1d48700(0000) GS:ffff9c7bfa800000(0000)
knlGS:0000000000000000
[ 235.808430] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 235.810762] CR2: 00007fd7f0eb65a4 CR3: 0000000012046005 CR4: 00000000001606f0
[ 235.813662] Call Trace:
[ 235.814694] alloc_rt_sched_group+0xf1/0x250
[ 235.816439] sched_create_group+0x59/0x70
[ 235.818094] sched_autogroup_create_attach+0x3a/0x160
[ 235.820148] ksys_setsid+0xeb/0x100
[ 235.821645] __ia32_sys_setsid+0xa/0x10
[ 235.823216] do_syscall_64+0x55/0x1a0
[ 235.824710] entry_SYSCALL_64_after_hwframe+0x44/0xa9


>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
>
>

2019-07-12 17:19:01

by Trond Myklebust

[permalink] [raw]
Subject: Re: multipath patches

On Fri, 2019-07-12 at 12:39 -0400, Olga Kornievskaia wrote:
> On Thu, Jul 11, 2019 at 5:13 PM Trond Myklebust <
> [email protected]> wrote:
> > On Thu, 2019-07-11 at 16:33 -0400, Olga Kornievskaia wrote:
> > > On Thu, Jul 11, 2019 at 3:29 PM Trond Myklebust <
> > > [email protected]> wrote:
> > > > On Thu, 2019-07-11 at 15:06 -0400, Olga Kornievskaia wrote:
> > > > > Hi Trond,
> > > > >
> > > > > I see that you have nconnect patches in your testing branch
> > > > > (as
> > > > > well
> > > > > as your linux-next and I assume they are the same). There is
> > > > > something wrong with that version. A mount hangs the machine.
> > > > >
> > > > > [ 132.143379] watchdog: BUG: soft lockup - CPU#0 stuck for
> > > > > 23s!
> > > > > [mount.nfs:2624]
> > > > >
> > > > > I don't have such problems with the patch series that Neil
> > > > > has
> > > > > posted.
> > > > >
> > > > > Thank you.
> > > >
> > > > How are the patchsets different? As far as I know, all I did
> > > > was
> > > > apply
> > > > the 3 patches that Neil added to my existing branch.
> > >
> > > I'm not sure. I had a problem with your "multipath" branch before
> > > and
> > > I recall what I did is went back and redownloaded your posted
> > > patches.
> > > That was when I was testing performance. So if you haven't
> > > touched
> > > that branch and just used it I think it's the same problem.
> > >
> > > In the current testing branch I don't see several patches that
> > > Neil
> > > has added (posted) to the mailing list. So I'm not sure what you
> > > mean
> > > you added 3 of his patches on top of yours. At most I can say
> > > maybe
> > > you added 2 of his (one that allows for v2 and v3 and another
> > > that
> > > does state operations on a single connection. There are no
> > > patches
> > > for
> > > sunrpc stats that were posted).
> > >
> > > What I know is that if I revert your branch to
> > > bf11fbdb20b385157b046ea7781f04d0c62554a3 before patches and apply
> > > Neils patches. All is fine. I really don't want to debug a non-
> > > working
> > > version when there is one that works.
> >
> > Sure, but that is not really an option given the rules for how
> > trees in
> > linux-next are supposed to work. They are considered to be more or
> > less
> > stable.
> >
> > Anyhow, I think I've found the bug. Neil had silently fixed it in
> > one
> > of my patches, so I've added an incremental patch that does more or
> > less what he did.
>
> I just pulled and I still have a problem with the nconnect mount.
> Machine still hangs.
>
> Stack trace isn't in NFS but I'm betting it's somehow related
>
> [ 235.756747] general protection fault: 0000 [#1] SMP PTI
> [ 235.765187] CPU: 0 PID: 2780 Comm: pool Tainted: G W
> 5.2.0-rc7+ #29
> [ 235.768555] Hardware name: VMware, Inc. VMware Virtual
> Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
> [ 235.774368] RIP: 0010:kmem_cache_alloc_node_trace+0x10b/0x1e0
> [ 235.777576] Code: 4d 89 e1 41 f6 44 24 0b 04 0f 84 5f ff ff ff 4c
> 89 e7 e8 08 b6 01 00 49 89 c1 e9 4f ff ff ff 41 8b 41 20 49 8b 39 48
> 8d 4a 01 <49> 8b 1c 06 4c 89 f0 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84
> 36
> ff ff
> [ 235.786811] RSP: 0018:ffffbc7c4200fe58 EFLAGS: 00010246
> [ 235.789778] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> 0000000000002b7c
> [ 235.793204] RDX: 0000000000002b7b RSI: 0000000000000dc0 RDI:
> 000000000002d96
> [ 235.796182] RBP: 0000000000000dc0 R08: ffff9c7bfa82d960 R09:
> ffff9c7bcfc06d00
> [ 235.799135] R10: ffff9c7bfddf0240 R11: 0000000000000001 R12:
> ffff9c7bcfc06d00
> [ 235.802094] R13: 0000000000000000 R14: f000ff53f000ff53 R15:
> ffffffffbe2d4d71
> [ 235.805072] FS: 00007fd7f1d48700(0000) GS:ffff9c7bfa800000(0000)
> knlGS:0000000000000000
> [ 235.808430] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 235.810762] CR2: 00007fd7f0eb65a4 CR3: 0000000012046005 CR4:
> 00000000001606f0
> [ 235.813662] Call Trace:
> [ 235.814694] alloc_rt_sched_group+0xf1/0x250
> [ 235.816439] sched_create_group+0x59/0x70
> [ 235.818094] sched_autogroup_create_attach+0x3a/0x160
> [ 235.820148] ksys_setsid+0xeb/0x100
> [ 235.821645] __ia32_sys_setsid+0xa/0x10
> [ 235.823216] do_syscall_64+0x55/0x1a0
> [ 235.824710] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>

Ah.. Missing xprt_get(). Fixed in the 'testing' branch now. I'll send
out a patch for review.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-07-12 18:03:51

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: multipath patches

On Fri, Jul 12, 2019 at 1:18 PM Trond Myklebust <[email protected]> wrote:
>
> On Fri, 2019-07-12 at 12:39 -0400, Olga Kornievskaia wrote:
> > On Thu, Jul 11, 2019 at 5:13 PM Trond Myklebust <
> > [email protected]> wrote:
> > > On Thu, 2019-07-11 at 16:33 -0400, Olga Kornievskaia wrote:
> > > > On Thu, Jul 11, 2019 at 3:29 PM Trond Myklebust <
> > > > [email protected]> wrote:
> > > > > On Thu, 2019-07-11 at 15:06 -0400, Olga Kornievskaia wrote:
> > > > > > Hi Trond,
> > > > > >
> > > > > > I see that you have nconnect patches in your testing branch
> > > > > > (as
> > > > > > well
> > > > > > as your linux-next and I assume they are the same). There is
> > > > > > something wrong with that version. A mount hangs the machine.
> > > > > >
> > > > > > [ 132.143379] watchdog: BUG: soft lockup - CPU#0 stuck for
> > > > > > 23s!
> > > > > > [mount.nfs:2624]
> > > > > >
> > > > > > I don't have such problems with the patch series that Neil
> > > > > > has
> > > > > > posted.
> > > > > >
> > > > > > Thank you.
> > > > >
> > > > > How are the patchsets different? As far as I know, all I did
> > > > > was
> > > > > apply
> > > > > the 3 patches that Neil added to my existing branch.
> > > >
> > > > I'm not sure. I had a problem with your "multipath" branch before
> > > > and
> > > > I recall what I did is went back and redownloaded your posted
> > > > patches.
> > > > That was when I was testing performance. So if you haven't
> > > > touched
> > > > that branch and just used it I think it's the same problem.
> > > >
> > > > In the current testing branch I don't see several patches that
> > > > Neil
> > > > has added (posted) to the mailing list. So I'm not sure what you
> > > > mean
> > > > you added 3 of his patches on top of yours. At most I can say
> > > > maybe
> > > > you added 2 of his (one that allows for v2 and v3 and another
> > > > that
> > > > does state operations on a single connection. There are no
> > > > patches
> > > > for
> > > > sunrpc stats that were posted).
> > > >
> > > > What I know is that if I revert your branch to
> > > > bf11fbdb20b385157b046ea7781f04d0c62554a3 before patches and apply
> > > > Neils patches. All is fine. I really don't want to debug a non-
> > > > working
> > > > version when there is one that works.
> > >
> > > Sure, but that is not really an option given the rules for how
> > > trees in
> > > linux-next are supposed to work. They are considered to be more or
> > > less
> > > stable.
> > >
> > > Anyhow, I think I've found the bug. Neil had silently fixed it in
> > > one
> > > of my patches, so I've added an incremental patch that does more or
> > > less what he did.
> >
> > I just pulled and I still have a problem with the nconnect mount.
> > Machine still hangs.
> >
> > Stack trace isn't in NFS but I'm betting it's somehow related
> >
> > [ 235.756747] general protection fault: 0000 [#1] SMP PTI
> > [ 235.765187] CPU: 0 PID: 2780 Comm: pool Tainted: G W
> > 5.2.0-rc7+ #29
> > [ 235.768555] Hardware name: VMware, Inc. VMware Virtual
> > Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
> > [ 235.774368] RIP: 0010:kmem_cache_alloc_node_trace+0x10b/0x1e0
> > [ 235.777576] Code: 4d 89 e1 41 f6 44 24 0b 04 0f 84 5f ff ff ff 4c
> > 89 e7 e8 08 b6 01 00 49 89 c1 e9 4f ff ff ff 41 8b 41 20 49 8b 39 48
> > 8d 4a 01 <49> 8b 1c 06 4c 89 f0 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84
> > 36
> > ff ff
> > [ 235.786811] RSP: 0018:ffffbc7c4200fe58 EFLAGS: 00010246
> > [ 235.789778] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> > 0000000000002b7c
> > [ 235.793204] RDX: 0000000000002b7b RSI: 0000000000000dc0 RDI:
> > 000000000002d96
> > [ 235.796182] RBP: 0000000000000dc0 R08: ffff9c7bfa82d960 R09:
> > ffff9c7bcfc06d00
> > [ 235.799135] R10: ffff9c7bfddf0240 R11: 0000000000000001 R12:
> > ffff9c7bcfc06d00
> > [ 235.802094] R13: 0000000000000000 R14: f000ff53f000ff53 R15:
> > ffffffffbe2d4d71
> > [ 235.805072] FS: 00007fd7f1d48700(0000) GS:ffff9c7bfa800000(0000)
> > knlGS:0000000000000000
> > [ 235.808430] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 235.810762] CR2: 00007fd7f0eb65a4 CR3: 0000000012046005 CR4:
> > 00000000001606f0
> > [ 235.813662] Call Trace:
> > [ 235.814694] alloc_rt_sched_group+0xf1/0x250
> > [ 235.816439] sched_create_group+0x59/0x70
> > [ 235.818094] sched_autogroup_create_attach+0x3a/0x160
> > [ 235.820148] ksys_setsid+0xeb/0x100
> > [ 235.821645] __ia32_sys_setsid+0xa/0x10
> > [ 235.823216] do_syscall_64+0x55/0x1a0
> > [ 235.824710] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >
>
> Ah.. Missing xprt_get(). Fixed in the 'testing' branch now. I'll send
> out a patch for review.

Hi Trond,

With the latest patch in the testing branch, I can mount.

>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
>
>

2019-07-12 19:10:43

by Trond Myklebust

[permalink] [raw]
Subject: Re: multipath patches

On Fri, 2019-07-12 at 14:02 -0400, Olga Kornievskaia wrote:
>
> Hi Trond,
>
> With the latest patch in the testing branch, I can mount.

Excellent! Thanks for your patience and for testing.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]