2014-02-26 00:58:09

by Ben Hutchings

[permalink] [raw]
Subject: Oops in nfs41_assign_slot in Linux 3.13.4

Trond, Arthur seems to be hitting a similar bug to
<https://bugzilla.redhat.com/show_bug.cgi?id=1050206>, and it's still
occurring in 3.13.4 even though that has the two fixes you posted there.
The full bug report, with screenshots of the oopses, is at
<https://bugs.debian.org/734268>.

On Tue, 2014-02-25 at 21:45 +0100, Arthur de Jong wrote:
> Control: severity -1 critical
> Control: found -1 linux/3.12.6-2
> Control: found -1 linux/3.13.4-1
> Control: fixed -1 linux/3.11.10-1
>
> Raising severity to critical because it kills the whole system quite
> predictably.
>
> I can still reliably reproduce this with linux-image-3.13-1-amd64
> 3.13.4-1 (see photo) so I'm currently still stuck with a 3.11 kernel.
>
> If there is any information I can provide to help identify this bug,
> please let me know.
>
> The only thing that could be relevant is that I have NFS mounts from
> different NFS servers:
>
> /etc/fstab:
>
> server1:/home /home nfs rw,hard,intr,bg,rsize=4096,wsize=1024,acregmin=60,acregmax=600,acdirmin=60,acdirmax=600,noatime,sec=sys,_netdev 0 0
> server2:/share /share nfs ro,soft,intr,bg,rsize=8192,nosuid,nodev,noexec,acregmin=60,acregmax=600,acdirmin=60,acdirmax=600,sec=sys,_netdev 0 0
>
>
> server1 is Debian wheezy (kernel linux-image-3.2.0-4-amd64
> 3.2.51-1) /etc/exports:
>
> /home -rw,root_squash,nohide,async,no_subtree_check client1 client2(no_root_squash) client3
>
> I'm mostly seeing crashes on client2, my main workstation, but also
> other machines. /home is an ext3 file system.
>
>
> server2 runs testing (kernel linux-image-3.12-1-686-pae
> 3.12.9-1) /etc/exports:
>
> /share *.local.domain(ro,all_squash,crossmnt,sync,no_subtree_check)
>
> Under /share are /share/disk1 and /share/disk2 ext3 file systems.
>
>
> On the client again, from /proc/mounts (running a 3.11 kernel again):
>
> server1:/home /home nfs4 rw,noatime,vers=4.0,rsize=4096,wsize=1024,namlen=255,acregmin=60,acregmax=600,acdirmin=60,acdirmax=600,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.12.4,local_lock=none,addr=192.168.12.1 0 0
> server2:/share /share nfs4 ro,nosuid,nodev,noexec,relatime,vers=4.0,rsize=8192,wsize=131072,namlen=255,acregmin=60,acregmax=600,acdirmin=60,acdirmax=600,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.12.4,local_lock=none,addr=192.168.12.9 0 0
> server2:/share/disk2 /share/disk2 nfs4 ro,nosuid,nodev,noexec,relatime,vers=4.0,rsize=8192,wsize=131072,namlen=255,acregmin=60,acregmax=600,acdirmin=60,acdirmax=600,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.12.4,local_lock=none,addr=192.168.12.9 0 0
> server2:/share/disk1 /share/disk1 nfs4 ro,nosuid,nodev,noexec,relatime,vers=4.0,rsize=8192,wsize=131072,namlen=255,acregmin=60,acregmax=600,acdirmin=60,acdirmax=600,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.12.4,local_lock=none,addr=192.168.12.9 0 0
>
> I have some freedom to change things around to test in this network so
> let me know which things to try.
>
> Thanks,
>

--
Ben Hutchings
Always try to do things in chronological order;
it's less confusing that way.


Attachments:
signature.asc (811.00 B)
This is a digitally signed message part

2014-02-26 19:29:53

by Trond Myklebust

[permalink] [raw]
Subject: Re: Oops in nfs41_assign_slot in Linux 3.13.4

Hi Ben,

On Wed, 2014-02-26 at 00:58 +0000, Ben Hutchings wrote:
> Trond, Arthur seems to be hitting a similar bug to
> <https://bugzilla.redhat.com/show_bug.cgi?id=1050206>, and it's still
> occurring in 3.13.4 even though that has the two fixes you posted there.
> The full bug report, with screenshots of the oopses, is at
> <https://bugs.debian.org/734268>.
>

I believe I've found another corruptor of that same list. Do Arthur's
tests perhaps touch on file locking? If so, then the following patch may
help...

Cheers
Trond

8<----------------------------------------------------------------------
>From 3db0ebd8e7e67d9ee96f623e7d1dcdc35fccea7f Mon Sep 17 00:00:00 2001
From: Trond Myklebust <[email protected]>
Date: Wed, 26 Feb 2014 11:19:14 -0800
Subject: [PATCH] NFSv4: Fix another nfs4_sequence corruptor

nfs4_release_lockowner needs to set the rpc_message reply to point to
the nfs4_sequence_res in order to avoid another Oopsable situation
in nfs41_assign_slot.

Fixes: fbd4bfd1d9d21 (NFS: Add nfs4_sequence calls for RELEASE_LOCKOWNER)
Cc: [email protected] # 3.12+
Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/nfs4proc.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 2da6a698b8f7..d3b829f7c509 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5895,6 +5895,7 @@ static int nfs4_release_lockowner(struct nfs_server *server, struct nfs4_lock_st
data->args.lock_owner.s_dev = server->s_dev;

msg.rpc_argp = &data->args;
+ msg.rpc_resp = &data->res;
rpc_call_async(server->client, &msg, 0, &nfs4_release_lockowner_ops, data);
return 0;
}
--
1.8.5.3


--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]



2014-03-01 19:57:00

by Trond Myklebust

[permalink] [raw]
Subject: Re: Oops in nfs41_assign_slot in Linux 3.13.4

On Sat, 2014-03-01 at 13:46 -0600, Trond Myklebust wrote:
> On Wed, 2014-02-26 at 11:29 -0800, Trond Myklebust wrote:
> > Hi Ben,
> >
> > On Wed, 2014-02-26 at 00:58 +0000, Ben Hutchings wrote:
> > > Trond, Arthur seems to be hitting a similar bug to
> > > <https://bugzilla.redhat.com/show_bug.cgi?id=1050206>, and it's still
> > > occurring in 3.13.4 even though that has the two fixes you posted there.
> > > The full bug report, with screenshots of the oopses, is at
> > > <https://bugs.debian.org/734268>.
> > >
> >
> > I believe I've found another corruptor of that same list. Do Arthur's
> > tests perhaps touch on file locking? If so, then the following patch may
> > help...
>
> Now that Connecthon is over, here is a patch that actually compiles.
>

Sigh... Third time lucky...
8<---------------------------------------------------------------------
>From b7e63a1079b266866a732cf699d8c4d61391bbda Mon Sep 17 00:00:00 2001
From: Trond Myklebust <[email protected]>
Date: Wed, 26 Feb 2014 11:19:14 -0800
Subject: [PATCH v3] NFSv4: Fix another nfs4_sequence corruptor

nfs4_release_lockowner needs to set the rpc_message reply to point to
the nfs4_sequence_res in order to avoid another Oopsable situation
in nfs41_assign_slot.

Fixes: fbd4bfd1d9d21 (NFS: Add nfs4_sequence calls for RELEASE_LOCKOWNER)
Cc: [email protected] # 3.12+
Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/nfs4proc.c | 10 +++++-----
include/linux/nfs_xdr.h | 5 +++++
2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 2da6a698b8f7..44e088dc357c 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5828,8 +5828,7 @@ struct nfs_release_lockowner_data {
struct nfs4_lock_state *lsp;
struct nfs_server *server;
struct nfs_release_lockowner_args args;
- struct nfs4_sequence_args seq_args;
- struct nfs4_sequence_res seq_res;
+ struct nfs_release_lockowner_res res;
unsigned long timestamp;
};

@@ -5837,7 +5836,7 @@ static void nfs4_release_lockowner_prepare(struct rpc_task *task, void *calldata
{
struct nfs_release_lockowner_data *data = calldata;
nfs40_setup_sequence(data->server,
- &data->seq_args, &data->seq_res, task);
+ &data->args.seq_args, &data->res.seq_res, task);
data->timestamp = jiffies;
}

@@ -5846,7 +5845,7 @@ static void nfs4_release_lockowner_done(struct rpc_task *task, void *calldata)
struct nfs_release_lockowner_data *data = calldata;
struct nfs_server *server = data->server;

- nfs40_sequence_done(task, &data->seq_res);
+ nfs40_sequence_done(task, &data->res.seq_res);

switch (task->tk_status) {
case 0:
@@ -5887,7 +5886,6 @@ static int nfs4_release_lockowner(struct nfs_server *server, struct nfs4_lock_st
data = kmalloc(sizeof(*data), GFP_NOFS);
if (!data)
return -ENOMEM;
- nfs4_init_sequence(&data->seq_args, &data->seq_res, 0);
data->lsp = lsp;
data->server = server;
data->args.lock_owner.clientid = server->nfs_client->cl_clientid;
@@ -5895,6 +5893,8 @@ static int nfs4_release_lockowner(struct nfs_server *server, struct nfs4_lock_st
data->args.lock_owner.s_dev = server->s_dev;

msg.rpc_argp = &data->args;
+ msg.rpc_resp = &data->res;
+ nfs4_init_sequence(&data->args.seq_args, &data->res.seq_res, 0);
rpc_call_async(server->client, &msg, 0, &nfs4_release_lockowner_ops, data);
return 0;
}
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index b2fb167b2e6d..5624e4e2763c 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -467,9 +467,14 @@ struct nfs_lockt_res {
};

struct nfs_release_lockowner_args {
+ struct nfs4_sequence_args seq_args;
struct nfs_lowner lock_owner;
};

+struct nfs_release_lockowner_res {
+ struct nfs4_sequence_res seq_res;
+};
+
struct nfs4_delegreturnargs {
struct nfs4_sequence_args seq_args;
const struct nfs_fh *fhandle;
--
1.8.5.3


--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]



2014-03-01 19:46:59

by Trond Myklebust

[permalink] [raw]
Subject: Re: Oops in nfs41_assign_slot in Linux 3.13.4

On Wed, 2014-02-26 at 11:29 -0800, Trond Myklebust wrote:
> Hi Ben,
>
> On Wed, 2014-02-26 at 00:58 +0000, Ben Hutchings wrote:
> > Trond, Arthur seems to be hitting a similar bug to
> > <https://bugzilla.redhat.com/show_bug.cgi?id=1050206>, and it's still
> > occurring in 3.13.4 even though that has the two fixes you posted there.
> > The full bug report, with screenshots of the oopses, is at
> > <https://bugs.debian.org/734268>.
> >
>
> I believe I've found another corruptor of that same list. Do Arthur's
> tests perhaps touch on file locking? If so, then the following patch may
> help...

Now that Connecthon is over, here is a patch that actually compiles.

Apologies....
Trond

8<---------------------------------------------------------------------
>From 97c7b4c6dc6caefa8df19301575aecc826d4ac6e Mon Sep 17 00:00:00 2001
From: Trond Myklebust <[email protected]>
Date: Wed, 26 Feb 2014 11:19:14 -0800
Subject: [PATCH v2] NFSv4: Fix another nfs4_sequence corruptor

nfs4_release_lockowner needs to set the rpc_message reply to point to
the nfs4_sequence_res in order to avoid another Oopsable situation
in nfs41_assign_slot.

Fixes: fbd4bfd1d9d21 (NFS: Add nfs4_sequence calls for RELEASE_LOCKOWNER)
Cc: [email protected] # 3.12+
Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/nfs4proc.c | 6 +++---
include/linux/nfs_xdr.h | 5 +++++
2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 2da6a698b8f7..ceb2836fd6ba 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5828,8 +5828,7 @@ struct nfs_release_lockowner_data {
struct nfs4_lock_state *lsp;
struct nfs_server *server;
struct nfs_release_lockowner_args args;
- struct nfs4_sequence_args seq_args;
- struct nfs4_sequence_res seq_res;
+ struct nfs_release_lockowner_res res;
unsigned long timestamp;
};

@@ -5887,7 +5886,6 @@ static int nfs4_release_lockowner(struct nfs_server *server, struct nfs4_lock_st
data = kmalloc(sizeof(*data), GFP_NOFS);
if (!data)
return -ENOMEM;
- nfs4_init_sequence(&data->seq_args, &data->seq_res, 0);
data->lsp = lsp;
data->server = server;
data->args.lock_owner.clientid = server->nfs_client->cl_clientid;
@@ -5895,6 +5893,8 @@ static int nfs4_release_lockowner(struct nfs_server *server, struct nfs4_lock_st
data->args.lock_owner.s_dev = server->s_dev;

msg.rpc_argp = &data->args;
+ msg.rpc_resp = &data->res;
+ nfs4_init_sequence(&data->args.seq_args, &data->res.seq_res, 0);
rpc_call_async(server->client, &msg, 0, &nfs4_release_lockowner_ops, data);
return 0;
}
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index b2fb167b2e6d..5624e4e2763c 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -467,9 +467,14 @@ struct nfs_lockt_res {
};

struct nfs_release_lockowner_args {
+ struct nfs4_sequence_args seq_args;
struct nfs_lowner lock_owner;
};

+struct nfs_release_lockowner_res {
+ struct nfs4_sequence_res seq_res;
+};
+
struct nfs4_delegreturnargs {
struct nfs4_sequence_args seq_args;
const struct nfs_fh *fhandle;
--
1.8.5.3


--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]