2010-06-16 08:20:42

by Joschi Brauchle

[permalink] [raw]
Subject: Syslog flooded: "NFSD: preprocess_seqid_op: magic stateid!"

Hi everyone!
I?m brand-new on the list and I hope this is the right place to post my questions.
Please bear with me if there is some information missing,?

I am facing the following (first) problem on our primary NFSv4 fileserver (called pollux) running SLES 10SP3
(Linux pollux 2.6.16.60-0.62.1-smp #1 SMP Mon Apr 12 18:53:46 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux),
the syslog is flooded with the following message:

Jun? 3 15:29:24 pollux kernel: NFSD: preprocess_seqid_op: magic stateid!

I have increased the loglevel of the nfs daemon and get the following line right before the line above:

Jun? 3 15:29:24 pollux kernel: NFSD: preprocess_seqid_op: seqid=208 stateid = (00000000/00000000/00000000/00000000)

We are running NFSv4 with Kerberos security, nfs-utils is the 1.0.7 Novell package for SLES10.
I have to mention that everything is operating as it should. It is just annoying that our syslog receives about 13000 "magic stateid" lines a day.

Our "/etc/exports" looks like this:
# NFSv4 entries (with Kerberos and GSS Security):
/export???????????????? gss/krb5(rw,fsid=0,no_all_squash,async,no_subtree_check)
# NFSv3 entries:
/export/opt???????????? *(rw,no_all_squash,async,no_subtree_check)

I have tried to find anything helpful on google. The only things I found are indications that it might be related to the "nfsnobody" userid. Actually, we have a couple of nfs-client-machines (no user logged in) which access the /export nfsv4 share from a root-cronjob using their "nfs/f.q.d.n.@realm" Kerberos service creds, which are mapped then to the "nobody" user on the server. I have the feeling that the "magic stateid" lines started appearing since we installed those cronjobs on the clients.

Does anyone have some knowledge about this "magic stateid" message?
Any suggestions in debugging this issue?
Unfortunately, Novell support was not able to help us on this issue so far, thus I?m trying here ?

Second problem:
We recently downgraded from a patched nfs-utils 1.0.9 (patched to accept the ?-h? option for different hostname in Kerberos service creds in a high availability environment) to the original Novell RPM providing nfs-utils 1.0.7 (now using the undocumented ?-c? switch to read the accepted creds from keytab file). (Note: the first problem has already occurred before this downgrade)

Since this change, we are experiencing massive lags (1-10 sec freezes, irregularly, but a couple times an hour) on our client machines, which are mounting their homes via NFSv4. There seem to be no related messages or warnings on the server nor the clients.

We come to the conclusion that 1.0.9 was the more stable version. The resulting question is: Which nfs-utils version would be best suited for SLES10SP3 (see kernel version above). Can we go straight to the latest 1.2.2 available or should we stick with 1.0.xx version?

Thanks so much for you time!

Institute for Communications Engineering (LNT)
Technische Universitaet Muenchen (TUM)
80290 Munich, Germany

Tel (work): +49 89 289-23474
Fax (work): +49 89 289-23490
E-mail: [email protected]
Web: http://www.lnt.ei.tum.de/



2010-06-17 02:31:08

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Syslog flooded: "NFSD: preprocess_seqid_op: magic stateid!"

On Wed, Jun 16, 2010 at 10:20:15AM +0200, Joschi Brauchle wrote:
> Hi everyone!
> I=E2=80=99m brand-new on the list and I hope this is the right place =
to post my questions.
> Please bear with me if there is some information missing,=E2=80=A6
>=20
> I am facing the following (first) problem on our primary NFSv4 filese=
rver (called pollux) running SLES 10SP3=20
> (Linux pollux 2.6.16.60-0.62.1-smp #1 SMP Mon Apr 12 18:53:46 UTC 201=
0 x86_64 x86_64 x86_64 GNU/Linux),=20
> the syslog is flooded with the following message:
>=20
> Jun=C2=A0 3 15:29:24 pollux kernel: NFSD: preprocess_seqid_op: magic =
stateid!

This was fixed by the commit below.

--b.

commit 2fdada03b3876ab9f84ede160f187e888cafefb4
Author: J. Bruce Fields <[email protected]>
Date: Fri Jul 27 16:10:37 2007 -0400

knfsd: demote some printk()s to dprintk()s
=20
To quote a recent mail from Andrew Morton:
=20
Look: if there's a way in which an unprivileged user can trigger
a printk we fix it, end of story.
=20
OK. I assume that goes double for printk()s that might be triggere=
d by
random hosts on the internet. So, disable some printk()s that look=
like
they could be triggered by malfunctioning or malicious clients. Fo=
r
now, just downgrade them to dprintk()s.
=20
Signed-off-by: "J. Bruce Fields" <[email protected]>
Acked-by: Neil Brown <[email protected]>

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 29b7e63..18ead17 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -238,12 +238,12 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_c=
ompound_state *cstate,
break;
case NFS4_OPEN_CLAIM_DELEGATE_PREV:
open->op_stateowner->so_confirmed =3D 1;
- printk("NFSD: unsupported OPEN claim type %d\n",
+ dprintk("NFSD: unsupported OPEN claim type %d\n",
open->op_claim_type);
status =3D nfserr_notsupp;
goto out;
default:
- printk("NFSD: Invalid OPEN claim type %d\n",
+ dprintk("NFSD: Invalid OPEN claim type %d\n",
open->op_claim_type);
status =3D nfserr_inval;
goto out;
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 578d809..2b20eb8 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -509,7 +509,7 @@ check_name(struct xdr_netobj name) {
if (name.len =3D=3D 0)=20
return 0;
if (name.len > NFS4_OPAQUE_LIMIT) {
- printk("NFSD: check_name: name too long(%d)!\n", name.len);
+ dprintk("NFSD: check_name: name too long(%d)!\n", name.len);
return 0;
}
return 1;
@@ -1742,7 +1742,7 @@ out:
if (open->op_claim_type =3D=3D NFS4_OPEN_CLAIM_PREVIOUS
&& flag =3D=3D NFS4_OPEN_DELEGATE_NONE
&& open->op_delegate_type !=3D NFS4_OPEN_DELEGATE_NONE)
- printk("NFSD: WARNING: refusing delegation reclaim\n");
+ dprintk("NFSD: WARNING: refusing delegation reclaim\n");
open->op_delegate_type =3D flag;
}
=20
@@ -2151,7 +2151,7 @@ nfs4_preprocess_seqid_op(struct svc_fh *current_f=
h, u32 seqid, stateid_t *statei
*sopp =3D NULL;
=20
if (ZERO_STATEID(stateid) || ONE_STATEID(stateid)) {
- printk("NFSD: preprocess_seqid_op: magic stateid!\n");
+ dprintk("NFSD: preprocess_seqid_op: magic stateid!\n");
return nfserr_bad_stateid;
}
=20
@@ -2202,7 +2202,7 @@ nfs4_preprocess_seqid_op(struct svc_fh *current_f=
h, u32 seqid, stateid_t *statei
}
=20
if ((flags & CHECK_FH) && nfs4_check_fh(current_fh, stp)) {
- printk("NFSD: preprocess_seqid_op: fh-stateid mismatch!\n");
+ dprintk("NFSD: preprocess_seqid_op: fh-stateid mismatch!\n");
return nfserr_bad_stateid;
}
=20
@@ -2218,22 +2218,22 @@ nfs4_preprocess_seqid_op(struct svc_fh *current=
_fh, u32 seqid, stateid_t *statei
goto check_replay;
=20
if (sop->so_confirmed && flags & CONFIRM) {
- printk("NFSD: preprocess_seqid_op: expected"
+ dprintk("NFSD: preprocess_seqid_op: expected"
" unconfirmed stateowner!\n");
return nfserr_bad_stateid;
}
if (!sop->so_confirmed && !(flags & CONFIRM)) {
- printk("NFSD: preprocess_seqid_op: stateowner not"
+ dprintk("NFSD: preprocess_seqid_op: stateowner not"
" confirmed yet!\n");
return nfserr_bad_stateid;
}
if (stateid->si_generation > stp->st_stateid.si_generation) {
- printk("NFSD: preprocess_seqid_op: future stateid?!\n");
+ dprintk("NFSD: preprocess_seqid_op: future stateid?!\n");
return nfserr_bad_stateid;
}
=20
if (stateid->si_generation < stp->st_stateid.si_generation) {
- printk("NFSD: preprocess_seqid_op: old stateid!\n");
+ dprintk("NFSD: preprocess_seqid_op: old stateid!\n");
return nfserr_old_stateid;
}
renew_client(sop->so_client);
@@ -2245,7 +2245,7 @@ check_replay:
/* indicate replay to calling function */
return nfserr_replay_me;
}
- printk("NFSD: preprocess_seqid_op: bad seqid (expected %d, got %d)\n"=
,
+ dprintk("NFSD: preprocess_seqid_op: bad seqid (expected %d, got %d)\n=
",
sop->so_seqid, seqid);
*sopp =3D NULL;
return nfserr_bad_seqid;
@@ -2858,7 +2858,7 @@ nfsd4_lockt(struct svc_rqst *rqstp, struct nfsd4_=
compound_state *cstate,
file_lock.fl_type =3D F_WRLCK;
break;
default:
- printk("NFSD: nfs4_lockt: bad lock type!\n");
+ dprintk("NFSD: nfs4_lockt: bad lock type!\n");
status =3D nfserr_inval;
goto out;
}

2010-06-17 02:32:57

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Syslog flooded: "NFSD: preprocess_seqid_op: magic stateid!"

On Wed, Jun 16, 2010 at 10:20:15AM +0200, Joschi Brauchle wrote:
> Second problem:
> We recently downgraded from a patched nfs-utils 1.0.9 (patched to acc=
ept the =E2=80=9C-h=E2=80=9D option for different hostname in Kerberos =
service creds in a high availability environment) to the original Novel=
l RPM providing nfs-utils 1.0.7 (now using the undocumented =E2=80=9C-c=
=E2=80=9D switch to read the accepted creds from keytab file). (Note: t=
he first problem has already occurred before this downgrade)
>=20
> Since this change, we are experiencing massive lags (1-10 sec freezes=
, irregularly, but a couple times an hour) on our client machines, whic=
h are mounting their homes via NFSv4. There seem to be no related messa=
ges or warnings on the server nor the clients.
>=20
> We come to the conclusion that 1.0.9 was the more stable version. The=
resulting question is: Which nfs-utils version would be best suited fo=
r SLES10SP3 (see kernel version above). Can we go straight to the lates=
t 1.2.2 available or should we stick with 1.0.xx version?

No idea. We'd have to figure out the cause of the lags.

--b.