2019-03-14 21:12:12

by Scott Mayhew

[permalink] [raw]
Subject: [pynfs PATCH 0/4] nfs4.1: add a bunch of reboot tests

(Note: These patches go on top of "nfs4.1: clean up the session and
client created in Environment.init()" from Jeff Layton sent on March
14th).

These patches add the following reboot tests:

REBT2x: Server reboot with N clients reclaiming opens.
REBT3x: Server reboot with N clients reclaming opens. Server reboots
again after half the clients have reclaimed.
REBT4x: Server reboot with N clients reclaiming opens. Half the
clients attempt to reclaim twice.
(where x in {a, b, c}, with 'a' using 10 clients, 'b' using 100 clients,
and 'c' testing 1000 clients)
REBT5: Server reboot where the client starts reclaiming shortly before
the end of grace.

Note all of these tests require the use of a helper script or manual
intervention to restart the server.

Scott Mayhew (4):
nfs4.1: add some reboot tests
nfs4.1: add some more reboot tests
nfs4.1: still more reboot tests
nfs4.1: test delayed reclaim following a server reboot

nfs4.1/server41tests/st_reboot.py | 284 +++++++++++++++++++++++++++++-
1 file changed, 278 insertions(+), 6 deletions(-)

--
2.17.2



2019-03-14 21:12:12

by Scott Mayhew

[permalink] [raw]
Subject: [pynfs PATCH 2/4] nfs4.1: add some more reboot tests

REBT3a, REBT3b, and REBT3c test recovery with multiple clients following
a server reboot, where the server reboots again during the grace period
while clients are still reclaiming.

Signed-off-by: Scott Mayhew <[email protected]>
---
nfs4.1/server41tests/st_reboot.py | 31 ++++++++++++++++++++++++++++++-
1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/nfs4.1/server41tests/st_reboot.py b/nfs4.1/server41tests/st_reboot.py
index 0216127..8bce9ec 100644
--- a/nfs4.1/server41tests/st_reboot.py
+++ b/nfs4.1/server41tests/st_reboot.py
@@ -149,7 +149,7 @@ def doTestAllClientsNoGrace(t, env, states):
log.warn("server took approximately %d seconds to lift grace "
"after all clients reclaimed" % lift_time)

-def doTestRebootWithNClients(t, env, n=10):
+def doTestRebootWithNClients(t, env, n=10, double_reboot=False):
boot_time = int(time.time())
lease_time = 90
states = []
@@ -166,6 +166,11 @@ def doTestRebootWithNClients(t, env, n=10):
boot_time = _waitForReboot(env)

try:
+ if double_reboot:
+ for i in range(n/2):
+ lease_time = doTestOneClientGrace(t, env, states[i])
+ boot_time = _waitForReboot(env)
+
for i in range(n):
lease_time = doTestOneClientGrace(t, env, states[i])

@@ -210,3 +215,27 @@ def testRebootWithManyManyManyClients(t, env):
CODE: REBT2c
"""
return doTestRebootWithNClients(t, env, 1000)
+
+def testDoubleRebootWithManyClients(t, env):
+ """Double reboot with many clients
+
+ FLAGS: reboot
+ CODE: REBT3a
+ """
+ return doTestRebootWithNClients(t, env, double_reboot=True)
+
+def testDoubleRebootWithManyManyClients(t, env):
+ """Double reboot with many many clients
+
+ FLAGS: reboot
+ CODE: REBT3b
+ """
+ return doTestRebootWithNClients(t, env, 100, True)
+
+def testDoubleRebootWithManyManyManyClients(t, env):
+ """Double reboot with many many many clients
+
+ FLAGS: reboot
+ CODE: REBT3c
+ """
+ return doTestRebootWithNClients(t, env, 1000, True)
--
2.17.2


2019-03-14 21:12:13

by Scott Mayhew

[permalink] [raw]
Subject: [pynfs PATCH 3/4] nfs4.1: still more reboot tests

REBT4a, REBT4b, and REBT4c test recovery with multiple clients following
a server reboot, where half of the clients attempt to reclaim twice.

Signed-off-by: Scott Mayhew <[email protected]>
---
nfs4.1/server41tests/st_reboot.py | 78 ++++++++++++++++++++++++-------
1 file changed, 61 insertions(+), 17 deletions(-)

diff --git a/nfs4.1/server41tests/st_reboot.py b/nfs4.1/server41tests/st_reboot.py
index 8bce9ec..06913f8 100644
--- a/nfs4.1/server41tests/st_reboot.py
+++ b/nfs4.1/server41tests/st_reboot.py
@@ -38,10 +38,13 @@ def create_session(c, cred=None, flags=0):
123, sec)], cred)
return res

-def reclaim_complete(sess):
+def reclaim_complete(sess, dup=False):
rc_op = op.reclaim_complete(FALSE)
res = sess.compound([rc_op])
- check(res, msg="reclaim_complete")
+ if not dup:
+ check(res, msg="reclaim_complete")
+ else:
+ check(res, NFS4ERR_COMPLETE_ALREADY, msg="Duplicate reclaim_complete")

#####################################################

@@ -84,11 +87,12 @@ class State(object):
self.sess = sess
self.fh = fh

-def doTestOneClientGrace(t, env, state):
- res = state.sess.compound([])
- check(res, NFS4ERR_BADSESSION, "Bare sequence after reboot")
- res = create_session(state.c)
- check(res, NFS4ERR_STALE_CLIENTID, "Reclaim using old clientid")
+def doTestOneClientGrace(t, env, state, dup=False):
+ if not dup:
+ res = state.sess.compound([])
+ check(res, NFS4ERR_BADSESSION, "Bare sequence after reboot")
+ res = create_session(state.c)
+ check(res, NFS4ERR_STALE_CLIENTID, "Reclaim using old clientid")
c = env.c1.new_client(state.name)
state.c = c
sess = c.create_session()
@@ -99,11 +103,15 @@ def doTestOneClientGrace(t, env, state):
access=OPEN4_SHARE_ACCESS_BOTH,
deny=OPEN4_SHARE_DENY_NONE,
deleg_type=OPEN_DELEGATE_NONE)
- check(res, msg="Reclaim using newly created clientid")
- fh = res.resarray[-1].object
- stateid = res.resarray[-2].stateid
- reclaim_complete(sess)
- close_file(sess, fh, stateid=stateid)
+ if not dup:
+ check(res, msg="Reclaim using newly created clientid")
+ fh = res.resarray[-1].object
+ stateid = res.resarray[-2].stateid
+ else:
+ check(res, NFS4ERR_NO_GRACE, msg="Duplicate reclaim")
+ reclaim_complete(sess, dup)
+ if not dup:
+ close_file(sess, fh, stateid=stateid)
res = open_file(sess, state.owner, claim_type=CLAIM_NULL,
access=OPEN4_SHARE_ACCESS_BOTH,
deny=OPEN4_SHARE_DENY_NONE,
@@ -149,7 +157,11 @@ def doTestAllClientsNoGrace(t, env, states):
log.warn("server took approximately %d seconds to lift grace "
"after all clients reclaimed" % lift_time)

-def doTestRebootWithNClients(t, env, n=10, double_reboot=False):
+def doTestRebootWithNClients(t, env, n=10, double_reboot=False,
+ double_reclaim=False):
+ if double_reboot and double_reclaim:
+ raise RuntimeError("double_reboot and double_reclaim cannot both be true")
+
boot_time = int(time.time())
lease_time = 90
states = []
@@ -166,13 +178,21 @@ def doTestRebootWithNClients(t, env, n=10, double_reboot=False):
boot_time = _waitForReboot(env)

try:
- if double_reboot:
+ if double_reboot or double_reclaim:
for i in range(n/2):
lease_time = doTestOneClientGrace(t, env, states[i])
- boot_time = _waitForReboot(env)

- for i in range(n):
- lease_time = doTestOneClientGrace(t, env, states[i])
+ if double_reboot:
+ boot_time = _waitForReboot(env)
+
+ if double_reclaim:
+ for i in range(n/2):
+ lease_time = doTestOneClientGrace(t, env, states[i], True)
+ for i in range(n/2, n):
+ lease_time = doTestOneClientGrace(t, env, states[i])
+ else:
+ for i in range(n):
+ lease_time = doTestOneClientGrace(t, env, states[i])

# At this point, all clients should have recovered except for 'block'.
# Recover that one now.
@@ -239,3 +259,27 @@ def testDoubleRebootWithManyManyManyClients(t, env):
CODE: REBT3c
"""
return doTestRebootWithNClients(t, env, 1000, True)
+
+def testRebootWithManyClientsDoubleReclaim(t, env):
+ """Reboot with many clients where half try to reclaim twice
+
+ FLAGS: reboot
+ CODE: REBT4a
+ """
+ return doTestRebootWithNClients(t, env, double_reclaim=True)
+
+def testRebootWithManyManyClientsDoubleReclaim(t, env):
+ """Reboot with many many clients where half try to reclaim twice
+
+ FLAGS: reboot
+ CODE: REBT4b
+ """
+ return doTestRebootWithNClients(t, env, 100, double_reclaim=True)
+
+def testRebootWithManyManyManyClientsDoubleReclaim(t, env):
+ """Reboot with many many many clients where half try to reclaim twice
+
+ FLAGS: reboot
+ CODE: REBT4c
+ """
+ return doTestRebootWithNClients(t, env, 1000, double_reclaim=True)
--
2.17.2


2019-03-14 21:12:13

by Scott Mayhew

[permalink] [raw]
Subject: [pynfs PATCH 4/4] nfs4.1: test delayed reclaim following a server reboot

REBT5 tests a server reboot where the client begins reclaim shortly
before the end of the grace period.

Signed-off-by: Scott Mayhew <[email protected]>
---
nfs4.1/server41tests/st_reboot.py | 56 +++++++++++++++++++++++++++++++
1 file changed, 56 insertions(+)

diff --git a/nfs4.1/server41tests/st_reboot.py b/nfs4.1/server41tests/st_reboot.py
index 06913f8..b77e2cf 100644
--- a/nfs4.1/server41tests/st_reboot.py
+++ b/nfs4.1/server41tests/st_reboot.py
@@ -283,3 +283,59 @@ def testRebootWithManyManyManyClientsDoubleReclaim(t, env):
CODE: REBT4c
"""
return doTestRebootWithNClients(t, env, 1000, double_reclaim=True)
+
+def testRebootWithLateReclaim(t, env):
+ """Reboot with client that starts reclaim near end of grace
+
+ FLAGS: reboot
+ CODE: REBT5
+ """
+ boot_time = int(time.time())
+ lease_time = 90
+ fh = []
+ stateid = []
+ name = "%s_client" % env.testname(t)
+ owner = "owner_%s" % name
+ c = env.c1.new_client(name)
+ sess = c.create_session()
+ reclaim_complete(sess)
+ N = 42
+ for i in range(N):
+ path = sess.c.homedir + ["%s_file_%i" % (owner, i)]
+ tmpfh, tmpstateid = create_confirm(sess, owner, path)
+ fh.append(tmpfh)
+ lease_time = _getleasetime(sess)
+ boot_time = _waitForReboot(env)
+ try:
+ sleep_time = lease_time - 5
+ env.sleep(sleep_time, "Delaying start of reclaim")
+ res = sess.compound([])
+ check(res, NFS4ERR_BADSESSION, "Bare sequence after reboot")
+ res = create_session(c)
+ check(res, NFS4ERR_STALE_CLIENTID, "Reclaim using old clientid")
+ c = env.c1.new_client(name)
+ sess = c.create_session()
+ lease_time = _getleasetime(sess)
+ # Reclaim open files, with a short delay between each open reclaim.
+ # This should put us at the end of the original grace period. The
+ # server might keep extending the grace period by 1 second (up to
+ # an additional lease period in total) as long as we keep reclaming.
+ for i in range(N):
+ res = open_file(sess, owner, path=fh[i], claim_type=CLAIM_PREVIOUS,
+ access=OPEN4_SHARE_ACCESS_BOTH,
+ deny=OPEN4_SHARE_DENY_NONE,
+ deleg_type=OPEN_DELEGATE_NONE)
+ check(res, msg="Reclaim using newly created clientid")
+ tmpstateid = res.resarray[-2].stateid
+ stateid.append(tmpstateid)
+ time.sleep(0.25)
+ reclaim_complete(sess)
+ for i in range(N):
+ close_file(sess, fh[i], stateid[i])
+ except:
+ grace_end_time = boot_time + lease_time + 5
+ now = int(time.time())
+ if now < grace_end_time:
+ sleep_time = grace_end_time - now
+ env.sleep(sleep_time, "Waiting for grace period to end")
+ raise
--
2.17.2


2019-03-14 21:12:13

by Scott Mayhew

[permalink] [raw]
Subject: [pynfs PATCH 1/4] nfs4.1: add some reboot tests

REBT2a, REBT2b, and REBT2c test recovery with multiple clients following
a server reboot.

Signed-off-by: Scott Mayhew <[email protected]>
---
nfs4.1/server41tests/st_reboot.py | 151 +++++++++++++++++++++++++++++-
1 file changed, 147 insertions(+), 4 deletions(-)

diff --git a/nfs4.1/server41tests/st_reboot.py b/nfs4.1/server41tests/st_reboot.py
index b852ded..0216127 100644
--- a/nfs4.1/server41tests/st_reboot.py
+++ b/nfs4.1/server41tests/st_reboot.py
@@ -1,25 +1,33 @@
from xdrdef.nfs4_const import *
from xdrdef.nfs4_type import *
-from .environment import check, fail, create_file, open_file, create_confirm
+from .environment import check, fail, create_file, open_file, create_confirm, close_file
import sys
import os
-import nfs4lib
+import time
+import logging
import nfs_ops
op = nfs_ops.NFS4ops()
from rpc import RPCTimeout

# NOTE - reboot tests are NOT part of the standard test suite

+log = logging.getLogger("test.env")
+
def _getleasetime(sess):
res = sess.compound([op.putrootfh(), op.getattr(1 << FATTR4_LEASE_TIME)])
return res.resarray[-1].obj_attributes[FATTR4_LEASE_TIME]

-def _waitForReboot(c, sess, env):
+def _waitForReboot(env):
env.serverhelper("reboot")
# Wait until the server is back up.
# The following blocks until it gets a response,
# which happens when the server comes back up.
env.c1.c1 = env.c1.connect(env.c1.server_address)
+ # Go ahead and whack the cached session and client ids now
+ # to avoid errors in Environment.finish().
+ env.c1.sessions = {}
+ env.c1.clients = {}
+ return int(time.time())

def create_session(c, cred=None, flags=0):
"""Send a simple CREATE_SESSION"""
@@ -51,7 +59,7 @@ def testRebootValid(t, env):
reclaim_complete(sess)
fh, stateid = create_confirm(sess, owner)
sleeptime = 5 + _getleasetime(sess)
- _waitForReboot(c, sess, env)
+ _waitForReboot(env)
try:
res = create_session(c)
check(res, NFS4ERR_STALE_CLIENTID, "Reclaim using old clientid")
@@ -67,3 +75,138 @@ def testRebootValid(t, env):
reclaim_complete(sess)
finally:
env.sleep(sleeptime, "Waiting for grace period to end")
+
+class State(object):
+ def __init__(self, name, owner, c, sess, fh):
+ self.name = name
+ self.owner = owner
+ self.c = c
+ self.sess = sess
+ self.fh = fh
+
+def doTestOneClientGrace(t, env, state):
+ res = state.sess.compound([])
+ check(res, NFS4ERR_BADSESSION, "Bare sequence after reboot")
+ res = create_session(state.c)
+ check(res, NFS4ERR_STALE_CLIENTID, "Reclaim using old clientid")
+ c = env.c1.new_client(state.name)
+ state.c = c
+ sess = c.create_session()
+ state.sess = sess
+ lease_time = _getleasetime(sess)
+ res = open_file(sess, state.owner, path=state.fh,
+ claim_type=CLAIM_PREVIOUS,
+ access=OPEN4_SHARE_ACCESS_BOTH,
+ deny=OPEN4_SHARE_DENY_NONE,
+ deleg_type=OPEN_DELEGATE_NONE)
+ check(res, msg="Reclaim using newly created clientid")
+ fh = res.resarray[-1].object
+ stateid = res.resarray[-2].stateid
+ reclaim_complete(sess)
+ close_file(sess, fh, stateid=stateid)
+ res = open_file(sess, state.owner, claim_type=CLAIM_NULL,
+ access=OPEN4_SHARE_ACCESS_BOTH,
+ deny=OPEN4_SHARE_DENY_NONE,
+ deleg_type=OPEN_DELEGATE_NONE)
+ check(res, NFS4ERR_GRACE, "New open during grace")
+ return lease_time
+
+def doTestOneClientNoGrace(t, env, state):
+ res = open_file(state.sess, state.owner, claim_type=CLAIM_NULL,
+ access=OPEN4_SHARE_ACCESS_BOTH,
+ deny=OPEN4_SHARE_DENY_NONE,
+ deleg_type=OPEN_DELEGATE_NONE)
+ if (res.status == NFS4ERR_GRACE):
+ return res
+ check(res, msg="New open after all clients done reclaiming")
+ fh = res.resarray[-1].object
+ stateid = res.resarray[-2].stateid
+ close_file(state.sess, fh, stateid=stateid)
+ return res
+
+# The server may have lifted the grace period early, but it's not obligated.
+# Keep looping until all the clients have done a normal open. If the server
+# didn't lift the grace period early we don't want to fail the test, but we
+# do want to log a message.
+def doTestAllClientsNoGrace(t, env, states):
+ all_done = False
+ warn_grace = False
+ start_time = int(time.time())
+ ok_time = 0
+ while not all_done:
+ all_done = True
+ for state in states:
+ res = doTestOneClientNoGrace(t, env, state)
+ if res.status == NFS4ERR_GRACE:
+ warn_grace = True
+ all_done = False
+ elif not ok_time:
+ ok_time = int(time.time())
+ if not all_done:
+ time.sleep(1)
+ if warn_grace:
+ lift_time = ok_time - start_time
+ log.warn("server took approximately %d seconds to lift grace "
+ "after all clients reclaimed" % lift_time)
+
+def doTestRebootWithNClients(t, env, n=10):
+ boot_time = int(time.time())
+ lease_time = 90
+ states = []
+ block = env.c1.new_client_session("%s_block" % env.testname(t))
+ for i in range(n):
+ name = "%s_client_%i" % (env.testname(t), i)
+ owner = "owner_%s" % name
+ c = env.c1.new_client(name)
+ sess = c.create_session()
+ reclaim_complete(sess)
+ fh, stateid = create_confirm(sess, owner)
+ states.append(State(name, owner, c, sess, fh))
+ lease_time = _getleasetime(sess)
+ boot_time = _waitForReboot(env)
+
+ try:
+ for i in range(n):
+ lease_time = doTestOneClientGrace(t, env, states[i])
+
+ # At this point, all clients should have recovered except for 'block'.
+ # Recover that one now.
+ block = env.c1.new_client_session("%s_block" % env.testname(t))
+
+ # The server may have lifted the grace period early. Test it.
+ doTestAllClientsNoGrace(t, env, states)
+
+ # If the test went normally, then the grace period should have already
+ # ended. If we got some unexpected error, then wait a bit for the server
+ # to expire the clients so cleanup can proceed more smoothly.
+ except:
+ grace_end_time = boot_time + lease_time + 5
+ now = int(time.time())
+ if now < grace_end_time:
+ sleep_time = grace_end_time - now
+ env.sleep(sleep_time, "Waiting for grace period to end")
+ raise
+
+def testRebootWithManyClients(t, env):
+ """Reboot with many clients
+
+ FLAGS: reboot
+ CODE: REBT2a
+ """
+ return doTestRebootWithNClients(t, env)
+
+def testRebootWithManyManyClients(t, env):
+ """Reboot with many many clients
+
+ FLAGS: reboot
+ CODE: REBT2b
+ """
+ return doTestRebootWithNClients(t, env, 100)
+
+def testRebootWithManyManyManyClients(t, env):
+ """Reboot with many many many clients
+
+ FLAGS: reboot
+ CODE: REBT2c
+ """
+ return doTestRebootWithNClients(t, env, 1000)
--
2.17.2


2019-03-14 21:48:34

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [pynfs PATCH 0/4] nfs4.1: add a bunch of reboot tests

On Thu, Mar 14, 2019 at 05:12:06PM -0400, Scott Mayhew wrote:
> (Note: These patches go on top of "nfs4.1: clean up the session and
> client created in Environment.init()" from Jeff Layton sent on March
> 14th).

That was yours, wasn't it? Applied, anyway.

> These patches add the following reboot tests:

Those look like good ideas, I've applied the patches.

The one thing I notice on a quick skim is that REBT5 depends on Linux
knfsd-specific behavior. That's worth noting in a comment.

(Also maybe we should add a flag to such tests. "knfsd"? We've got
some tests flagged "ganesha".)

--b.

>
> REBT2x: Server reboot with N clients reclaiming opens.
> REBT3x: Server reboot with N clients reclaming opens. Server reboots
> again after half the clients have reclaimed.
> REBT4x: Server reboot with N clients reclaiming opens. Half the
> clients attempt to reclaim twice.
> (where x in {a, b, c}, with 'a' using 10 clients, 'b' using 100 clients,
> and 'c' testing 1000 clients)
> REBT5: Server reboot where the client starts reclaiming shortly before
> the end of grace.
>
> Note all of these tests require the use of a helper script or manual
> intervention to restart the server.
>
> Scott Mayhew (4):
> nfs4.1: add some reboot tests
> nfs4.1: add some more reboot tests
> nfs4.1: still more reboot tests
> nfs4.1: test delayed reclaim following a server reboot
>
> nfs4.1/server41tests/st_reboot.py | 284 +++++++++++++++++++++++++++++-
> 1 file changed, 278 insertions(+), 6 deletions(-)
>
> --
> 2.17.2

2019-03-14 23:18:35

by Scott Mayhew

[permalink] [raw]
Subject: Re: [pynfs PATCH 0/4] nfs4.1: add a bunch of reboot tests

On Thu, 14 Mar 2019, J. Bruce Fields wrote:

> On Thu, Mar 14, 2019 at 05:12:06PM -0400, Scott Mayhew wrote:
> > (Note: These patches go on top of "nfs4.1: clean up the session and
> > client created in Environment.init()" from Jeff Layton sent on March
> > 14th).
>
> That was yours, wasn't it? Applied, anyway.

No, my patch modified Environment.finish(). Jeff's modifies
Environment.init(). You'll need his patch too or these tests won't
work.

-Scott

>
> > These patches add the following reboot tests:
>
> Those look like good ideas, I've applied the patches.
>
> The one thing I notice on a quick skim is that REBT5 depends on Linux
> knfsd-specific behavior. That's worth noting in a comment.
>
> (Also maybe we should add a flag to such tests. "knfsd"? We've got
> some tests flagged "ganesha".)
>
> --b.
>
> >
> > REBT2x: Server reboot with N clients reclaiming opens.
> > REBT3x: Server reboot with N clients reclaming opens. Server reboots
> > again after half the clients have reclaimed.
> > REBT4x: Server reboot with N clients reclaiming opens. Half the
> > clients attempt to reclaim twice.
> > (where x in {a, b, c}, with 'a' using 10 clients, 'b' using 100 clients,
> > and 'c' testing 1000 clients)
> > REBT5: Server reboot where the client starts reclaiming shortly before
> > the end of grace.
> >
> > Note all of these tests require the use of a helper script or manual
> > intervention to restart the server.
> >
> > Scott Mayhew (4):
> > nfs4.1: add some reboot tests
> > nfs4.1: add some more reboot tests
> > nfs4.1: still more reboot tests
> > nfs4.1: test delayed reclaim following a server reboot
> >
> > nfs4.1/server41tests/st_reboot.py | 284 +++++++++++++++++++++++++++++-
> > 1 file changed, 278 insertions(+), 6 deletions(-)
> >
> > --
> > 2.17.2

2019-03-15 01:00:38

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [pynfs PATCH 0/4] nfs4.1: add a bunch of reboot tests

On Thu, Mar 14, 2019 at 07:18:33PM -0400, Scott Mayhew wrote:
> On Thu, 14 Mar 2019, J. Bruce Fields wrote:
>
> > On Thu, Mar 14, 2019 at 05:12:06PM -0400, Scott Mayhew wrote:
> > > (Note: These patches go on top of "nfs4.1: clean up the session and
> > > client created in Environment.init()" from Jeff Layton sent on March
> > > 14th).
> >
> > That was yours, wasn't it? Applied, anyway.
>
> No, my patch modified Environment.finish(). Jeff's modifies
> Environment.init(). You'll need his patch too or these tests won't
> work.

Whoops, sorry.

I don't see that mail from Jeff anywhere. Would someone mind resending?

--b.

>
> -Scott
>
> >
> > > These patches add the following reboot tests:
> >
> > Those look like good ideas, I've applied the patches.
> >
> > The one thing I notice on a quick skim is that REBT5 depends on Linux
> > knfsd-specific behavior. That's worth noting in a comment.
> >
> > (Also maybe we should add a flag to such tests. "knfsd"? We've got
> > some tests flagged "ganesha".)
> >
> > --b.
> >
> > >
> > > REBT2x: Server reboot with N clients reclaiming opens.
> > > REBT3x: Server reboot with N clients reclaming opens. Server reboots
> > > again after half the clients have reclaimed.
> > > REBT4x: Server reboot with N clients reclaiming opens. Half the
> > > clients attempt to reclaim twice.
> > > (where x in {a, b, c}, with 'a' using 10 clients, 'b' using 100 clients,
> > > and 'c' testing 1000 clients)
> > > REBT5: Server reboot where the client starts reclaiming shortly before
> > > the end of grace.
> > >
> > > Note all of these tests require the use of a helper script or manual
> > > intervention to restart the server.
> > >
> > > Scott Mayhew (4):
> > > nfs4.1: add some reboot tests
> > > nfs4.1: add some more reboot tests
> > > nfs4.1: still more reboot tests
> > > nfs4.1: test delayed reclaim following a server reboot
> > >
> > > nfs4.1/server41tests/st_reboot.py | 284 +++++++++++++++++++++++++++++-
> > > 1 file changed, 278 insertions(+), 6 deletions(-)
> > >
> > > --
> > > 2.17.2

2019-03-15 01:03:29

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [pynfs PATCH 0/4] nfs4.1: add a bunch of reboot tests

On Thu, Mar 14, 2019 at 09:00:36PM -0400, J. Bruce Fields wrote:
> On Thu, Mar 14, 2019 at 07:18:33PM -0400, Scott Mayhew wrote:
> > On Thu, 14 Mar 2019, J. Bruce Fields wrote:
> >
> > > On Thu, Mar 14, 2019 at 05:12:06PM -0400, Scott Mayhew wrote:
> > > > (Note: These patches go on top of "nfs4.1: clean up the session and
> > > > client created in Environment.init()" from Jeff Layton sent on March
> > > > 14th).
> > >
> > > That was yours, wasn't it? Applied, anyway.
> >
> > No, my patch modified Environment.finish(). Jeff's modifies
> > Environment.init(). You'll need his patch too or these tests won't
> > work.
>
> Whoops, sorry.
>
> I don't see that mail from Jeff anywhere. Would someone mind resending?

Never mind, I found it.--b.

>
> --b.
>
> >
> > -Scott
> >
> > >
> > > > These patches add the following reboot tests:
> > >
> > > Those look like good ideas, I've applied the patches.
> > >
> > > The one thing I notice on a quick skim is that REBT5 depends on Linux
> > > knfsd-specific behavior. That's worth noting in a comment.
> > >
> > > (Also maybe we should add a flag to such tests. "knfsd"? We've got
> > > some tests flagged "ganesha".)
> > >
> > > --b.
> > >
> > > >
> > > > REBT2x: Server reboot with N clients reclaiming opens.
> > > > REBT3x: Server reboot with N clients reclaming opens. Server reboots
> > > > again after half the clients have reclaimed.
> > > > REBT4x: Server reboot with N clients reclaiming opens. Half the
> > > > clients attempt to reclaim twice.
> > > > (where x in {a, b, c}, with 'a' using 10 clients, 'b' using 100 clients,
> > > > and 'c' testing 1000 clients)
> > > > REBT5: Server reboot where the client starts reclaiming shortly before
> > > > the end of grace.
> > > >
> > > > Note all of these tests require the use of a helper script or manual
> > > > intervention to restart the server.
> > > >
> > > > Scott Mayhew (4):
> > > > nfs4.1: add some reboot tests
> > > > nfs4.1: add some more reboot tests
> > > > nfs4.1: still more reboot tests
> > > > nfs4.1: test delayed reclaim following a server reboot
> > > >
> > > > nfs4.1/server41tests/st_reboot.py | 284 +++++++++++++++++++++++++++++-
> > > > 1 file changed, 278 insertions(+), 6 deletions(-)
> > > >
> > > > --
> > > > 2.17.2

2019-03-15 19:43:46

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [pynfs PATCH 1/4] nfs4.1: add some reboot tests

On Thu, Mar 14, 2019 at 05:12:07PM -0400, Scott Mayhew wrote:
> +def doTestRebootWithNClients(t, env, n=10):
> + boot_time = int(time.time())
> + lease_time = 90

Looks like these two variables aren't used till they're set again a
little further down, so I'll delete these two lines.

--b.

> + states = []
> + block = env.c1.new_client_session("%s_block" % env.testname(t))
> + for i in range(n):
> + name = "%s_client_%i" % (env.testname(t), i)
> + owner = "owner_%s" % name
> + c = env.c1.new_client(name)
> + sess = c.create_session()
> + reclaim_complete(sess)
> + fh, stateid = create_confirm(sess, owner)
> + states.append(State(name, owner, c, sess, fh))
> + lease_time = _getleasetime(sess)
> + boot_time = _waitForReboot(env)

2019-03-15 19:52:36

by Scott Mayhew

[permalink] [raw]
Subject: Re: [pynfs PATCH 1/4] nfs4.1: add some reboot tests

On Fri, 15 Mar 2019, J. Bruce Fields wrote:

> On Thu, Mar 14, 2019 at 05:12:07PM -0400, Scott Mayhew wrote:
> > +def doTestRebootWithNClients(t, env, n=10):
> > + boot_time = int(time.time())
> > + lease_time = 90
>
> Looks like these two variables aren't used till they're set again a
> little further down, so I'll delete these two lines.

The intention there was to have some default values for the exception
handler if the test were to barf right away for some reason.
>
> --b.
>
> > + states = []
> > + block = env.c1.new_client_session("%s_block" % env.testname(t))
> > + for i in range(n):
> > + name = "%s_client_%i" % (env.testname(t), i)
> > + owner = "owner_%s" % name
> > + c = env.c1.new_client(name)
> > + sess = c.create_session()
> > + reclaim_complete(sess)
> > + fh, stateid = create_confirm(sess, owner)
> > + states.append(State(name, owner, c, sess, fh))
> > + lease_time = _getleasetime(sess)
> > + boot_time = _waitForReboot(env)

2019-03-15 20:49:01

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [pynfs PATCH 1/4] nfs4.1: add some reboot tests

On Thu, Mar 14, 2019 at 05:12:07PM -0400, Scott Mayhew wrote:
> +def testRebootWithManyManyManyClients(t, env):
> + """Reboot with many many many clients
> +
> + FLAGS: reboot
> + CODE: REBT2c
> + """
> + return doTestRebootWithNClients(t, env, 1000)

My test server uses a 15 second lease time, mainly just to speed up
tests. That's not enough for pynfs to send out reclaims for 1000
clients.

So I'm wondering whether that's a reasonable test or not.

On the one hand, we should be able to handle 1000 clients, and a 15
second lease is probably unrealistically short. And maybe we could
choose more patient behavior for the server (currently it will wait at
most 2 grace periods while reclaims continue to arrive).

On the other hand, real clients will send their reclaims simultaneously
rather than one at a time. And from a trace it looks like most of the
time's spent waiting for pynfs to send the next request rather than
waiting for replies. So this is a bit unusual.

I'm inclined to drop the "many many many clients" tests. It's easy
enough for someone doing reboot testing to patch the tests if they need
to.

By the way, the longest round trip time I see is the RECLAIM_COMPLETE.
I assume that's doing a commit to disk. It looks like there's nothing
on the server to prevent processing RECLAIM_COMPLETEs in parallel so as
long as that's true I suppose we're OK.

--b.

2019-03-15 20:50:21

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [pynfs PATCH 1/4] nfs4.1: add some reboot tests

On Fri, Mar 15, 2019 at 03:52:33PM -0400, Scott Mayhew wrote:
> On Fri, 15 Mar 2019, J. Bruce Fields wrote:
>
> > On Thu, Mar 14, 2019 at 05:12:07PM -0400, Scott Mayhew wrote:
> > > +def doTestRebootWithNClients(t, env, n=10):
> > > + boot_time = int(time.time())
> > > + lease_time = 90
> >
> > Looks like these two variables aren't used till they're set again a
> > little further down, so I'll delete these two lines.
>
> The intention there was to have some default values for the exception
> handler if the test were to barf right away for some reason.

Oh, I see, OK.

--b.

> >
> > --b.
> >
> > > + states = []
> > > + block = env.c1.new_client_session("%s_block" % env.testname(t))
> > > + for i in range(n):
> > > + name = "%s_client_%i" % (env.testname(t), i)
> > > + owner = "owner_%s" % name
> > > + c = env.c1.new_client(name)
> > > + sess = c.create_session()
> > > + reclaim_complete(sess)
> > > + fh, stateid = create_confirm(sess, owner)
> > > + states.append(State(name, owner, c, sess, fh))
> > > + lease_time = _getleasetime(sess)
> > > + boot_time = _waitForReboot(env)

2019-03-18 14:30:23

by Frank Filz

[permalink] [raw]
Subject: RE: [pynfs PATCH 1/4] nfs4.1: add some reboot tests

> On Thu, Mar 14, 2019 at 05:12:07PM -0400, Scott Mayhew wrote:
> > +def testRebootWithManyManyManyClients(t, env):
> > + """Reboot with many many many clients
> > +
> > + FLAGS: reboot
> > + CODE: REBT2c
> > + """
> > + return doTestRebootWithNClients(t, env, 1000)
>
> My test server uses a 15 second lease time, mainly just to speed up tests.
That's
> not enough for pynfs to send out reclaims for 1000 clients.
>
> So I'm wondering whether that's a reasonable test or not.
>
> On the one hand, we should be able to handle 1000 clients, and a 15 second
> lease is probably unrealistically short. And maybe we could choose more
patient
> behavior for the server (currently it will wait at most 2 grace periods
while
> reclaims continue to arrive).
>
> On the other hand, real clients will send their reclaims simultaneously
rather
> than one at a time. And from a trace it looks like most of the time's
spent
> waiting for pynfs to send the next request rather than waiting for
replies. So this
> is a bit unusual.
>
> I'm inclined to drop the "many many many clients" tests. It's easy enough
for
> someone doing reboot testing to patch the tests if they need to.
>
> By the way, the longest round trip time I see is the RECLAIM_COMPLETE.
> I assume that's doing a commit to disk. It looks like there's nothing on
the
> server to prevent processing RECLAIM_COMPLETEs in parallel so as long as
> that's true I suppose we're OK.

How about having the many many many clients tests under a different flag so
they are still available but easy to pick or not pick?

Considering that CID5 with the huge number of client-ids it creates but
doesn't clean up (so they all eventually expire) has caught bugs in Ganesha,
I like the idea of messy big tests being available for QE to run...

Frank


2019-03-18 14:57:30

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [pynfs PATCH 1/4] nfs4.1: add some reboot tests

On Mon, Mar 18, 2019 at 07:30:20AM -0700, Frank Filz wrote:
> > On Thu, Mar 14, 2019 at 05:12:07PM -0400, Scott Mayhew wrote:
> > > +def testRebootWithManyManyManyClients(t, env):
> > > + """Reboot with many many many clients
> > > +
> > > + FLAGS: reboot
> > > + CODE: REBT2c
> > > + """
> > > + return doTestRebootWithNClients(t, env, 1000)
> >
> > My test server uses a 15 second lease time, mainly just to speed up tests.
> That's
> > not enough for pynfs to send out reclaims for 1000 clients.
> >
> > So I'm wondering whether that's a reasonable test or not.
> >
> > On the one hand, we should be able to handle 1000 clients, and a 15 second
> > lease is probably unrealistically short. And maybe we could choose more
> patient
> > behavior for the server (currently it will wait at most 2 grace periods
> while
> > reclaims continue to arrive).
> >
> > On the other hand, real clients will send their reclaims simultaneously
> rather
> > than one at a time. And from a trace it looks like most of the time's
> spent
> > waiting for pynfs to send the next request rather than waiting for
> replies. So this
> > is a bit unusual.
> >
> > I'm inclined to drop the "many many many clients" tests. It's easy enough
> for
> > someone doing reboot testing to patch the tests if they need to.
> >
> > By the way, the longest round trip time I see is the RECLAIM_COMPLETE.
> > I assume that's doing a commit to disk. It looks like there's nothing on
> the
> > server to prevent processing RECLAIM_COMPLETEs in parallel so as long as
> > that's true I suppose we're OK.
>
> How about having the many many many clients tests under a different flag so
> they are still available but easy to pick or not pick?

That might be OK.

Or it might also be possible to make the test a little smarter; e.g., if
reclaims start to fail with NOGRACE after a lease period, keep going and
maybe have the test WARN instead of failing.

--b.

> Considering that CID5 with the huge number of client-ids it creates but
> doesn't clean up (so they all eventually expire) has caught bugs in Ganesha,
> I like the idea of messy big tests being available for QE to run...