2023-06-14 20:14:52

by Olga Kornievskaia

[permalink] [raw]
Subject: Handling of BADSESSON error

Hi Trond,

I'm looking for advice on how to handle the problem that when
BADSESSION is received (on an interrupted slot) and we don't increment
the seqid for that slot. The client releases the slot and it's
possible for another thread to use it before the session is frozen.
Here are the (unfiltered sequential) tracepoints showing the problem.
Follow slot_nr=0 and seq_nr=7673

kworker/u2:26-541 [000] ..... 869.508658: nfs4_sequence_done:
error=-10052 (BADSESSION) session=0x90caa481 slot_nr=4 seq_nr=4259
highest_slotid=0 target_highest_slotid=0 status_flags=0x0 ()
kworker/u2:26-541 [000] ..... 869.508661: nfs4_write:
error=-10052 (BADSESSION) fileid=00:3b:111 fhandle=0x59c8ccff
offset=2304664 count=7992 res=0 stateid=1:0x3f4f04cd
layoutstateid=0:0x00000000
kworker/u2:1-3198 [000] ..... 869.508898: nfs4_xdr_status:
task:0000a2ae@00000011 xid=0x5d0f6dda error=-10052 (BADSESSION)
operation=53
kworker/u2:1-3198 [000] ..... 869.508905: nfs4_sequence_done:
error=-10052 (BADSESSION) session=0x90caa481 slot_nr=0 seq_nr=7673
highest_slotid=0 target_highest_slotid=0 status_flags=0x0 ()
dt-3684 [000] ..... 869.508918: nfs4_set_lock:
error=-10052 (BADSESSION) cmd=SETLK:WRLCK range=1603340:1834535
fileid=00:3b:109 fhandle=0x7c6bc6b4 stateid=1:0x8f5f1fe4
lockstateid=0:0x7bd5c66f

*** this is use of slot_nr=0 seq_nr=7673 that gets BADSESSION. Slot
gets released without incrementing the seq#. The next tracepoint shows
the use of the slot again by another lock call ***

kworker/u2:1-3198 [000] ..... 869.508928:
nfs4_setup_sequence: session=0x90caa481 slot_nr=0 seq_nr=7673
highest_used_slotid=1
kworker/u2:29-549 [000] ..... 869.509746: nfs4_sequence_done:
error=0 (OK) session=0x90caa481 slot_nr=0 seq_nr=7673
highest_slotid=63 target_highest_slotid=63 status_flags=0x0 ()
dt-3672 [000] ..... 869.509770: nfs4_set_lock:
error=0 (OK) cmd=SETLK:WRLCK range=146432:159743 fileid=00:3b:129
fhandle=0x50fa2dd4 stateid=1:0xcf065b31 lockstateid=1:0x5c571804
kworker/u2:26-541 [000] ..... 869.509814:
nfs4_setup_sequence: session=0x90caa481 slot_nr=0 seq_nr=7674
highest_used_slotid=0
kworker/u2:26-541 [000] ..... 869.509857:
nfs4_setup_sequence: session=0x90caa481 slot_nr=1 seq_nr=7805
highest_used_slotid=1

** finally the state manager gets to run? But only after 3 "NEW" use
of slots are done **

172.28.68.180-m-3751 [000] ..... 869.510267: nfs4_state_mgr:
hostname=172.28.68.180 clp state=MANAGER_RUNNING|CHECK_LEASE|0xc040
kworker/u2:29-549 [000] ..... 869.510977: nfs4_xdr_status:
task:0000a2c8@00000011 xid=0x5e0f6dda error=-10052 (BADSESSION)
operation=53
kworker/u2:29-549 [000] ..... 869.510983: nfs4_sequence_done:
error=-10052 (BADSESSION) session=0x90caa481 slot_nr=1 seq_nr=7805
highest_slotid=0 target_highest_slotid=0 status_flags=0x0 ()
kworker/u2:29-549 [000] ..... 869.510985: nfs4_write:
error=-10052 (BADSESSION) fileid=00:3b:129 fhandle=0x50fa2dd4
offset=146432 count=13312 res=0 stateid=1:0xcf065b31
layoutstateid=0:0x00000000
kworker/u2:26-541 [000] ..... 869.511318: nfs4_sequence_done:
error=0 (OK) session=0x90caa481 slot_nr=0 seq_nr=7674
highest_slotid=63 target_highest_slotid=63 status_flags=0x0 ()
dt-3669 [000] ..... 869.511337: nfs4_set_lock:
error=0 (OK) cmd=SETLK:WRLCK range=2462720:2469375 fileid=00:3b:138
fhandle=0xe30d8cf3 stateid=1:0xe2787aa1 lockstateid=1:0x216421fe
172.28.68.180-m-3751 [000] ..... 869.511918:
nfs4_destroy_session: error=0 (OK) dstaddr=172.28.68.180
172.28.68.180-m-3751 [000] ..... 869.513347:
nfs4_create_session: error=0 (OK) dstaddr=172.28.68.180

To prevent reuse of the same slot/seqid for when we receive
BADSESSION, can we perhaps set slot->seq_done? Then, when
nfs41_sequence_process() calls nfs41_sequence_free_slot(), it'd
increment seq_nr then. Slot re-use would be prevented.

Or, perhaps we set the NFS4_SLOT_TBL_DRAINING bit right in
nfs41_sequence_process() for BADSESSION so that nothing else can get
the slot when it's released?

Or some other way or preventing slots being (re)used after receiving
BADSESSION on that slot. The problem if re-using (interrupted) slots
is that they get cached reply from the server and those operations
"think" operation succeeded and they have wrong/invalid stateids for
instance.

Here's the sequence of events. First of all this is a session trunking
scenario where one of the servers leaves the group.
NFS OP uses slot=0 seq=0 sends it to server 1. Server 1 processes the
request populates its session cache. But the reply never reaches the
client. Connection gets reset.
NFS OP is resent using slot=0 seq=0 to server 2 which just left the
trunking group. It replies with BADSESSION
(session is not frozen on the client yet) new NFS OP uses slot=0 seq=0
and sends it to server 1. Server 1 responds out of the session cache.
Client destroys the session
Client uses stateid returned from the new OP which is really invalid
for the operation. Server fails the operation. Application failure
occurs.

Thank you..


2023-06-14 20:37:59

by Rick Macklem

[permalink] [raw]
Subject: Re: Handling of BADSESSON error

On Wed, Jun 14, 2023 at 12:58 PM Olga Kornievskaia <[email protected]> wrote:
>
>
> Hi Trond,
>
> I'm looking for advice on how to handle the problem that when
> BADSESSION is received (on an interrupted slot) and we don't increment
> the seqid for that slot. The client releases the slot and it's
> possible for another thread to use it before the session is frozen.
> Here are the (unfiltered sequential) tracepoints showing the problem.
> Follow slot_nr=0 and seq_nr=7673
>
> kworker/u2:26-541 [000] ..... 869.508658: nfs4_sequence_done:
> error=-10052 (BADSESSION) session=0x90caa481 slot_nr=4 seq_nr=4259
> highest_slotid=0 target_highest_slotid=0 status_flags=0x0 ()
> kworker/u2:26-541 [000] ..... 869.508661: nfs4_write:
> error=-10052 (BADSESSION) fileid=00:3b:111 fhandle=0x59c8ccff
> offset=2304664 count=7992 res=0 stateid=1:0x3f4f04cd
> layoutstateid=0:0x00000000
> kworker/u2:1-3198 [000] ..... 869.508898: nfs4_xdr_status:
> task:0000a2ae@00000011 xid=0x5d0f6dda error=-10052 (BADSESSION)
> operation=53
> kworker/u2:1-3198 [000] ..... 869.508905: nfs4_sequence_done:
> error=-10052 (BADSESSION) session=0x90caa481 slot_nr=0 seq_nr=7673
> highest_slotid=0 target_highest_slotid=0 status_flags=0x0 ()
> dt-3684 [000] ..... 869.508918: nfs4_set_lock:
> error=-10052 (BADSESSION) cmd=SETLK:WRLCK range=1603340:1834535
> fileid=00:3b:109 fhandle=0x7c6bc6b4 stateid=1:0x8f5f1fe4
> lockstateid=0:0x7bd5c66f
>
> *** this is use of slot_nr=0 seq_nr=7673 that gets BADSESSION. Slot
> gets released without incrementing the seq#. The next tracepoint shows
> the use of the slot again by another lock call ***
>
> kworker/u2:1-3198 [000] ..... 869.508928:
> nfs4_setup_sequence: session=0x90caa481 slot_nr=0 seq_nr=7673
> highest_used_slotid=1
> kworker/u2:29-549 [000] ..... 869.509746: nfs4_sequence_done:
> error=0 (OK) session=0x90caa481 slot_nr=0 seq_nr=7673
> highest_slotid=63 target_highest_slotid=63 status_flags=0x0 ()
> dt-3672 [000] ..... 869.509770: nfs4_set_lock:
> error=0 (OK) cmd=SETLK:WRLCK range=146432:159743 fileid=00:3b:129
> fhandle=0x50fa2dd4 stateid=1:0xcf065b31 lockstateid=1:0x5c571804
> kworker/u2:26-541 [000] ..... 869.509814:
> nfs4_setup_sequence: session=0x90caa481 slot_nr=0 seq_nr=7674
> highest_used_slotid=0
> kworker/u2:26-541 [000] ..... 869.509857:
> nfs4_setup_sequence: session=0x90caa481 slot_nr=1 seq_nr=7805
> highest_used_slotid=1
>
> ** finally the state manager gets to run? But only after 3 "NEW" use
> of slots are done **
>
> 172.28.68.180-m-3751 [000] ..... 869.510267: nfs4_state_mgr:
> hostname=172.28.68.180 clp state=MANAGER_RUNNING|CHECK_LEASE|0xc040
> kworker/u2:29-549 [000] ..... 869.510977: nfs4_xdr_status:
> task:0000a2c8@00000011 xid=0x5e0f6dda error=-10052 (BADSESSION)
> operation=53
> kworker/u2:29-549 [000] ..... 869.510983: nfs4_sequence_done:
> error=-10052 (BADSESSION) session=0x90caa481 slot_nr=1 seq_nr=7805
> highest_slotid=0 target_highest_slotid=0 status_flags=0x0 ()
> kworker/u2:29-549 [000] ..... 869.510985: nfs4_write:
> error=-10052 (BADSESSION) fileid=00:3b:129 fhandle=0x50fa2dd4
> offset=146432 count=13312 res=0 stateid=1:0xcf065b31
> layoutstateid=0:0x00000000
> kworker/u2:26-541 [000] ..... 869.511318: nfs4_sequence_done:
> error=0 (OK) session=0x90caa481 slot_nr=0 seq_nr=7674
> highest_slotid=63 target_highest_slotid=63 status_flags=0x0 ()
> dt-3669 [000] ..... 869.511337: nfs4_set_lock:
> error=0 (OK) cmd=SETLK:WRLCK range=2462720:2469375 fileid=00:3b:138
> fhandle=0xe30d8cf3 stateid=1:0xe2787aa1 lockstateid=1:0x216421fe
> 172.28.68.180-m-3751 [000] ..... 869.511918:
> nfs4_destroy_session: error=0 (OK) dstaddr=172.28.68.180
> 172.28.68.180-m-3751 [000] ..... 869.513347:
> nfs4_create_session: error=0 (OK) dstaddr=172.28.68.180
>
> To prevent reuse of the same slot/seqid for when we receive
> BADSESSION, can we perhaps set slot->seq_done? Then, when
> nfs41_sequence_process() calls nfs41_sequence_free_slot(), it'd
> increment seq_nr then. Slot re-use would be prevented.
>
> Or, perhaps we set the NFS4_SLOT_TBL_DRAINING bit right in
> nfs41_sequence_process() for BADSESSION so that nothing else can get
> the slot when it's released?
>
> Or some other way or preventing slots being (re)used after receiving
> BADSESSION on that slot. The problem if re-using (interrupted) slots
> is that they get cached reply from the server and those operations
> "think" operation succeeded and they have wrong/invalid stateids for
> instance.
>
> Here's the sequence of events. First of all this is a session trunking
> scenario where one of the servers leaves the group.
> NFS OP uses slot=0 seq=0 sends it to server 1. Server 1 processes the
> request populates its session cache. But the reply never reaches the
> client. Connection gets reset.
> NFS OP is resent using slot=0 seq=0 to server 2 which just left the
> trunking group. It replies with BADSESSION
> (session is not frozen on the client yet) new NFS OP uses slot=0 seq=0
> and sends it to server 1. Server 1 responds out of the session cache.
To me, this sounds like a broken NFSv4.1/4.2 server. Once a session is bad,
I do not think there should ever be a reply that was cached in that bad session.
Put another way, the server should not leave the "trunking group' (whatever that
means?) without making the session bad for all trunks. I do not think
a session should
ever work on one server and not on another one.

Having said the above, I have no opinion w.r.t. how such a case should
be handled.
(Except to tell the NFS server vendor that their server is broken.)

Just mho, rick

> Client destroys the session
> Client uses stateid returned from the new OP which is really invalid
> for the operation. Server fails the operation. Application failure
> occurs.
>
> Thank you..

2023-06-14 21:01:07

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: Handling of BADSESSON error

On Wed, Jun 14, 2023 at 4:24 PM Rick Macklem <[email protected]> wrote:
>
> On Wed, Jun 14, 2023 at 12:58 PM Olga Kornievskaia <[email protected]> wrote:
> >
> >
> > Hi Trond,
> >
> > I'm looking for advice on how to handle the problem that when
> > BADSESSION is received (on an interrupted slot) and we don't increment
> > the seqid for that slot. The client releases the slot and it's
> > possible for another thread to use it before the session is frozen.
> > Here are the (unfiltered sequential) tracepoints showing the problem.
> > Follow slot_nr=0 and seq_nr=7673
> >
> > kworker/u2:26-541 [000] ..... 869.508658: nfs4_sequence_done:
> > error=-10052 (BADSESSION) session=0x90caa481 slot_nr=4 seq_nr=4259
> > highest_slotid=0 target_highest_slotid=0 status_flags=0x0 ()
> > kworker/u2:26-541 [000] ..... 869.508661: nfs4_write:
> > error=-10052 (BADSESSION) fileid=00:3b:111 fhandle=0x59c8ccff
> > offset=2304664 count=7992 res=0 stateid=1:0x3f4f04cd
> > layoutstateid=0:0x00000000
> > kworker/u2:1-3198 [000] ..... 869.508898: nfs4_xdr_status:
> > task:0000a2ae@00000011 xid=0x5d0f6dda error=-10052 (BADSESSION)
> > operation=53
> > kworker/u2:1-3198 [000] ..... 869.508905: nfs4_sequence_done:
> > error=-10052 (BADSESSION) session=0x90caa481 slot_nr=0 seq_nr=7673
> > highest_slotid=0 target_highest_slotid=0 status_flags=0x0 ()
> > dt-3684 [000] ..... 869.508918: nfs4_set_lock:
> > error=-10052 (BADSESSION) cmd=SETLK:WRLCK range=1603340:1834535
> > fileid=00:3b:109 fhandle=0x7c6bc6b4 stateid=1:0x8f5f1fe4
> > lockstateid=0:0x7bd5c66f
> >
> > *** this is use of slot_nr=0 seq_nr=7673 that gets BADSESSION. Slot
> > gets released without incrementing the seq#. The next tracepoint shows
> > the use of the slot again by another lock call ***
> >
> > kworker/u2:1-3198 [000] ..... 869.508928:
> > nfs4_setup_sequence: session=0x90caa481 slot_nr=0 seq_nr=7673
> > highest_used_slotid=1
> > kworker/u2:29-549 [000] ..... 869.509746: nfs4_sequence_done:
> > error=0 (OK) session=0x90caa481 slot_nr=0 seq_nr=7673
> > highest_slotid=63 target_highest_slotid=63 status_flags=0x0 ()
> > dt-3672 [000] ..... 869.509770: nfs4_set_lock:
> > error=0 (OK) cmd=SETLK:WRLCK range=146432:159743 fileid=00:3b:129
> > fhandle=0x50fa2dd4 stateid=1:0xcf065b31 lockstateid=1:0x5c571804
> > kworker/u2:26-541 [000] ..... 869.509814:
> > nfs4_setup_sequence: session=0x90caa481 slot_nr=0 seq_nr=7674
> > highest_used_slotid=0
> > kworker/u2:26-541 [000] ..... 869.509857:
> > nfs4_setup_sequence: session=0x90caa481 slot_nr=1 seq_nr=7805
> > highest_used_slotid=1
> >
> > ** finally the state manager gets to run? But only after 3 "NEW" use
> > of slots are done **
> >
> > 172.28.68.180-m-3751 [000] ..... 869.510267: nfs4_state_mgr:
> > hostname=172.28.68.180 clp state=MANAGER_RUNNING|CHECK_LEASE|0xc040
> > kworker/u2:29-549 [000] ..... 869.510977: nfs4_xdr_status:
> > task:0000a2c8@00000011 xid=0x5e0f6dda error=-10052 (BADSESSION)
> > operation=53
> > kworker/u2:29-549 [000] ..... 869.510983: nfs4_sequence_done:
> > error=-10052 (BADSESSION) session=0x90caa481 slot_nr=1 seq_nr=7805
> > highest_slotid=0 target_highest_slotid=0 status_flags=0x0 ()
> > kworker/u2:29-549 [000] ..... 869.510985: nfs4_write:
> > error=-10052 (BADSESSION) fileid=00:3b:129 fhandle=0x50fa2dd4
> > offset=146432 count=13312 res=0 stateid=1:0xcf065b31
> > layoutstateid=0:0x00000000
> > kworker/u2:26-541 [000] ..... 869.511318: nfs4_sequence_done:
> > error=0 (OK) session=0x90caa481 slot_nr=0 seq_nr=7674
> > highest_slotid=63 target_highest_slotid=63 status_flags=0x0 ()
> > dt-3669 [000] ..... 869.511337: nfs4_set_lock:
> > error=0 (OK) cmd=SETLK:WRLCK range=2462720:2469375 fileid=00:3b:138
> > fhandle=0xe30d8cf3 stateid=1:0xe2787aa1 lockstateid=1:0x216421fe
> > 172.28.68.180-m-3751 [000] ..... 869.511918:
> > nfs4_destroy_session: error=0 (OK) dstaddr=172.28.68.180
> > 172.28.68.180-m-3751 [000] ..... 869.513347:
> > nfs4_create_session: error=0 (OK) dstaddr=172.28.68.180
> >
> > To prevent reuse of the same slot/seqid for when we receive
> > BADSESSION, can we perhaps set slot->seq_done? Then, when
> > nfs41_sequence_process() calls nfs41_sequence_free_slot(), it'd
> > increment seq_nr then. Slot re-use would be prevented.
> >
> > Or, perhaps we set the NFS4_SLOT_TBL_DRAINING bit right in
> > nfs41_sequence_process() for BADSESSION so that nothing else can get
> > the slot when it's released?
> >
> > Or some other way or preventing slots being (re)used after receiving
> > BADSESSION on that slot. The problem if re-using (interrupted) slots
> > is that they get cached reply from the server and those operations
> > "think" operation succeeded and they have wrong/invalid stateids for
> > instance.
> >
> > Here's the sequence of events. First of all this is a session trunking
> > scenario where one of the servers leaves the group.
> > NFS OP uses slot=0 seq=0 sends it to server 1. Server 1 processes the
> > request populates its session cache. But the reply never reaches the
> > client. Connection gets reset.
> > NFS OP is resent using slot=0 seq=0 to server 2 which just left the
> > trunking group. It replies with BADSESSION
> > (session is not frozen on the client yet) new NFS OP uses slot=0 seq=0
> > and sends it to server 1. Server 1 responds out of the session cache.
> To me, this sounds like a broken NFSv4.1/4.2 server. Once a session is bad,
> I do not think there should ever be a reply that was cached in that bad session.
> Put another way, the server should not leave the "trunking group' (whatever that
> means?) without making the session bad for all trunks. I do not think
> a session should
> ever work on one server and not on another one.

The spec allows for server to leave the group and session to be still valid.

From section 2.10.13.1.4
"If the SEQUENCE requests fail with NFS4ERR_BADSESSION, then the
session no longer exists on any of the server network addresses for
which the client has connections associated with that session ID. It
is possible the session is still alive and available on other network
addresses. "

Linux server throws away the session on getting the BADSESSION but the
problem is that it doesn't happen instantly. So some new requests can
sneak before the session gets killed. Thus I'm advocating that slotid
still happens on BADSESSION or I'm suggesting that we freeze the
session table on BADSESSION which we currently don't do -- which
allows new requests to go.

>
> Having said the above, I have no opinion w.r.t. how such a case should
> be handled.
> (Except to tell the NFS server vendor that their server is broken.)
>
> Just mho, rick
>
> > Client destroys the session
> > Client uses stateid returned from the new OP which is really invalid
> > for the operation. Server fails the operation. Application failure
> > occurs.
> >
> > Thank you..

2023-06-14 21:01:40

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: Handling of BADSESSON error

On Wed, Jun 14, 2023 at 4:43 PM Olga Kornievskaia <[email protected]> wrote:
>
> On Wed, Jun 14, 2023 at 4:24 PM Rick Macklem <[email protected]> wrote:
> >
> > On Wed, Jun 14, 2023 at 12:58 PM Olga Kornievskaia <[email protected]> wrote:
> > >
> > >
> > > Hi Trond,
> > >
> > > I'm looking for advice on how to handle the problem that when
> > > BADSESSION is received (on an interrupted slot) and we don't increment
> > > the seqid for that slot. The client releases the slot and it's
> > > possible for another thread to use it before the session is frozen.
> > > Here are the (unfiltered sequential) tracepoints showing the problem.
> > > Follow slot_nr=0 and seq_nr=7673
> > >
> > > kworker/u2:26-541 [000] ..... 869.508658: nfs4_sequence_done:
> > > error=-10052 (BADSESSION) session=0x90caa481 slot_nr=4 seq_nr=4259
> > > highest_slotid=0 target_highest_slotid=0 status_flags=0x0 ()
> > > kworker/u2:26-541 [000] ..... 869.508661: nfs4_write:
> > > error=-10052 (BADSESSION) fileid=00:3b:111 fhandle=0x59c8ccff
> > > offset=2304664 count=7992 res=0 stateid=1:0x3f4f04cd
> > > layoutstateid=0:0x00000000
> > > kworker/u2:1-3198 [000] ..... 869.508898: nfs4_xdr_status:
> > > task:0000a2ae@00000011 xid=0x5d0f6dda error=-10052 (BADSESSION)
> > > operation=53
> > > kworker/u2:1-3198 [000] ..... 869.508905: nfs4_sequence_done:
> > > error=-10052 (BADSESSION) session=0x90caa481 slot_nr=0 seq_nr=7673
> > > highest_slotid=0 target_highest_slotid=0 status_flags=0x0 ()
> > > dt-3684 [000] ..... 869.508918: nfs4_set_lock:
> > > error=-10052 (BADSESSION) cmd=SETLK:WRLCK range=1603340:1834535
> > > fileid=00:3b:109 fhandle=0x7c6bc6b4 stateid=1:0x8f5f1fe4
> > > lockstateid=0:0x7bd5c66f
> > >
> > > *** this is use of slot_nr=0 seq_nr=7673 that gets BADSESSION. Slot
> > > gets released without incrementing the seq#. The next tracepoint shows
> > > the use of the slot again by another lock call ***
> > >
> > > kworker/u2:1-3198 [000] ..... 869.508928:
> > > nfs4_setup_sequence: session=0x90caa481 slot_nr=0 seq_nr=7673
> > > highest_used_slotid=1
> > > kworker/u2:29-549 [000] ..... 869.509746: nfs4_sequence_done:
> > > error=0 (OK) session=0x90caa481 slot_nr=0 seq_nr=7673
> > > highest_slotid=63 target_highest_slotid=63 status_flags=0x0 ()
> > > dt-3672 [000] ..... 869.509770: nfs4_set_lock:
> > > error=0 (OK) cmd=SETLK:WRLCK range=146432:159743 fileid=00:3b:129
> > > fhandle=0x50fa2dd4 stateid=1:0xcf065b31 lockstateid=1:0x5c571804
> > > kworker/u2:26-541 [000] ..... 869.509814:
> > > nfs4_setup_sequence: session=0x90caa481 slot_nr=0 seq_nr=7674
> > > highest_used_slotid=0
> > > kworker/u2:26-541 [000] ..... 869.509857:
> > > nfs4_setup_sequence: session=0x90caa481 slot_nr=1 seq_nr=7805
> > > highest_used_slotid=1
> > >
> > > ** finally the state manager gets to run? But only after 3 "NEW" use
> > > of slots are done **
> > >
> > > 172.28.68.180-m-3751 [000] ..... 869.510267: nfs4_state_mgr:
> > > hostname=172.28.68.180 clp state=MANAGER_RUNNING|CHECK_LEASE|0xc040
> > > kworker/u2:29-549 [000] ..... 869.510977: nfs4_xdr_status:
> > > task:0000a2c8@00000011 xid=0x5e0f6dda error=-10052 (BADSESSION)
> > > operation=53
> > > kworker/u2:29-549 [000] ..... 869.510983: nfs4_sequence_done:
> > > error=-10052 (BADSESSION) session=0x90caa481 slot_nr=1 seq_nr=7805
> > > highest_slotid=0 target_highest_slotid=0 status_flags=0x0 ()
> > > kworker/u2:29-549 [000] ..... 869.510985: nfs4_write:
> > > error=-10052 (BADSESSION) fileid=00:3b:129 fhandle=0x50fa2dd4
> > > offset=146432 count=13312 res=0 stateid=1:0xcf065b31
> > > layoutstateid=0:0x00000000
> > > kworker/u2:26-541 [000] ..... 869.511318: nfs4_sequence_done:
> > > error=0 (OK) session=0x90caa481 slot_nr=0 seq_nr=7674
> > > highest_slotid=63 target_highest_slotid=63 status_flags=0x0 ()
> > > dt-3669 [000] ..... 869.511337: nfs4_set_lock:
> > > error=0 (OK) cmd=SETLK:WRLCK range=2462720:2469375 fileid=00:3b:138
> > > fhandle=0xe30d8cf3 stateid=1:0xe2787aa1 lockstateid=1:0x216421fe
> > > 172.28.68.180-m-3751 [000] ..... 869.511918:
> > > nfs4_destroy_session: error=0 (OK) dstaddr=172.28.68.180
> > > 172.28.68.180-m-3751 [000] ..... 869.513347:
> > > nfs4_create_session: error=0 (OK) dstaddr=172.28.68.180
> > >
> > > To prevent reuse of the same slot/seqid for when we receive
> > > BADSESSION, can we perhaps set slot->seq_done? Then, when
> > > nfs41_sequence_process() calls nfs41_sequence_free_slot(), it'd
> > > increment seq_nr then. Slot re-use would be prevented.
> > >
> > > Or, perhaps we set the NFS4_SLOT_TBL_DRAINING bit right in
> > > nfs41_sequence_process() for BADSESSION so that nothing else can get
> > > the slot when it's released?
> > >
> > > Or some other way or preventing slots being (re)used after receiving
> > > BADSESSION on that slot. The problem if re-using (interrupted) slots
> > > is that they get cached reply from the server and those operations
> > > "think" operation succeeded and they have wrong/invalid stateids for
> > > instance.
> > >
> > > Here's the sequence of events. First of all this is a session trunking
> > > scenario where one of the servers leaves the group.
> > > NFS OP uses slot=0 seq=0 sends it to server 1. Server 1 processes the
> > > request populates its session cache. But the reply never reaches the
> > > client. Connection gets reset.
> > > NFS OP is resent using slot=0 seq=0 to server 2 which just left the
> > > trunking group. It replies with BADSESSION
> > > (session is not frozen on the client yet) new NFS OP uses slot=0 seq=0
> > > and sends it to server 1. Server 1 responds out of the session cache.
> > To me, this sounds like a broken NFSv4.1/4.2 server. Once a session is bad,
> > I do not think there should ever be a reply that was cached in that bad session.
> > Put another way, the server should not leave the "trunking group' (whatever that
> > means?) without making the session bad for all trunks. I do not think
> > a session should
> > ever work on one server and not on another one.
>
> The spec allows for server to leave the group and session to be still valid.
>
> From section 2.10.13.1.4
> "If the SEQUENCE requests fail with NFS4ERR_BADSESSION, then the
> session no longer exists on any of the server network addresses for
> which the client has connections associated with that session ID. It
> is possible the session is still alive and available on other network
> addresses. "
>
> Linux server

I meant to say Linux client

> throws away the session on getting the BADSESSION but the
> problem is that it doesn't happen instantly. So some new requests can
> sneak before the session gets killed. Thus I'm advocating that slotid
> still happens on BADSESSION or I'm suggesting that we freeze the
> session table on BADSESSION which we currently don't do -- which
> allows new requests to go.
>
> >
> > Having said the above, I have no opinion w.r.t. how such a case should
> > be handled.
> > (Except to tell the NFS server vendor that their server is broken.)
> >
> > Just mho, rick
> >
> > > Client destroys the session
> > > Client uses stateid returned from the new OP which is really invalid
> > > for the operation. Server fails the operation. Application failure
> > > occurs.
> > >
> > > Thank you..

2023-06-14 21:21:20

by Rick Macklem

[permalink] [raw]
Subject: Re: Handling of BADSESSON error

On Wed, Jun 14, 2023 at 1:43 PM Olga Kornievskaia <[email protected]> wrote:
>
> On Wed, Jun 14, 2023 at 4:24 PM Rick Macklem <[email protected]> wrote:
> >
> > On Wed, Jun 14, 2023 at 12:58 PM Olga Kornievskaia <[email protected]> wrote:
> > >
> > >
> > > Hi Trond,
> > >
> > > I'm looking for advice on how to handle the problem that when
> > > BADSESSION is received (on an interrupted slot) and we don't increment
> > > the seqid for that slot. The client releases the slot and it's
> > > possible for another thread to use it before the session is frozen.
> > > Here are the (unfiltered sequential) tracepoints showing the problem.
> > > Follow slot_nr=0 and seq_nr=7673
> > >
> > > kworker/u2:26-541 [000] ..... 869.508658: nfs4_sequence_done:
> > > error=-10052 (BADSESSION) session=0x90caa481 slot_nr=4 seq_nr=4259
> > > highest_slotid=0 target_highest_slotid=0 status_flags=0x0 ()
> > > kworker/u2:26-541 [000] ..... 869.508661: nfs4_write:
> > > error=-10052 (BADSESSION) fileid=00:3b:111 fhandle=0x59c8ccff
> > > offset=2304664 count=7992 res=0 stateid=1:0x3f4f04cd
> > > layoutstateid=0:0x00000000
> > > kworker/u2:1-3198 [000] ..... 869.508898: nfs4_xdr_status:
> > > task:0000a2ae@00000011 xid=0x5d0f6dda error=-10052 (BADSESSION)
> > > operation=53
> > > kworker/u2:1-3198 [000] ..... 869.508905: nfs4_sequence_done:
> > > error=-10052 (BADSESSION) session=0x90caa481 slot_nr=0 seq_nr=7673
> > > highest_slotid=0 target_highest_slotid=0 status_flags=0x0 ()
> > > dt-3684 [000] ..... 869.508918: nfs4_set_lock:
> > > error=-10052 (BADSESSION) cmd=SETLK:WRLCK range=1603340:1834535
> > > fileid=00:3b:109 fhandle=0x7c6bc6b4 stateid=1:0x8f5f1fe4
> > > lockstateid=0:0x7bd5c66f
> > >
> > > *** this is use of slot_nr=0 seq_nr=7673 that gets BADSESSION. Slot
> > > gets released without incrementing the seq#. The next tracepoint shows
> > > the use of the slot again by another lock call ***
> > >
> > > kworker/u2:1-3198 [000] ..... 869.508928:
> > > nfs4_setup_sequence: session=0x90caa481 slot_nr=0 seq_nr=7673
> > > highest_used_slotid=1
> > > kworker/u2:29-549 [000] ..... 869.509746: nfs4_sequence_done:
> > > error=0 (OK) session=0x90caa481 slot_nr=0 seq_nr=7673
> > > highest_slotid=63 target_highest_slotid=63 status_flags=0x0 ()
> > > dt-3672 [000] ..... 869.509770: nfs4_set_lock:
> > > error=0 (OK) cmd=SETLK:WRLCK range=146432:159743 fileid=00:3b:129
> > > fhandle=0x50fa2dd4 stateid=1:0xcf065b31 lockstateid=1:0x5c571804
> > > kworker/u2:26-541 [000] ..... 869.509814:
> > > nfs4_setup_sequence: session=0x90caa481 slot_nr=0 seq_nr=7674
> > > highest_used_slotid=0
> > > kworker/u2:26-541 [000] ..... 869.509857:
> > > nfs4_setup_sequence: session=0x90caa481 slot_nr=1 seq_nr=7805
> > > highest_used_slotid=1
> > >
> > > ** finally the state manager gets to run? But only after 3 "NEW" use
> > > of slots are done **
> > >
> > > 172.28.68.180-m-3751 [000] ..... 869.510267: nfs4_state_mgr:
> > > hostname=172.28.68.180 clp state=MANAGER_RUNNING|CHECK_LEASE|0xc040
> > > kworker/u2:29-549 [000] ..... 869.510977: nfs4_xdr_status:
> > > task:0000a2c8@00000011 xid=0x5e0f6dda error=-10052 (BADSESSION)
> > > operation=53
> > > kworker/u2:29-549 [000] ..... 869.510983: nfs4_sequence_done:
> > > error=-10052 (BADSESSION) session=0x90caa481 slot_nr=1 seq_nr=7805
> > > highest_slotid=0 target_highest_slotid=0 status_flags=0x0 ()
> > > kworker/u2:29-549 [000] ..... 869.510985: nfs4_write:
> > > error=-10052 (BADSESSION) fileid=00:3b:129 fhandle=0x50fa2dd4
> > > offset=146432 count=13312 res=0 stateid=1:0xcf065b31
> > > layoutstateid=0:0x00000000
> > > kworker/u2:26-541 [000] ..... 869.511318: nfs4_sequence_done:
> > > error=0 (OK) session=0x90caa481 slot_nr=0 seq_nr=7674
> > > highest_slotid=63 target_highest_slotid=63 status_flags=0x0 ()
> > > dt-3669 [000] ..... 869.511337: nfs4_set_lock:
> > > error=0 (OK) cmd=SETLK:WRLCK range=2462720:2469375 fileid=00:3b:138
> > > fhandle=0xe30d8cf3 stateid=1:0xe2787aa1 lockstateid=1:0x216421fe
> > > 172.28.68.180-m-3751 [000] ..... 869.511918:
> > > nfs4_destroy_session: error=0 (OK) dstaddr=172.28.68.180
> > > 172.28.68.180-m-3751 [000] ..... 869.513347:
> > > nfs4_create_session: error=0 (OK) dstaddr=172.28.68.180
> > >
> > > To prevent reuse of the same slot/seqid for when we receive
> > > BADSESSION, can we perhaps set slot->seq_done? Then, when
> > > nfs41_sequence_process() calls nfs41_sequence_free_slot(), it'd
> > > increment seq_nr then. Slot re-use would be prevented.
> > >
> > > Or, perhaps we set the NFS4_SLOT_TBL_DRAINING bit right in
> > > nfs41_sequence_process() for BADSESSION so that nothing else can get
> > > the slot when it's released?
> > >
> > > Or some other way or preventing slots being (re)used after receiving
> > > BADSESSION on that slot. The problem if re-using (interrupted) slots
> > > is that they get cached reply from the server and those operations
> > > "think" operation succeeded and they have wrong/invalid stateids for
> > > instance.
> > >
> > > Here's the sequence of events. First of all this is a session trunking
> > > scenario where one of the servers leaves the group.
> > > NFS OP uses slot=0 seq=0 sends it to server 1. Server 1 processes the
> > > request populates its session cache. But the reply never reaches the
> > > client. Connection gets reset.
> > > NFS OP is resent using slot=0 seq=0 to server 2 which just left the
> > > trunking group. It replies with BADSESSION
> > > (session is not frozen on the client yet) new NFS OP uses slot=0 seq=0
> > > and sends it to server 1. Server 1 responds out of the session cache.
> > To me, this sounds like a broken NFSv4.1/4.2 server. Once a session is bad,
> > I do not think there should ever be a reply that was cached in that bad session.
> > Put another way, the server should not leave the "trunking group' (whatever that
> > means?) without making the session bad for all trunks. I do not think
> > a session should
> > ever work on one server and not on another one.
>
> The spec allows for server to leave the group and session to be still valid.
>
> From section 2.10.13.1.4
> "If the SEQUENCE requests fail with NFS4ERR_BADSESSION, then the
> session no longer exists on any of the server network addresses for
> which the client has connections associated with that session ID. It
> is possible the session is still alive and available on other network
> addresses. "
>
Yep. I just noticed that myself. (It's item 3. in 2.10.13.1.4.)
I suppose a client should not use a session at all once it receives a BADSESSION
for it, or it can go through the suggested algorithm in 2.10.13.1.4, sending a
Sequence by itself on each connection, etc...

A bother, but I guess it has no choice. I'm glad the FreeBSD client mostly
avoids trunking, rick

> Linux server throws away the session on getting the BADSESSION but the
> problem is that it doesn't happen instantly. So some new requests can
> sneak before the session gets killed. Thus I'm advocating that slotid
> still happens on BADSESSION or I'm suggesting that we freeze the
> session table on BADSESSION which we currently don't do -- which
> allows new requests to go.
>
> >
> > Having said the above, I have no opinion w.r.t. how such a case should
> > be handled.
> > (Except to tell the NFS server vendor that their server is broken.)
> >
> > Just mho, rick
> >
> > > Client destroys the session
> > > Client uses stateid returned from the new OP which is really invalid
> > > for the operation. Server fails the operation. Application failure
> > > occurs.
> > >
> > > Thank you..

2023-06-14 21:43:45

by Rick Macklem

[permalink] [raw]
Subject: Re: Handling of BADSESSON error

On Wed, Jun 14, 2023 at 1:44 PM Olga Kornievskaia <[email protected]> wrote:
>
> On Wed, Jun 14, 2023 at 4:43 PM Olga Kornievskaia <[email protected]> wrote:
> >
> > On Wed, Jun 14, 2023 at 4:24 PM Rick Macklem <[email protected]> wrote:
> > >
> > > On Wed, Jun 14, 2023 at 12:58 PM Olga Kornievskaia <[email protected]> wrote:
> > > >
> > > >
> > > > Hi Trond,
> > > >
> > > > I'm looking for advice on how to handle the problem that when
> > > > BADSESSION is received (on an interrupted slot) and we don't increment
> > > > the seqid for that slot. The client releases the slot and it's
> > > > possible for another thread to use it before the session is frozen.
> > > > Here are the (unfiltered sequential) tracepoints showing the problem.
> > > > Follow slot_nr=0 and seq_nr=7673
> > > >
> > > > kworker/u2:26-541 [000] ..... 869.508658: nfs4_sequence_done:
> > > > error=-10052 (BADSESSION) session=0x90caa481 slot_nr=4 seq_nr=4259
> > > > highest_slotid=0 target_highest_slotid=0 status_flags=0x0 ()
> > > > kworker/u2:26-541 [000] ..... 869.508661: nfs4_write:
> > > > error=-10052 (BADSESSION) fileid=00:3b:111 fhandle=0x59c8ccff
> > > > offset=2304664 count=7992 res=0 stateid=1:0x3f4f04cd
> > > > layoutstateid=0:0x00000000
> > > > kworker/u2:1-3198 [000] ..... 869.508898: nfs4_xdr_status:
> > > > task:0000a2ae@00000011 xid=0x5d0f6dda error=-10052 (BADSESSION)
> > > > operation=53
> > > > kworker/u2:1-3198 [000] ..... 869.508905: nfs4_sequence_done:
> > > > error=-10052 (BADSESSION) session=0x90caa481 slot_nr=0 seq_nr=7673
> > > > highest_slotid=0 target_highest_slotid=0 status_flags=0x0 ()
> > > > dt-3684 [000] ..... 869.508918: nfs4_set_lock:
> > > > error=-10052 (BADSESSION) cmd=SETLK:WRLCK range=1603340:1834535
> > > > fileid=00:3b:109 fhandle=0x7c6bc6b4 stateid=1:0x8f5f1fe4
> > > > lockstateid=0:0x7bd5c66f
> > > >
> > > > *** this is use of slot_nr=0 seq_nr=7673 that gets BADSESSION. Slot
> > > > gets released without incrementing the seq#. The next tracepoint shows
> > > > the use of the slot again by another lock call ***
> > > >
> > > > kworker/u2:1-3198 [000] ..... 869.508928:
> > > > nfs4_setup_sequence: session=0x90caa481 slot_nr=0 seq_nr=7673
> > > > highest_used_slotid=1
> > > > kworker/u2:29-549 [000] ..... 869.509746: nfs4_sequence_done:
> > > > error=0 (OK) session=0x90caa481 slot_nr=0 seq_nr=7673
> > > > highest_slotid=63 target_highest_slotid=63 status_flags=0x0 ()
> > > > dt-3672 [000] ..... 869.509770: nfs4_set_lock:
> > > > error=0 (OK) cmd=SETLK:WRLCK range=146432:159743 fileid=00:3b:129
> > > > fhandle=0x50fa2dd4 stateid=1:0xcf065b31 lockstateid=1:0x5c571804
> > > > kworker/u2:26-541 [000] ..... 869.509814:
> > > > nfs4_setup_sequence: session=0x90caa481 slot_nr=0 seq_nr=7674
> > > > highest_used_slotid=0
> > > > kworker/u2:26-541 [000] ..... 869.509857:
> > > > nfs4_setup_sequence: session=0x90caa481 slot_nr=1 seq_nr=7805
> > > > highest_used_slotid=1
> > > >
> > > > ** finally the state manager gets to run? But only after 3 "NEW" use
> > > > of slots are done **
> > > >
> > > > 172.28.68.180-m-3751 [000] ..... 869.510267: nfs4_state_mgr:
> > > > hostname=172.28.68.180 clp state=MANAGER_RUNNING|CHECK_LEASE|0xc040
> > > > kworker/u2:29-549 [000] ..... 869.510977: nfs4_xdr_status:
> > > > task:0000a2c8@00000011 xid=0x5e0f6dda error=-10052 (BADSESSION)
> > > > operation=53
> > > > kworker/u2:29-549 [000] ..... 869.510983: nfs4_sequence_done:
> > > > error=-10052 (BADSESSION) session=0x90caa481 slot_nr=1 seq_nr=7805
> > > > highest_slotid=0 target_highest_slotid=0 status_flags=0x0 ()
> > > > kworker/u2:29-549 [000] ..... 869.510985: nfs4_write:
> > > > error=-10052 (BADSESSION) fileid=00:3b:129 fhandle=0x50fa2dd4
> > > > offset=146432 count=13312 res=0 stateid=1:0xcf065b31
> > > > layoutstateid=0:0x00000000
> > > > kworker/u2:26-541 [000] ..... 869.511318: nfs4_sequence_done:
> > > > error=0 (OK) session=0x90caa481 slot_nr=0 seq_nr=7674
> > > > highest_slotid=63 target_highest_slotid=63 status_flags=0x0 ()
> > > > dt-3669 [000] ..... 869.511337: nfs4_set_lock:
> > > > error=0 (OK) cmd=SETLK:WRLCK range=2462720:2469375 fileid=00:3b:138
> > > > fhandle=0xe30d8cf3 stateid=1:0xe2787aa1 lockstateid=1:0x216421fe
> > > > 172.28.68.180-m-3751 [000] ..... 869.511918:
> > > > nfs4_destroy_session: error=0 (OK) dstaddr=172.28.68.180
> > > > 172.28.68.180-m-3751 [000] ..... 869.513347:
> > > > nfs4_create_session: error=0 (OK) dstaddr=172.28.68.180
> > > >
> > > > To prevent reuse of the same slot/seqid for when we receive
> > > > BADSESSION, can we perhaps set slot->seq_done? Then, when
> > > > nfs41_sequence_process() calls nfs41_sequence_free_slot(), it'd
> > > > increment seq_nr then. Slot re-use would be prevented.
> > > >
> > > > Or, perhaps we set the NFS4_SLOT_TBL_DRAINING bit right in
> > > > nfs41_sequence_process() for BADSESSION so that nothing else can get
> > > > the slot when it's released?
> > > >
> > > > Or some other way or preventing slots being (re)used after receiving
> > > > BADSESSION on that slot. The problem if re-using (interrupted) slots
> > > > is that they get cached reply from the server and those operations
> > > > "think" operation succeeded and they have wrong/invalid stateids for
> > > > instance.
Yep. On thinking about it a little more, I think you are correct. The
client either
needs to avoid using that slot again (the FreeBSD client marks slots "bad" when
an RPC is interrupted without having received/processed a reply) or the client
needs to stop using the session for regular RPCs when a BADSESSION reply is
received.

What it should do after that is not so obvious to me.
There is the suggested algorithm in 2.10.13.1.4, but it does seem complex and
it only describes the cases where a Sequence by itself either succeeds or fails
on the other connections. (It does not describe what to do if some succeed and
others fail, if I am reading it correctly.)
--> It might be easier to just CREATE_SESSION a new session and avoid using
the old one any more?
This sounds like one of your suggested ways to handle it.

Good luck with whatever you choose and sorry about the noise w.r.t. a broken
server (although I'd personally prefer to consider such a server as broken, the
RFC allows it, so;-). rick

> > > >
> > > > Here's the sequence of events. First of all this is a session trunking
> > > > scenario where one of the servers leaves the group.
> > > > NFS OP uses slot=0 seq=0 sends it to server 1. Server 1 processes the
> > > > request populates its session cache. But the reply never reaches the
> > > > client. Connection gets reset.
> > > > NFS OP is resent using slot=0 seq=0 to server 2 which just left the
> > > > trunking group. It replies with BADSESSION
> > > > (session is not frozen on the client yet) new NFS OP uses slot=0 seq=0
> > > > and sends it to server 1. Server 1 responds out of the session cache.
> > > To me, this sounds like a broken NFSv4.1/4.2 server. Once a session is bad,
> > > I do not think there should ever be a reply that was cached in that bad session.
> > > Put another way, the server should not leave the "trunking group' (whatever that
> > > means?) without making the session bad for all trunks. I do not think
> > > a session should
> > > ever work on one server and not on another one.
> >
> > The spec allows for server to leave the group and session to be still valid.
> >
> > From section 2.10.13.1.4
> > "If the SEQUENCE requests fail with NFS4ERR_BADSESSION, then the
> > session no longer exists on any of the server network addresses for
> > which the client has connections associated with that session ID. It
> > is possible the session is still alive and available on other network
> > addresses. "
> >
> > Linux server
>
> I meant to say Linux client
>
> > throws away the session on getting the BADSESSION but the
> > problem is that it doesn't happen instantly. So some new requests can
> > sneak before the session gets killed. Thus I'm advocating that slotid
> > still happens on BADSESSION or I'm suggesting that we freeze the
> > session table on BADSESSION which we currently don't do -- which
> > allows new requests to go.
> >
> > >
> > > Having said the above, I have no opinion w.r.t. how such a case should
> > > be handled.
> > > (Except to tell the NFS server vendor that their server is broken.)
> > >
> > > Just mho, rick
> > >
> > > > Client destroys the session
> > > > Client uses stateid returned from the new OP which is really invalid
> > > > for the operation. Server fails the operation. Application failure
> > > > occurs.
> > > >
> > > > Thank you..

2023-06-14 21:48:39

by Trond Myklebust

[permalink] [raw]
Subject: Re: Handling of BADSESSON error

On Wed, 2023-06-14 at 15:58 -0400, Olga Kornievskaia wrote:
> Hi Trond,
>
> I'm looking for advice on how to handle the problem that when
> BADSESSION is received (on an interrupted slot) and we don't
> increment
> the seqid for that slot. The client releases the slot and it's
> possible for another thread to use it before the session is frozen.
> Here are the (unfiltered sequential) tracepoints showing the problem.
> Follow slot_nr=0 and seq_nr=7673
>
>    kworker/u2:26-541     [000] .....   869.508658:
> nfs4_sequence_done:
> error=-10052 (BADSESSION) session=0x90caa481 slot_nr=4 seq_nr=4259
> highest_slotid=0 target_highest_slotid=0 status_flags=0x0 ()
>    kworker/u2:26-541     [000] .....   869.508661: nfs4_write:
> error=-10052 (BADSESSION) fileid=00:3b:111 fhandle=0x59c8ccff
> offset=2304664 count=7992 res=0 stateid=1:0x3f4f04cd
> layoutstateid=0:0x00000000
>     kworker/u2:1-3198    [000] .....   869.508898: nfs4_xdr_status:
> task:0000a2ae@00000011 xid=0x5d0f6dda error=-10052 (BADSESSION)
> operation=53
>     kworker/u2:1-3198    [000] .....   869.508905:
> nfs4_sequence_done:
> error=-10052 (BADSESSION) session=0x90caa481 slot_nr=0 seq_nr=7673
> highest_slotid=0 target_highest_slotid=0 status_flags=0x0 ()
>               dt-3684    [000] .....   869.508918: nfs4_set_lock:
> error=-10052 (BADSESSION) cmd=SETLK:WRLCK range=1603340:1834535
> fileid=00:3b:109 fhandle=0x7c6bc6b4 stateid=1:0x8f5f1fe4
> lockstateid=0:0x7bd5c66f
>
> *** this is use of slot_nr=0 seq_nr=7673 that gets BADSESSION. Slot
> gets released without incrementing the seq#. The next tracepoint
> shows
> the use of the slot again by another lock call ***
>
>     kworker/u2:1-3198    [000] .....   869.508928:
> nfs4_setup_sequence: session=0x90caa481 slot_nr=0 seq_nr=7673
> highest_used_slotid=1
>    kworker/u2:29-549     [000] .....   869.509746:
> nfs4_sequence_done:
> error=0 (OK) session=0x90caa481 slot_nr=0 seq_nr=7673
> highest_slotid=63 target_highest_slotid=63 status_flags=0x0 ()
>               dt-3672    [000] .....   869.509770: nfs4_set_lock:
> error=0 (OK) cmd=SETLK:WRLCK range=146432:159743 fileid=00:3b:129
> fhandle=0x50fa2dd4 stateid=1:0xcf065b31 lockstateid=1:0x5c571804
>    kworker/u2:26-541     [000] .....   869.509814:
> nfs4_setup_sequence: session=0x90caa481 slot_nr=0 seq_nr=7674
> highest_used_slotid=0
>    kworker/u2:26-541     [000] .....   869.509857:
> nfs4_setup_sequence: session=0x90caa481 slot_nr=1 seq_nr=7805
> highest_used_slotid=1
>
> ** finally the state manager gets to run? But only after 3 "NEW" use
> of slots are done **
>
>  172.28.68.180-m-3751    [000] .....   869.510267: nfs4_state_mgr:
> hostname=172.28.68.180 clp state=MANAGER_RUNNING|CHECK_LEASE|0xc040
>    kworker/u2:29-549     [000] .....   869.510977: nfs4_xdr_status:
> task:0000a2c8@00000011 xid=0x5e0f6dda error=-10052 (BADSESSION)
> operation=53
>    kworker/u2:29-549     [000] .....   869.510983:
> nfs4_sequence_done:
> error=-10052 (BADSESSION) session=0x90caa481 slot_nr=1 seq_nr=7805
> highest_slotid=0 target_highest_slotid=0 status_flags=0x0 ()
>    kworker/u2:29-549     [000] .....   869.510985: nfs4_write:
> error=-10052 (BADSESSION) fileid=00:3b:129 fhandle=0x50fa2dd4
> offset=146432 count=13312 res=0 stateid=1:0xcf065b31
> layoutstateid=0:0x00000000
>    kworker/u2:26-541     [000] .....   869.511318:
> nfs4_sequence_done:
> error=0 (OK) session=0x90caa481 slot_nr=0 seq_nr=7674
> highest_slotid=63 target_highest_slotid=63 status_flags=0x0 ()
>               dt-3669    [000] .....   869.511337: nfs4_set_lock:
> error=0 (OK) cmd=SETLK:WRLCK range=2462720:2469375 fileid=00:3b:138
> fhandle=0xe30d8cf3 stateid=1:0xe2787aa1 lockstateid=1:0x216421fe
>  172.28.68.180-m-3751    [000] .....   869.511918:
> nfs4_destroy_session: error=0 (OK) dstaddr=172.28.68.180
>  172.28.68.180-m-3751    [000] .....   869.513347:
> nfs4_create_session: error=0 (OK) dstaddr=172.28.68.180
>
> To prevent reuse of the same slot/seqid for when we receive
> BADSESSION, can we perhaps set slot->seq_done? Then, when
> nfs41_sequence_process() calls nfs41_sequence_free_slot(), it'd
> increment seq_nr then. Slot re-use would be prevented.
>
> Or, perhaps we set the NFS4_SLOT_TBL_DRAINING bit right in
> nfs41_sequence_process() for BADSESSION so that nothing else can get
> the slot when it's released?
>
> Or some other way or preventing slots being (re)used after receiving
> BADSESSION on that slot. The problem if re-using (interrupted) slots
> is that they get cached reply from the server and those operations
> "think" operation succeeded and they have wrong/invalid stateids for
> instance.
>
> Here's the sequence of events. First of all this is a session
> trunking
> scenario where one of the servers leaves the group.
> NFS OP uses slot=0 seq=0 sends it to server 1. Server 1 processes the
> request populates its session cache. But the reply never reaches the
> client. Connection gets reset.
> NFS OP is resent using slot=0 seq=0 to server 2 which just left the
> trunking group. It replies with BADSESSION
> (session is not frozen on the client yet) new NFS OP uses slot=0
> seq=0
> and sends it to server 1. Server 1 responds out of the session cache.
> Client destroys the session
> Client uses stateid returned from the new OP which is really invalid
> for the operation. Server fails the operation. Application failure
> occurs.
>
> Thank you..

I suggest just adding a call along the lines of

set_bit(NFS4_SLOT_TBL_DRAINING, &session->fc_slot_table.slot_tbl_state);

immediately before the call to nfs4_schedule_session_recovery() in
nfs41_sequence_process(). That ought to be race-free because we should
still be holding the slot. It won't try to do any of the other fancy
stuff in nfs4_drain_slot_tbl(). All that will happen is that
nfs4_setup_sequence() will stop allocating new unprivileged slots.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]