Hi,
These are 3 patches against linux-2.4.18 with NFS-ALL applied.
The first one implements nlmsvc_proc_granted_res such that when a server
receive that callback it deallocates the corresponding block, using the
nlmsvc_grant_reply function.
The second adds a check of the address in addition to the cookie value, when
looking at a callback or a reply from the client, such that even if 2 clients
have the same cookie the blocks aren't mischoosen. This patch is againts the
previous one, because it modifies nlmsvc_proc_granted_res. Maybe it should also
be against linux-2.4.18 with NFS-ALL applied ?
The third concerns the client side, it adds a check that the client is really
requesting a blocking lock instead of relaying on the server not to be broken.
Thanks
Pat
Hi,
I follow up on my patchs, because there are some more issues.
First issue:
This is related with the patch implementing nlmsvc_proc_granted_res (if the
patch isn't applied, the server never gets the status from the clients
in response to a granted_msg).
With the patch applied, the server get the client status. If the status is bad,
the lock is removed at the server side. I think it is the right thing (I didn't
do it, it is the granted_reply function behaviour; in the specification, I
didn't saw anything about what the server should do upon receiving a bad
status).
The client send a bad reply when it get a granted_msg with a lock it didn't
requested. I also think it is fine. However this leads to the following
problem:
suppose the server send a grant_msg with a lock. The client now has the lock
granted, it sends a granted_res with a good status. Now suppose the server
never receive this callback. The server will resend a grant_msg with the lock.
The client isn't waiting for a granted_res anymore and thus responds with a
granted_res with a bad status. If the server gets this granted_res it will
deallocate the lock. At that point there is no lockfrom the server point of
view and a lock from the client point of view.
Is it a problem ?
If yes I think that a possible solution would be that when a client gets a
granted_msg for a lock he thinks he didn't request, he looks at all the locks
on the system, and if he finds this one, he returns a good status. I think this
isn't a violation of the standard, because it is arguable that the procedure
didn't failed when the client has the lock allready granted (the code for this
is allready quasi existing because it is roughly the same than what happens
when the server just rebooted).
If it is needed, is it a acceptable solution ?
Second issue:
This is unrelated with the patch.
In case of a client becoming unreachable at a bad time, it may happen that
a granted lock is never released. Here is the scenario:
The client ask for a blocking lock and the server responds that the lock is
allready taken. After some time the client becomes unreachable (but doesn't
crash and reboot, that case is handled by statd) for whatever reason. Then
the lock becomes available, the server locks the file, and send a granted_msg.
However, as the client is unreachable the server will keep on resending
granted_msg (see nlmsvc_grant_blocked). This means that the file will never
be lockable anymore.
I have a simple idea for a solution to this problem, however it raises other
issues. A possibility would be to have a timer (5 minutes, or so...) such
that when it is exceeded, the server stops sending granted_msg and unlocks
the lock (calls delete_block(...,1)). However, if the client is waiting for
that lock and is blocking, it will block forever. It is not a problem in the
linux implementation, as the client does busy waiting. (if the server could
send a status in the granted_msg call, it would have been easier, but I
cannot see such a thing).
What do you think about that ?
Pat
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
>>>>> " " == Dumas Patrice <[email protected]> writes:
> suppose the server send a grant_msg with a lock. The client now
> has the lock granted, it sends a granted_res with a good
> status. Now suppose the server never receive this callback. The
> server will resend a grant_msg with the lock. The client isn't
> waiting for a granted_res anymore and thus responds with a
> granted_res with a bad status. If the server gets this
> granted_res it will deallocate the lock. At that point there is
> no lockfrom the server point of view and a lock from the client
> point of view.
Just add a call to posix_test_lock() in nlmclnt_grant().
> The client ask for a blocking lock and the server responds that
> the lock is allready taken. After some time the client becomes
> unreachable (but doesn't crash and reboot, that case is handled
> by statd) for whatever reason. Then the lock becomes available,
> the server locks the file, and send a granted_msg. However, as
> the client is unreachable the server will keep on resending
> granted_msg (see nlmsvc_grant_blocked). This means that the
> file will never be lockable anymore.
> I have a simple idea for a solution to this problem, however it
> raises other issues. A possibility would be to have a timer (5
> minutes, or so...) such that when it is exceeded, the server
> stops sending granted_msg and unlocks the lock (calls
> delete_block(...,1)). However, if the client is waiting for
> that lock and is blocking, it will block forever. It is not a
> problem in the linux implementation, as the client does busy
> waiting. (if the server could send a status in the granted_msg
> call, it would have been easier, but I cannot see such a
> thing).
The client occasionally wakes up and resends the NLM_LOCK call, so it
will not wait for ever. However what say the server misses the
client's GRANTED_RES reply?
The above problems are exactly why we should aim to use NLM_GRANTED
instead of NLM_GRANTED_MSG at some point in the future.
Cheers,
Trond
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
Hi,
> Just add a call to posix_test_lock() in nlmclnt_grant().
Ok.
> The client occasionally wakes up and resends the NLM_LOCK call, so it
> will not wait for ever. However what say the server misses the
> client's GRANTED_RES reply?
In that case the server resend (after 60 * HZ which seems to be one minute) a
GRANTED_MSG.
> The above problems are exactly why we should aim to use NLM_GRANTED
> instead of NLM_GRANTED_MSG at some point in the future.
I don't think it would solve that problem. The client could also never reply
to the GRANTED call, and we are back with the same problem. The client rpc
reply to GRANTED and the client rpc call to GRANTED_RES play exactly the same
role, and leads to the same problem when they are lost (and should lead to the
same behaviour at the server side, in my opinion, maybe with different timers).
To clarify things, as they are, and what I propose, I submit some possible
behaviours for GRANTED_MSG implementation and the for GRANTED implementations.
I suppose that the rpc call are async in both cases. There is a note at the end
for sync calls.
resend means: retry a call
couldn't send means: rpc async call timed out
unlock means: unlock the lock and forget the block
GRANTED_MSG calls:
1. actual code
* The server couldn't send the GRANTED_MSG: resend after 10 s
* The server send succesfully: resend after 60 s
2. lazy server
* The server couldn't send the GRANTED_MSG: unlock
* The server send succesfully: unlock
3. my proposal: each time the server resends, it increases a counter
* The server couldn't send the GRANTED_MSG : resend after 10 s
* The server send succesfully: resend after 60 s
* the counter exceeded a limit: unlock
The GRANTED equivalents are:
1.
* The server couldn't send the GRANTED: resend after 60 s
2.
* The server couldn't send the GRANTED: unlock
3. each time the server resends, it increases a counter
* The server couldn't send the GRANTED: resend after 60 s
* the counter exceeded a limit: unlock
Solution 1 may lead to server retaining a lock.
Solution 2 and 3 may lead to client thinking it has a lock or blocking
depending whether or not he allready send GRANTED_RES/GRANTED reply or not. The
blocking case cannot happen with current linux lockd client implementation.
Using rpc sync call is the same as 1 if there is no timeout, and the same as 2
if there is one. According to a comment in the code there may be other
problems:
* - we don't want to use a synchronous RPC thread, otherwise
* we might find ourselves hanging on a dead portmapper.
Nevertheless, I am working on a GRANTED call implementation, but more for
standard conformance and completeness than to resolve that issue. My proposal
here is
If the initial call is LOCK, try a GRANTED call, and it it times out, revert to
GRANTED_MSG (needed because:
* - Some lockd implementations (e.g. HP) don't react to
* RPC_GRANTED calls; they seem to insist on RPC_GRANTED_MSG calls.
)
If the initial call is LOCK_MSG, no change.
Pat
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
Hi,
> Just add a call to posix_test_lock() in nlmclnt_grant().
I don't think it will work because posix_test_lock() will find that the locks
have the same owner and thus return 0 as if the lock wasn't found. A workaround
would be to set the fl_owner of the file_lock from the granted_msg to a value
which will cause locks_same_owner to fail, and after that try
nlm_compare_locks with the resulting lock.
Am I right ? If so could setting fl_owner to NULL break something ?
Pat
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs