2009-11-09 09:18:26

by Mi Jinlong

[permalink] [raw]
Subject: [RFC][PATCH] client cannot get lock after other client got lock occur network partition.

Hi Trond et all

There is a bug, when i test NFSv3 file's lock as followed:

Step1: ClientA and ClientB open a same nfs file;
Step2: ClientA locks file with write lock, it's ok;
Step3: Cut off the network between ClientA and Server;
Step4: ClientB can not acquire for write lock successful forever, even though
the network partition larger than NLM_HOST_EXPIRE.

As i know, If use NFSv4, step4 can success after LEASE_TIME.

Is it necessary to fix NFSv3 ?

The attached patch can make this case OK, but i am not sure it's good.

Signed-off-by: Mi Jinlong <[email protected]>
---
fs/lockd/host.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/lockd/host.c b/fs/lockd/host.c
index 4600c20..c964327 100644
--- a/fs/lockd/host.c
+++ b/fs/lockd/host.c
@@ -550,8 +550,8 @@ nlm_gc_hosts(void)

for (chain = nlm_hosts; chain < nlm_hosts + NLM_HOST_NRHASH; ++chain) {
hlist_for_each_entry_safe(host, pos, next, chain, h_hash) {
- if (atomic_read(&host->h_count) || host->h_inuse
- || time_before(jiffies, host->h_expires)) {
+ if (time_before(jiffies, host->h_expires)
+ && (atomic_read(&host->h_count) || host->h_inuse))
dprintk("nlm_gc_hosts skipping %s (cnt %d use %d exp %ld)\n",
host->h_name, atomic_read(&host->h_count),
host->h_inuse, host->h_expires);
@@ -560,6 +560,7 @@ nlm_gc_hosts(void)
dprintk("lockd: delete host %s\n", host->h_name);
hlist_del_init(&host->h_hash);

+ nlmsvc_free_host_resources(host);
nlm_destroy_host(host);
nrhosts--;
}
---
thanks,
Mi Jinlong



2009-11-09 13:16:47

by Trond Myklebust

[permalink] [raw]
Subject: Re: [RFC][PATCH] client cannot get lock after other client got lock occur network partition.

On Mon, 2009-11-09 at 17:19 +0800, Mi Jinlong wrote:
> Hi Trond et all
>
> There is a bug, when i test NFSv3 file's lock as followed:
>
> Step1: ClientA and ClientB open a same nfs file;
> Step2: ClientA locks file with write lock, it's ok;
> Step3: Cut off the network between ClientA and Server;
> Step4: ClientB can not acquire for write lock successful forever, even though
> the network partition larger than NLM_HOST_EXPIRE.
>
> As i know, If use NFSv4, step4 can success after LEASE_TIME.
>
> Is it necessary to fix NFSv3 ?
>
> The attached patch can make this case OK, but i am not sure it's good.

Unfortunately, NLM (the NFSv2 and v3 locking protocol) is not lease
based, so the above scenario is truly an unfixable one.

The problem with applying your patch is, in essence, that we risk
breaking another scenario where a client grabs a lock, and then holds it
for a while.
The reason this breaks is that there is no equivalent in the NLM
protocol of the NFSv4 RENEW operation to tell the server that "This
client is still alive and wants you to keep its state".

Cheers,
Trond



2009-11-10 09:37:25

by Mi Jinlong

[permalink] [raw]
Subject: Re: [RFC][PATCH] client cannot get lock after other client got lock occur network partition.

Hi Trond

Trond Myklebust =8E=CA=93=B9:
> On Mon, 2009-11-09 at 17:19 +0800, Mi Jinlong wrote:
>> Hi Trond et all
>>
>> There is a bug, when i test NFSv3 file's lock as followed:
>>
>> Step1: ClientA and ClientB open a same nfs file;
>> Step2: ClientA locks file with write lock, it's ok;
>> Step3: Cut off the network between ClientA and Server;
>> Step4: ClientB can not acquire for write lock successful forever, ev=
en though
>> the network partition larger than NLM_HOST_EXPIRE.
>>
>> As i know, If use NFSv4, step4 can success after LEASE_TIME.
>>
>> Is it necessary to fix NFSv3 ?=20
>>
>> The attached patch can make this case OK, but i am not sure it's goo=
d.
>=20
> Unfortunately, NLM (the NFSv2 and v3 locking protocol) is not lease
> based, so the above scenario is truly an unfixable one.
>=20
> The problem with applying your patch is, in essence, that we risk
> breaking another scenario where a client grabs a lock, and then holds=
it
> for a while.
> The reason this breaks is that there is no equivalent in the NLM
> protocol of the NFSv4 RENEW operation to tell the server that "This
> client is still alive and wants you to keep its state".

Thanks for your answer!

This bug seems serious, shouldn't we fix it?

thanks,
Mi Jinlong


2009-11-10 12:35:52

by Trond Myklebust

[permalink] [raw]
Subject: Re: [RFC][PATCH] client cannot get lock after other client got lock occur network partition.

On Tue, 2009-11-10 at 17:38 +0800, Mi Jinlong wrote:
> Hi Trond
>=20
> Trond Myklebust =E5=86=99=E9=81=93:
> > On Mon, 2009-11-09 at 17:19 +0800, Mi Jinlong wrote:
> >> Hi Trond et all
> >>
> >> There is a bug, when i test NFSv3 file's lock as followed:
> >>
> >> Step1: ClientA and ClientB open a same nfs file;
> >> Step2: ClientA locks file with write lock, it's ok;
> >> Step3: Cut off the network between ClientA and Server;
> >> Step4: ClientB can not acquire for write lock successful forever, =
even though
> >> the network partition larger than NLM_HOST_EXPIRE.
> >>
> >> As i know, If use NFSv4, step4 can success after LEASE_TIME.
> >>
> >> Is it necessary to fix NFSv3 ?=20
> >>
> >> The attached patch can make this case OK, but i am not sure it's g=
ood.
> >=20
> > Unfortunately, NLM (the NFSv2 and v3 locking protocol) is not lease
> > based, so the above scenario is truly an unfixable one.
> >=20
> > The problem with applying your patch is, in essence, that we risk
> > breaking another scenario where a client grabs a lock, and then hol=
ds it
> > for a while.
> > The reason this breaks is that there is no equivalent in the NLM
> > protocol of the NFSv4 RENEW operation to tell the server that "This
> > client is still alive and wants you to keep its state".
>=20
> Thanks for your answer!
>=20
> This bug seems serious, shouldn't we fix it?

Unless you can think of a fix which works with the current NLM protocol=
,
I'd suggest simply encouraging people to move to a protocol with lease
based locks: i.e. NFSv4...

Cheers
Trond