2010-10-07 13:10:59

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 0/4] pnfs-submit: misc fixes in CB_LAYOUTRECALL and layout stateid code

Preparing for second submission wave, here are some misc fixes to the
LAYOUTRETURN and callback code, that all apply to the top of Benny's
pnfs-submit branch

Fred



2010-10-08 14:16:33

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH 2/4] pnfs_submit: Only update stateid if it is more recent than current

On 2010-10-08 10:13, P.B.Shelley wrote:
> On Thu, Oct 7, 2010 at 10:06 PM, Benny Halevy <[email protected]> wrote:
>> On 2010-10-07 10:01, Fred Isaman wrote:
>>> On Thu, Oct 7, 2010 at 9:34 AM, Benny Halevy <[email protected]> wrote:
>>>> On 2010-10-06 16:35, Fred Isaman wrote:
>>>>> Right now, when we set the stateid, we blindly overwrite the current
>>>>> one, allowing the seqid to incorrectly roll backward.
>>>>>
>>>>> Signed-off-by: Fred Isaman <[email protected]>
>>>>> ---
>>>>> fs/nfs/pnfs.c | 38 ++++++++++++++++++++++++++++++++------
>>>>> 1 files changed, 32 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>>>>> index 39bce9b..555955b 100644
>>>>> --- a/fs/nfs/pnfs.c
>>>>> +++ b/fs/nfs/pnfs.c
>>>>> @@ -459,16 +459,42 @@ pnfs_destroy_all_layouts(struct nfs_client *clp)
>>>>> }
>>>>> }
>>>>>
>>>>> +/* update lo->stateid with new if is more recent
>>>>> + *
>>>>> + * lo->stateid could be the open stateid, in which case we just use what given.
>>>>> + */
>>>>> static void
>>>>> pnfs_set_layout_stateid(struct pnfs_layout_hdr *lo,
>>>>> - const nfs4_stateid *stateid)
>>>>> + const nfs4_stateid *new)
>>>>> {
>>>>> - /* TODO - should enforce that embedded seqid, in the case
>>>>> - * that the two stateid.others are equal, only increases.
>>>>> - * Complicated by wrap-around.
>>>>> - */
>>>>> + nfs4_stateid *old = &lo->stateid;
>>>>> + bool overwrite = false;
>>>>> +
>>>>> write_seqlock(&lo->seqlock);
>>>>> - memcpy(lo->stateid.data, stateid->data, sizeof(lo->stateid.data));
>>>>> + if (!test_bit(NFS_LAYOUT_STATEID_SET, &lo->state) ||
>>>>> + memcmp(old->stateid.other, new->stateid.other, sizeof(new->stateid.other)))

here we check the 'other' part.

>>>>> + overwrite = true;
>>>>> + else {
>>>>> + u32 oldseq, newseq, limit;
>>>>> +
>>>>> + oldseq = be32_to_cpu(old->stateid.seqid);
>>>>> + newseq = be32_to_cpu(new->stateid.seqid);
>>>>> + /* There are no good bounds on window size, so just
>>>>> + * use a ridiculously large window of 2^31.
>>>>> + */
>>>>> + limit = oldseq + (1 << 31);
>>>>> + if (oldseq < limit) {
>>>>> + /* The easy, non-wraparound case */
>>>>> + if (oldseq < newseq && newseq < limit)
>>>>> + overwrite = true;
>>>>> + } else {
>>>>> + /* Near wraparound edge */
>>>>> + if (oldseq < newseq || newseq < limit)
>>>>> + overwrite = true;
>>>>> + }
>>>>
>>>> Wouldn't it be simpler to just look at (int32_t)(newseq - oldseq)?
>>>>
>>>
>>> Why yes it would. I'll send a new version of this patch shortly.
>>>
>>
>> No need :)
>> I'll just change this as follows:
>>
>> + else {
>> + u32 oldseq, newseq, limit;
>> +
>> + oldseq = be32_to_cpu(old->stateid.seqid);
>> + newseq = be32_to_cpu(new->stateid.seqid);
>> + if ((int)(newseq - oldseq) > 0)
>> + overwrite = true;
> Do we also need to verify the other field of the stateid? Will there
> be situations that server change the other field and reset the seqid?

We do :)
see block above.

Benny

>
> Thanks,
> Shelley
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2010-10-08 15:52:15

by Fred Isaman

[permalink] [raw]
Subject: Re: [PATCH 2/4] pnfs_submit: Only update stateid if it is more recent than current

On Fri, Oct 8, 2010 at 11:36 AM, P.B.Shelley <[email protected]> wrote:
> On Fri, Oct 8, 2010 at 10:35 PM, Fred Isaman <[email protected]> wrote:
>> On Fri, Oct 8, 2010 at 10:13 AM, P.B.Shelley <[email protected]> wrote:
>>> On Thu, Oct 7, 2010 at 10:06 PM, Benny Halevy <[email protected]> wrote:
>>>> On 2010-10-07 10:01, Fred Isaman wrote:
>>>>> On Thu, Oct 7, 2010 at 9:34 AM, Benny Halevy <[email protected]> wrote:
>>>>>> On 2010-10-06 16:35, Fred Isaman wrote:
>>>>>>> Right now, when we set the stateid, we blindly overwrite the current
>>>>>>> one, allowing the seqid to incorrectly roll backward.
>>>>>>>
>>>>>>> Signed-off-by: Fred Isaman <[email protected]>
>>>>>>> ---
>>>>>>> ?fs/nfs/pnfs.c | ? 38 ++++++++++++++++++++++++++++++++------
>>>>>>> ?1 files changed, 32 insertions(+), 6 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>>>>>>> index 39bce9b..555955b 100644
>>>>>>> --- a/fs/nfs/pnfs.c
>>>>>>> +++ b/fs/nfs/pnfs.c
>>>>>>> @@ -459,16 +459,42 @@ pnfs_destroy_all_layouts(struct nfs_client *clp)
>>>>>>> ? ? ? }
>>>>>>> ?}
>>>>>>>
>>>>>>> +/* update lo->stateid with new if is more recent
>>>>>>> + *
>>>>>>> + * lo->stateid could be the open stateid, in which case we just use what given.
>>>>>>> + */
>>>>>>> ?static void
>>>>>>> ?pnfs_set_layout_stateid(struct pnfs_layout_hdr *lo,
>>>>>>> - ? ? ? ? ? ? ? ? ? ? const nfs4_stateid *stateid)
>>>>>>> + ? ? ? ? ? ? ? ? ? ? const nfs4_stateid *new)
>>>>>>> ?{
>>>>>>> - ? ? /* TODO - should enforce that embedded seqid, in the case
>>>>>>> - ? ? ?* that the two stateid.others are equal, ?only increases.
>>>>>>> - ? ? ?* Complicated by wrap-around.
>>>>>>> - ? ? ?*/
>>>>>>> + ? ? nfs4_stateid *old = &lo->stateid;
>>>>>>> + ? ? bool overwrite = false;
>>>>>>> +
>>>>>>> ? ? ? write_seqlock(&lo->seqlock);
>>>>>>> - ? ? memcpy(lo->stateid.data, stateid->data, sizeof(lo->stateid.data));
>>>>>>> + ? ? if (!test_bit(NFS_LAYOUT_STATEID_SET, &lo->state) ||
>>>>>>> + ? ? ? ? memcmp(old->stateid.other, new->stateid.other, sizeof(new->stateid.other)))
>>>>>>> + ? ? ? ? ? ? overwrite = true;
>>>>>>> + ? ? else {
>>>>>>> + ? ? ? ? ? ? u32 oldseq, newseq, limit;
>>>>>>> +
>>>>>>> + ? ? ? ? ? ? oldseq = be32_to_cpu(old->stateid.seqid);
>>>>>>> + ? ? ? ? ? ? newseq = be32_to_cpu(new->stateid.seqid);
>>>>>>> + ? ? ? ? ? ? /* There are no good bounds on window size, so just
>>>>>>> + ? ? ? ? ? ? ?* use a ridiculously large window of 2^31.
>>>>>>> + ? ? ? ? ? ? ?*/
>>>>>>> + ? ? ? ? ? ? limit = oldseq + (1 << 31);
>>>>>>> + ? ? ? ? ? ? if (oldseq < limit) {
>>>>>>> + ? ? ? ? ? ? ? ? ? ? /* The easy, non-wraparound case */
>>>>>>> + ? ? ? ? ? ? ? ? ? ? if (oldseq < newseq && newseq < limit)
>>>>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? overwrite = true;
>>>>>>> + ? ? ? ? ? ? } else {
>>>>>>> + ? ? ? ? ? ? ? ? ? ? /* Near wraparound edge */
>>>>>>> + ? ? ? ? ? ? ? ? ? ? if (oldseq < newseq || newseq < limit)
>>>>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? overwrite = true;
>>>>>>> + ? ? ? ? ? ? }
>>>>>>
>>>>>> Wouldn't it be simpler to just look at (int32_t)(newseq - oldseq)?
>>>>>>
>>>>>
>>>>> Why yes it would. ?I'll send a new version of this patch shortly.
>>>>>
>>>>
>>>> No need :)
>>>> I'll just change this as follows:
>>>>
>>>> + ? ? ? else {
>>>> + ? ? ? ? ? ? ? u32 oldseq, newseq, limit;
>>>> +
>>>> + ? ? ? ? ? ? ? oldseq = be32_to_cpu(old->stateid.seqid);
>>>> + ? ? ? ? ? ? ? newseq = be32_to_cpu(new->stateid.seqid);
>>>> + ? ? ? ? ? ? ? if ((int)(newseq - oldseq) > 0)
>>>> + ? ? ? ? ? ? ? ? ? ? ? overwrite = true;
>>> Do we also need to verify the other field of the stateid? Will there
>>> be situations that server change the other field and reset the seqid?
>>
>> The server is going to use the "other" we sent, except in the case we
>> sent an open stateid. ?The only potential for trouble I see is if a
>> LAYOUTGET reply gets lost in the network for a long time and is
>> received after the layout stateid has been reset for some reason.
>> However, that implies an error elsewhere (which may well exist at the
>> moment...careful stateid handling is next on the agenda), as we should
>> have been waiting for that lseg to arrive before continuing.
> Oops, I missed the other field comparing code. Thank you for pointing
> it out, Benny.
>
> Can client choose to send an open stateid for LATYOUTGET request, even
> if client has a layout stateid for the file?
>

No (see 12.5.2). But the spec seems to allow sending multiple
LAYOUTGETs with the open stateid until it processes the first reply
which includes a proper layout stateid.

Fred


> --
> Thanks,
> Shelley
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

2010-10-07 13:11:00

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 4/4] pnfs_submit: Fix clp refcounting in layout recalls

We were issing a put, and doing pointless zero checking on clp for
which we hold a reference.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/callback_proc.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index 5a996b5..6b560ce 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -304,8 +304,7 @@ static int pnfs_async_return_layout(struct nfs_client *clp, struct inode *inode,

init_completion(&data.started);
__module_get(THIS_MODULE);
- if (!atomic_inc_not_zero(&clp->cl_count))
- goto out_put_no_client;
+ atomic_inc(&clp->cl_count);

t = kthread_run(pnfs_recall_layout, &data, "%s", "pnfs_recall_layout");
if (IS_ERR(t)) {
@@ -320,7 +319,6 @@ static int pnfs_async_return_layout(struct nfs_client *clp, struct inode *inode,
return data.result;
out_module_put:
nfs_put_client(clp);
-out_put_no_client:
clear_bit(NFS4CLNT_LAYOUT_RECALL, &clp->cl_state);
module_put(THIS_MODULE);
return status;
@@ -643,7 +641,7 @@ __be32 nfs4_callback_recallany(struct cb_recallanyargs *args, void *dummy)
status = cpu_to_be32(NFS4ERR_INVAL);
if (!validate_bitmap_values((const unsigned long *)
&args->craa_type_mask))
- return status;
+ goto out_put;

status = cpu_to_be32(NFS4_OK);
if (test_bit(RCA4_TYPE_MASK_RDATA_DLG, (const unsigned long *)
@@ -659,6 +657,8 @@ __be32 nfs4_callback_recallany(struct cb_recallanyargs *args, void *dummy)

if (flags)
nfs_expire_all_delegation_types(clp, flags);
+out_put:
+ nfs_put_client(clp);
out:
dprintk("%s: exit with status = %d\n", __func__, ntohl(status));
return status;
--
1.7.2.1


2010-10-08 14:35:40

by Fred Isaman

[permalink] [raw]
Subject: Re: [PATCH 2/4] pnfs_submit: Only update stateid if it is more recent than current

On Fri, Oct 8, 2010 at 10:13 AM, P.B.Shelley <[email protected]> wrote:
> On Thu, Oct 7, 2010 at 10:06 PM, Benny Halevy <[email protected]> wrote:
>> On 2010-10-07 10:01, Fred Isaman wrote:
>>> On Thu, Oct 7, 2010 at 9:34 AM, Benny Halevy <[email protected]> wrote:
>>>> On 2010-10-06 16:35, Fred Isaman wrote:
>>>>> Right now, when we set the stateid, we blindly overwrite the current
>>>>> one, allowing the seqid to incorrectly roll backward.
>>>>>
>>>>> Signed-off-by: Fred Isaman <[email protected]>
>>>>> ---
>>>>> ?fs/nfs/pnfs.c | ? 38 ++++++++++++++++++++++++++++++++------
>>>>> ?1 files changed, 32 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>>>>> index 39bce9b..555955b 100644
>>>>> --- a/fs/nfs/pnfs.c
>>>>> +++ b/fs/nfs/pnfs.c
>>>>> @@ -459,16 +459,42 @@ pnfs_destroy_all_layouts(struct nfs_client *clp)
>>>>> ? ? ? }
>>>>> ?}
>>>>>
>>>>> +/* update lo->stateid with new if is more recent
>>>>> + *
>>>>> + * lo->stateid could be the open stateid, in which case we just use what given.
>>>>> + */
>>>>> ?static void
>>>>> ?pnfs_set_layout_stateid(struct pnfs_layout_hdr *lo,
>>>>> - ? ? ? ? ? ? ? ? ? ? const nfs4_stateid *stateid)
>>>>> + ? ? ? ? ? ? ? ? ? ? const nfs4_stateid *new)
>>>>> ?{
>>>>> - ? ? /* TODO - should enforce that embedded seqid, in the case
>>>>> - ? ? ?* that the two stateid.others are equal, ?only increases.
>>>>> - ? ? ?* Complicated by wrap-around.
>>>>> - ? ? ?*/
>>>>> + ? ? nfs4_stateid *old = &lo->stateid;
>>>>> + ? ? bool overwrite = false;
>>>>> +
>>>>> ? ? ? write_seqlock(&lo->seqlock);
>>>>> - ? ? memcpy(lo->stateid.data, stateid->data, sizeof(lo->stateid.data));
>>>>> + ? ? if (!test_bit(NFS_LAYOUT_STATEID_SET, &lo->state) ||
>>>>> + ? ? ? ? memcmp(old->stateid.other, new->stateid.other, sizeof(new->stateid.other)))
>>>>> + ? ? ? ? ? ? overwrite = true;
>>>>> + ? ? else {
>>>>> + ? ? ? ? ? ? u32 oldseq, newseq, limit;
>>>>> +
>>>>> + ? ? ? ? ? ? oldseq = be32_to_cpu(old->stateid.seqid);
>>>>> + ? ? ? ? ? ? newseq = be32_to_cpu(new->stateid.seqid);
>>>>> + ? ? ? ? ? ? /* There are no good bounds on window size, so just
>>>>> + ? ? ? ? ? ? ?* use a ridiculously large window of 2^31.
>>>>> + ? ? ? ? ? ? ?*/
>>>>> + ? ? ? ? ? ? limit = oldseq + (1 << 31);
>>>>> + ? ? ? ? ? ? if (oldseq < limit) {
>>>>> + ? ? ? ? ? ? ? ? ? ? /* The easy, non-wraparound case */
>>>>> + ? ? ? ? ? ? ? ? ? ? if (oldseq < newseq && newseq < limit)
>>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? overwrite = true;
>>>>> + ? ? ? ? ? ? } else {
>>>>> + ? ? ? ? ? ? ? ? ? ? /* Near wraparound edge */
>>>>> + ? ? ? ? ? ? ? ? ? ? if (oldseq < newseq || newseq < limit)
>>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? overwrite = true;
>>>>> + ? ? ? ? ? ? }
>>>>
>>>> Wouldn't it be simpler to just look at (int32_t)(newseq - oldseq)?
>>>>
>>>
>>> Why yes it would. ?I'll send a new version of this patch shortly.
>>>
>>
>> No need :)
>> I'll just change this as follows:
>>
>> + ? ? ? else {
>> + ? ? ? ? ? ? ? u32 oldseq, newseq, limit;
>> +
>> + ? ? ? ? ? ? ? oldseq = be32_to_cpu(old->stateid.seqid);
>> + ? ? ? ? ? ? ? newseq = be32_to_cpu(new->stateid.seqid);
>> + ? ? ? ? ? ? ? if ((int)(newseq - oldseq) > 0)
>> + ? ? ? ? ? ? ? ? ? ? ? overwrite = true;
> Do we also need to verify the other field of the stateid? Will there
> be situations that server change the other field and reset the seqid?

The server is going to use the "other" we sent, except in the case we
sent an open stateid. The only potential for trouble I see is if a
LAYOUTGET reply gets lost in the network for a long time and is
received after the layout stateid has been reset for some reason.
However, that implies an error elsewhere (which may well exist at the
moment...careful stateid handling is next on the agenda), as we should
have been waiting for that lseg to arrive before continuing.

Fred

>
> Thanks,
> Shelley
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

2010-10-07 14:06:36

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH 2/4] pnfs_submit: Only update stateid if it is more recent than current

On 2010-10-07 10:01, Fred Isaman wrote:
> On Thu, Oct 7, 2010 at 9:34 AM, Benny Halevy <[email protected]> wrote:
>> On 2010-10-06 16:35, Fred Isaman wrote:
>>> Right now, when we set the stateid, we blindly overwrite the current
>>> one, allowing the seqid to incorrectly roll backward.
>>>
>>> Signed-off-by: Fred Isaman <[email protected]>
>>> ---
>>> fs/nfs/pnfs.c | 38 ++++++++++++++++++++++++++++++++------
>>> 1 files changed, 32 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>>> index 39bce9b..555955b 100644
>>> --- a/fs/nfs/pnfs.c
>>> +++ b/fs/nfs/pnfs.c
>>> @@ -459,16 +459,42 @@ pnfs_destroy_all_layouts(struct nfs_client *clp)
>>> }
>>> }
>>>
>>> +/* update lo->stateid with new if is more recent
>>> + *
>>> + * lo->stateid could be the open stateid, in which case we just use what given.
>>> + */
>>> static void
>>> pnfs_set_layout_stateid(struct pnfs_layout_hdr *lo,
>>> - const nfs4_stateid *stateid)
>>> + const nfs4_stateid *new)
>>> {
>>> - /* TODO - should enforce that embedded seqid, in the case
>>> - * that the two stateid.others are equal, only increases.
>>> - * Complicated by wrap-around.
>>> - */
>>> + nfs4_stateid *old = &lo->stateid;
>>> + bool overwrite = false;
>>> +
>>> write_seqlock(&lo->seqlock);
>>> - memcpy(lo->stateid.data, stateid->data, sizeof(lo->stateid.data));
>>> + if (!test_bit(NFS_LAYOUT_STATEID_SET, &lo->state) ||
>>> + memcmp(old->stateid.other, new->stateid.other, sizeof(new->stateid.other)))
>>> + overwrite = true;
>>> + else {
>>> + u32 oldseq, newseq, limit;
>>> +
>>> + oldseq = be32_to_cpu(old->stateid.seqid);
>>> + newseq = be32_to_cpu(new->stateid.seqid);
>>> + /* There are no good bounds on window size, so just
>>> + * use a ridiculously large window of 2^31.
>>> + */
>>> + limit = oldseq + (1 << 31);
>>> + if (oldseq < limit) {
>>> + /* The easy, non-wraparound case */
>>> + if (oldseq < newseq && newseq < limit)
>>> + overwrite = true;
>>> + } else {
>>> + /* Near wraparound edge */
>>> + if (oldseq < newseq || newseq < limit)
>>> + overwrite = true;
>>> + }
>>
>> Wouldn't it be simpler to just look at (int32_t)(newseq - oldseq)?
>>
>
> Why yes it would. I'll send a new version of this patch shortly.
>

No need :)
I'll just change this as follows:

+ else {
+ u32 oldseq, newseq, limit;
+
+ oldseq = be32_to_cpu(old->stateid.seqid);
+ newseq = be32_to_cpu(new->stateid.seqid);
+ if ((int)(newseq - oldseq) > 0)
+ overwrite = true;
+ }

Benny

> Fred
>
>> Benny
>>
>>> + }
>>> + if (overwrite)
>>> + memcpy(&old->stateid, &new->stateid, sizeof(new->stateid));
>>> write_sequnlock(&lo->seqlock);
>>> }
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>

2010-10-07 13:11:00

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 3/4] pnfs_submit: simplify nfs4_callback_layoutrecall

The code in the two branches was identical, so remove the if statement.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/callback_proc.c | 29 ++++++++++-------------------
1 files changed, 10 insertions(+), 19 deletions(-)

diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index fa0ca0c..5a996b5 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -370,25 +370,16 @@ __be32 nfs4_callback_layoutrecall(struct cb_layoutrecallargs *args,
/* the callback must come from the MDS personality */
if (!(clp->cl_exchange_flags & EXCHGID4_FLAG_USE_PNFS_MDS))
goto loop;
- if (args->cbl_recall_type == RETURN_FILE) {
- inode = nfs_layoutrecall_find_inode(clp, args);
- if (inode != NULL) {
- status = pnfs_async_return_layout(clp, inode,
- args);
- if (status)
- res = cpu_to_be32(NFS4ERR_DELAY);
- iput(inode);
- }
- } else { /* _ALL or _FSID */
- /* we need the inode to get the nfs_server struct */
- inode = nfs_layoutrecall_find_inode(clp, args);
- if (!inode)
- goto loop;
- status = pnfs_async_return_layout(clp, inode, args);
- if (status)
- res = cpu_to_be32(NFS4ERR_DELAY);
- iput(inode);
- }
+ /* In the _ALL or _FSID case, we need the inode to get
+ * the nfs_server struct.
+ */
+ inode = nfs_layoutrecall_find_inode(clp, args);
+ if (!inode)
+ goto loop;
+ status = pnfs_async_return_layout(clp, inode, args);
+ if (status)
+ res = cpu_to_be32(NFS4ERR_DELAY);
+ iput(inode);
loop:
clp = nfs_find_client_next(prev);
nfs_put_client(prev);
--
1.7.2.1


2010-10-07 13:11:00

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 2/4] pnfs_submit: Only update stateid if it is more recent than current

Right now, when we set the stateid, we blindly overwrite the current
one, allowing the seqid to incorrectly roll backward.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/pnfs.c | 38 ++++++++++++++++++++++++++++++++------
1 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 39bce9b..555955b 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -459,16 +459,42 @@ pnfs_destroy_all_layouts(struct nfs_client *clp)
}
}

+/* update lo->stateid with new if is more recent
+ *
+ * lo->stateid could be the open stateid, in which case we just use what given.
+ */
static void
pnfs_set_layout_stateid(struct pnfs_layout_hdr *lo,
- const nfs4_stateid *stateid)
+ const nfs4_stateid *new)
{
- /* TODO - should enforce that embedded seqid, in the case
- * that the two stateid.others are equal, only increases.
- * Complicated by wrap-around.
- */
+ nfs4_stateid *old = &lo->stateid;
+ bool overwrite = false;
+
write_seqlock(&lo->seqlock);
- memcpy(lo->stateid.data, stateid->data, sizeof(lo->stateid.data));
+ if (!test_bit(NFS_LAYOUT_STATEID_SET, &lo->state) ||
+ memcmp(old->stateid.other, new->stateid.other, sizeof(new->stateid.other)))
+ overwrite = true;
+ else {
+ u32 oldseq, newseq, limit;
+
+ oldseq = be32_to_cpu(old->stateid.seqid);
+ newseq = be32_to_cpu(new->stateid.seqid);
+ /* There are no good bounds on window size, so just
+ * use a ridiculously large window of 2^31.
+ */
+ limit = oldseq + (1 << 31);
+ if (oldseq < limit) {
+ /* The easy, non-wraparound case */
+ if (oldseq < newseq && newseq < limit)
+ overwrite = true;
+ } else {
+ /* Near wraparound edge */
+ if (oldseq < newseq || newseq < limit)
+ overwrite = true;
+ }
+ }
+ if (overwrite)
+ memcpy(&old->stateid, &new->stateid, sizeof(new->stateid));
write_sequnlock(&lo->seqlock);
}

--
1.7.2.1


2010-10-07 15:32:39

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH 0/4] pnfs-submit: misc fixes in CB_LAYOUTRECALL and layout stateid code

Merged all 4 patches (with my fix to patch 2/4)
And pushed to the local Bakeathon server:
git://192.168.12.138/bhalevy/linux-pnfs.git

Benny

Thanks!
On 2010-10-06 16:35, Fred Isaman wrote:
> Preparing for second submission wave, here are some misc fixes to the
> LAYOUTRETURN and callback code, that all apply to the top of Benny's
> pnfs-submit branch
>
> Fred
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2010-10-07 14:01:24

by Fred Isaman

[permalink] [raw]
Subject: Re: [PATCH 2/4] pnfs_submit: Only update stateid if it is more recent than current

On Thu, Oct 7, 2010 at 9:34 AM, Benny Halevy <[email protected]> wrote:
> On 2010-10-06 16:35, Fred Isaman wrote:
>> Right now, when we set the stateid, we blindly overwrite the current
>> one, allowing the seqid to incorrectly roll backward.
>>
>> Signed-off-by: Fred Isaman <[email protected]>
>> ---
>> ?fs/nfs/pnfs.c | ? 38 ++++++++++++++++++++++++++++++++------
>> ?1 files changed, 32 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>> index 39bce9b..555955b 100644
>> --- a/fs/nfs/pnfs.c
>> +++ b/fs/nfs/pnfs.c
>> @@ -459,16 +459,42 @@ pnfs_destroy_all_layouts(struct nfs_client *clp)
>> ? ? ? }
>> ?}
>>
>> +/* update lo->stateid with new if is more recent
>> + *
>> + * lo->stateid could be the open stateid, in which case we just use what given.
>> + */
>> ?static void
>> ?pnfs_set_layout_stateid(struct pnfs_layout_hdr *lo,
>> - ? ? ? ? ? ? ? ? ? ? const nfs4_stateid *stateid)
>> + ? ? ? ? ? ? ? ? ? ? const nfs4_stateid *new)
>> ?{
>> - ? ? /* TODO - should enforce that embedded seqid, in the case
>> - ? ? ?* that the two stateid.others are equal, ?only increases.
>> - ? ? ?* Complicated by wrap-around.
>> - ? ? ?*/
>> + ? ? nfs4_stateid *old = &lo->stateid;
>> + ? ? bool overwrite = false;
>> +
>> ? ? ? write_seqlock(&lo->seqlock);
>> - ? ? memcpy(lo->stateid.data, stateid->data, sizeof(lo->stateid.data));
>> + ? ? if (!test_bit(NFS_LAYOUT_STATEID_SET, &lo->state) ||
>> + ? ? ? ? memcmp(old->stateid.other, new->stateid.other, sizeof(new->stateid.other)))
>> + ? ? ? ? ? ? overwrite = true;
>> + ? ? else {
>> + ? ? ? ? ? ? u32 oldseq, newseq, limit;
>> +
>> + ? ? ? ? ? ? oldseq = be32_to_cpu(old->stateid.seqid);
>> + ? ? ? ? ? ? newseq = be32_to_cpu(new->stateid.seqid);
>> + ? ? ? ? ? ? /* There are no good bounds on window size, so just
>> + ? ? ? ? ? ? ?* use a ridiculously large window of 2^31.
>> + ? ? ? ? ? ? ?*/
>> + ? ? ? ? ? ? limit = oldseq + (1 << 31);
>> + ? ? ? ? ? ? if (oldseq < limit) {
>> + ? ? ? ? ? ? ? ? ? ? /* The easy, non-wraparound case */
>> + ? ? ? ? ? ? ? ? ? ? if (oldseq < newseq && newseq < limit)
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? overwrite = true;
>> + ? ? ? ? ? ? } else {
>> + ? ? ? ? ? ? ? ? ? ? /* Near wraparound edge */
>> + ? ? ? ? ? ? ? ? ? ? if (oldseq < newseq || newseq < limit)
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? overwrite = true;
>> + ? ? ? ? ? ? }
>
> Wouldn't it be simpler to just look at (int32_t)(newseq - oldseq)?
>

Why yes it would. I'll send a new version of this patch shortly.

Fred

> Benny
>
>> + ? ? }
>> + ? ? if (overwrite)
>> + ? ? ? ? ? ? memcpy(&old->stateid, &new->stateid, sizeof(new->stateid));
>> ? ? ? write_sequnlock(&lo->seqlock);
>> ?}
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

2010-10-08 14:13:56

by P.B.Shelley

[permalink] [raw]
Subject: Re: [PATCH 2/4] pnfs_submit: Only update stateid if it is more recent than current

On Thu, Oct 7, 2010 at 10:06 PM, Benny Halevy <[email protected]> wrote:
> On 2010-10-07 10:01, Fred Isaman wrote:
>> On Thu, Oct 7, 2010 at 9:34 AM, Benny Halevy <[email protected]> wrote:
>>> On 2010-10-06 16:35, Fred Isaman wrote:
>>>> Right now, when we set the stateid, we blindly overwrite the current
>>>> one, allowing the seqid to incorrectly roll backward.
>>>>
>>>> Signed-off-by: Fred Isaman <[email protected]>
>>>> ---
>>>>  fs/nfs/pnfs.c |   38 ++++++++++++++++++++++++++++++++------
>>>>  1 files changed, 32 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>>>> index 39bce9b..555955b 100644
>>>> --- a/fs/nfs/pnfs.c
>>>> +++ b/fs/nfs/pnfs.c
>>>> @@ -459,16 +459,42 @@ pnfs_destroy_all_layouts(struct nfs_client *clp)
>>>>       }
>>>>  }
>>>>
>>>> +/* update lo->stateid with new if is more recent
>>>> + *
>>>> + * lo->stateid could be the open stateid, in which case we just use what given.
>>>> + */
>>>>  static void
>>>>  pnfs_set_layout_stateid(struct pnfs_layout_hdr *lo,
>>>> -                     const nfs4_stateid *stateid)
>>>> +                     const nfs4_stateid *new)
>>>>  {
>>>> -     /* TODO - should enforce that embedded seqid, in the case
>>>> -      * that the two stateid.others are equal,  only increases.
>>>> -      * Complicated by wrap-around.
>>>> -      */
>>>> +     nfs4_stateid *old = &lo->stateid;
>>>> +     bool overwrite = false;
>>>> +
>>>>       write_seqlock(&lo->seqlock);
>>>> -     memcpy(lo->stateid.data, stateid->data, sizeof(lo->stateid.data));
>>>> +     if (!test_bit(NFS_LAYOUT_STATEID_SET, &lo->state) ||
>>>> +         memcmp(old->stateid.other, new->stateid.other, sizeof(new->stateid.other)))
>>>> +             overwrite = true;
>>>> +     else {
>>>> +             u32 oldseq, newseq, limit;
>>>> +
>>>> +             oldseq = be32_to_cpu(old->stateid.seqid);
>>>> +             newseq = be32_to_cpu(new->stateid.seqid);
>>>> +             /* There are no good bounds on window size, so just
>>>> +              * use a ridiculously large window of 2^31.
>>>> +              */
>>>> +             limit = oldseq + (1 << 31);
>>>> +             if (oldseq < limit) {
>>>> +                     /* The easy, non-wraparound case */
>>>> +                     if (oldseq < newseq && newseq < limit)
>>>> +                             overwrite = true;
>>>> +             } else {
>>>> +                     /* Near wraparound edge */
>>>> +                     if (oldseq < newseq || newseq < limit)
>>>> +                             overwrite = true;
>>>> +             }
>>>
>>> Wouldn't it be simpler to just look at (int32_t)(newseq - oldseq)?
>>>
>>
>> Why yes it would.  I'll send a new version of this patch shortly.
>>
>
> No need :)
> I'll just change this as follows:
>
> +       else {
> +               u32 oldseq, newseq, limit;
> +
> +               oldseq = be32_to_cpu(old->stateid.seqid);
> +               newseq = be32_to_cpu(new->stateid.seqid);
> +               if ((int)(newseq - oldseq) > 0)
> +                       overwrite = true;
Do we also need to verify the other field of the stateid? Will there
be situations that server change the other field and reset the seqid?

Thanks,
Shelley

2010-10-08 15:36:26

by P.B.Shelley

[permalink] [raw]
Subject: Re: [PATCH 2/4] pnfs_submit: Only update stateid if it is more recent than current

On Fri, Oct 8, 2010 at 10:35 PM, Fred Isaman <[email protected]> wrote:
> On Fri, Oct 8, 2010 at 10:13 AM, P.B.Shelley <[email protected]> wrote:
>> On Thu, Oct 7, 2010 at 10:06 PM, Benny Halevy <[email protected]> wrote:
>>> On 2010-10-07 10:01, Fred Isaman wrote:
>>>> On Thu, Oct 7, 2010 at 9:34 AM, Benny Halevy <[email protected]> wrote:
>>>>> On 2010-10-06 16:35, Fred Isaman wrote:
>>>>>> Right now, when we set the stateid, we blindly overwrite the current
>>>>>> one, allowing the seqid to incorrectly roll backward.
>>>>>>
>>>>>> Signed-off-by: Fred Isaman <[email protected]>
>>>>>> ---
>>>>>>  fs/nfs/pnfs.c |   38 ++++++++++++++++++++++++++++++++------
>>>>>>  1 files changed, 32 insertions(+), 6 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>>>>>> index 39bce9b..555955b 100644
>>>>>> --- a/fs/nfs/pnfs.c
>>>>>> +++ b/fs/nfs/pnfs.c
>>>>>> @@ -459,16 +459,42 @@ pnfs_destroy_all_layouts(struct nfs_client *clp)
>>>>>>       }
>>>>>>  }
>>>>>>
>>>>>> +/* update lo->stateid with new if is more recent
>>>>>> + *
>>>>>> + * lo->stateid could be the open stateid, in which case we just use what given.
>>>>>> + */
>>>>>>  static void
>>>>>>  pnfs_set_layout_stateid(struct pnfs_layout_hdr *lo,
>>>>>> -                     const nfs4_stateid *stateid)
>>>>>> +                     const nfs4_stateid *new)
>>>>>>  {
>>>>>> -     /* TODO - should enforce that embedded seqid, in the case
>>>>>> -      * that the two stateid.others are equal,  only increases.
>>>>>> -      * Complicated by wrap-around.
>>>>>> -      */
>>>>>> +     nfs4_stateid *old = &lo->stateid;
>>>>>> +     bool overwrite = false;
>>>>>> +
>>>>>>       write_seqlock(&lo->seqlock);
>>>>>> -     memcpy(lo->stateid.data, stateid->data, sizeof(lo->stateid.data));
>>>>>> +     if (!test_bit(NFS_LAYOUT_STATEID_SET, &lo->state) ||
>>>>>> +         memcmp(old->stateid.other, new->stateid.other, sizeof(new->stateid.other)))
>>>>>> +             overwrite = true;
>>>>>> +     else {
>>>>>> +             u32 oldseq, newseq, limit;
>>>>>> +
>>>>>> +             oldseq = be32_to_cpu(old->stateid.seqid);
>>>>>> +             newseq = be32_to_cpu(new->stateid.seqid);
>>>>>> +             /* There are no good bounds on window size, so just
>>>>>> +              * use a ridiculously large window of 2^31.
>>>>>> +              */
>>>>>> +             limit = oldseq + (1 << 31);
>>>>>> +             if (oldseq < limit) {
>>>>>> +                     /* The easy, non-wraparound case */
>>>>>> +                     if (oldseq < newseq && newseq < limit)
>>>>>> +                             overwrite = true;
>>>>>> +             } else {
>>>>>> +                     /* Near wraparound edge */
>>>>>> +                     if (oldseq < newseq || newseq < limit)
>>>>>> +                             overwrite = true;
>>>>>> +             }
>>>>>
>>>>> Wouldn't it be simpler to just look at (int32_t)(newseq - oldseq)?
>>>>>
>>>>
>>>> Why yes it would.  I'll send a new version of this patch shortly.
>>>>
>>>
>>> No need :)
>>> I'll just change this as follows:
>>>
>>> +       else {
>>> +               u32 oldseq, newseq, limit;
>>> +
>>> +               oldseq = be32_to_cpu(old->stateid.seqid);
>>> +               newseq = be32_to_cpu(new->stateid.seqid);
>>> +               if ((int)(newseq - oldseq) > 0)
>>> +                       overwrite = true;
>> Do we also need to verify the other field of the stateid? Will there
>> be situations that server change the other field and reset the seqid?
>
> The server is going to use the "other" we sent, except in the case we
> sent an open stateid.  The only potential for trouble I see is if a
> LAYOUTGET reply gets lost in the network for a long time and is
> received after the layout stateid has been reset for some reason.
> However, that implies an error elsewhere (which may well exist at the
> moment...careful stateid handling is next on the agenda), as we should
> have been waiting for that lseg to arrive before continuing.
Oops, I missed the other field comparing code. Thank you for pointing
it out, Benny.

Can client choose to send an open stateid for LATYOUTGET request, even
if client has a layout stateid for the file?

--
Thanks,
Shelley

2010-10-07 13:11:00

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 1/4] pnfs_submit: Remove nonsensical check

Only a buggy server would return a zero iomode with roc set.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/pnfs.c | 4 ----
1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 4b33cde..39bce9b 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -935,12 +935,8 @@ pnfs_layout_process(struct nfs4_layoutget *lgp)
/* FI: This needs to be re-examined. At lo level,
* all it needs is a bit indicating whether any of
* the lsegs in the list have the flags set.
- *
- * The IOMODE_ANY line just seems nonsensical.
*/
lo->roc_iomode |= res->range.iomode;
- if (!lo->roc_iomode)
- lo->roc_iomode = IOMODE_ANY;
}

/* Done processing layoutget. Set the layout stateid */
--
1.7.2.1


2010-10-07 13:35:02

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH 2/4] pnfs_submit: Only update stateid if it is more recent than current

On 2010-10-06 16:35, Fred Isaman wrote:
> Right now, when we set the stateid, we blindly overwrite the current
> one, allowing the seqid to incorrectly roll backward.
>
> Signed-off-by: Fred Isaman <[email protected]>
> ---
> fs/nfs/pnfs.c | 38 ++++++++++++++++++++++++++++++++------
> 1 files changed, 32 insertions(+), 6 deletions(-)
>
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index 39bce9b..555955b 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -459,16 +459,42 @@ pnfs_destroy_all_layouts(struct nfs_client *clp)
> }
> }
>
> +/* update lo->stateid with new if is more recent
> + *
> + * lo->stateid could be the open stateid, in which case we just use what given.
> + */
> static void
> pnfs_set_layout_stateid(struct pnfs_layout_hdr *lo,
> - const nfs4_stateid *stateid)
> + const nfs4_stateid *new)
> {
> - /* TODO - should enforce that embedded seqid, in the case
> - * that the two stateid.others are equal, only increases.
> - * Complicated by wrap-around.
> - */
> + nfs4_stateid *old = &lo->stateid;
> + bool overwrite = false;
> +
> write_seqlock(&lo->seqlock);
> - memcpy(lo->stateid.data, stateid->data, sizeof(lo->stateid.data));
> + if (!test_bit(NFS_LAYOUT_STATEID_SET, &lo->state) ||
> + memcmp(old->stateid.other, new->stateid.other, sizeof(new->stateid.other)))
> + overwrite = true;
> + else {
> + u32 oldseq, newseq, limit;
> +
> + oldseq = be32_to_cpu(old->stateid.seqid);
> + newseq = be32_to_cpu(new->stateid.seqid);
> + /* There are no good bounds on window size, so just
> + * use a ridiculously large window of 2^31.
> + */
> + limit = oldseq + (1 << 31);
> + if (oldseq < limit) {
> + /* The easy, non-wraparound case */
> + if (oldseq < newseq && newseq < limit)
> + overwrite = true;
> + } else {
> + /* Near wraparound edge */
> + if (oldseq < newseq || newseq < limit)
> + overwrite = true;
> + }

Wouldn't it be simpler to just look at (int32_t)(newseq - oldseq)?

Benny

> + }
> + if (overwrite)
> + memcpy(&old->stateid, &new->stateid, sizeof(new->stateid));
> write_sequnlock(&lo->seqlock);
> }
>