2010-05-20 10:30:27

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 00/22] LAYOUTGET invocation

(My apologies for the 4 patches that went out about 1/2 hour ago. Please ignore those.)

This patch series limits LAYOUTGET invocation to the beginning of the IO paths.

It is intended for the pnfs_submit branch, without reversion in a post_submit branch.

Patches 1-4 revert direct IO. Commit is already broken, and this series breaks them further. The problem is that the direct IO redefines data->wb_req and data->pages, so that it can only work with the pnfs code if we don't look at those fields.

Patches 5-8 do some code cleanup in preperation for the real work.

Patches 9-19 implement the change. NOTE that patch 19 changes the calling convention of the layout drivers commit calls. There is no longer a universal lseg for the commit, instead each nfs_page has an lseg attached, with NULL meaning to go through the MDS.

Patches 20-22 rework the filelayout commit function, and then do some code cleanup this enables.



The basic idea of these patches is as follows:

We attempt to grab a lseg (possibly invoking LAYOUTGET) early in the IO. If we succeed, we refcount and stash it, using it through the rest of the io. If we fail, we revert to straight nfs, even if the area becomes covered by a layout due to other io.

The tricky, though hopefully anomalous, case is when we start without the layout, but have it at this particular stage of the IO. We ignore this for the moment at write_pages, which will cause block and object to issue CB_LAYOUTRECALL. At commit, it is tricky to handle, but since block doesn't use commit, and file needs to handle complicated splitting anyway, I just push all complicated decisions of splitting commit between nfs (for IO started without layout) and pnfs to the driver.

Fred



2010-05-26 17:58:58

by Fred Isaman

[permalink] [raw]
Subject: Re: [PATCH 00/22] LAYOUTGET invocation


On May 26, 2010, at 1:39 PM, Dean Hildebrand wrote:

> Try to remember that this isn't some new feature that we are disabling, or a new way of doing things, this is a primary I/O path. We MUST fix this with the B-list code submission, so why go through the hassle of searching through old patches and tags to find it.
>


To be clear, I am not disabling directIO itself, only directIO's use of pnfs. DirectIO will still be functional, but will use the MDS.

Fred

> If you want to talk about a *REAL* solution, then we need to figure out who broke O_DIRECT and reject their patches until they fix it. You can't submit patches that break a primary I/O path. But again, since we are focused on A-list items, ifdef'ing the code out for now in the B-list branch seems like a reasonable compromise.
>
> Dean
>
> Boaz Harrosh wrote:
>> On 05/25/2010 11:14 PM, Dean Hildebrand wrote:
>>
>>>
>>>> I can send some post_submit patches with the code ifdef'ed out if people would be content with that.
>>>>
>>> Thanks for the background. I would be much happier if you sent patches with the code ifdef'd out, added with the comment in the code regarding which patches you believe introduced the problem.
>>>
>>> Dean
>>>
>>>> Fred
>>>>
>>
>> I disagree. Source code is not a version management system. We have git
>> for that. The code is never lost it is there for eternity in the git
>> tree. We could ask Benny to tag the last branch that had broken directIO
>> as LAST_directIO_VERSION for easy random access at future time.
>>
>> If in the future someone smart wants to forward port the code and fix it
>> then the *right* way to do it is by manual octopus merge at the point of
>> branch.
>> Never, Never uncomment out code that was sitting collecting dust.
>> Manual octopus merge I mean using the two diffs from the two sides of the
>> branch, and replaying one on the other. For instance if at one patch
>> a function was moved, then redo the move of the current function again, not
>> leave the old code as it was before. Let the merge point out the points of
>> friction. Because you see, with commented code, there is never a merge
>> conflict it will always uncomment.
>>
>> And anyway the Kernel people will never accept code in comments. There
>> are out-of-tree gits to do that. So I don't even think it is an option.
>> The pnfs branches are patches that should eventually go upstream. Or
>> are currently the only option for the testing of upstream code.
>>
>> Boaz
>>


2010-05-26 18:53:20

by Dean

[permalink] [raw]
Subject: Re: [PATCH 00/22] LAYOUTGET invocation



Boaz Harrosh wrote:
> On 05/26/2010 08:39 PM, Dean Hildebrand wrote:
>
>> Try to remember that this isn't some new feature that we are disabling,
>> or a new way of doing things, this is a primary I/O path. We MUST fix
>> this with the B-list code submission, so why go through the hassle of
>> searching through old patches and tags to find it.
>>
>
> No, this is no hassle. uncommenting old code and hopping for the best
> is the hassle. (insert here the explanation from previous mail)
>
To answer Fred's statement, I understand the nfs o_direct will still
work, but pNFS must support o_direct in the b-list. O_direct is not a
wierd unused flag, it is very common. Also, I wouldn't be hoping for
the best, I would actually fix it....
>
>> If you want to talk about a *REAL* solution, then we need to figure out
>> who broke O_DIRECT and reject their patches until they fix it. You
>> can't submit patches that break a primary I/O path. But again, since we
>> are focused on A-list items, ifdef'ing the code out for now in the
>> B-list branch seems like a reasonable compromise.
>>
>>
>
> No "ifdef'ing the code out" is never a "reasonable compromise" if you want
> then keep it in a separate clean patch, with "BROKEN:" at subject line and
> committed in a branch that is above the pnfs-all-latest. So it can be rebased
> from time to time, but not included in the regular test scenario, until fixed.
>
> (Which is BTW what it means the keep it out-of-tree, these git games are done
> routinely, every day)
>

My point is that someone's patches broke O_DIRECT, and this is ONLY
acceptable because it is in the B-list. So temporarily having code in
the B-list that is #ifdef'd out really doesn't seem like the worst idea
in the world. But either way, as long as its isn't simiply removed (as
the original patches would have done) and is easy to add back in so that
we can figure out what went wrong and fix it up.
Dean
> Boaz
>
>
>> Dean
>>
>> Boaz Harrosh wrote:
>>
>>> On 05/25/2010 11:14 PM, Dean Hildebrand wrote:
>>>
>>>
>>>>
>>>>
>>>>> I can send some post_submit patches with the code ifdef'ed out if people would be content with that.
>>>>>
>>>>>
>>>>>
>>>> Thanks for the background. I would be much happier if you sent patches
>>>> with the code ifdef'd out, added with the comment in the code regarding
>>>> which patches you believe introduced the problem.
>>>>
>>>> Dean
>>>>
>>>>
>>>>> Fred
>>>>>
>>>>>
>>> I disagree. Source code is not a version management system. We have git
>>> for that. The code is never lost it is there for eternity in the git
>>> tree. We could ask Benny to tag the last branch that had broken directIO
>>> as LAST_directIO_VERSION for easy random access at future time.
>>>
>>> If in the future someone smart wants to forward port the code and fix it
>>> then the *right* way to do it is by manual octopus merge at the point of
>>> branch.
>>> Never, Never uncomment out code that was sitting collecting dust.
>>> Manual octopus merge I mean using the two diffs from the two sides of the
>>> branch, and replaying one on the other. For instance if at one patch
>>> a function was moved, then redo the move of the current function again, not
>>> leave the old code as it was before. Let the merge point out the points of
>>> friction. Because you see, with commented code, there is never a merge
>>> conflict it will always uncomment.
>>>
>>> And anyway the Kernel people will never accept code in comments. There
>>> are out-of-tree gits to do that. So I don't even think it is an option.
>>> The pnfs branches are patches that should eventually go upstream. Or
>>> are currently the only option for the testing of upstream code.
>>>
>>> Boaz
>>>
>>>
>
>

2010-05-25 20:14:01

by Dean

[permalink] [raw]
Subject: Re: [PATCH 00/22] LAYOUTGET invocation



Fred Isaman wrote:
> On May 25, 2010, at 2:27 PM, Dean Hildebrand wrote:
>
>
>> Fred Isaman wrote:
>>
>>> (My apologies for the 4 patches that went out about 1/2 hour ago. Please ignore those.)
>>>
>>> This patch series limits LAYOUTGET invocation to the beginning of the IO paths.
>>>
>>> It is intended for the pnfs_submit branch, without reversion in a post_submit branch.
>>>
>>> Patches 1-4 revert direct IO. Commit is already broken, and this series breaks them further. The problem is that the direct IO redefines data->wb_req and data->pages, so that it can only work with the pnfs code if we don't look at those fields.
>>>
>>>
>> Can you give some history on this? Is it crashing? Has this problem been around for a long time or is a new set of patches causing the problem? Does this affect pNFS O_DIRECT or all O_DIRECT code?
>>
>>
>
> Any use of pnfs_commit from the directIO code will cause a crash.
>
> In my current tree, the breakage is introduced by with pnfs_commit in patch 3ca1c1136, (use of variables first and last), but because of all the rewriting I can't tell how long it has been around, though I suspect since the beginning given the reliance in filelayout_commit (patch a43d8107) on wb_pages. If the directIO avoided commit, then it is not obviously broken.
>
>
>
>> I don't think revert is the right way to go about this. Removing support for O_DIRECT because changes to the non-O_DIRECT path break it would not fly in the mainline, and so I don't see why it would fly here. At the minimum, since O_DIRECT is a B-list feature, I could see it being commented/ifdef'd out for the time being, but completely removing the patches is extremely invasive considering this is a b-list development branch.
>>
>>
>
> The problem is the redefinition of the data->wb_req and wb_pages fields. For directIO to work, these either have to be marked as completely off limits to the pnfs code and the filelayout commit code in particular rewritten, or the directIO (not just the pnfs directIO) needs to be substantially rewritten.
>
> I can send some post_submit patches with the code ifdef'ed out if people would be content with that.
>

Thanks for the background. I would be much happier if you sent patches
with the code ifdef'd out, added with the comment in the code regarding
which patches you believe introduced the problem.

Dean
> Fred
>
>
>
>> Dean
>>
>>
>>> Patches 5-8 do some code cleanup in preperation for the real work.
>>>
>>> Patches 9-19 implement the change. NOTE that patch 19 changes the calling convention of the layout drivers commit calls. There is no longer a universal lseg for the commit, instead each nfs_page has an lseg attached, with NULL meaning to go through the MDS.
>>>
>>> Patches 20-22 rework the filelayout commit function, and then do some code cleanup this enables.
>>>
>>>
>>>
>>> The basic idea of these patches is as follows:
>>>
>>> We attempt to grab a lseg (possibly invoking LAYOUTGET) early in the IO. If we succeed, we refcount and stash it, using it through the rest of the io. If we fail, we revert to straight nfs, even if the area becomes covered by a layout due to other io.
>>>
>>> The tricky, though hopefully anomalous, case is when we start without the layout, but have it at this particular stage of the IO. We ignore this for the moment at write_pages, which will cause block and object to issue CB_LAYOUTRECALL. At commit, it is tricky to handle, but since block doesn't use commit, and file needs to handle complicated splitting anyway, I just push all complicated decisions of splitting commit between nfs (for IO started without layout) and pnfs to the driver.
>>>
>>> Fred
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>>
>
>

2010-05-26 17:47:53

by Dean

[permalink] [raw]
Subject: Re: [PATCH 00/22] LAYOUTGET invocation

Try to remember that this isn't some new feature that we are disabling,
or a new way of doing things, this is a primary I/O path. We MUST fix
this with the B-list code submission, so why go through the hassle of
searching through old patches and tags to find it.

If you want to talk about a *REAL* solution, then we need to figure out
who broke O_DIRECT and reject their patches until they fix it. You
can't submit patches that break a primary I/O path. But again, since we
are focused on A-list items, ifdef'ing the code out for now in the
B-list branch seems like a reasonable compromise.

Dean

Boaz Harrosh wrote:
> On 05/25/2010 11:14 PM, Dean Hildebrand wrote:
>
>>
>>> I can send some post_submit patches with the code ifdef'ed out if people would be content with that.
>>>
>>>
>> Thanks for the background. I would be much happier if you sent patches
>> with the code ifdef'd out, added with the comment in the code regarding
>> which patches you believe introduced the problem.
>>
>> Dean
>>
>>> Fred
>>>
>
> I disagree. Source code is not a version management system. We have git
> for that. The code is never lost it is there for eternity in the git
> tree. We could ask Benny to tag the last branch that had broken directIO
> as LAST_directIO_VERSION for easy random access at future time.
>
> If in the future someone smart wants to forward port the code and fix it
> then the *right* way to do it is by manual octopus merge at the point of
> branch.
> Never, Never uncomment out code that was sitting collecting dust.
> Manual octopus merge I mean using the two diffs from the two sides of the
> branch, and replaying one on the other. For instance if at one patch
> a function was moved, then redo the move of the current function again, not
> leave the old code as it was before. Let the merge point out the points of
> friction. Because you see, with commented code, there is never a merge
> conflict it will always uncomment.
>
> And anyway the Kernel people will never accept code in comments. There
> are out-of-tree gits to do that. So I don't even think it is an option.
> The pnfs branches are patches that should eventually go upstream. Or
> are currently the only option for the testing of upstream code.
>
> Boaz
>

2010-05-25 18:26:54

by Dean

[permalink] [raw]
Subject: Re: [PATCH 00/22] LAYOUTGET invocation



Fred Isaman wrote:
> (My apologies for the 4 patches that went out about 1/2 hour ago. Please ignore those.)
>
> This patch series limits LAYOUTGET invocation to the beginning of the IO paths.
>
> It is intended for the pnfs_submit branch, without reversion in a post_submit branch.
>
> Patches 1-4 revert direct IO. Commit is already broken, and this series breaks them further. The problem is that the direct IO redefines data->wb_req and data->pages, so that it can only work with the pnfs code if we don't look at those fields.
>

Can you give some history on this? Is it crashing? Has this problem
been around for a long time or is a new set of patches causing the
problem? Does this affect pNFS O_DIRECT or all O_DIRECT code?

I don't think revert is the right way to go about this. Removing
support for O_DIRECT because changes to the non-O_DIRECT path break it
would not fly in the mainline, and so I don't see why it would fly
here. At the minimum, since O_DIRECT is a B-list feature, I could see
it being commented/ifdef'd out for the time being, but completely
removing the patches is extremely invasive considering this is a b-list
development branch.

Dean

> Patches 5-8 do some code cleanup in preperation for the real work.
>
> Patches 9-19 implement the change. NOTE that patch 19 changes the calling convention of the layout drivers commit calls. There is no longer a universal lseg for the commit, instead each nfs_page has an lseg attached, with NULL meaning to go through the MDS.
>
> Patches 20-22 rework the filelayout commit function, and then do some code cleanup this enables.
>
>
>
> The basic idea of these patches is as follows:
>
> We attempt to grab a lseg (possibly invoking LAYOUTGET) early in the IO. If we succeed, we refcount and stash it, using it through the rest of the io. If we fail, we revert to straight nfs, even if the area becomes covered by a layout due to other io.
>
> The tricky, though hopefully anomalous, case is when we start without the layout, but have it at this particular stage of the IO. We ignore this for the moment at write_pages, which will cause block and object to issue CB_LAYOUTRECALL. At commit, it is tricky to handle, but since block doesn't use commit, and file needs to handle complicated splitting anyway, I just push all complicated decisions of splitting commit between nfs (for IO started without layout) and pnfs to the driver.
>
> Fred
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2010-05-26 18:13:13

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [PATCH 00/22] LAYOUTGET invocation

On 05/26/2010 08:39 PM, Dean Hildebrand wrote:
> Try to remember that this isn't some new feature that we are disabling,
> or a new way of doing things, this is a primary I/O path. We MUST fix
> this with the B-list code submission, so why go through the hassle of
> searching through old patches and tags to find it.

No, this is no hassle. uncommenting old code and hopping for the best
is the hassle. (insert here the explanation from previous mail)

>
> If you want to talk about a *REAL* solution, then we need to figure out
> who broke O_DIRECT and reject their patches until they fix it. You
> can't submit patches that break a primary I/O path. But again, since we
> are focused on A-list items, ifdef'ing the code out for now in the
> B-list branch seems like a reasonable compromise.
>

No "ifdef'ing the code out" is never a "reasonable compromise" if you want
then keep it in a separate clean patch, with "BROKEN:" at subject line and
committed in a branch that is above the pnfs-all-latest. So it can be rebased
from time to time, but not included in the regular test scenario, until fixed.

(Which is BTW what it means the keep it out-of-tree, these git games are done
routinely, every day)

Boaz

> Dean
>
> Boaz Harrosh wrote:
>> On 05/25/2010 11:14 PM, Dean Hildebrand wrote:
>>
>>>
>>>> I can send some post_submit patches with the code ifdef'ed out if people would be content with that.
>>>>
>>>>
>>> Thanks for the background. I would be much happier if you sent patches
>>> with the code ifdef'd out, added with the comment in the code regarding
>>> which patches you believe introduced the problem.
>>>
>>> Dean
>>>
>>>> Fred
>>>>
>>
>> I disagree. Source code is not a version management system. We have git
>> for that. The code is never lost it is there for eternity in the git
>> tree. We could ask Benny to tag the last branch that had broken directIO
>> as LAST_directIO_VERSION for easy random access at future time.
>>
>> If in the future someone smart wants to forward port the code and fix it
>> then the *right* way to do it is by manual octopus merge at the point of
>> branch.
>> Never, Never uncomment out code that was sitting collecting dust.
>> Manual octopus merge I mean using the two diffs from the two sides of the
>> branch, and replaying one on the other. For instance if at one patch
>> a function was moved, then redo the move of the current function again, not
>> leave the old code as it was before. Let the merge point out the points of
>> friction. Because you see, with commented code, there is never a merge
>> conflict it will always uncomment.
>>
>> And anyway the Kernel people will never accept code in comments. There
>> are out-of-tree gits to do that. So I don't even think it is an option.
>> The pnfs branches are patches that should eventually go upstream. Or
>> are currently the only option for the testing of upstream code.
>>
>> Boaz
>>


2010-05-25 19:03:34

by Fred Isaman

[permalink] [raw]
Subject: Re: [PATCH 00/22] LAYOUTGET invocation


On May 25, 2010, at 2:27 PM, Dean Hildebrand wrote:

>
>
> Fred Isaman wrote:
>> (My apologies for the 4 patches that went out about 1/2 hour ago. Please ignore those.)
>>
>> This patch series limits LAYOUTGET invocation to the beginning of the IO paths.
>>
>> It is intended for the pnfs_submit branch, without reversion in a post_submit branch.
>>
>> Patches 1-4 revert direct IO. Commit is already broken, and this series breaks them further. The problem is that the direct IO redefines data->wb_req and data->pages, so that it can only work with the pnfs code if we don't look at those fields.
>>
>
> Can you give some history on this? Is it crashing? Has this problem been around for a long time or is a new set of patches causing the problem? Does this affect pNFS O_DIRECT or all O_DIRECT code?
>

Any use of pnfs_commit from the directIO code will cause a crash.

In my current tree, the breakage is introduced by with pnfs_commit in patch 3ca1c1136, (use of variables first and last), but because of all the rewriting I can't tell how long it has been around, though I suspect since the beginning given the reliance in filelayout_commit (patch a43d8107) on wb_pages. If the directIO avoided commit, then it is not obviously broken.


> I don't think revert is the right way to go about this. Removing support for O_DIRECT because changes to the non-O_DIRECT path break it would not fly in the mainline, and so I don't see why it would fly here. At the minimum, since O_DIRECT is a B-list feature, I could see it being commented/ifdef'd out for the time being, but completely removing the patches is extremely invasive considering this is a b-list development branch.
>

The problem is the redefinition of the data->wb_req and wb_pages fields. For directIO to work, these either have to be marked as completely off limits to the pnfs code and the filelayout commit code in particular rewritten, or the directIO (not just the pnfs directIO) needs to be substantially rewritten.

I can send some post_submit patches with the code ifdef'ed out if people would be content with that.

Fred


> Dean
>
>> Patches 5-8 do some code cleanup in preperation for the real work.
>>
>> Patches 9-19 implement the change. NOTE that patch 19 changes the calling convention of the layout drivers commit calls. There is no longer a universal lseg for the commit, instead each nfs_page has an lseg attached, with NULL meaning to go through the MDS.
>>
>> Patches 20-22 rework the filelayout commit function, and then do some code cleanup this enables.
>>
>>
>>
>> The basic idea of these patches is as follows:
>>
>> We attempt to grab a lseg (possibly invoking LAYOUTGET) early in the IO. If we succeed, we refcount and stash it, using it through the rest of the io. If we fail, we revert to straight nfs, even if the area becomes covered by a layout due to other io.
>>
>> The tricky, though hopefully anomalous, case is when we start without the layout, but have it at this particular stage of the IO. We ignore this for the moment at write_pages, which will cause block and object to issue CB_LAYOUTRECALL. At commit, it is tricky to handle, but since block doesn't use commit, and file needs to handle complicated splitting anyway, I just push all complicated decisions of splitting commit between nfs (for IO started without layout) and pnfs to the driver.
>>
>> Fred
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>


2010-05-26 08:43:11

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [PATCH 00/22] LAYOUTGET invocation

On 05/25/2010 11:14 PM, Dean Hildebrand wrote:
>
>
>>
>> I can send some post_submit patches with the code ifdef'ed out if people would be content with that.
>>
>
> Thanks for the background. I would be much happier if you sent patches
> with the code ifdef'd out, added with the comment in the code regarding
> which patches you believe introduced the problem.
>
> Dean
>> Fred

I disagree. Source code is not a version management system. We have git
for that. The code is never lost it is there for eternity in the git
tree. We could ask Benny to tag the last branch that had broken directIO
as LAST_directIO_VERSION for easy random access at future time.

If in the future someone smart wants to forward port the code and fix it
then the *right* way to do it is by manual octopus merge at the point of
branch.
Never, Never uncomment out code that was sitting collecting dust.
Manual octopus merge I mean using the two diffs from the two sides of the
branch, and replaying one on the other. For instance if at one patch
a function was moved, then redo the move of the current function again, not
leave the old code as it was before. Let the merge point out the points of
friction. Because you see, with commented code, there is never a merge
conflict it will always uncomment.

And anyway the Kernel people will never accept code in comments. There
are out-of-tree gits to do that. So I don't even think it is an option.
The pnfs branches are patches that should eventually go upstream. Or
are currently the only option for the testing of upstream code.

Boaz

2010-05-20 10:30:37

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 08/22] pnfs: track the number of outstanding commits

Commit 71d0a6112a3 "NFS: Fix an unstable write data integrity race"
adds locking which is incompatible with the current file layout commit code,
which splits the commit into several RPCs cloned from the original.
Add a counter so layout driver can properly unlock only once.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/nfs4filelayout.c | 3 +++
fs/nfs/write.c | 19 ++++++++++++++++---
include/linux/nfs_xdr.h | 2 ++
3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index c96dd0e..789706e 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -518,6 +518,9 @@ filelayout_clone_write_data(struct nfs_write_data *old)
new = nfs_commitdata_alloc();
if (!new)
goto out;
+ kref_init(&new->refcount);
+ new->parent = old;
+ kref_get(&old->refcount);
new->inode = old->inode;
new->cred = old->cred;
new->args.offset = 0;
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index a4c95a0..937da85 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1369,7 +1369,8 @@ static int nfs_commit_rpcsetup(struct list_head *head,
data->res.fattr = &data->fattr;
data->res.verf = &data->verf;
nfs_fattr_init(&data->fattr);
-
+ kref_init(&data->refcount);
+ data->parent = NULL;
data->args.context = first->wb_context; /* used by commit done */

return pnfs_initiate_commit(data, NFS_CLIENT(inode), &nfs_commit_ops,
@@ -1421,6 +1422,19 @@ static void nfs_commit_done(struct rpc_task *task, void *calldata)
return;
}

+static inline void nfs_commit_cleanup(struct kref *kref)
+{
+ struct nfs_write_data *data;
+
+ data = container_of(kref, struct nfs_write_data, refcount);
+ /* Clear lock only when all cloned commits are finished */
+ if (data->parent)
+ kref_put(&data->parent->refcount, nfs_commit_cleanup);
+ else
+ nfs_commit_clear_lock(NFS_I(data->inode));
+ nfs_commitdata_release(data);
+}
+
static void nfs_commit_release(void *calldata)
{
struct nfs_write_data *data = calldata;
@@ -1458,8 +1472,7 @@ static void nfs_commit_release(void *calldata)
next:
nfs_clear_page_tag_locked(req);
}
- nfs_commit_clear_lock(NFS_I(data->inode));
- nfs_commitdata_release(calldata);
+ kref_put(&data->refcount, nfs_commit_cleanup);
}

static const struct rpc_call_ops nfs_commit_ops = {
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index dee1c8c..864eac1 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1005,6 +1005,8 @@ struct nfs_read_data {
};

struct nfs_write_data {
+ struct kref refcount; /* For pnfs commit splitting */
+ struct nfs_write_data *parent; /* For pnfs commit splitting */
int flags;
struct rpc_task task;
struct inode *inode;
--
1.6.6.1


2010-05-20 10:30:38

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 10/22] pnfs_submit: expose pnfs_update_layout, put_lseg, and get_lseg functions

These will be used in the generic code. Set so they will compile away to
nothing if CONFIG_NFS_V4_1 not set.

This requires kref_put to be under lock. See rule 3 of Documentation/kref.txt

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/pnfs.c | 48 ++++++++++++++++++++++++++++++++++--------------
fs/nfs/pnfs.h | 44 +++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 77 insertions(+), 15 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 055f040..d1693a4 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -405,7 +405,25 @@ destroy_lseg(struct kref *kref)
PNFS_LD_IO_OPS(lseg->layout)->free_lseg(lseg);
}

-static inline void
+static void
+put_lseg_locked(struct pnfs_layout_segment *lseg)
+{
+ bool do_wake_up;
+ struct nfs_inode *nfsi;
+
+ if (!lseg)
+ return;
+
+ dprintk("%s: lseg %p ref %d valid %d\n", __func__, lseg,
+ atomic_read(&lseg->kref.refcount), lseg->valid);
+ do_wake_up = !lseg->valid;
+ nfsi = PNFS_NFS_INODE(lseg->layout);
+ kref_put(&lseg->kref, destroy_lseg);
+ if (do_wake_up)
+ wake_up(&nfsi->lo_waitq);
+}
+
+void
put_lseg(struct pnfs_layout_segment *lseg)
{
bool do_wake_up;
@@ -418,7 +436,9 @@ put_lseg(struct pnfs_layout_segment *lseg)
atomic_read(&lseg->kref.refcount), lseg->valid);
do_wake_up = !lseg->valid;
nfsi = PNFS_NFS_INODE(lseg->layout);
+ spin_lock(&nfsi->lo_lock);
kref_put(&lseg->kref, destroy_lseg);
+ spin_unlock(&nfsi->lo_lock);
if (do_wake_up)
wake_up(&nfsi->lo_waitq);
}
@@ -640,7 +660,7 @@ pnfs_free_layout(struct pnfs_layout_type *lo,
lseg, lseg->range.iomode, lseg->range.offset,
lseg->range.length);
list_del(&lseg->fi_list);
- put_lseg(lseg);
+ put_lseg_locked(lseg);
}

dprintk("%s:Return\n", __func__);
@@ -1000,7 +1020,7 @@ pnfs_has_layout(struct pnfs_layout_type *lo,
(lseg->valid || !only_valid)) {
ret = lseg;
if (take_ref)
- kref_get(&ret->kref);
+ get_lseg(ret);
break;
}
if (cmp_layout(range, &lseg->range) > 0)
@@ -1033,7 +1053,7 @@ void drain_layoutreturns(struct pnfs_layout_type *lo)
* returned to the caller.
*/
int
-pnfs_update_layout(struct inode *ino,
+_pnfs_update_layout(struct inode *ino,
struct nfs_open_context *ctx,
u64 count,
loff_t pos,
@@ -1062,9 +1082,9 @@ pnfs_update_layout(struct inode *ino,
/* Check to see if the layout for the given range already exists */
lseg = pnfs_has_layout(lo, &arg, take_ref, !take_ref);
if (lseg && !lseg->valid) {
- spin_unlock(&nfsi->lo_lock);
if (take_ref)
- put_lseg(lseg);
+ put_lseg_locked(lseg);
+ spin_unlock(&nfsi->lo_lock);
for (;;) {
prepare_to_wait(&nfsi->lo_waitq, &__wait,
TASK_KILLABLE);
@@ -1075,7 +1095,7 @@ pnfs_update_layout(struct inode *ino,
dprintk("%s: invalid lseg %p ref %d\n", __func__,
lseg, atomic_read(&lseg->kref.refcount)-1);
if (take_ref)
- put_lseg(lseg);
+ put_lseg_locked(lseg);
if (signal_pending(current)) {
lseg = NULL;
result = -ERESTARTSYS;
@@ -1262,7 +1282,7 @@ pnfs_layout_process(struct nfs4_pnfs_layoutget *lgp)
init_lseg(lo, lseg);
lseg->range = res->lseg;
if (lgp->lsegpp) {
- kref_get(&lseg->kref);
+ get_lseg(lseg);
*lgp->lsegpp = lseg;
}

@@ -1380,7 +1400,7 @@ pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
readahead_range(inode, pages, &loff, &count);

if (count > 0) {
- status = pnfs_update_layout(inode, ctx, count,
+ status = _pnfs_update_layout(inode, ctx, count,
loff, IOMODE_READ, NULL);
dprintk("%s virt update returned %d\n", __func__, status);
if (status != 0)
@@ -1438,7 +1458,7 @@ pnfs_update_layout_commit(struct inode *inode,
if (start == 0 && count == 0)
count = NFS4_MAX_UINT64;

- status = pnfs_update_layout(inode, nfs_page->wb_context,
+ status = _pnfs_update_layout(inode, nfs_page->wb_context,
count,
start,
IOMODE_RW,
@@ -1538,7 +1558,7 @@ pnfs_file_write(struct file *filp, const char __user *buf, size_t count,
goto out;

/* Retrieve and set layout if not allready cached */
- status = pnfs_update_layout(inode,
+ status = _pnfs_update_layout(inode,
context,
count,
*pos,
@@ -1580,7 +1600,7 @@ pnfs_writepages(struct nfs_write_data *wdata, int how)
args->offset);

/* Retrieve and set layout if not allready cached */
- status = pnfs_update_layout(inode,
+ status = _pnfs_update_layout(inode,
args->context,
args->count,
args->offset,
@@ -1681,7 +1701,7 @@ pnfs_readpages(struct nfs_read_data *rdata)
args->offset);

/* Retrieve and set layout if not allready cached */
- status = pnfs_update_layout(inode,
+ status = _pnfs_update_layout(inode,
args->context,
args->count,
args->offset,
@@ -1845,7 +1865,7 @@ pnfs_commit(struct nfs_write_data *data, int sync)
new one. If it was recalled we better commit the data first
before returning it, otherwise the data needs to be rewritten,
either with a new layout or to the MDS */
- result = pnfs_update_layout(data->inode,
+ result = _pnfs_update_layout(data->inode,
NULL,
count,
first->wb_offset,
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 5e9b06b..242abf5 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -31,7 +31,8 @@ extern int pnfs4_proc_layoutreturn(struct nfs4_pnfs_layoutreturn *lrp, bool wait
/* pnfs.c */
extern const nfs4_stateid zero_stateid;

-int pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
+void put_lseg(struct pnfs_layout_segment *lseg);
+int _pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
u64 count, loff_t pos, enum pnfs_iomode access_type,
struct pnfs_layout_segment **lsegpp);

@@ -80,6 +81,12 @@ static inline int lo_fail_bit(u32 iomode)
NFS_INO_RW_LAYOUT_FAILED : NFS_INO_RO_LAYOUT_FAILED;
}

+static inline void get_lseg(struct pnfs_layout_segment *lseg)
+{
+ if (lseg)
+ kref_get(&lseg->kref);
+}
+
/* Return true if a layout driver is being used for this mountpoint */
static inline int pnfs_enabled_sb(struct nfs_server *nfss)
{
@@ -169,6 +176,23 @@ static inline int pnfs_return_layout(struct inode *ino,
return 0;
}

+static inline int pnfs_update_layout(struct inode *ino,
+ struct nfs_open_context *ctx,
+ u64 count, loff_t pos, enum pnfs_iomode access_type,
+ struct pnfs_layout_segment **lsegpp)
+{
+ struct nfs_server *nfss = NFS_SERVER(ino);
+
+ if (pnfs_enabled_sb(nfss))
+ return _pnfs_update_layout(ino, ctx, count, pos,
+ access_type, lsegpp);
+ else {
+ if (lsegpp)
+ *lsegpp = NULL;
+ return 0;
+ }
+}
+
static inline int pnfs_get_write_status(struct nfs_write_data *data)
{
return data->pdata.pnfs_error;
@@ -189,6 +213,24 @@ static inline int pnfs_use_rpc(struct nfs_server *nfss)

#else /* CONFIG_NFS_V4_1 */

+static inline void get_lseg(struct pnfs_layout_segment *lseg)
+{
+}
+
+static inline void put_lseg(struct pnfs_layout_segment *lseg)
+{
+}
+
+static inline int
+pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
+ u64 count, loff_t pos, enum pnfs_iomode access_type,
+ struct pnfs_layout_segment **lsegpp)
+{
+ if (lsegpp)
+ *lsegpp = NULL;
+ return 0;
+}
+
static inline enum pnfs_try_status
pnfs_try_to_read_data(struct nfs_read_data *data,
const struct rpc_call_ops *call_ops)
--
1.6.6.1


2010-05-20 10:30:37

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 09/22] pnfs_submit: mandate basic io path operations for layout drivers

Mandate read_pagelist, write_pagelist, and commit. This will help
void needless checks in the io path.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/pnfs.c | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 8dbf740..055f040 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -259,6 +259,14 @@ pnfs_register_layoutdriver(struct pnfs_layoutdriver_type *ld_type)
return NULL;
}

+ if (!io_ops->read_pagelist || !io_ops->write_pagelist ||
+ !io_ops->commit) {
+ printk(KERN_ERR "%s Layout driver must provide "
+ "read_pagelist, write_pagelist, and commit.\n",
+ __func__);
+ return NULL;
+ }
+
pnfs_mod = kmalloc(sizeof(struct pnfs_module), GFP_KERNEL);
if (pnfs_mod != NULL) {
dprintk("%s Registering id:%u name:%s\n",
--
1.6.6.1


2010-05-20 10:30:39

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 11/22] pnfs_submit: stash and refcount lseg in read path

Note we are not using it yet, but refcounting should be accurate.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/pagelist.c | 11 +++++++++--
fs/nfs/pnfs.c | 4 +++-
fs/nfs/read.c | 9 +++++++--
fs/nfs/write.c | 2 +-
include/linux/nfs_page.h | 5 ++++-
5 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index bfc9da7..92fedbb 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -20,6 +20,7 @@
#include <linux/nfs_mount.h>

#include "internal.h"
+#include "pnfs.h"

static struct kmem_cache *nfs_page_cachep;

@@ -56,7 +57,8 @@ nfs_page_free(struct nfs_page *p)
struct nfs_page *
nfs_create_request(struct nfs_open_context *ctx, struct inode *inode,
struct page *page,
- unsigned int offset, unsigned int count)
+ unsigned int offset, unsigned int count,
+ struct pnfs_layout_segment *lseg)
{
struct nfs_page *req;

@@ -86,6 +88,8 @@ nfs_create_request(struct nfs_open_context *ctx, struct inode *inode,
req->wb_bytes = count;
req->wb_context = get_nfs_open_context(ctx);
kref_init(&req->wb_kref);
+ req->wb_lseg = lseg;
+ get_lseg(lseg);
return req;
}

@@ -156,9 +160,12 @@ void nfs_clear_request(struct nfs_page *req)
put_nfs_open_context(ctx);
req->wb_context = NULL;
}
+ if (req->wb_lseg != NULL) {
+ put_lseg(req->wb_lseg);
+ req->wb_lseg = NULL;
+ }
}

-
/**
* nfs_release_request - Release the count on an NFS read/write request
* @req: request to release
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index d1693a4..ddc4578 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1392,6 +1392,7 @@ pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
pgio->pg_iswrite = 0;
pgio->pg_boundary = 0;
pgio->pg_test = NULL;
+ pgio->pg_lseg = NULL;

if (!pnfs_enabled_sb(nfss))
return;
@@ -1401,7 +1402,8 @@ pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,

if (count > 0) {
status = _pnfs_update_layout(inode, ctx, count,
- loff, IOMODE_READ, NULL);
+ loff, IOMODE_READ,
+ &pgio->pg_lseg);
dprintk("%s virt update returned %d\n", __func__, status);
if (status != 0)
return;
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 2670d2e..6473795 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -121,11 +121,14 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
LIST_HEAD(one_request);
struct nfs_page *new;
unsigned int len;
+ struct pnfs_layout_segment *lseg;

len = nfs_page_length(page);
if (len == 0)
return nfs_return_empty_page(page);
- new = nfs_create_request(ctx, inode, page, 0, len);
+ pnfs_update_layout(inode, ctx, NFS4_MAX_UINT64, 0, IOMODE_READ, &lseg);
+ new = nfs_create_request(ctx, inode, page, 0, len, lseg);
+ put_lseg(lseg);
if (IS_ERR(new)) {
unlock_page(page);
return PTR_ERR(new);
@@ -606,7 +609,8 @@ readpage_async_filler(void *data, struct page *page)
if (len == 0)
return nfs_return_empty_page(page);

- new = nfs_create_request(desc->ctx, inode, page, 0, len);
+ new = nfs_create_request(desc->ctx, inode, page, 0, len,
+ desc->pgio->pg_lseg);
if (IS_ERR(new))
goto out_error;

@@ -673,6 +677,7 @@ int nfs_readpages(struct file *filp, struct address_space *mapping,
ret = read_cache_pages(mapping, pages, readpage_async_filler, &desc);

nfs_pageio_complete(&pgio);
+ put_lseg(pgio.pg_lseg);
npages = (pgio.pg_bytes_written + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
nfs_add_stats(inode, NFSIOS_READPAGES, npages);
read_complete:
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 937da85..523ceb4 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -653,7 +653,7 @@ static struct nfs_page * nfs_setup_write_request(struct nfs_open_context* ctx,
req = nfs_try_to_update_request(inode, page, offset, bytes);
if (req != NULL)
goto out;
- req = nfs_create_request(ctx, inode, page, offset, bytes);
+ req = nfs_create_request(ctx, inode, page, offset, bytes, NULL);
if (IS_ERR(req))
goto out;
error = nfs_inode_add_request(inode, req);
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index d04ebb2..18a455c 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -48,6 +48,7 @@ struct nfs_page {
struct kref wb_kref; /* reference count */
unsigned long wb_flags;
struct nfs_writeverf wb_verf; /* Commit cookie */
+ struct pnfs_layout_segment *wb_lseg; /* Pnfs layout info */
};

struct nfs_pageio_descriptor {
@@ -61,6 +62,7 @@ struct nfs_pageio_descriptor {
int (*pg_doio)(struct inode *, struct list_head *, unsigned int, size_t, int);
int pg_ioflags;
int pg_error;
+ struct pnfs_layout_segment *pg_lseg;
#ifdef CONFIG_NFS_V4_1
int pg_iswrite;
int pg_boundary;
@@ -74,7 +76,8 @@ extern struct nfs_page *nfs_create_request(struct nfs_open_context *ctx,
struct inode *inode,
struct page *page,
unsigned int offset,
- unsigned int count);
+ unsigned int count,
+ struct pnfs_layout_segment *lseg);
extern void nfs_clear_request(struct nfs_page *req);
extern void nfs_release_request(struct nfs_page *req);

--
1.6.6.1


2010-05-20 10:30:40

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 12/22] pnfs_submit: read path changeover

Change readpages path to only call LAYOUTGET once.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/pagelist.c | 2 ++
fs/nfs/pnfs.c | 37 +++++++------------------------------
fs/nfs/pnfs.h | 25 ++++++++++++++++---------
3 files changed, 25 insertions(+), 39 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 92fedbb..19ffdc5 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -259,6 +259,8 @@ static int nfs_can_coalesce_requests(struct nfs_page *prev,
return 0;
if (prev->wb_pgbase + prev->wb_bytes != PAGE_CACHE_SIZE)
return 0;
+ if (req->wb_lseg != prev->wb_lseg)
+ return 0;
#ifdef CONFIG_NFS_V4_1
if (pgio->pg_test && !pgio->pg_test(pgio, prev, req))
return 0;
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index ddc4578..07a8c33 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1690,7 +1690,7 @@ pnfs_readpages(struct nfs_read_data *rdata)
{
struct nfs_readargs *args = &rdata->args;
struct inode *inode = rdata->inode;
- int numpages, status, pgcount, temp;
+ int numpages, pgcount, temp;
struct nfs_server *nfss = NFS_SERVER(inode);
struct nfs_inode *nfsi = NFS_I(inode);
struct pnfs_layout_segment *lseg;
@@ -1702,19 +1702,8 @@ pnfs_readpages(struct nfs_read_data *rdata)
args->count,
args->offset);

- /* Retrieve and set layout if not allready cached */
- status = _pnfs_update_layout(inode,
- args->context,
- args->count,
- args->offset,
- IOMODE_READ,
- &lseg);
- if (status) {
- dprintk("%s: Updating layout failed (%d), retry with NFS \n",
- __func__, status);
- trypnfs = PNFS_NOT_ATTEMPTED;
- goto out;
- }
+ lseg = rdata->req->wb_lseg;
+ get_lseg(lseg);

/* Determine number of pages. */
pgcount = args->pgbase + args->count;
@@ -1741,7 +1730,6 @@ pnfs_readpages(struct nfs_read_data *rdata)
rdata->pdata.lseg = NULL;
put_lseg(lseg);
}
- out:
dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
return trypnfs;
}
@@ -1750,21 +1738,10 @@ enum pnfs_try_status
_pnfs_try_to_read_data(struct nfs_read_data *data,
const struct rpc_call_ops *call_ops)
{
- struct inode *ino = data->inode;
- struct nfs_server *nfss = NFS_SERVER(ino);
-
- dprintk("--> %s\n", __func__);
- /* Only create an rpc request if utilizing NFSv4 I/O */
- if (!pnfs_enabled_sb(nfss) ||
- !nfss->pnfs_curr_ld->ld_io_ops->read_pagelist) {
- dprintk("<-- %s: not using pnfs\n", __func__);
- return PNFS_NOT_ATTEMPTED;
- } else {
- dprintk("%s: Utilizing pNFS I/O\n", __func__);
- data->pdata.call_ops = call_ops;
- data->pdata.pnfs_error = 0;
- return pnfs_readpages(data);
- }
+ dprintk("%s: Utilizing pNFS I/O\n", __func__);
+ data->pdata.call_ops = call_ops;
+ data->pdata.pnfs_error = 0;
+ return pnfs_readpages(data);
}

enum pnfs_try_status
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 242abf5..5201973 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -93,22 +93,29 @@ static inline int pnfs_enabled_sb(struct nfs_server *nfss)
return nfss->pnfs_curr_ld != NULL;
}

+static inline void _pnfs_clear_lseg_from_pages(struct list_head *head)
+{
+ struct nfs_page *req;
+
+ list_for_each_entry(req, head, wb_list) {
+ put_lseg(req->wb_lseg);
+ req->wb_lseg = NULL;
+ }
+}
+
static inline enum pnfs_try_status
pnfs_try_to_read_data(struct nfs_read_data *data,
const struct rpc_call_ops *call_ops)
{
- struct inode *inode = data->inode;
- struct nfs_server *nfss = NFS_SERVER(inode);
enum pnfs_try_status ret;

- /* FIXME: read_pagelist should probably be mandated */
- if (PNFS_EXISTS_LDIO_OP(nfss, read_pagelist))
- ret = _pnfs_try_to_read_data(data, call_ops);
- else
- ret = PNFS_NOT_ATTEMPTED;
-
+ if (!data->req->wb_lseg)
+ return PNFS_NOT_ATTEMPTED;
+ ret = _pnfs_try_to_read_data(data, call_ops);
if (ret == PNFS_ATTEMPTED)
- nfs_inc_stats(inode, NFSIOS_PNFS_READ);
+ nfs_inc_stats(data->inode, NFSIOS_PNFS_READ);
+ else
+ _pnfs_clear_lseg_from_pages(&data->pages);
return ret;
}

--
1.6.6.1


2010-05-20 10:30:43

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 15/22] pnfs_submit: remove pnfs_file_operations

pnfs_writepages is useful, but not necessary, for determining size
parameters for LAYUTGET.

Also, the pnfs_file_operations were getting out of sync with
nfs_file_operations (see commits e1ebfd33be068 and bf40d3435caf49369).

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/file.c | 24 ------------------------
fs/nfs/nfs4proc.c | 1 -
fs/nfs/pnfs.c | 35 -----------------------------------
fs/nfs/pnfs.h | 1 -
include/linux/nfs_fs.h | 3 ---
5 files changed, 0 insertions(+), 64 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 1479289..80e7dc2 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -82,30 +82,6 @@ const struct file_operations nfs_file_operations = {
.setlease = nfs_setlease,
};

-#ifdef CONFIG_NFS_V4_1
-const struct file_operations pnfs_file_operations = {
- .llseek = nfs_file_llseek,
- .read = do_sync_read,
- .write = pnfs_file_write,
- .aio_read = nfs_file_read,
- .aio_write = nfs_file_write,
-#ifdef CONFIG_MMU
- .mmap = nfs_file_mmap,
-#else
- .mmap = generic_file_mmap,
-#endif
- .open = nfs_file_open,
- .flush = nfs_file_flush,
- .release = nfs_file_release,
- .fsync = nfs_file_fsync,
- .lock = nfs_lock,
- .flock = nfs_flock,
- .splice_read = nfs_file_splice_read,
- .check_flags = nfs_check_flags,
- .setlease = nfs_setlease,
-};
-#endif /* CONFIG_NFS_V4_1 */
-
const struct inode_operations nfs_file_inode_operations = {
.permission = nfs_permission,
.getattr = nfs_getattr,
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 8b375a7..9786391 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5935,7 +5935,6 @@ pnfs_v4_clientops_init(void)
struct nfs_rpc_ops *p = (struct nfs_rpc_ops *)&pnfs_v4_clientops;

memcpy(p, &nfs_v4_clientops, sizeof(*p));
- p->file_ops = &pnfs_file_operations;
p->setattr = pnfs4_proc_setattr;
p->read_done = pnfs4_read_done;
p->write_setup = pnfs4_proc_write_setup;
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 07a8c33..a542601 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1539,41 +1539,6 @@ pnfs_writeback_done(struct nfs_write_data *data)
}

/*
- * Obtain a layout for the the write range, and call do_sync_write.
- *
- * Unlike the read path which can wait until page coalescing
- * (pnfs_pageio_init_read) to get a layout, the write path discards the
- * request range to form the address_mapping - so we get a layout in
- * the file operations write method.
- *
- * If pnfs_update_layout fails, pages will be coalesced for MDS I/O.
- */
-ssize_t
-pnfs_file_write(struct file *filp, const char __user *buf, size_t count,
- loff_t *pos)
-{
- struct inode *inode = filp->f_dentry->d_inode;
- struct nfs_open_context *context = filp->private_data;
- int status;
-
- if (!pnfs_enabled_sb(NFS_SERVER(inode)))
- goto out;
-
- /* Retrieve and set layout if not allready cached */
- status = _pnfs_update_layout(inode,
- context,
- count,
- *pos,
- IOMODE_RW,
- NULL);
- if (status)
- dprintk("%s: Unable to get a layout for %Zu@%llu iomode %d)\n",
- __func__, count, *pos, IOMODE_RW);
-out:
- return do_sync_write(filp, buf, count, pos);
-}
-
-/*
* Call the appropriate parallel I/O subsystem write function.
* If no I/O device driver exists, or one does match the returned
* fstype, then return a positive status for regular NFS processing.
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 5201973..bec3c49 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -59,7 +59,6 @@ void pnfs_pageio_init_read(struct nfs_pageio_descriptor *, struct inode *,
struct nfs_open_context *, struct list_head *);
void pnfs_pageio_init_write(struct nfs_pageio_descriptor *, struct inode *);
void pnfs_update_layout_commit(struct inode *, struct list_head *, pgoff_t, unsigned int);
-ssize_t pnfs_file_write(struct file *, const char __user *, size_t, loff_t *);
void pnfs_get_layout_done(struct nfs4_pnfs_layoutget *, int rpc_status);
int pnfs_layout_process(struct nfs4_pnfs_layoutget *lgp);
void pnfs_layout_release(struct pnfs_layout_type *, atomic_t *,
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 9d41821..16acf96 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -402,9 +402,6 @@ extern const struct inode_operations nfs3_file_inode_operations;
#endif /* CONFIG_NFS_V3 */
extern const struct file_operations nfs_file_operations;
extern const struct address_space_operations nfs_file_aops;
-#ifdef CONFIG_NFS_V4_1
-extern const struct file_operations pnfs_file_operations;
-#endif /* CONFIG_NFS_V4_1 */

static inline struct nfs_open_context *nfs_file_open_context(struct file *filp)
{
--
1.6.6.1


2010-05-20 10:30:44

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 17/22] pnfs_submit: remove pnfs_writepages LAYOUTGET invocation

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/pnfs.c | 37 +++++++------------------------------
fs/nfs/pnfs.h | 15 ++++++---------
2 files changed, 13 insertions(+), 39 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 5e2dad8..5ccd406 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1515,7 +1515,7 @@ pnfs_writepages(struct nfs_write_data *wdata, int how)
{
struct nfs_writeargs *args = &wdata->args;
struct inode *inode = wdata->inode;
- int numpages, status;
+ int numpages;
enum pnfs_try_status trypnfs;
struct nfs_server *nfss = NFS_SERVER(inode);
struct nfs_inode *nfsi = NFS_I(inode);
@@ -1527,19 +1527,8 @@ pnfs_writepages(struct nfs_write_data *wdata, int how)
args->count,
args->offset);

- /* Retrieve and set layout if not allready cached */
- status = _pnfs_update_layout(inode,
- args->context,
- args->count,
- args->offset,
- IOMODE_RW,
- &lseg);
- if (status) {
- dprintk("%s: Updating layout failed (%d), retry with NFS \n",
- __func__, status);
- trypnfs = PNFS_NOT_ATTEMPTED; /* retry with nfs I/O */
- goto out;
- }
+ lseg = wdata->req->wb_lseg;
+ get_lseg(lseg);

/* Determine number of pages
*/
@@ -1567,7 +1556,6 @@ pnfs_writepages(struct nfs_write_data *wdata, int how)
wdata->pdata.lseg = NULL;
put_lseg(lseg);
}
-out:
dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
return trypnfs;
}
@@ -1674,22 +1662,11 @@ enum pnfs_try_status
_pnfs_try_to_write_data(struct nfs_write_data *data,
const struct rpc_call_ops *call_ops, int how)
{
- struct inode *ino = data->inode;
- struct nfs_server *nfss = NFS_SERVER(ino);
-
dprintk("--> %s\n", __func__);
- /* Only create an rpc request if utilizing NFSv4 I/O */
- if (!pnfs_enabled_sb(nfss) ||
- !nfss->pnfs_curr_ld->ld_io_ops->write_pagelist) {
- dprintk("<-- %s: not using pnfs\n", __func__);
- return PNFS_NOT_ATTEMPTED;
- } else {
- dprintk("%s: Utilizing pNFS I/O\n", __func__);
- data->pdata.call_ops = call_ops;
- data->pdata.pnfs_error = 0;
- data->pdata.how = how;
- return pnfs_writepages(data, how);
- }
+ data->pdata.call_ops = call_ops;
+ data->pdata.pnfs_error = 0;
+ data->pdata.how = how;
+ return pnfs_writepages(data, how);
}

enum pnfs_try_status
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 2b01dc7..d1a4f42 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -122,18 +122,15 @@ pnfs_try_to_write_data(struct nfs_write_data *data,
const struct rpc_call_ops *call_ops,
int how)
{
- struct inode *inode = data->inode;
- struct nfs_server *nfss = NFS_SERVER(inode);
enum pnfs_try_status ret;

- /* FIXME: write_pagelist should probably be mandated */
- if (PNFS_EXISTS_LDIO_OP(nfss, write_pagelist))
- ret = _pnfs_try_to_write_data(data, call_ops, how);
- else
- ret = PNFS_NOT_ATTEMPTED;
-
+ if (!data->req->wb_lseg)
+ return PNFS_NOT_ATTEMPTED;
+ ret = _pnfs_try_to_write_data(data, call_ops, how);
if (ret == PNFS_ATTEMPTED)
- nfs_inc_stats(inode, NFSIOS_PNFS_WRITE);
+ nfs_inc_stats(data->inode, NFSIOS_PNFS_WRITE);
+ else
+ _pnfs_clear_lseg_from_pages(&data->pages);
return ret;
}

--
1.6.6.1


2010-05-20 10:30:42

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 14/22] pnfs_submit: stash and refcount lseg in write path

Store the lseg in each nfs_page. Note this necessitates adding checks
for compatibility with pre-existing nfs_pages lsegs.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/file.c | 14 +++++++++-----
fs/nfs/write.c | 30 ++++++++++++++++++------------
include/linux/nfs_fs.h | 8 ++++++--
3 files changed, 33 insertions(+), 19 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 03a1b3b..1479289 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -416,7 +416,9 @@ static int nfs_write_begin(struct file *file, struct address_space *mapping,
file->f_path.dentry->d_name.name,
mapping->host->i_ino, len, (long long) pos);

- pnfs_update_layout(mapping->host, NULL, NFS4_MAX_UINT64, 0, IOMODE_RW,
+ pnfs_update_layout(mapping->host,
+ nfs_file_open_context(file),
+ NFS4_MAX_UINT64, 0, IOMODE_RW,
(struct pnfs_layout_segment **) fsdata);
start:
/*
@@ -435,7 +437,7 @@ start:
}
*pagep = page;

- ret = nfs_flush_incompatible(file, page);
+ ret = nfs_flush_incompatible(file, page, *fsdata);
if (ret) {
unlock_page(page);
page_cache_release(page);
@@ -487,7 +489,7 @@ static int nfs_write_end(struct file *file, struct address_space *mapping,
zero_user_segment(page, pglen, PAGE_CACHE_SIZE);
}

- status = nfs_updatepage(file, page, offset, copied);
+ status = nfs_updatepage(file, page, offset, copied, fsdata);

unlock_page(page);
page_cache_release(page);
@@ -594,6 +596,8 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
/* make sure the cache has finished storing the page */
nfs_fscache_wait_on_page_write(NFS_I(dentry->d_inode), page);

+ /* XXX Do we want to call pnfs_update_layout here? */
+
lock_page(page);
mapping = page->mapping;
if (mapping != dentry->d_inode->i_mapping)
@@ -604,11 +608,11 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
if (pagelen == 0)
goto out_unlock;

- ret = nfs_flush_incompatible(filp, page);
+ ret = nfs_flush_incompatible(filp, page, NULL);
if (ret != 0)
goto out_unlock;

- ret = nfs_updatepage(filp, page, 0, pagelen);
+ ret = nfs_updatepage(filp, page, 0, pagelen, NULL);
out_unlock:
if (!ret)
return VM_FAULT_LOCKED;
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 523ceb4..34a571b 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -570,7 +570,8 @@ static inline int nfs_scan_commit(struct inode *inode, struct list_head *dst, pg
static struct nfs_page *nfs_try_to_update_request(struct inode *inode,
struct page *page,
unsigned int offset,
- unsigned int bytes)
+ unsigned int bytes,
+ struct pnfs_layout_segment *lseg)
{
struct nfs_page *req;
unsigned int rqend;
@@ -595,8 +596,8 @@ static struct nfs_page *nfs_try_to_update_request(struct inode *inode,
* Note: nfs_flush_incompatible() will already
* have flushed out requests having wrong owners.
*/
- if (offset > rqend
- || end < req->wb_offset)
+ if (offset > rqend || end < req->wb_offset ||
+ req->wb_lseg != lseg)
goto out_flushme;

if (nfs_set_page_tag_locked(req))
@@ -644,16 +645,17 @@ out_err:
* already called nfs_flush_incompatible() if necessary.
*/
static struct nfs_page * nfs_setup_write_request(struct nfs_open_context* ctx,
- struct page *page, unsigned int offset, unsigned int bytes)
+ struct page *page, unsigned int offset, unsigned int bytes,
+ struct pnfs_layout_segment *lseg)
{
struct inode *inode = page->mapping->host;
struct nfs_page *req;
int error;

- req = nfs_try_to_update_request(inode, page, offset, bytes);
+ req = nfs_try_to_update_request(inode, page, offset, bytes, lseg);
if (req != NULL)
goto out;
- req = nfs_create_request(ctx, inode, page, offset, bytes, NULL);
+ req = nfs_create_request(ctx, inode, page, offset, bytes, lseg);
if (IS_ERR(req))
goto out;
error = nfs_inode_add_request(inode, req);
@@ -666,11 +668,12 @@ out:
}

static int nfs_writepage_setup(struct nfs_open_context *ctx, struct page *page,
- unsigned int offset, unsigned int count)
+ unsigned int offset, unsigned int count,
+ struct pnfs_layout_segment *lseg)
{
struct nfs_page *req;

- req = nfs_setup_write_request(ctx, page, offset, count);
+ req = nfs_setup_write_request(ctx, page, offset, count, lseg);
if (IS_ERR(req))
return PTR_ERR(req);
nfs_mark_request_dirty(req);
@@ -682,7 +685,8 @@ static int nfs_writepage_setup(struct nfs_open_context *ctx, struct page *page,
return 0;
}

-int nfs_flush_incompatible(struct file *file, struct page *page)
+int nfs_flush_incompatible(struct file *file, struct page *page,
+ struct pnfs_layout_segment *lseg)
{
struct nfs_open_context *ctx = nfs_file_open_context(file);
struct nfs_page *req;
@@ -699,7 +703,8 @@ int nfs_flush_incompatible(struct file *file, struct page *page)
req = nfs_page_find_request(page);
if (req == NULL)
return 0;
- do_flush = req->wb_page != page || req->wb_context != ctx;
+ do_flush = req->wb_page != page || req->wb_context != ctx ||
+ req->wb_lseg != lseg;
nfs_release_request(req);
if (!do_flush)
return 0;
@@ -726,7 +731,8 @@ static int nfs_write_pageuptodate(struct page *page, struct inode *inode)
* things with a page scheduled for an RPC call (e.g. invalidate it).
*/
int nfs_updatepage(struct file *file, struct page *page,
- unsigned int offset, unsigned int count)
+ unsigned int offset, unsigned int count,
+ struct pnfs_layout_segment *lseg)
{
struct nfs_open_context *ctx = nfs_file_open_context(file);
struct inode *inode = page->mapping->host;
@@ -751,7 +757,7 @@ int nfs_updatepage(struct file *file, struct page *page,
offset = 0;
}

- status = nfs_writepage_setup(ctx, page, offset, count);
+ status = nfs_writepage_setup(ctx, page, offset, count, lseg);
if (status < 0)
nfs_set_pageerror(page);

diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 98a8dc0..9d41821 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -503,8 +503,12 @@ extern void nfs_unblock_sillyrename(struct dentry *dentry);
extern int nfs_congestion_kb;
extern int nfs_writepage(struct page *page, struct writeback_control *wbc);
extern int nfs_writepages(struct address_space *, struct writeback_control *);
-extern int nfs_flush_incompatible(struct file *file, struct page *page);
-extern int nfs_updatepage(struct file *, struct page *, unsigned int, unsigned int);
+struct pnfs_layout_segment;
+extern int nfs_flush_incompatible(struct file *file, struct page *page,
+ struct pnfs_layout_segment *lseg);
+extern int nfs_updatepage(struct file *, struct page *,
+ unsigned int offset, unsigned int count,
+ struct pnfs_layout_segment *lseg);
extern int nfs_writeback_done(struct rpc_task *, struct nfs_write_data *);

/*
--
1.6.6.1


2010-05-20 10:30:41

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 13/22] pnfs_submit: use fsdata to pass lseg

Preparing for LAYUTGET invocation in nfs_write_begin to be the
only invocation in the write path.

It isn't used at all yet, but it should be properly referenced/dereferenced

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/file.c | 16 +++++++++++++---
1 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 3ec9abb..03a1b3b 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -416,6 +416,8 @@ static int nfs_write_begin(struct file *file, struct address_space *mapping,
file->f_path.dentry->d_name.name,
mapping->host->i_ino, len, (long long) pos);

+ pnfs_update_layout(mapping->host, NULL, NFS4_MAX_UINT64, 0, IOMODE_RW,
+ (struct pnfs_layout_segment **) fsdata);
start:
/*
* Prevent starvation issues if someone is doing a consistency
@@ -424,11 +426,13 @@ start:
ret = wait_on_bit(&NFS_I(mapping->host)->flags, NFS_INO_FLUSHING,
nfs_wait_bit_killable, TASK_KILLABLE);
if (ret)
- return ret;
+ goto out;

page = grab_cache_page_write_begin(mapping, index, flags);
- if (!page)
- return -ENOMEM;
+ if (!page) {
+ ret = -ENOMEM;
+ goto out;
+ }
*pagep = page;

ret = nfs_flush_incompatible(file, page);
@@ -443,6 +447,11 @@ start:
if (!ret)
goto start;
}
+ out:
+ if (ret) {
+ put_lseg(*fsdata);
+ *fsdata = NULL;
+ }
return ret;
}

@@ -482,6 +491,7 @@ static int nfs_write_end(struct file *file, struct address_space *mapping,

unlock_page(page);
page_cache_release(page);
+ put_lseg(fsdata);

if (status < 0)
return status;
--
1.6.6.1


2010-05-20 10:30:44

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 16/22] pnfs_submit: remove pnfs_update_layout_commit

This seems completely extraneous. Also note this was being
called from within a spinlock.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/pnfs.c | 39 ---------------------------------------
fs/nfs/pnfs.h | 1 -
fs/nfs/write.c | 8 +-------
3 files changed, 1 insertions(+), 47 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index a542601..5e2dad8 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1429,45 +1429,6 @@ pnfs_pageio_init_write(struct nfs_pageio_descriptor *pgio, struct inode *inode)
pnfs_set_pg_test(inode, pgio);
}

-/*
- * Get a layoutout for COMMIT
- */
-void
-pnfs_update_layout_commit(struct inode *inode,
- struct list_head *head,
- pgoff_t idx_start,
- unsigned int npages)
-{
- struct nfs_server *nfss = NFS_SERVER(inode);
- struct nfs_page *nfs_page = nfs_list_entry(head->next);
- u64 count;
- loff_t start;
- int status;
-
- dprintk("--> %s inode %p layout range: %Zd@%llu\n", __func__, inode,
- (size_t)(npages * PAGE_CACHE_SIZE),
- (u64)((u64)idx_start << PAGE_CACHE_SHIFT));
-
- if (!pnfs_enabled_sb(nfss))
- return;
-
- /* COMMIT indicates the whole file with offset = count = 0
- * whereas layout segments indicate whole file with offset = 0,
- * count = NFS4_MAX_UINT64.
- */
- count = (size_t)npages * PAGE_CACHE_SIZE;
- start = (loff_t)idx_start << PAGE_CACHE_SHIFT;
- if (start == 0 && count == 0)
- count = NFS4_MAX_UINT64;
-
- status = _pnfs_update_layout(inode, nfs_page->wb_context,
- count,
- start,
- IOMODE_RW,
- NULL);
- dprintk("%s virt update status %d\n", __func__, status);
-}
-
static int
pnfs_call_done(struct pnfs_call_data *pdata, struct rpc_task *task, void *data)
{
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index bec3c49..2b01dc7 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -58,7 +58,6 @@ enum pnfs_try_status _pnfs_try_to_commit(struct nfs_write_data *,
void pnfs_pageio_init_read(struct nfs_pageio_descriptor *, struct inode *,
struct nfs_open_context *, struct list_head *);
void pnfs_pageio_init_write(struct nfs_pageio_descriptor *, struct inode *);
-void pnfs_update_layout_commit(struct inode *, struct list_head *, pgoff_t, unsigned int);
void pnfs_get_layout_done(struct nfs4_pnfs_layoutget *, int rpc_status);
int pnfs_layout_process(struct nfs4_pnfs_layoutget *lgp);
void pnfs_layout_release(struct pnfs_layout_type *, atomic_t *,
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 34a571b..6d4fe10 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -538,14 +538,8 @@ nfs_scan_commit(struct inode *inode, struct list_head *dst, pgoff_t idx_start, u
ret = nfs_scan_list(nfsi, dst, idx_start, npages, NFS_PAGE_TAG_COMMIT);
if (ret > 0)
nfsi->ncommit -= ret;
- if (nfs_need_commit(NFS_I(inode))) {
+ if (nfs_need_commit(NFS_I(inode)))
__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
-#ifdef CONFIG_NFS_V4_1
- /* FIXME: change pnfs_update_layout_commit to derive
- idx_start from head of list and pass ret rather than npages */
- pnfs_update_layout_commit(inode, dst, idx_start, npages);
-#endif /* CONFIG_NFS_V4_1 */
- }
return ret;
}
#else
--
1.6.6.1


2010-05-20 10:30:49

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 22/22] pnfs_submit: remove unecessary pnfs_fl_call_data field commit_through_mds

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/nfs4proc.c | 8 ++++----
include/linux/nfs_xdr.h | 1 -
2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 9786391..d62d008 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3338,17 +3338,17 @@ static void nfs4_proc_commit_setup(struct nfs_write_data *data, struct rpc_messa

#if defined(CONFIG_NFS_V4_1)
/*
- * pNFS doew not send a getattr to Data Serfers on commit.
+ * pNFS doew not send a getattr to Data Servers on commit.
*/
static void
pnfs4_proc_commit_setup(struct nfs_write_data *data, struct rpc_message *msg)
{
struct nfs_server *server = NFS_SERVER(data->inode);

- dprintk("--> %s ds_nfs_client %p commit_through_mds %d\n", __func__,
- data->fldata.ds_nfs_client, data->fldata.commit_through_mds);
+ dprintk("--> %s ds_nfs_client %p\n", __func__,
+ data->fldata.ds_nfs_client);

- if (!data->fldata.ds_nfs_client || data->fldata.commit_through_mds)
+ if (!data->fldata.ds_nfs_client)
return nfs4_proc_commit_setup(data, msg);

data->args.bitmask = server->attr_bitmask;
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 27d811b..134e33f 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -977,7 +977,6 @@ struct pnfs_call_data {
struct pnfs_fl_call_data {
struct nfs_client *ds_nfs_client;
__u64 orig_offset;
- int commit_through_mds;
};
#endif /* CONFIG_NFS_V4_1 */

--
1.6.6.1


2010-05-20 10:30:45

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 18/22] pnfs: export some commit error handling for use by layout drivers

There exists code to deal with a memory error during commit before the
RPC has been sent. Separate this out and export it for later use by the
filelayout driver.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/internal.h | 1 +
fs/nfs/write.c | 29 ++++++++++++++++++-----------
2 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 92f3231..49c4ea8 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -290,6 +290,7 @@ extern int pnfs_initiate_commit(struct nfs_write_data *data,
const struct rpc_call_ops *call_ops,
int how);
extern void nfs_write_prepare(struct rpc_task *task, void *calldata);
+extern void nfs_mark_list_commit(struct list_head *head);
#ifdef CONFIG_MIGRATION
extern int nfs_migrate_page(struct address_space *,
struct page *, struct page *);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 6d4fe10..2302133 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1377,6 +1377,23 @@ static int nfs_commit_rpcsetup(struct list_head *head,
how);
}

+/* Handle memory error during commit */
+void nfs_mark_list_commit(struct list_head *head)
+{
+ struct nfs_page *req;
+
+ while (!list_empty(head)) {
+ req = nfs_list_entry(head->next);
+ nfs_list_remove_request(req);
+ nfs_mark_request_commit(req);
+ dec_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
+ dec_bdi_stat(req->wb_page->mapping->backing_dev_info,
+ BDI_RECLAIMABLE);
+ nfs_clear_page_tag_locked(req);
+ }
+}
+EXPORT_SYMBOL(nfs_mark_list_commit);
+
/*
* Commit dirty pages
*/
@@ -1384,25 +1401,15 @@ static int
nfs_commit_list(struct inode *inode, struct list_head *head, int how)
{
struct nfs_write_data *data;
- struct nfs_page *req;

data = nfs_commitdata_alloc();
-
if (!data)
goto out_bad;

/* Set up the argument struct */
return nfs_commit_rpcsetup(head, data, how);
out_bad:
- while (!list_empty(head)) {
- req = nfs_list_entry(head->next);
- nfs_list_remove_request(req);
- nfs_mark_request_commit(req);
- dec_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
- dec_bdi_stat(req->wb_page->mapping->backing_dev_info,
- BDI_RECLAIMABLE);
- nfs_clear_page_tag_locked(req);
- }
+ nfs_mark_list_commit(head);
nfs_commit_clear_lock(NFS_I(inode));
return -ENOMEM;
}
--
1.6.6.1


2010-05-20 10:30:48

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 21/22] pnfs_submit: remove unecessary pnfs_fl_call_data field pnfs_client

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/nfs4filelayout.c | 9 +++------
include/linux/nfs_xdr.h | 1 -
2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 6edecc7..d0a7262 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -216,7 +216,6 @@ filelayout_read_pagelist(struct pnfs_layout_type *layoutid,

/* just try the first data server for the index..*/
data->fldata.ds_nfs_client = ds->ds_clp;
- data->fldata.pnfs_client = ds->ds_clp->cl_rpcclient;
data->args.fh = nfs4_fl_select_ds_fh(flseg, idx);

/* Now get the file offset on the dserver
@@ -230,7 +229,7 @@ filelayout_read_pagelist(struct pnfs_layout_type *layoutid,
data->fldata.orig_offset = offset;

/* Perform an asynchronous read */
- nfs_initiate_read(data, data->fldata.pnfs_client,
+ nfs_initiate_read(data, ds->ds_clp->cl_rpcclient,
&filelayout_read_call_ops);

data->pdata.pnfs_error = 0;
@@ -269,7 +268,6 @@ filelayout_write_pagelist(struct pnfs_layout_type *layoutid,
htonl(ds->ds_ip_addr), ntohs(ds->ds_port), ds->r_addr);

data->fldata.ds_nfs_client = ds->ds_clp;
- data->fldata.pnfs_client = ds->ds_clp->cl_rpcclient;
data->args.fh = nfs4_fl_select_ds_fh(flseg, idx);

/* Get the file offset on the dserver. Set the write offset to
@@ -281,7 +279,7 @@ filelayout_write_pagelist(struct pnfs_layout_type *layoutid,
/* Perform an asynchronous write The offset will be reset in the
* call_ops->rpc_call_done() routine
*/
- nfs_initiate_write(data, data->fldata.pnfs_client,
+ nfs_initiate_write(data, ds->ds_clp->cl_rpcclient,
&filelayout_write_call_ops, sync);

data->pdata.pnfs_error = 0;
@@ -572,7 +570,7 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
struct nfs4_pnfs_ds *ds;

dprintk("%s data %p pnfs_client %p sync %d\n",
- __func__, data, data->fldata.pnfs_client, sync);
+ __func__, data, data->fldata.ds_nfs_client->cl_rpcclient, sync);

/* Alloc room for both in one go */
ds_page_list = kzalloc((NFS4_PNFS_MAX_MULTI_CNT + 1) *
@@ -643,7 +641,6 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
continue;
}
clnt = ds->ds_clp->cl_rpcclient;
- dsdata->fldata.pnfs_client = clnt;
dsdata->fldata.ds_nfs_client = ds->ds_clp;
dsdata->args.fh = \
nfs4_fl_select_ds_fh(LSEG_LD_DATA(req->wb_lseg),
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 864eac1..27d811b 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -975,7 +975,6 @@ struct pnfs_call_data {

/* files layout-type specific data for read, write, and commit */
struct pnfs_fl_call_data {
- struct rpc_clnt *pnfs_client; /* Holds pNFS device across async calls */
struct nfs_client *ds_nfs_client;
__u64 orig_offset;
int commit_through_mds;
--
1.6.6.1


2010-05-20 10:30:48

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 20/22] pnfs_submit: filelayout: rewrite filelayout_commit to use new API

In the process, give it a much needed rewrite.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/nfs4filelayout.c | 192 ++++++++++++++++++++++++++---------------------
fs/nfs/write.c | 9 ++
2 files changed, 115 insertions(+), 86 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 789706e..6edecc7 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -530,8 +530,7 @@ filelayout_clone_write_data(struct nfs_write_data *old)
nfs_fattr_init(&new->fattr);
new->res.verf = &new->verf;
new->args.context = get_nfs_open_context(old->args.context);
- new->pdata.lseg = old->pdata.lseg;
- kref_get(&new->pdata.lseg->kref);
+ new->pdata.lseg = NULL;
new->pdata.call_ops = old->pdata.call_ops;
new->pdata.how = old->pdata.how;
out:
@@ -559,103 +558,124 @@ enum pnfs_try_status
filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
struct nfs_write_data *data)
{
- struct nfs4_filelayout_segment *nfslay;
- struct nfs_write_data *dsdata = NULL;
+ LIST_HEAD(head);
+ struct nfs_page *req;
+ loff_t file_offset = 0;
+ u16 idx, i;
+ struct list_head **ds_page_list = NULL;
+ u16 *indices_used;
+ int num_indices_seen = 0;
+ const struct rpc_call_ops *call_ops;
+ struct rpc_clnt *clnt;
+ struct nfs_write_data **clone_list = NULL;
+ struct nfs_write_data *dsdata;
struct nfs4_pnfs_ds *ds;
- struct nfs_page *req, *reqt;
- struct list_head *pos, *tmp, head, head2;
- loff_t file_offset, comp_offset;
- enum pnfs_try_status trypnfs = PNFS_ATTEMPTED;
- u32 idx1, idx2;

- nfslay = LSEG_LD_DATA(data->pdata.lseg);
-
- dprintk("%s data %p pnfs_client %p nfslay %p sync %d\n",
- __func__, data, data->fldata.pnfs_client, nfslay, sync);
-
- data->fldata.commit_through_mds = nfslay->commit_through_mds;
- if (nfslay->commit_through_mds) {
- dprintk("%s data %p commit through mds\n", __func__, data);
- return PNFS_NOT_ATTEMPTED;
- }
-
- INIT_LIST_HEAD(&head);
- INIT_LIST_HEAD(&head2);
- list_add(&head, &data->pages);
- list_del_init(&data->pages);
-
- /* COMMIT to each Data Server */
- while (!list_empty(&head)) {
- req = nfs_list_entry(head.next);
-
- file_offset = (loff_t)req->wb_index << PAGE_CACHE_SHIFT;
-
- /* Get dserver for the current page */
- idx1 = nfs4_fl_calc_ds_index(data->pdata.lseg, file_offset);
- ds = nfs4_fl_prepare_ds(data->pdata.lseg, idx1);
- if (!ds) {
- data->pdata.pnfs_error = -EIO;
- goto err_rewind;
+ dprintk("%s data %p pnfs_client %p sync %d\n",
+ __func__, data, data->fldata.pnfs_client, sync);
+
+ /* Alloc room for both in one go */
+ ds_page_list = kzalloc((NFS4_PNFS_MAX_MULTI_CNT + 1) *
+ (sizeof(u16) + sizeof(struct list_head *)),
+ GFP_KERNEL);
+ if (!ds_page_list)
+ goto mem_error;
+ indices_used = (u16 *) (ds_page_list + NFS4_PNFS_MAX_MULTI_CNT + 1);
+
+ /* Sort pages based on which ds to send to.
+ * MDS is given index equal to NFS4_PNFS_MAX_MULTI_CNT.
+ * Note we are assuming there is only a single lseg in play.
+ * When that is not true, we could first sort on lseg, then
+ * sort within each as we do here.
+ */
+ while (!list_empty(&data->pages)) {
+ req = nfs_list_entry(data->pages.next);
+ nfs_list_remove_request(req);
+ if (!req->wb_lseg ||
+ ((struct nfs4_filelayout_segment *)
+ LSEG_LD_DATA(req->wb_lseg))->commit_through_mds)
+ idx = NFS4_PNFS_MAX_MULTI_CNT;
+ else {
+ file_offset = (loff_t)req->wb_index << PAGE_CACHE_SHIFT;
+ idx = nfs4_fl_calc_ds_index(req->wb_lseg, file_offset);
}
-
- /* Gather all pages going to the current data server by
- * comparing their indices.
- * XXX: This recalculates the indices unecessarily.
- * One idea would be to calc the index for every page
- * and then compare if they are the same. */
- list_for_each_safe(pos, tmp, &head) {
- reqt = nfs_list_entry(pos);
- comp_offset = (loff_t)reqt->wb_index << PAGE_CACHE_SHIFT;
- idx2 = nfs4_fl_calc_ds_index(data->pdata.lseg,
- comp_offset);
- if (idx1 == idx2) {
- nfs_list_remove_request(reqt);
- nfs_list_add_request(reqt, &head2);
- }
+ if (ds_page_list[idx]) {
+ /* Already seen this idx */
+ list_add(&req->wb_list, ds_page_list[idx]);
+ } else {
+ /* New idx not seen so far */
+ list_add_tail(&req->wb_list, &head);
+ indices_used[num_indices_seen++] = idx;
}
-
- if (!list_empty(&head)) {
- dsdata = filelayout_clone_write_data(data);
- if (!dsdata) {
- /* return pages back to head */
- list_splice(&head2, &head);
- INIT_LIST_HEAD(&head2);
- data->pdata.pnfs_error = -ENOMEM;
- goto err_rewind;
- }
+ ds_page_list[idx] = &req->wb_list;
+ }
+ /* Once created, clone must be released via call_op */
+ clone_list = kzalloc(num_indices_seen *
+ sizeof(struct nfs_write_data *), GFP_KERNEL);
+ if (!clone_list)
+ goto mem_error;
+ for (i = 0; i < num_indices_seen - 1; i++) {
+ clone_list[i] = filelayout_clone_write_data(data);
+ if (!clone_list[i])
+ goto mem_error;
+ }
+ clone_list[i] = data;
+ /* Now send off the RPCs to each ds. Note that it is important
+ * that any RPC to the MDS be sent last (or at least after all
+ * clones have been made.)
+ */
+ for (i = 0; i < num_indices_seen; i++) {
+ dsdata = clone_list[i];
+ idx = indices_used[i];
+ list_cut_position(&dsdata->pages, &head, ds_page_list[idx]);
+ if (idx == NFS4_PNFS_MAX_MULTI_CNT) {
+ call_ops = data->pdata.call_ops;;
+ clnt = NFS_CLIENT(dsdata->inode);
+ ds = NULL;
} else {
- dsdata = data;
+ call_ops = &filelayout_commit_call_ops;
+ req = nfs_list_entry(dsdata->pages.next);
+ ds = nfs4_fl_prepare_ds(req->wb_lseg, idx);
+ if (!ds) {
+ /* Trigger retry of this chunk through MDS */
+ dsdata->task.tk_status = -EIO;
+ data->pdata.call_ops->rpc_release(dsdata);
+ continue;
+ }
+ clnt = ds->ds_clp->cl_rpcclient;
+ dsdata->fldata.pnfs_client = clnt;
+ dsdata->fldata.ds_nfs_client = ds->ds_clp;
+ dsdata->args.fh = \
+ nfs4_fl_select_ds_fh(LSEG_LD_DATA(req->wb_lseg),
+ idx);
}
-
- list_add(&dsdata->pages, &head2);
- list_del_init(&head2);
-
- dsdata->fldata.pnfs_client = ds->ds_clp->cl_rpcclient;
- dsdata->fldata.ds_nfs_client = ds->ds_clp;
- dsdata->args.fh = nfs4_fl_select_ds_fh(nfslay, idx1);
-
dprintk("%s: Initiating commit: %llu USE DS:\n",
__func__, file_offset);
print_ds(ds);

/* Send COMMIT to data server */
- nfs_initiate_commit(dsdata, dsdata->fldata.pnfs_client,
- &filelayout_commit_call_ops, sync);
+ nfs_initiate_commit(dsdata, clnt, call_ops, sync);
}
+ kfree(clone_list);
+ kfree(ds_page_list);
+ data->pdata.pnfs_error = 0;
+ return PNFS_ATTEMPTED;

-out:
- if (data->pdata.pnfs_error)
- printk(KERN_ERR "%s: ERROR %d\n", __func__,
- data->pdata.pnfs_error);
-
- /* XXX should we send COMMIT to MDS e.g. not free data and return 1 ? */
- return trypnfs;
-err_rewind:
- /* put remaining pages back onto the original data->pages */
- list_add(&data->pages, &head);
- list_del_init(&head);
- trypnfs = PNFS_NOT_ATTEMPTED;
- goto out;
+ mem_error:
+ if (clone_list) {
+ for (i = 0; i < num_indices_seen - 1; i++) {
+ if (!clone_list[i])
+ break;
+ data->pdata.call_ops->rpc_release(clone_list[i]);
+ }
+ kfree(clone_list);
+ }
+ kfree(ds_page_list);
+ /* One of these will be empty, but doesn't hurt to do both */
+ nfs_mark_list_commit(&head);
+ nfs_mark_list_commit(&data->pages);
+ data->pdata.call_ops->rpc_release(data);
+ return PNFS_ATTEMPTED;
}

/* Return the stripesize for the specified file.
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 28e4907..48aa4a9 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1461,6 +1461,15 @@ static void nfs_commit_release(void *calldata)
req->wb_bytes,
(long long)req_offset(req));
if (status < 0) {
+ if (req->wb_lseg) {
+ struct pnfs_layout_segment *lseg = req->wb_lseg;
+
+ req->wb_lseg = NULL;
+ put_lseg(lseg);
+ dprintk(" retry through MDS\n");
+ nfs_mark_request_dirty(req);
+ goto next;
+ }
nfs_context_set_write_error(req->wb_context, status);
nfs_inode_remove_request(req);
dprintk(", error = %d\n", status);
--
1.6.6.1


2010-05-20 10:30:47

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 19/22] pnfs_submit: API change: remove pnfs_commit layoutget invocation

WARNING - this is an API change.

The layout driver's commit operation no longer takes an lseg.
This is because each nfs_page may or may not have an associated lseg.
It is the layout drivers task to send commits to the appropriate place.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/internal.h | 2 +-
fs/nfs/pagelist.c | 5 ++-
fs/nfs/pnfs.c | 79 ++++++++-------------------------------------
fs/nfs/pnfs.h | 21 +++++-------
fs/nfs/write.c | 23 +++++++------
include/linux/nfs_page.h | 3 +-
6 files changed, 43 insertions(+), 90 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 49c4ea8..f149452 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -288,7 +288,7 @@ extern int nfs_initiate_commit(struct nfs_write_data *data,
extern int pnfs_initiate_commit(struct nfs_write_data *data,
struct rpc_clnt *clnt,
const struct rpc_call_ops *call_ops,
- int how);
+ int how, int pnfs);
extern void nfs_write_prepare(struct rpc_task *task, void *calldata);
extern void nfs_mark_list_commit(struct list_head *head);
#ifdef CONFIG_MIGRATION
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 19ffdc5..52f6f6a 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -386,6 +386,7 @@ void nfs_pageio_cond_complete(struct nfs_pageio_descriptor *desc, pgoff_t index)
* @idx_start: lower bound of page->index to scan
* @npages: idx_start + npages sets the upper bound to scan.
* @tag: tag to scan for
+ * @use_pnfs: will be set TRUE if commit needs to be handled by layout driver
*
* Moves elements from one of the inode request lists.
* If the number of requests is set to 0, the entire address_space
@@ -395,7 +396,7 @@ void nfs_pageio_cond_complete(struct nfs_pageio_descriptor *desc, pgoff_t index)
*/
int nfs_scan_list(struct nfs_inode *nfsi,
struct list_head *dst, pgoff_t idx_start,
- unsigned int npages, int tag)
+ unsigned int npages, int tag, int *use_pnfs)
{
struct nfs_page *pgvec[NFS_SCAN_MAXENTRIES];
struct nfs_page *req;
@@ -426,6 +427,8 @@ int nfs_scan_list(struct nfs_inode *nfsi,
radix_tree_tag_clear(&nfsi->nfs_page_tree,
req->wb_index, tag);
nfs_list_add_request(req, dst);
+ if (req->wb_lseg)
+ *use_pnfs = 1;
res++;
if (res == INT_MAX)
goto out;
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 5ccd406..4c3480b 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1673,19 +1673,11 @@ enum pnfs_try_status
_pnfs_try_to_commit(struct nfs_write_data *data,
const struct rpc_call_ops *call_ops, int how)
{
- struct inode *inode = data->inode;
-
- if (!pnfs_enabled_sb(NFS_SERVER(inode))) {
- dprintk("%s: Not using pNFS I/O\n", __func__);
- return PNFS_NOT_ATTEMPTED;
- } else {
- /* data->call_ops and data->how set in nfs_commit_rpcsetup */
- dprintk("%s: Utilizing pNFS I/O\n", __func__);
- data->pdata.call_ops = call_ops;
- data->pdata.pnfs_error = 0;
- data->pdata.how = how;
- return pnfs_commit(data, how);
- }
+ dprintk("%s: Utilizing pNFS I/O\n", __func__);
+ data->pdata.call_ops = call_ops;
+ data->pdata.pnfs_error = 0;
+ data->pdata.how = how;
+ return pnfs_commit(data, how);
}

/* pNFS Commit callback function for all layout drivers */
@@ -1706,76 +1698,33 @@ pnfs_commit_done(struct nfs_write_data *data)
_pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE,
true);
pnfs_initiate_commit(data, NFS_CLIENT(data->inode),
- pdata->call_ops, pdata->how);
+ pdata->call_ops, pdata->how, 1);
}
}

static enum pnfs_try_status
pnfs_commit(struct nfs_write_data *data, int sync)
{
- int result;
struct nfs_inode *nfsi = NFS_I(data->inode);
struct nfs_server *nfss = NFS_SERVER(data->inode);
- struct pnfs_layout_segment *lseg;
- struct nfs_page *first, *last, *p;
- int npages;
enum pnfs_try_status trypnfs;
- u64 count;

dprintk("%s: Begin\n", __func__);

- /* If the layout driver doesn't define its own commit function
- * use standard NFSv4 commit
- */
- first = last = nfs_list_entry(data->pages.next);
- npages = 0;
- list_for_each_entry(p, &data->pages, wb_list) {
- last = p;
- npages++;
- }
- /* COMMIT indicates the whole file with offset = count = 0
- * whereas layout segments indicate whole file with offset = 0,
- * count = NFS4_MAX_UINT64.
+ /* We need to account for possibility that
+ * each nfs_page can point to a different lseg (or be NULL).
+ * For the immediate case of whole-file-only layouts, we at
+ * least know there can be only a single lseg.
+ * We still have to account for the possibility of some being NULL.
+ * This will be done by passing the buck to the layout driver.
*/
- count = ((npages - 1) << PAGE_CACHE_SHIFT) + first->wb_bytes +
- (first != last) ? last->wb_bytes : 0;
- if (first->wb_offset == 0 && count == 0)
- count = NFS4_MAX_UINT64;
-
- /* FIXME: we really ought to keep the layout segment that we used
- to write the page around for committing it and never ask for a
- new one. If it was recalled we better commit the data first
- before returning it, otherwise the data needs to be rewritten,
- either with a new layout or to the MDS */
- result = _pnfs_update_layout(data->inode,
- NULL,
- count,
- first->wb_offset,
- IOMODE_RW,
- &lseg);
- /* If no layout have been retrieved,
- * use standard NFSv4 commit
- */
- if (result) {
- dprintk("%s: Updating layout failed (%d), retry with NFS \n",
- __func__, result);
- trypnfs = PNFS_NOT_ATTEMPTED;
- goto out;
- }
-
- dprintk("%s: Calling layout driver commit\n", __func__);
+ data->pdata.lseg = NULL;
if (!pnfs_use_rpc(nfss))
data->pdata.pnfsflags |= PNFS_NO_RPC;
- data->pdata.lseg = lseg;
trypnfs = nfss->pnfs_curr_ld->ld_io_ops->commit(&nfsi->layout,
sync, data);
- if (trypnfs == PNFS_NOT_ATTEMPTED) {
+ if (trypnfs == PNFS_NOT_ATTEMPTED)
data->pdata.pnfsflags &= ~PNFS_NO_RPC;
- data->pdata.lseg = NULL;
- put_lseg(lseg);
- }
-
-out:
dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
return trypnfs;
}
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index d1a4f42..78defc3 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -139,21 +139,18 @@ pnfs_try_to_commit(struct nfs_write_data *data,
const struct rpc_call_ops *call_ops,
int how)
{
- struct inode *inode = data->inode;
- struct nfs_server *nfss = NFS_SERVER(inode);
enum pnfs_try_status ret;

- /* Note that we check for "write_pagelist" and not for "commit"
- since if async writes were done and pages weren't marked as stable
- the commit method MUST be defined by the LD */
- /* FIXME: write_pagelist should probably be mandated */
- if (PNFS_EXISTS_LDIO_OP(nfss, write_pagelist))
- ret = _pnfs_try_to_commit(data, call_ops, how);
- else
- ret = PNFS_NOT_ATTEMPTED;
-
+ /* Unlike in pnfs_try_to_write_data and pnfs_try_to_read_data,
+ * we have no guarantee that all nfs_pages point to the same
+ * lseg. However, if we reach here, we are guaranteed that at
+ * least one points to some lseg.
+ */
+ ret = _pnfs_try_to_commit(data, call_ops, how);
if (ret == PNFS_ATTEMPTED)
- nfs_inc_stats(inode, NFSIOS_PNFS_COMMIT);
+ nfs_inc_stats(data->inode, NFSIOS_PNFS_COMMIT);
+ else
+ _pnfs_clear_lseg_from_pages(&data->pages);
return ret;
}

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 2302133..28e4907 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -527,7 +527,7 @@ nfs_need_commit(struct nfs_inode *nfsi)
* The requests are *not* checked to ensure that they form a contiguous set.
*/
static int
-nfs_scan_commit(struct inode *inode, struct list_head *dst, pgoff_t idx_start, unsigned int npages)
+nfs_scan_commit(struct inode *inode, struct list_head *dst, pgoff_t idx_start, unsigned int npages, int *use_pnfs)
{
struct nfs_inode *nfsi = NFS_I(inode);
int ret;
@@ -535,7 +535,8 @@ nfs_scan_commit(struct inode *inode, struct list_head *dst, pgoff_t idx_start, u
if (!nfs_need_commit(nfsi))
return 0;

- ret = nfs_scan_list(nfsi, dst, idx_start, npages, NFS_PAGE_TAG_COMMIT);
+ ret = nfs_scan_list(nfsi, dst, idx_start, npages, NFS_PAGE_TAG_COMMIT,
+ use_pnfs);
if (ret > 0)
nfsi->ncommit -= ret;
if (nfs_need_commit(NFS_I(inode)))
@@ -1334,9 +1335,10 @@ EXPORT_SYMBOL(nfs_initiate_commit);
int pnfs_initiate_commit(struct nfs_write_data *data,
struct rpc_clnt *clnt,
const struct rpc_call_ops *call_ops,
- int how)
+ int how, int pnfs)
{
- if (pnfs_try_to_commit(data, &nfs_commit_ops, how) == PNFS_ATTEMPTED)
+ if (pnfs &&
+ (pnfs_try_to_commit(data, &nfs_commit_ops, how) == PNFS_ATTEMPTED))
return pnfs_get_write_status(data);

return nfs_initiate_commit(data, clnt, &nfs_commit_ops, how);
@@ -1347,7 +1349,7 @@ int pnfs_initiate_commit(struct nfs_write_data *data,
*/
static int nfs_commit_rpcsetup(struct list_head *head,
struct nfs_write_data *data,
- int how)
+ int how, int pnfs)
{
struct nfs_page *first = nfs_list_entry(head->next);
struct inode *inode = first->wb_context->path.dentry->d_inode;
@@ -1374,7 +1376,7 @@ static int nfs_commit_rpcsetup(struct list_head *head,
data->args.context = first->wb_context; /* used by commit done */

return pnfs_initiate_commit(data, NFS_CLIENT(inode), &nfs_commit_ops,
- how);
+ how, pnfs);
}

/* Handle memory error during commit */
@@ -1398,7 +1400,7 @@ EXPORT_SYMBOL(nfs_mark_list_commit);
* Commit dirty pages
*/
static int
-nfs_commit_list(struct inode *inode, struct list_head *head, int how)
+nfs_commit_list(struct inode *inode, struct list_head *head, int how, int pnfs)
{
struct nfs_write_data *data;

@@ -1407,7 +1409,7 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how)
goto out_bad;

/* Set up the argument struct */
- return nfs_commit_rpcsetup(head, data, how);
+ return nfs_commit_rpcsetup(head, data, how, pnfs);
out_bad:
nfs_mark_list_commit(head);
nfs_commit_clear_lock(NFS_I(inode));
@@ -1495,14 +1497,15 @@ static int nfs_commit_inode(struct inode *inode, int how)
LIST_HEAD(head);
int may_wait = how & FLUSH_SYNC;
int res = 0;
+ int use_pnfs = 0;

if (!nfs_commit_set_lock(NFS_I(inode), may_wait))
goto out;
spin_lock(&inode->i_lock);
- res = nfs_scan_commit(inode, &head, 0, 0);
+ res = nfs_scan_commit(inode, &head, 0, 0, &use_pnfs);
spin_unlock(&inode->i_lock);
if (res) {
- int error = nfs_commit_list(inode, &head, how);
+ int error = nfs_commit_list(inode, &head, how, use_pnfs);
if (error < 0)
return error;
if (may_wait)
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index 18a455c..06e5157 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -83,7 +83,8 @@ extern void nfs_release_request(struct nfs_page *req);


extern int nfs_scan_list(struct nfs_inode *nfsi, struct list_head *dst,
- pgoff_t idx_start, unsigned int npages, int tag);
+ pgoff_t idx_start, unsigned int npages, int tag,
+ int *use_pnfs);
extern void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
struct inode *inode,
int (*doio)(struct inode *, struct list_head *, unsigned int, size_t, int),
--
1.6.6.1


2010-05-20 10:30:30

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 03/22] Revert "pnfs: Enable O_DIRECT read path."

This reverts commit fe1dbd120b6a94bbacec205d0a4ae40d36e314b5.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/direct.c | 26 +-------------------------
1 files changed, 1 insertions(+), 25 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 1148214..3ef9b0c 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -56,7 +56,6 @@

#include "internal.h"
#include "iostat.h"
-#include "pnfs.h"

#define NFSDBG_FACILITY NFSDBG_VFS

@@ -329,17 +328,6 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_direct_req *dreq,
unsigned int pgbase;
int result;
ssize_t started = 0;
- size_t pnfs_stripe_rem = count;
- enum pnfs_try_status trypnfs;
-
- /* pnfs_stripe_rem will be set to the remaining bytes in
- * the first stripe_unit (which for standard nfs is count)
- */
- pnfs_direct_init_io(inode, ctx, count, pos, 0, &rsize,
- &pnfs_stripe_rem);
-
- dprintk("%s: pos %llu count %Zu wsize %Zu\n",
- __func__, pos, count, rsize);

do {
struct nfs_read_data *data;
@@ -347,12 +335,6 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_direct_req *dreq,

pgbase = user_addr & ~PAGE_MASK;
bytes = min(rsize,count);
-#if defined(CONFIG_NFS_V4_1)
- if (pnfs_enabled_sb(NFS_SERVER(inode))) {
- bytes = min(bytes, pnfs_stripe_rem);
- pnfs_stripe_rem = rsize;
- }
-#endif /* CONFIG_NFS_V4_1 */

result = -ENOMEM;
data = nfs_readdata_alloc(nfs_page_array_len(pgbase, bytes));
@@ -393,14 +375,8 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_direct_req *dreq,
data->res.eof = 0;
data->res.count = bytes;

- trypnfs = pnfs_try_to_read_data(data, &nfs_read_direct_ops);
- if (trypnfs == PNFS_ATTEMPTED) {
- result = pnfs_get_read_status(data);
- if (result)
- break;
- } else if (nfs_direct_read_execute(data, &task_setup_data, &msg)) {
+ if (nfs_direct_read_execute(data, &task_setup_data, &msg))
break;
- }

started += bytes;
user_addr += bytes;
--
1.6.6.1


2010-05-20 10:30:28

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 01/22] Revert "pnfs-nonfilelayout: Prelim support for non-file layout O_DIRECT"

This reverts commit 05277f5f5236462a11e7a20ebe9009449f8a463d.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/direct.c | 10 ----------
1 files changed, 0 insertions(+), 10 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index e111e9f..02e5918 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -191,22 +191,12 @@ static ssize_t nfs_direct_wait(struct nfs_direct_req *dreq)
{
ssize_t result = -EIOCBQUEUED;

- if (!pnfs_use_rpc(NFS_SERVER(dreq->inode))) {
- /* FIXME: Right now non-rpc layout types must perform
- * syncronous direct i/o.
- * New pNFS callback to wait on outstanding requests?
- */
- result = 0;
- goto set_result;
- }
-
/* Async requests don't wait here */
if (dreq->iocb)
goto out;

result = wait_for_completion_killable(&dreq->completion);

-set_result:
if (!result)
result = dreq->error;
if (!result)
--
1.6.6.1


2010-05-20 10:30:29

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 02/22] Revert "pnfs: Enable O_DIRECT write path."

This reverts commit 2faf680af973895bdfe19f2254b59dc1a153dd82.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/direct.c | 41 +----------------------------------------
1 files changed, 1 insertions(+), 40 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 02e5918..1148214 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -505,7 +505,6 @@ static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq)
.workqueue = nfsiod_workqueue,
.flags = RPC_TASK_ASYNC,
};
- enum pnfs_try_status trypnfs;

dreq->count = 0;
get_dreq(dreq);
@@ -529,11 +528,6 @@ static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq)
* Reuse data->task; data->args should not have changed
* since the original request was sent.
*/
- trypnfs = pnfs_try_to_write_data(data, &nfs_write_direct_ops,
- NFS_FILE_SYNC);
- if (trypnfs == PNFS_ATTEMPTED)
- continue;
-
nfs_direct_write_execute(data, &task_setup_data, &msg);
}

@@ -616,7 +610,6 @@ static void nfs_direct_commit_schedule(struct nfs_direct_req *dreq)
.workqueue = nfsiod_workqueue,
.flags = RPC_TASK_ASYNC,
};
- enum pnfs_try_status trypnfs;

data->inode = dreq->inode;
data->cred = msg.rpc_cred;
@@ -630,11 +623,6 @@ static void nfs_direct_commit_schedule(struct nfs_direct_req *dreq)
data->res.verf = &data->verf;
nfs_fattr_init(&data->fattr);

- trypnfs = pnfs_try_to_commit(data, &nfs_commit_direct_ops,
- RPC_TASK_ASYNC);
- if (trypnfs == PNFS_ATTEMPTED)
- return;
-
nfs_direct_commit_execute(dreq, data, &task_setup_data, &msg);
}

@@ -683,9 +671,6 @@ static void nfs_direct_write_result(struct rpc_task *task, void *calldata)
{
struct nfs_write_data *data = calldata;

- dprintk("%s: verf: %d stable %d\n", __func__,
- data->res.verf->committed, data->args.stable);
-
if (nfs_writeback_done(task, data) != 0)
return;
}
@@ -799,17 +784,6 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_direct_req *dreq,
unsigned int pgbase;
int result;
ssize_t started = 0;
- size_t pnfs_stripe_rem = count;
- enum pnfs_try_status trypnfs;
-
- /* pnfs_stripe_rem will be set to the remaining bytes in
- * the first stripe_unit (which for standard nfs is count)
- */
- pnfs_direct_init_io(inode, ctx, count, pos, 1,
- &wsize, &pnfs_stripe_rem);
-
- dprintk("%s: pos %llu count %Zu wsize %Zu\n",
- __func__, pos, count, wsize);

do {
struct nfs_write_data *data;
@@ -818,12 +792,6 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_direct_req *dreq,
pgbase = user_addr & ~PAGE_MASK;
bytes = min(wsize,count);

-#if defined(CONFIG_NFS_V4_1)
- if (pnfs_enabled_sb(NFS_SERVER(inode))) {
- bytes = min(bytes, pnfs_stripe_rem);
- pnfs_stripe_rem = wsize;
- }
-#endif /* CONFIG_NFS_V4_1 */
result = -ENOMEM;
data = nfs_writedata_alloc(nfs_page_array_len(pgbase, bytes));
if (unlikely(!data))
@@ -867,15 +835,8 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_direct_req *dreq,
data->res.verf = &data->verf;
nfs_fattr_init(&data->fattr);

- trypnfs = pnfs_try_to_write_data(data, &nfs_write_direct_ops,
- sync);
- if (trypnfs == PNFS_ATTEMPTED) {
- result = pnfs_get_write_status(data);
- if (result)
- break;
- } else if (nfs_direct_write_execute(data, &task_setup_data, &msg)) {
+ if (nfs_direct_write_execute(data, &task_setup_data, &msg))
break;
- }

started += bytes;
user_addr += bytes;
--
1.6.6.1


2010-05-20 10:30:30

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 04/22] Revert "pnfs: Add function to set up O_DIRECT I/O"

This reverts commit 4bc73cd4118b5d5b710c28c83a750bf4e02e8269.

Conflicts:

fs/nfs/pnfs.c
fs/nfs/pnfs.h

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/pnfs.c | 31 -------------------------------
fs/nfs/pnfs.h | 25 -------------------------
2 files changed, 0 insertions(+), 56 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 20285bc..8dbf740 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1399,37 +1399,6 @@ pnfs_pageio_init_write(struct nfs_pageio_descriptor *pgio, struct inode *inode)
pnfs_set_pg_test(inode, pgio);
}

-/* Retrieve I/O parameters for O_DIRECT.
- * Out Args:
- * iosize - min of boundary and (rsize or wsize)
- * remaining - # bytes remaining in the current stripe unit
- */
-void
-_pnfs_direct_init_io(struct inode *inode, struct nfs_open_context *ctx,
- size_t count, loff_t loff, int iswrite, size_t *iosize,
- size_t *remaining)
-{
- struct nfs_server *nfss = NFS_SERVER(inode);
- u32 boundary;
- unsigned int rwsize;
-
- if (count <= 0 ||
- pnfs_update_layout(inode, ctx, count, loff, IOMODE_READ, NULL))
- return;
-
- if (iswrite)
- rwsize = nfss->wsize;
- else
- rwsize = nfss->rsize;
-
- boundary = pnfs_getboundary(inode);
-
- *iosize = min(rwsize, boundary);
- *remaining = boundary - (do_div(loff, boundary));
-
- dprintk("%s Rem %Zu iosize %Zu\n", __func__, *remaining, *iosize);
-}
-
/*
* Get a layoutout for COMMIT
*/
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 8edca30..5e9b06b 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -66,9 +66,6 @@ void pnfs_layout_release(struct pnfs_layout_type *, atomic_t *,
void pnfs_set_layout_stateid(struct pnfs_layout_type *lo,
const nfs4_stateid *stateid);
void pnfs_destroy_layout(struct nfs_inode *);
-void _pnfs_direct_init_io(struct inode *inode, struct nfs_open_context *ctx,
- size_t count, loff_t loff, int iswrite,
- size_t *rwsize, size_t *remaining);

#define PNFS_EXISTS_LDIO_OP(srv, opname) ((srv)->pnfs_curr_ld && \
(srv)->pnfs_curr_ld->ld_io_ops && \
@@ -182,20 +179,6 @@ static inline int pnfs_get_read_status(struct nfs_read_data *data)
return data->pdata.pnfs_error;
}

-static inline void pnfs_direct_init_io(struct inode *inode,
- struct nfs_open_context *ctx,
- size_t count, loff_t loff, int iswrite,
- size_t *iosize, size_t *remaining)
-{
- struct nfs_server *nfss = NFS_SERVER(inode);
-
- if (pnfs_enabled_sb(nfss))
- return _pnfs_direct_init_io(inode, ctx, count, loff, iswrite,
- iosize, remaining);
-
- return;
-}
-
static inline int pnfs_use_rpc(struct nfs_server *nfss)
{
if (pnfs_enabled_sb(nfss))
@@ -241,14 +224,6 @@ static inline int pnfs_get_read_status(struct nfs_read_data *data)
return 0;
}

-/* Set num of remaining bytes, which is everything */
-static inline void pnfs_direct_init_io(struct inode *inode,
- struct nfs_open_context *ctx,
- size_t count, loff_t loff, int iswrite,
- size_t *iosize, size_t *remaining)
-{
-}
-
static inline int pnfs_use_rpc(struct nfs_server *nfss)
{
return 1;
--
1.6.6.1


2010-05-20 10:30:32

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 06/22] pnfs: filelayout: remove some dead code from filelayout_commit

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/nfs4filelayout.c | 10 ++--------
1 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 2d7f634..3f31c32 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -562,7 +562,6 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
struct nfs_page *req, *reqt;
struct list_head *pos, *tmp, head, head2;
loff_t file_offset, comp_offset;
- size_t stripesz, cbytes;
enum pnfs_try_status trypnfs = PNFS_ATTEMPTED;
u32 idx1, idx2;

@@ -577,9 +576,6 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
return PNFS_NOT_ATTEMPTED;
}

- stripesz = filelayout_get_stripesize(layoutid);
- dprintk("%s stripesize %Zd\n", __func__, stripesz);
-
INIT_LIST_HEAD(&head);
INIT_LIST_HEAD(&head2);
list_add(&head, &data->pages);
@@ -587,7 +583,6 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,

/* COMMIT to each Data Server */
while (!list_empty(&head)) {
- cbytes = 0;
req = nfs_list_entry(head.next);

file_offset = (loff_t)req->wb_index << PAGE_CACHE_SHIFT;
@@ -613,7 +608,6 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
if (idx1 == idx2) {
nfs_list_remove_request(reqt);
nfs_list_add_request(reqt, &head2);
- cbytes += reqt->wb_bytes;
}
}

@@ -637,8 +631,8 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
dsdata->fldata.ds_nfs_client = ds->ds_clp;
dsdata->args.fh = nfs4_fl_select_ds_fh(nfslay, idx1);

- dprintk("%s: Initiating commit: %Zu@%llu USE DS:\n",
- __func__, cbytes, file_offset);
+ dprintk("%s: Initiating commit: %llu USE DS:\n",
+ __func__, file_offset);
print_ds(ds);

/* Send COMMIT to data server */
--
1.6.6.1


2010-05-20 10:30:31

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 05/22] pnfs: filelayout: clean and breakup nfs4_pnfs_dserver_get

Rewrite nfs4_pnfs_dserver_get as two functions, nfs4_fl_calc_ds_index() and
nfs4_fl_prepare_ds(). This cleans up the code a bit and prepares for more
extensive rewrite of filelayout_commit().

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/nfs4filelayout.c | 75 ++++++++++++----------------------
fs/nfs/nfs4filelayout.h | 33 +++++++--------
fs/nfs/nfs4filelayoutdev.c | 95 +++++++++++++++----------------------------
3 files changed, 75 insertions(+), 128 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 200bbc2..2d7f634 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -196,8 +196,8 @@ filelayout_read_pagelist(struct pnfs_layout_type *layoutid,
{
struct inode *inode = PNFS_INODE(layoutid);
struct nfs4_filelayout_segment *flseg;
- struct nfs4_pnfs_dserver dserver;
- int status;
+ struct nfs4_pnfs_ds *ds;
+ u32 idx;

dprintk("--> %s ino %lu nr_pages %d pgbase %u req %Zu@%llu\n",
__func__, inode->i_ino, nr_pages, pgbase, count, offset);
@@ -205,23 +205,19 @@ filelayout_read_pagelist(struct pnfs_layout_type *layoutid,
flseg = LSEG_LD_DATA(data->pdata.lseg);

/* Retrieve the correct rpc_client for the byte range */
- status = nfs4_pnfs_dserver_get(data->pdata.lseg,
- offset,
- count,
- &dserver);
- if (status) {
- printk(KERN_ERR "%s: dserver get failed status %d use MDS\n",
- __func__, status);
+ idx = nfs4_fl_calc_ds_index(data->pdata.lseg, offset);
+ ds = nfs4_fl_prepare_ds(data->pdata.lseg, idx);
+ if (!ds) {
+ printk(KERN_ERR "%s: prepare_ds failed, use MDS\n", __func__);
return PNFS_NOT_ATTEMPTED;
}
-
dprintk("%s USE DS:ip %x %s\n", __func__,
- htonl(dserver.ds->ds_ip_addr), dserver.ds->r_addr);
+ htonl(ds->ds_ip_addr), ds->r_addr);

/* just try the first data server for the index..*/
- data->fldata.pnfs_client = dserver.ds->ds_clp->cl_rpcclient;
- data->fldata.ds_nfs_client = dserver.ds->ds_clp;
- data->args.fh = dserver.fh;
+ data->fldata.ds_nfs_client = ds->ds_clp;
+ data->fldata.pnfs_client = ds->ds_clp->cl_rpcclient;
+ data->args.fh = nfs4_fl_select_ds_fh(flseg, idx);

/* Now get the file offset on the dserver
* Set the read offset to this offset, and
@@ -255,32 +251,26 @@ filelayout_write_pagelist(struct pnfs_layout_type *layoutid,
{
struct inode *inode = PNFS_INODE(layoutid);
struct nfs4_filelayout_segment *flseg = LSEG_LD_DATA(data->pdata.lseg);
- struct nfs4_pnfs_dserver dserver;
- int status;
+ struct nfs4_pnfs_ds *ds;
+ u32 idx;

dprintk("--> %s ino %lu nr_pages %d pgbase %u req %Zu@%llu sync %d\n",
__func__, inode->i_ino, nr_pages, pgbase, count, offset, sync);

/* Retrieve the correct rpc_client for the byte range */
- status = nfs4_pnfs_dserver_get(data->pdata.lseg,
- offset,
- count,
- &dserver);
-
- if (status) {
- printk(KERN_ERR "%s: dserver get failed status %d use MDS\n",
- __func__, status);
+ idx = nfs4_fl_calc_ds_index(data->pdata.lseg, offset);
+ ds = nfs4_fl_prepare_ds(data->pdata.lseg, idx);
+ if (!ds) {
+ printk(KERN_ERR "%s: prepare_ds failed, use MDS\n", __func__);
return PNFS_NOT_ATTEMPTED;
}
-
dprintk("%s ino %lu %Zu@%llu DS:%x:%hu %s\n",
__func__, inode->i_ino, count, offset,
- htonl(dserver.ds->ds_ip_addr), ntohs(dserver.ds->ds_port),
- dserver.ds->r_addr);
+ htonl(ds->ds_ip_addr), ntohs(ds->ds_port), ds->r_addr);

- data->fldata.pnfs_client = dserver.ds->ds_clp->cl_rpcclient;
- data->fldata.ds_nfs_client = dserver.ds->ds_clp;
- data->args.fh = dserver.fh;
+ data->fldata.ds_nfs_client = ds->ds_clp;
+ data->fldata.pnfs_client = ds->ds_clp->cl_rpcclient;
+ data->args.fh = nfs4_fl_select_ds_fh(flseg, idx);

/* Get the file offset on the dserver. Set the write offset to
* this offset and save the original offset.
@@ -568,15 +558,12 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
{
struct nfs4_filelayout_segment *nfslay;
struct nfs_write_data *dsdata = NULL;
- struct nfs4_pnfs_dserver dserver;
struct nfs4_pnfs_ds *ds;
struct nfs_page *req, *reqt;
struct list_head *pos, *tmp, head, head2;
loff_t file_offset, comp_offset;
size_t stripesz, cbytes;
- int status;
enum pnfs_try_status trypnfs = PNFS_ATTEMPTED;
- struct nfs4_file_layout_dsaddr *dsaddr;
u32 idx1, idx2;

nfslay = LSEG_LD_DATA(data->pdata.lseg);
@@ -593,9 +580,6 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
stripesz = filelayout_get_stripesize(layoutid);
dprintk("%s stripesize %Zd\n", __func__, stripesz);

- dsaddr = container_of(data->pdata.lseg->deviceid,
- struct nfs4_file_layout_dsaddr, deviceid);
-
INIT_LIST_HEAD(&head);
INIT_LIST_HEAD(&head2);
list_add(&head, &data->pages);
@@ -609,19 +593,13 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
file_offset = (loff_t)req->wb_index << PAGE_CACHE_SHIFT;

/* Get dserver for the current page */
- status = nfs4_pnfs_dserver_get(data->pdata.lseg,
- file_offset,
- req->wb_bytes,
- &dserver);
- if (status) {
+ idx1 = nfs4_fl_calc_ds_index(data->pdata.lseg, file_offset);
+ ds = nfs4_fl_prepare_ds(data->pdata.lseg, idx1);
+ if (!ds) {
data->pdata.pnfs_error = -EIO;
goto err_rewind;
}

- /* Get its index */
- idx1 = filelayout_dserver_get_index(file_offset, dsaddr,
- nfslay);
-
/* Gather all pages going to the current data server by
* comparing their indices.
* XXX: This recalculates the indices unecessarily.
@@ -630,8 +608,8 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
list_for_each_safe(pos, tmp, &head) {
reqt = nfs_list_entry(pos);
comp_offset = (loff_t)reqt->wb_index << PAGE_CACHE_SHIFT;
- idx2 = filelayout_dserver_get_index(comp_offset,
- dsaddr, nfslay);
+ idx2 = nfs4_fl_calc_ds_index(data->pdata.lseg,
+ comp_offset);
if (idx1 == idx2) {
nfs_list_remove_request(reqt);
nfs_list_add_request(reqt, &head2);
@@ -655,10 +633,9 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
list_add(&dsdata->pages, &head2);
list_del_init(&head2);

- ds = dserver.ds;
dsdata->fldata.pnfs_client = ds->ds_clp->cl_rpcclient;
dsdata->fldata.ds_nfs_client = ds->ds_clp;
- dsdata->args.fh = dserver.fh;
+ dsdata->args.fh = nfs4_fl_select_ds_fh(nfslay, idx1);

dprintk("%s: Initiating commit: %Zu@%llu USE DS:\n",
__func__, cbytes, file_offset);
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index fbf307c..3697926 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -26,6 +26,9 @@

#define FILE_MT(inode) ((struct filelayout_mount_type *) \
(NFS_SERVER(inode)->pnfs_mountid->mountid))
+#define FILE_DSADDR(lseg) (container_of(lseg->deviceid, \
+ struct nfs4_file_layout_dsaddr, \
+ deviceid))

enum stripetype4 {
STRIPE_SPARSE = 1,
@@ -55,16 +58,6 @@ struct nfs4_pnfs_dev_hlist {
struct hlist_head dev_list[NFS4_PNFS_DEV_HASH_SIZE];
};

-/*
- * Used for I/O, Maps a stripe index to a layout file handle and a
- * multipath data server.
- */
-
-struct nfs4_pnfs_dserver {
- struct nfs_fh *fh;
- struct nfs4_pnfs_ds *ds;
-};
-
struct nfs4_filelayout_segment {
u32 stripe_type;
u32 commit_through_mds;
@@ -87,18 +80,24 @@ struct filelayout_mount_type {
struct super_block *fl_sb;
};

+static inline struct nfs_fh *
+nfs4_fl_select_ds_fh(struct nfs4_filelayout_segment *flseg, u32 idx)
+{
+ /* FRED - what about case == 0??? */
+ if (flseg->num_fh == 1)
+ return &flseg->fh_array[0];
+ else
+ return &flseg->fh_array[idx];
+}
+
extern struct pnfs_client_operations *pnfs_callback_ops;

extern void nfs4_fl_free_deviceid_callback(struct kref *);
extern void print_ds(struct nfs4_pnfs_ds *ds);
char *deviceid_fmt(const struct pnfs_deviceid *dev_id);
-int nfs4_pnfs_dserver_get(struct pnfs_layout_segment *lseg,
- loff_t offset,
- size_t count,
- struct nfs4_pnfs_dserver *dserver);
-u32 filelayout_dserver_get_index(loff_t offset,
- struct nfs4_file_layout_dsaddr *di,
- struct nfs4_filelayout_segment *layout);
+u32 nfs4_fl_calc_ds_index(struct pnfs_layout_segment *lseg, loff_t offset);
+struct nfs4_pnfs_ds *nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg,
+ u32 ds_idx);
extern struct nfs4_file_layout_dsaddr *
nfs4_pnfs_device_item_find(struct nfs_client *, struct pnfs_deviceid *dev_id);
struct nfs4_file_layout_dsaddr *
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index b04c9d9..cd39a6a 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -554,90 +554,61 @@ nfs4_pnfs_device_item_find(struct nfs_client *clp, struct pnfs_deviceid *id)
container_of(d, struct nfs4_file_layout_dsaddr, deviceid);
}

-/* Want res = ((offset / layout->stripe_unit) % dsaddr->stripe_count)
+/* Want res = (offset - layout->pattern_offset)/ layout->stripe_unit
* Then: ((res + fsi) % dsaddr->stripe_count)
*/
-u32
-filelayout_dserver_get_index(loff_t offset,
- struct nfs4_file_layout_dsaddr *dsaddr,
- struct nfs4_filelayout_segment *layout)
+static inline u32
+_nfs4_fl_calc_j_index(loff_t offset,
+ struct nfs4_file_layout_dsaddr *dsaddr,
+ struct nfs4_filelayout_segment *layout)
{
- u64 tmp, tmp2;
+ u64 tmp;

- tmp = offset;
+ tmp = offset - layout->pattern_offset;
do_div(tmp, layout->stripe_unit);
- tmp2 = do_div(tmp, dsaddr->stripe_count) + layout->first_stripe_index;
- return do_div(tmp2, dsaddr->stripe_count);
+ tmp += layout->first_stripe_index;
+ return do_div(tmp, dsaddr->stripe_count);
}

-/* Retrieve the rpc client for a specified byte range
- * in 'inode' by filling in the contents of 'dserver'.
- */
-int
-nfs4_pnfs_dserver_get(struct pnfs_layout_segment *lseg,
- loff_t offset,
- size_t count,
- struct nfs4_pnfs_dserver *dserver)
+u32
+nfs4_fl_calc_ds_index(struct pnfs_layout_segment *lseg, loff_t offset)
{
- struct nfs4_filelayout_segment *layout = LSEG_LD_DATA(lseg);
- struct inode *inode = PNFS_INODE(lseg->layout);
- struct nfs_server *mds_srv = NFS_SERVER(inode);
+ struct nfs4_filelayout_segment *flseg = LSEG_LD_DATA(lseg);
struct nfs4_file_layout_dsaddr *dsaddr;
- u64 tmp, tmp2;
- u32 stripe_idx, end_idx, ds_idx;
-
- if (!layout)
- return 1;
-
- dsaddr = container_of(lseg->deviceid, struct nfs4_file_layout_dsaddr,
- deviceid);
-
- stripe_idx = filelayout_dserver_get_index(offset, dsaddr, layout);
-
- /* For debugging, ensure entire requested range is in this dserver */
- tmp = offset + count - 1;
- do_div(tmp, layout->stripe_unit);
- tmp2 = do_div(tmp, dsaddr->stripe_count) + layout->first_stripe_index;
- end_idx = do_div(tmp2, dsaddr->stripe_count);
+ u32 j;

- dprintk("%s: offset=%Lu, count=%Zu, si=%u, dsi=%u, "
- "stripe_count=%u, stripe_unit=%u first_stripe_index %u\n",
- __func__,
- offset, count, stripe_idx, end_idx, dsaddr->stripe_count,
- layout->stripe_unit, layout->first_stripe_index);
+ dsaddr = FILE_DSADDR(lseg);
+ j = _nfs4_fl_calc_j_index(offset, dsaddr, flseg);
+ return dsaddr->stripe_indices[j];
+}

- BUG_ON(end_idx != stripe_idx);
- BUG_ON(stripe_idx >= dsaddr->stripe_count);
+struct nfs4_pnfs_ds *
+nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
+{
+ struct nfs4_filelayout_segment *flseg = LSEG_LD_DATA(lseg);
+ struct nfs4_file_layout_dsaddr *dsaddr;

- ds_idx = dsaddr->stripe_indices[stripe_idx];
+ dsaddr = FILE_DSADDR(lseg);
if (dsaddr->ds_list[ds_idx] == NULL) {
- printk(KERN_ERR "%s: No data server for device id (%s)!! \n",
- __func__, deviceid_fmt(&layout->dev_id));
- return 1;
+ printk(KERN_ERR "%s: No data server for device id (%s)!!\n",
+ __func__, deviceid_fmt(&flseg->dev_id));
+ return NULL;
}

if (!dsaddr->ds_list[ds_idx]->ds_clp) {
int err;

- err = nfs4_pnfs_ds_create(mds_srv, dsaddr->ds_list[ds_idx]);
+ err = nfs4_pnfs_ds_create(PNFS_NFS_SERVER(lseg->layout),
+ dsaddr->ds_list[ds_idx]);
if (err) {
printk(KERN_ERR "%s nfs4_pnfs_ds_create error %d\n",
__func__, err);
- return 1;
+ return NULL;
}
}
- dserver->ds = dsaddr->ds_list[ds_idx];
+ dprintk("%s: dev_id=%s, ds_idx=%u\n",
+ __func__, deviceid_fmt(&flseg->dev_id), ds_idx);

- if (layout->num_fh == 1)
- dserver->fh = &layout->fh_array[0];
- else
- dserver->fh = &layout->fh_array[ds_idx];
-
- dprintk("%s: dev_id=%s, ip:port=%s, ds_idx=%u stripe_idx=%u, "
- "offset=%llu, count=%Zu\n",
- __func__, deviceid_fmt(&layout->dev_id),
- dserver->ds->r_addr,
- ds_idx, stripe_idx, offset, count);
-
- return 0;
+ return dsaddr->ds_list[ds_idx];
}
+
--
1.6.6.1


2010-05-20 10:30:36

by Fred Isaman

[permalink] [raw]
Subject: [PATCH 07/22] pnfs: remove PNFS_LAYOUTGET_ON_OPEN

It is not used anywhere.

Signed-off-by: Fred Isaman <[email protected]>
---
fs/nfs/nfs4filelayout.c | 3 +--
include/linux/nfs4_pnfs.h | 14 --------------
2 files changed, 1 insertions(+), 16 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 3f31c32..c96dd0e 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -710,8 +710,7 @@ struct layoutdriver_io_operations filelayout_io_operations = {
};

struct layoutdriver_policy_operations filelayout_policy_operations = {
- .flags = PNFS_USE_RPC_CODE |
- PNFS_LAYOUTGET_ON_OPEN,
+ .flags = PNFS_USE_RPC_CODE,
.get_stripesize = filelayout_get_stripesize,
.pg_test = filelayout_pg_test,
};
diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
index 4d47b48..0feb5b7 100644
--- a/include/linux/nfs4_pnfs.h
+++ b/include/linux/nfs4_pnfs.h
@@ -165,11 +165,6 @@ enum layoutdriver_policy_flags {
/* Should the NFS req. gather algorithm cross stripe boundaries? */
PNFS_GATHER_ACROSS_STRIPES = 1 << 1,

- /* Should the pNFS client issue a layoutget call in the
- * same compound as the OPEN operation?
- */
- PNFS_LAYOUTGET_ON_OPEN = 1 << 2,
-
/* Should the pNFS client commit and return the layout upon a setattr */
PNFS_LAYOUTRET_ON_SETATTR = 1 << 3,
};
@@ -198,15 +193,6 @@ pnfs_ld_gather_across_stripes(struct pnfs_layoutdriver_type *ld)
return ld->ld_policy_ops->flags & PNFS_GATHER_ACROSS_STRIPES;
}

-/* Should the pNFS client issue a layoutget call in the
- * same compound as the OPEN operation?
- */
-static inline int
-pnfs_ld_layoutget_on_open(struct pnfs_layoutdriver_type *ld)
-{
- return ld->ld_policy_ops->flags & PNFS_LAYOUTGET_ON_OPEN;
-}
-
/* Should the pNFS client commit and return the layout upon a setattr
*/
static inline int
--
1.6.6.1