We at IBM receiving multiple customer requests for supporting NFSv4
server migration.
I have referred to Trond and Chuck's presentations at 2011 Connectathon
and there appears to be
sizable work remaining. I am not sure if there is any progress made
since those talks.
I would like to open up a discussion thread on the mailing list to
understand the latest status.
Also would like to get the input on the Volatile Filehandles (VFH). I
searched the mailing list, and could not
find any recent discussion on this.
Some discussion points:
- What are the pieces left to attain full client/server support for
seamless server migration?
- Any discussion/sugestions on the way to implement VFH? As described
in RFC 3530 sections 4.2.3 and 4.2.4?
- Are there any community efforts going / about to start in this area?
so that we can partner and get
things done instead of duplicating the work.
Thanks a lot for your help
JV
On Wed, 2011-08-03 at 18:16 -0700, Malahal Naineni wrote:
> NeilBrown [[email protected]] wrote:
> >
> > I substantially agree, though I think the implication can be refined a
> > little.
> >
> > I would say that the implication is that a VFH is only really usable
> > when the complete path leading to the file in question is read-only.
> > We don't need to assume that other files in other parts of the
> > hierarchy which have stable file handles are read-only.
> >
> > So if the server presents us with a VFH, it seems reasonable to assume
> > that we can use a repeated lookup of the same name to refresh the
> > filehandle simply because there is no other credible way to respond to
> > a FHEXPIRED.
>
> The spec seems to imply that repeated lookup of the same name to refresh
> the file handle is OK as long as the file is OPEN! It doesn't seem to imply
> anything for files that are not opened.
I see no such implication for a migration situation, unless you also
migrate the open state.
Even if you do, then you still have to somehow recover directory
filehandles for the current directory and any other RPC call that
happened to be in progress when the original server was migrated.
> "RFC 3530, 4.2.3. Volatile Filehandle" states:
>
> "Servers which provide volatile filehandles that may expire while open
> (i.e., if FH4_VOL_MIGRATION or FH4_VOL_RENAME is set or if
> FH4_VOLATILE_ANY is set and FH4_NOEXPIRE_WITH_OPEN not set), should deny
> a RENAME or REMOVE that would affect an OPEN file of any of the
> components leading to the OPEN file. In addition, the server should
> deny all RENAME or REMOVE requests during the grace period upon server
> restart."
>
> On the other hand, if FH4_NOEXPIRE_WITH_OPEN is set, then the file can
> be allowed to be renamed or removed by the server.
Which completely violates the expectations of POSIX applications on the
client. Any idea how you would work around the above?
Trond
--
Trond Myklebust
Linux NFS client maintainer
NetApp
[email protected]
http://www.netapp.com
On Aug 2, 2011, at 7:58 AM, Venkateswararao Jujjuri wrote:
> We at IBM receiving multiple customer requests for supporting NFSv4 server migration.
> I have referred to Trond and Chuck's presentations at 2011 Connectathon and there appears to be
> sizable work remaining. I am not sure if there is any progress made since those talks.
>
> I would like to open up a discussion thread on the mailing list to understand the latest status.
> Also would like to get the input on the Volatile Filehandles (VFH). I searched the mailing list, and could not
> find any recent discussion on this.
>
> Some discussion points:
> - What are the pieces left to attain full client/server support for seamless server migration?
The client migration implementation is code complete and in test now. This includes both minor version 0 and 1. We don't have any mv1 servers to test with at this time, so that support is provisional. I hope to have patches ready for the 3.2 merge window, but you can see what I've got now on git.linux-nfs.org.
A problem is that there are corner cases in the v4.0 migration specification that are still unresolved. We are working with the NFSv4 WG to get these addressed. But I expect some minor changes even after the patches are merged upstream.
We don't have firm plans for a server migration implementation on Linux at this time, but Bruce can maybe say more about that.
> - Any discussion/sugestions on the way to implement VFH? As described in RFC 3530 sections 4.2.3 and 4.2.4?
I think we are avoiding volatile file handles as long as possible. We don't have plans to implement them at the moment.
> - Are there any community efforts going / about to start in this area? so that we can partner and get
> things done instead of duplicating the work.
>
> Thanks a lot for your help
> JV
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
On Fri, Aug 05, 2011 at 03:16:10PM -0400, J. Bruce Fields wrote:
> > Another scheme is to disconnect the file handles from the inode number.
> > I implemented this a couple years ago for a customer. Basically add
> > an extended attribute into each inode that contains the nfs file handle,
> > and that handle stays the same independent of the inode number. The
> > added complexity is that you need a new lookup data structure mapping
>
> And that data structure should be persistent--how were you storing it?
In this case it was in a clustered database in userspace, which I didn't
really touch directly. If doing outside of such an appliance I would
add a btree to XFS to do the mapping.
> > from your nfs handle to something that can be used to find the inode
> > (inode number typically).
>
> Interesting, I've wondered before how well that would work. Any lessons
> learned?
Don't store the data in userspace, it just makes life hard :) The
basic idea actually was pretty straight forward, and if doing it using
the xfs btree I mentioned above fairly easy. Other filesystems might or
might not heave similar easily reusable persistant lookup data structures.
On 08/02/2011 07:53 AM, Chuck Lever wrote:
> On Aug 2, 2011, at 7:58 AM, Venkateswararao Jujjuri wrote:
>
>> We at IBM receiving multiple customer requests for supporting NFSv4 server migration.
>> I have referred to Trond and Chuck's presentations at 2011 Connectathon and there appears to be
>> sizable work remaining. I am not sure if there is any progress made since those talks.
>>
>> I would like to open up a discussion thread on the mailing list to understand the latest status.
>> Also would like to get the input on the Volatile Filehandles (VFH). I searched the mailing list, and could not
>> find any recent discussion on this.
>>
>> Some discussion points:
>> - What are the pieces left to attain full client/server support for seamless server migration?
> The client migration implementation is code complete and in test now. This includes both minor version 0 and 1. We don't have any mv1 servers to test with at this time, so that support is provisional. I hope to have patches ready for the 3.2 merge window, but you can see what I've got now on git.linux-nfs.org.
Great I will take it from there. Is it your branch or Trand's ? Can you
please give me which project/branch
I should take from git.linux-nfs.org?
>
> A problem is that there are corner cases in the v4.0 migration specification that are still unresolved. We are working with the NFSv4 WG to get these addressed. But I expect some minor changes even after the patches are merged upstream.
What are those corner cases? Is it on any mailing list? Is it possible
for us to see that discussion?
>
> We don't have firm plans for a server migration implementation on Linux at this time, but Bruce can maybe say more about that.
Sure; would wait for Bruce's views on this. We are getting requirements
for both client and server support.
>> - Any discussion/sugestions on the way to implement VFH? As described in RFC 3530 sections 4.2.3 and 4.2.4?
> I think we are avoiding volatile file handles as long as possible. We don't have plans to implement them at the moment.
Hrm. How can we achieve the complete migration support without volatile
filehandle support?
What are the reasons for avoiding it? May be we can start looking into
this but would like to understand
the reasons (if any) for avoiding it.
Thanks a lot for your quick response.
- JV
>> - Are there any community efforts going / about to start in this area? so that we can partner and get
>> things done instead of duplicating the work.
>>
>> Thanks a lot for your help
>> JV
On Wed, 3 Aug 2011 05:27:26 -0700 "Myklebust, Trond"
<[email protected]> wrote:
> > >> - Any discussion/sugestions on the way to implement VFH? As
> > described in RFC 3530 sections 4.2.3 and 4.2.4?
> > > I think we are avoiding volatile file handles as long as possible.
> > We don't have plans to implement them at the moment.
> > Hrm. How can we achieve the complete migration support without
> volatile
> > filehandle support?
> > What are the reasons for avoiding it? May be we can start looking into
> > this but would like to understand
> > the reasons (if any) for avoiding it.
>
> POSIX allows the namespace to change at any time (rename() or unlink())
> and so you cannot rely on addressing files by pathname. That was the
> whole reason for introducing filehandles into NFSv2 in the first place.
>
> Volatile filehandles were introduced in NFSv4 without any attempt to fix
> those shortcomings. There is no real prescription for how to recover in
> a situation where a rename or unlink has occurred prior to the
> filehandle expiring. Nor is there a reliable prescription for dealing
> with the case where a new file of the same name has replaced the
> original.
> Basically, the implication is that volatile filehandles are only really
> usable in a situation where the whole Filesystem is read-only on the
> server.
I substantially agree, though I think the implication can be refined a little.
I would say that the implication is that a VFH is only really usable when the
complete path leading to the file in question is read-only. We don't need
to assume that other files in other parts of the hierarchy which have stable
file handles are read-only.
So if the server presents us with a VFH, it seems reasonable to assume that
we can use a repeated lookup of the same name to refresh the filehandle
simply because there is no other credible way to respond to a FHEXPIRED.
So while the spec doesn't explicitly say that an expired VFH can be expected
to never be renamed, it does - as you say - strongly imply that so it seems
reasonable to proceed with implementation on that basis...
Is that convincing?
NeilBrown
On Aug 3, 2011, at 3:28 AM, Venkateswararao Jujjuri wrote:
> On 08/02/2011 07:53 AM, Chuck Lever wrote:
>> On Aug 2, 2011, at 7:58 AM, Venkateswararao Jujjuri wrote:
>>
>>> We at IBM receiving multiple customer requests for supporting NFSv4 server migration.
>>> I have referred to Trond and Chuck's presentations at 2011 Connectathon and there appears to be
>>> sizable work remaining. I am not sure if there is any progress made since those talks.
>>>
>>> I would like to open up a discussion thread on the mailing list to understand the latest status.
>>> Also would like to get the input on the Volatile Filehandles (VFH). I searched the mailing list, and could not
>>> find any recent discussion on this.
>>>
>>> Some discussion points:
>>> - What are the pieces left to attain full client/server support for seamless server migration?
>> The client migration implementation is code complete and in test now. This includes both minor version 0 and 1. We don't have any mv1 servers to test with at this time, so that support is provisional. I hope to have patches ready for the 3.2 merge window, but you can see what I've got now on git.linux-nfs.org.
> Great I will take it from there. Is it your branch or Trand's ? Can you please give me which project/branch
> I should take from git.linux-nfs.org?
This source code is just for review and experimentation. I would wait for the merged code upstream before basing any work on it, as it needs to be forward ported to 3.1 and I expect there will be some minor architectural changes soon.
git://git.linux-nfs.org/projects/cel/cel-2.6.git
>> A problem is that there are corner cases in the v4.0 migration specification that are still unresolved. We are working with the NFSv4 WG to get these addressed. But I expect some minor changes even after the patches are merged upstream.
> What are those corner cases? Is it on any mailing list? Is it possible for us to see that discussion?
There has been a lot of face-to-face discussion about this over the past six months or so. We are planning to bring the discussion to the [email protected] mailing list (which is public) very soon.
>> We don't have firm plans for a server migration implementation on Linux at this time, but Bruce can maybe say more about that.
> Sure; would wait for Bruce's views on this. We are getting requirements for both client and server support.
Are you planning to work with the kernel's NFSD, or with Ganesha?
>>> - Any discussion/sugestions on the way to implement VFH? As described in RFC 3530 sections 4.2.3 and 4.2.4?
>> I think we are avoiding volatile file handles as long as possible. We don't have plans to implement them at the moment.
> Hrm. How can we achieve the complete migration support without volatile filehandle support?
> What are the reasons for avoiding it? May be we can start looking into this but would like to understand
> the reasons (if any) for avoiding it.
Migration itself does not require volatile file handles (FHs). If you are considering a simple-minded server implementation, like an rsync between heterogenous physical file systems, then yes, volatile FHs may be required. But as Trond points out, volatile FHs are a troubled concept anyway, even without migration in the picture.
Passing a client between two servers that export the same cluster file system would be a simple and common use of migration where file handles don't have to (and probably won't) change. We expect that the most common use cases for migration in the near term are going to involve scenarios with homogenous server OS and physical file systems, where FH format can be controlled and thus preserved across a migration event.
Robust server migration implementations, in other words, will migrate not just data, but also file handles, write verifiers, and NFSv4 state. This allows the greatest transparency for clients, and the smoothest possible migration recovery. Clients can be made simpler if FH recovery is not needed.
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
> -----Original Message-----
> From: Venkateswararao Jujjuri [mailto:[email protected]]
> Sent: Wednesday, August 03, 2011 3:28 AM
> To: Chuck Lever
> Cc: [email protected]; Myklebust, Trond
> Subject: Re: State of NFSv4 VolatileFilehandles
>
> On 08/02/2011 07:53 AM, Chuck Lever wrote:
> > On Aug 2, 2011, at 7:58 AM, Venkateswararao Jujjuri wrote:
> >
> >> We at IBM receiving multiple customer requests for supporting NFSv4
> server migration.
> >> I have referred to Trond and Chuck's presentations at 2011
> Connectathon and there appears to be
> >> sizable work remaining. I am not sure if there is any progress made
> since those talks.
> >>
> >> I would like to open up a discussion thread on the mailing list to
> understand the latest status.
> >> Also would like to get the input on the Volatile Filehandles (VFH).
> I searched the mailing list, and could not
> >> find any recent discussion on this.
> >>
> >> Some discussion points:
> >> - What are the pieces left to attain full client/server support for
> seamless server migration?
> > The client migration implementation is code complete and in test
now.
> This includes both minor version 0 and 1. We don't have any mv1
> servers to test with at this time, so that support is provisional. I
> hope to have patches ready for the 3.2 merge window, but you can see
> what I've got now on git.linux-nfs.org.
> Great I will take it from there. Is it your branch or Trand's ? Can
you
> please give me which project/branch
> I should take from git.linux-nfs.org?
> >
> > A problem is that there are corner cases in the v4.0 migration
> specification that are still unresolved. We are working with the
NFSv4
> WG to get these addressed. But I expect some minor changes even after
> the patches are merged upstream.
> What are those corner cases? Is it on any mailing list? Is it possible
> for us to see that discussion?
> >
> > We don't have firm plans for a server migration implementation on
> Linux at this time, but Bruce can maybe say more about that.
> Sure; would wait for Bruce's views on this. We are getting
requirements
> for both client and server support.
>
> >> - Any discussion/sugestions on the way to implement VFH? As
> described in RFC 3530 sections 4.2.3 and 4.2.4?
> > I think we are avoiding volatile file handles as long as possible.
> We don't have plans to implement them at the moment.
> Hrm. How can we achieve the complete migration support without
volatile
> filehandle support?
> What are the reasons for avoiding it? May be we can start looking into
> this but would like to understand
> the reasons (if any) for avoiding it.
POSIX allows the namespace to change at any time (rename() or unlink())
and so you cannot rely on addressing files by pathname. That was the
whole reason for introducing filehandles into NFSv2 in the first place.
Volatile filehandles were introduced in NFSv4 without any attempt to fix
those shortcomings. There is no real prescription for how to recover in
a situation where a rename or unlink has occurred prior to the
filehandle expiring. Nor is there a reliable prescription for dealing
with the case where a new file of the same name has replaced the
original.
Basically, the implication is that volatile filehandles are only really
usable in a situation where the whole Filesystem is read-only on the
server.
Cheers
Trond
NeilBrown [[email protected]] wrote:
> > POSIX allows the namespace to change at any time (rename() or unlink())
> > and so you cannot rely on addressing files by pathname. That was the
> > whole reason for introducing filehandles into NFSv2 in the first place.
> >
> > Volatile filehandles were introduced in NFSv4 without any attempt to fix
> > those shortcomings. There is no real prescription for how to recover in
> > a situation where a rename or unlink has occurred prior to the
> > filehandle expiring. Nor is there a reliable prescription for dealing
> > with the case where a new file of the same name has replaced the
> > original.
> > Basically, the implication is that volatile filehandles are only really
> > usable in a situation where the whole Filesystem is read-only on the
> > server.
>
> I substantially agree, though I think the implication can be refined a little.
>
> I would say that the implication is that a VFH is only really usable when the
> complete path leading to the file in question is read-only. We don't need
> to assume that other files in other parts of the hierarchy which have stable
> file handles are read-only.
The spec recommends "change" attribute for validating data cache, name
cache, etc. Some client implementations use "change" attribute for
validating VFH though! Can we use it for validating VFH?
Thanks, Malahal.
> -----Original Message-----
> From: J. Bruce Fields [mailto:[email protected]]
> Sent: Thursday, August 04, 2011 12:27 PM
> To: Myklebust, Trond
> Cc: Venkateswararao Jujjuri; Chuck Lever; [email protected]
> Subject: Re: State of NFSv4 VolatileFilehandles
>
> On Thu, Aug 04, 2011 at 12:10:44PM -0400, Trond Myklebust wrote:
> > On Thu, 2011-08-04 at 12:03 -0400, J. Bruce Fields wrote:
> > > On Thu, Aug 04, 2011 at 04:27:33AM -0700, Venkateswararao Jujjuri
> wrote:
> > > > One of the usecase is rsync between two physical filesystems;
but
> in
> > > > this particular use case the export
> > > > is readonly (rootfs). As trond mentioned Volatile FHs are fine
> in
> > > > the case of readonly exports.
> > > > Is it something we can consider for upstream? VFH only for
> readonly
> > > > exports.?
> > >
> > > The client has no way of knowing that an export is read only. (Or
> that
> > > the server guarantees the safety of looking up names again in the
> more
> > > general cases Neil describes.) Unless we decide that a server is
> making
> > > an implicit guarantee of that just by exposing volatile
filehandles
> at
> > > all. Doesn't sound like the existing spec really says that,
> though.
> >
> > NFSv4.1 introduces the 'fs_status' recommended attribute (see
section
> > 11.11 in RFC5661), which does, in fact, allow the client to deduce
> that
> > an export is read-only/won't ever change.
>
> Oh, neat, I'd forgotten that; you're thinking of STATUS4_FIXED? But
> I'm
> not sure it does the job:
>
> STATUS4_FIXED, which indicates a read-only image in the sense
> that it will never change. The possibility is allowed that, as
> a result of migration or switch to a different image, changed
> data can be accessed, but within the confines of this instance,
> no change is allowed. The client can use this fact to cache
> aggressively.
>
> OK, so permission to set your attribute cache timeout very high,
> perhaps, but I don't see why "changed data" couldn't mean changed
> paths....
No, but you can presumably use the FSLI4BX_CLSIMUL flag from
fs_locations_info in order to find an equivalent replica.
Cheers,
Trond
On Thu, 2011-08-04 at 12:03 -0400, J. Bruce Fields wrote:
> On Thu, Aug 04, 2011 at 04:27:33AM -0700, Venkateswararao Jujjuri wrote:
> > One of the usecase is rsync between two physical filesystems; but in
> > this particular use case the export
> > is readonly (rootfs). As trond mentioned Volatile FHs are fine in
> > the case of readonly exports.
> > Is it something we can consider for upstream? VFH only for readonly
> > exports.?
>
> The client has no way of knowing that an export is read only. (Or that
> the server guarantees the safety of looking up names again in the more
> general cases Neil describes.) Unless we decide that a server is making
> an implicit guarantee of that just by exposing volatile filehandles at
> all. Doesn't sound like the existing spec really says that, though.
NFSv4.1 introduces the 'fs_status' recommended attribute (see section
11.11 in RFC5661), which does, in fact, allow the client to deduce that
an export is read-only/won't ever change.
> If an examination of existing implementations and/or some sort of new
> spec language could reassure us that servers will only ever expose
> volatile filehandles when it's safe to do so, then maybe it would make
> sense for the client to implement volatile filehandle recovery?
>
> But if there's a chance of "unsafe" servers out there, then it would
> seem like a trap for the unwary user....
>
> Your rootfs's probably aren't terribly large--could you copy around
> compressed block-level images instead of doing rsync?
Agreed.
--
Trond Myklebust
Linux NFS client maintainer
NetApp
[email protected]
http://www.netapp.com
On Thu, Aug 04, 2011 at 12:10:44PM -0400, Trond Myklebust wrote:
> On Thu, 2011-08-04 at 12:03 -0400, J. Bruce Fields wrote:
> > On Thu, Aug 04, 2011 at 04:27:33AM -0700, Venkateswararao Jujjuri wrote:
> > > One of the usecase is rsync between two physical filesystems; but in
> > > this particular use case the export
> > > is readonly (rootfs). As trond mentioned Volatile FHs are fine in
> > > the case of readonly exports.
> > > Is it something we can consider for upstream? VFH only for readonly
> > > exports.?
> >
> > The client has no way of knowing that an export is read only. (Or that
> > the server guarantees the safety of looking up names again in the more
> > general cases Neil describes.) Unless we decide that a server is making
> > an implicit guarantee of that just by exposing volatile filehandles at
> > all. Doesn't sound like the existing spec really says that, though.
>
> NFSv4.1 introduces the 'fs_status' recommended attribute (see section
> 11.11 in RFC5661), which does, in fact, allow the client to deduce that
> an export is read-only/won't ever change.
Oh, neat, I'd forgotten that; you're thinking of STATUS4_FIXED? But I'm
not sure it does the job:
STATUS4_FIXED, which indicates a read-only image in the sense
that it will never change. The possibility is allowed that, as
a result of migration or switch to a different image, changed
data can be accessed, but within the confines of this instance,
no change is allowed. The client can use this fact to cache
aggressively.
OK, so permission to set your attribute cache timeout very high,
perhaps, but I don't see why "changed data" couldn't mean changed
paths....
--b.
> > If an examination of existing implementations and/or some sort of new
> > spec language could reassure us that servers will only ever expose
> > volatile filehandles when it's safe to do so, then maybe it would make
> > sense for the client to implement volatile filehandle recovery?
> >
> > But if there's a chance of "unsafe" servers out there, then it would
> > seem like a trap for the unwary user....
> >
> > Your rootfs's probably aren't terribly large--could you copy around
> > compressed block-level images instead of doing rsync?
>
> Agreed.
NeilBrown [[email protected]] wrote:
>
> I substantially agree, though I think the implication can be refined a
> little.
>
> I would say that the implication is that a VFH is only really usable
> when the complete path leading to the file in question is read-only.
> We don't need to assume that other files in other parts of the
> hierarchy which have stable file handles are read-only.
>
> So if the server presents us with a VFH, it seems reasonable to assume
> that we can use a repeated lookup of the same name to refresh the
> filehandle simply because there is no other credible way to respond to
> a FHEXPIRED.
The spec seems to imply that repeated lookup of the same name to refresh
the file handle is OK as long as the file is OPEN! It doesn't seem to imply
anything for files that are not opened.
"RFC 3530, 4.2.3. Volatile Filehandle" states:
"Servers which provide volatile filehandles that may expire while open
(i.e., if FH4_VOL_MIGRATION or FH4_VOL_RENAME is set or if
FH4_VOLATILE_ANY is set and FH4_NOEXPIRE_WITH_OPEN not set), should deny
a RENAME or REMOVE that would affect an OPEN file of any of the
components leading to the OPEN file. In addition, the server should
deny all RENAME or REMOVE requests during the grace period upon server
restart."
On the other hand, if FH4_NOEXPIRE_WITH_OPEN is set, then the file can
be allowed to be renamed or removed by the server.
Thanks, Malahal.
On Thu, 2011-08-04 at 13:03 -0400, J. Bruce Fields wrote:
> On Thu, Aug 04, 2011 at 09:48:32AM -0700, Myklebust, Trond wrote:
> > > -----Original Message-----
> > > From: J. Bruce Fields [mailto:[email protected]]
> > > Oh, neat, I'd forgotten that; you're thinking of STATUS4_FIXED? But
> > > I'm
> > > not sure it does the job:
> > >
> > > STATUS4_FIXED, which indicates a read-only image in the sense
> > > that it will never change. The possibility is allowed that, as
> > > a result of migration or switch to a different image, changed
> > > data can be accessed, but within the confines of this instance,
> > > no change is allowed. The client can use this fact to cache
> > > aggressively.
> > >
> > > OK, so permission to set your attribute cache timeout very high,
> > > perhaps, but I don't see why "changed data" couldn't mean changed
> > > paths....
> >
> > No, but you can presumably use the FSLI4BX_CLSIMUL flag from
> > fs_locations_info in order to find an equivalent replica.
>
> I lost you.
>
> Actually my real problem is that I don't understand the description of
> STATUS4_FIXED. What does "or switch to a different image" mean? Not
> "migration", or the sentence would have ended before the "or".
>
> I read it as allowing a server admin to replace the filesystem image in
> place, in which case from the client's point of view this allows the
> filesystem to change at any time. Which makes the whole thing not
> terribly useful, except (as the last sentence says) as a caching hint.
If the server admin replaces one filesystem, with a different
filesystem, then nothing is going to work anyway. I don't see how that
is relevant. That's a case of 'doctor it hurts...'
The bit that _is_ relevant is the 'migration' part, but since the
fs_locations_info FSLI4BX_CLSIMUL flag allows you to conclude that
replica is an exact replica at all times (i.e. contents are guaranteed
to be the same even if filehandles, directory cookies, etc are not) then
the STATUS4_FIXED flag does allow you to assume that paths have not
changed.
--
Trond Myklebust
Linux NFS client maintainer
NetApp
[email protected]
http://www.netapp.com
Venkateswararao Jujjuri [[email protected]] wrote:
> >I think we are avoiding volatile file handles as long as possible. We don't have plans to implement them at the moment.
> Hrm. How can we achieve the complete migration support without
> volatile filehandle support?
If you can generate persistent file handles that don't include server
specific information (maybe a bit hard, but not impossible), migration
can be done. Maybe that is what Chuck is talking about.
On 08/03/2011 03:13 PM, Chuck Lever wrote:
> On Aug 3, 2011, at 3:28 AM, Venkateswararao Jujjuri wrote:
>
>> On 08/02/2011 07:53 AM, Chuck Lever wrote:
>>> On Aug 2, 2011, at 7:58 AM, Venkateswararao Jujjuri wrote:
>>>
>>>> We at IBM receiving multiple customer requests for supporting NFSv4 server migration.
>>>> I have referred to Trond and Chuck's presentations at 2011 Connectathon and there appears to be
>>>> sizable work remaining. I am not sure if there is any progress made since those talks.
>>>>
>>>> I would like to open up a discussion thread on the mailing list to understand the latest status.
>>>> Also would like to get the input on the Volatile Filehandles (VFH). I searched the mailing list, and could not
>>>> find any recent discussion on this.
>>>>
>>>> Some discussion points:
>>>> - What are the pieces left to attain full client/server support for seamless server migration?
>>> The client migration implementation is code complete and in test now. This includes both minor version 0 and 1. We don't have any mv1 servers to test with at this time, so that support is provisional. I hope to have patches ready for the 3.2 merge window, but you can see what I've got now on git.linux-nfs.org.
>> Great I will take it from there. Is it your branch or Trand's ? Can you please give me which project/branch
>> I should take from git.linux-nfs.org?
> This source code is just for review and experimentation. I would wait for the merged code upstream before basing any work on it, as it needs to be forward ported to 3.1 and I expect there will be some minor architectural changes soon.
>
> git://git.linux-nfs.org/projects/cel/cel-2.6.git
>
>>> A problem is that there are corner cases in the v4.0 migration specification that are still unresolved. We are working with the NFSv4 WG to get these addressed. But I expect some minor changes even after the patches are merged upstream.
>> What are those corner cases? Is it on any mailing list? Is it possible for us to see that discussion?
> There has been a lot of face-to-face discussion about this over the past six months or so. We are planning to bring the discussion to the [email protected] mailing list (which is public) very soon.
>
>>> We don't have firm plans for a server migration implementation on Linux at this time, but Bruce can maybe say more about that.
>> Sure; would wait for Bruce's views on this. We are getting requirements for both client and server support.
> Are you planning to work with the kernel's NFSD, or with Ganesha?
Currently we are looking for kernel NFSD.
>
>>>> - Any discussion/sugestions on the way to implement VFH? As described in RFC 3530 sections 4.2.3 and 4.2.4?
>>> I think we are avoiding volatile file handles as long as possible. We don't have plans to implement them at the moment.
>> Hrm. How can we achieve the complete migration support without volatile filehandle support?
>> What are the reasons for avoiding it? May be we can start looking into this but would like to understand
>> the reasons (if any) for avoiding it.
> Migration itself does not require volatile file handles (FHs). If you are considering a simple-minded server implementation, like an rsync between heterogenous physical file systems, then yes, volatile FHs may be required. But as Trond points out, volatile FHs are a troubled concept anyway, even without migration in the picture.
One of the usecase is rsync between two physical filesystems; but in
this particular use case the export
is readonly (rootfs). As trond mentioned Volatile FHs are fine in the
case of readonly exports.
Is it something we can consider for upstream? VFH only for readonly
exports.?
>
> Passing a client between two servers that export the same cluster file system would be a simple and common use of migration where file handles don't have to (and probably won't) change. We expect that the most common use cases for migration in the near term are going to involve scenarios with homogenous server OS and physical file systems, where FH format can be controlled and thus preserved across a migration event.
>
> Robust server migration implementations, in other words, will migrate not just data, but also file handles, write verifiers, and NFSv4 state. This allows the greatest transparency for clients, and the smoothest possible migration recovery. Clients can be made simpler if FH recovery is not needed.
Yes this totally makes sense.
Thanks,
JV
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>
>
On Thu, Aug 04, 2011 at 01:21:32PM -0400, Trond Myklebust wrote:
> On Thu, 2011-08-04 at 13:03 -0400, J. Bruce Fields wrote:
> > On Thu, Aug 04, 2011 at 09:48:32AM -0700, Myklebust, Trond wrote:
> > > > -----Original Message-----
> > > > From: J. Bruce Fields [mailto:[email protected]]
> > > > Oh, neat, I'd forgotten that; you're thinking of STATUS4_FIXED? But
> > > > I'm
> > > > not sure it does the job:
> > > >
> > > > STATUS4_FIXED, which indicates a read-only image in the sense
> > > > that it will never change. The possibility is allowed that, as
> > > > a result of migration or switch to a different image, changed
> > > > data can be accessed, but within the confines of this instance,
> > > > no change is allowed. The client can use this fact to cache
> > > > aggressively.
> > > >
> > > > OK, so permission to set your attribute cache timeout very high,
> > > > perhaps, but I don't see why "changed data" couldn't mean changed
> > > > paths....
> > >
> > > No, but you can presumably use the FSLI4BX_CLSIMUL flag from
> > > fs_locations_info in order to find an equivalent replica.
> >
> > I lost you.
> >
> > Actually my real problem is that I don't understand the description of
> > STATUS4_FIXED. What does "or switch to a different image" mean? Not
> > "migration", or the sentence would have ended before the "or".
> >
> > I read it as allowing a server admin to replace the filesystem image in
> > place, in which case from the client's point of view this allows the
> > filesystem to change at any time. Which makes the whole thing not
> > terribly useful, except (as the last sentence says) as a caching hint.
>
> If the server admin replaces one filesystem, with a different
> filesystem, then nothing is going to work anyway. I don't see how that
> is relevant. That's a case of 'doctor it hurts...'
>
> The bit that _is_ relevant is the 'migration' part, but since the
> fs_locations_info FSLI4BX_CLSIMUL flag allows you to conclude that
> replica is an exact replica at all times (i.e. contents are guaranteed
> to be the same even if filehandles, directory cookies, etc are not) then
> the STATUS4_FIXED flag does allow you to assume that paths have not
> changed.
So you're position is that "or switch to a different image" in the above
is redundant, or just a mistake?
--b.
On Thu, Aug 04, 2011 at 04:27:33AM -0700, Venkateswararao Jujjuri wrote:
> One of the usecase is rsync between two physical filesystems; but in
> this particular use case the export
> is readonly (rootfs). As trond mentioned Volatile FHs are fine in
> the case of readonly exports.
> Is it something we can consider for upstream? VFH only for readonly
> exports.?
The client has no way of knowing that an export is read only. (Or that
the server guarantees the safety of looking up names again in the more
general cases Neil describes.) Unless we decide that a server is making
an implicit guarantee of that just by exposing volatile filehandles at
all. Doesn't sound like the existing spec really says that, though.
If an examination of existing implementations and/or some sort of new
spec language could reassure us that servers will only ever expose
volatile filehandles when it's safe to do so, then maybe it would make
sense for the client to implement volatile filehandle recovery?
But if there's a chance of "unsafe" servers out there, then it would
seem like a trap for the unwary user....
Your rootfs's probably aren't terribly large--could you copy around
compressed block-level images instead of doing rsync?
--b.
On Wed, Aug 03, 2011 at 12:28:20AM -0700, Venkateswararao Jujjuri wrote:
> On 08/02/2011 07:53 AM, Chuck Lever wrote:
> >We don't have firm plans for a server migration implementation on Linux at this time, but Bruce can maybe say more about that.
> Sure; would wait for Bruce's views on this. We are getting
> requirements for both client and server support.
We've been looking at migration and failover, backed by a cluster
filesystem, using floating IP's as a way to get most of the benefits
without quite as much fiddling with protocol issues and without
requiring the absolute latest clients.
We'd likely look into NFSv4 protocol-based migration after that. Is
there some reason you require that in particular? Is it only because
you want to be able to migrate using rsync and count on the client
recovering volatile filehandles?
--b.
On Fri, Aug 05, 2011 at 09:38:33AM -0400, Christoph Hellwig wrote:
> On Thu, Aug 04, 2011 at 12:03:44PM -0400, J. Bruce Fields wrote:
> > The client has no way of knowing that an export is read only. (Or that
> > the server guarantees the safety of looking up names again in the more
> > general cases Neil describes.) Unless we decide that a server is making
> > an implicit guarantee of that just by exposing volatile filehandles at
> > all. Doesn't sound like the existing spec really says that, though.
> >
> > If an examination of existing implementations and/or some sort of new
> > spec language could reassure us that servers will only ever expose
> > volatile filehandles when it's safe to do so, then maybe it would make
> > sense for the client to implement volatile filehandle recovery?
> >
> > But if there's a chance of "unsafe" servers out there, then it would
> > seem like a trap for the unwary user....
> >
> > Your rootfs's probably aren't terribly large--could you copy around
> > compressed block-level images instead of doing rsync?
>
> Another scheme is to disconnect the file handles from the inode number.
> I implemented this a couple years ago for a customer. Basically add
> an extended attribute into each inode that contains the nfs file handle,
> and that handle stays the same independent of the inode number. The
> added complexity is that you need a new lookup data structure mapping
And that data structure should be persistent--how were you storing it?
> from your nfs handle to something that can be used to find the inode
> (inode number typically).
Interesting, I've wondered before how well that would work. Any lessons
learned?
--b.
On Tue, 16 Aug 2011 08:59:39 -0700 Malahal Naineni <[email protected]> wrote:
> Trond Myklebust [[email protected]] wrote:
> > On Mon, 2011-08-15 at 13:49 -0700, Malahal Naineni wrote:
> > > NeilBrown [[email protected]] wrote:
> > > > > POSIX allows the namespace to change at any time (rename() or unlink())
> > > > > and so you cannot rely on addressing files by pathname. That was the
> > > > > whole reason for introducing filehandles into NFSv2 in the first place.
> > > > >
> > > > > Volatile filehandles were introduced in NFSv4 without any attempt to fix
> > > > > those shortcomings. There is no real prescription for how to recover in
> > > > > a situation where a rename or unlink has occurred prior to the
> > > > > filehandle expiring. Nor is there a reliable prescription for dealing
> > > > > with the case where a new file of the same name has replaced the
> > > > > original.
> > > > > Basically, the implication is that volatile filehandles are only really
> > > > > usable in a situation where the whole Filesystem is read-only on the
> > > > > server.
> > > >
> > > > I substantially agree, though I think the implication can be refined a little.
> > > >
> > > > I would say that the implication is that a VFH is only really usable when the
> > > > complete path leading to the file in question is read-only. We don't need
> > > > to assume that other files in other parts of the hierarchy which have stable
> > > > file handles are read-only.
> > >
> > > The spec recommends "change" attribute for validating data cache, name
> > > cache, etc. Some client implementations use "change" attribute for
> > > validating VFH though! Can we use it for validating VFH?
> >
> > The change attribute can only be used as a heuristic since it is not
> > guaranteed to be a value that is unique to one file.
>
> Agreed, it is a heuristic if we only use the file's "change id". If we
> want to be very strict, we could potentially use change ids of all the
> path components in the pathname... OR how about a mount option "use VFH
> at your own risk"?
I don't think change-id is really useful even as an heuristic. Not only are
they not unique, but they are not guaranteed to be stable either (after all,
something might have changed when the file handle expired).
I think the *only* credible response to FHEXPIRED is to re-lookup the same
name and as the spec doesn't make any promises about that it is *only* safe
to do it with explicit permission through a mount option.
NeilBrown
On Thu, 2011-08-04 at 13:30 -0400, J. Bruce Fields wrote:
> On Thu, Aug 04, 2011 at 01:21:32PM -0400, Trond Myklebust wrote:
> > On Thu, 2011-08-04 at 13:03 -0400, J. Bruce Fields wrote:
> > > On Thu, Aug 04, 2011 at 09:48:32AM -0700, Myklebust, Trond wrote:
> > > > > -----Original Message-----
> > > > > From: J. Bruce Fields [mailto:[email protected]]
> > > > > Oh, neat, I'd forgotten that; you're thinking of STATUS4_FIXED? But
> > > > > I'm
> > > > > not sure it does the job:
> > > > >
> > > > > STATUS4_FIXED, which indicates a read-only image in the sense
> > > > > that it will never change. The possibility is allowed that, as
> > > > > a result of migration or switch to a different image, changed
> > > > > data can be accessed, but within the confines of this instance,
> > > > > no change is allowed. The client can use this fact to cache
> > > > > aggressively.
> > > > >
> > > > > OK, so permission to set your attribute cache timeout very high,
> > > > > perhaps, but I don't see why "changed data" couldn't mean changed
> > > > > paths....
> > > >
> > > > No, but you can presumably use the FSLI4BX_CLSIMUL flag from
> > > > fs_locations_info in order to find an equivalent replica.
> > >
> > > I lost you.
> > >
> > > Actually my real problem is that I don't understand the description of
> > > STATUS4_FIXED. What does "or switch to a different image" mean? Not
> > > "migration", or the sentence would have ended before the "or".
> > >
> > > I read it as allowing a server admin to replace the filesystem image in
> > > place, in which case from the client's point of view this allows the
> > > filesystem to change at any time. Which makes the whole thing not
> > > terribly useful, except (as the last sentence says) as a caching hint.
> >
> > If the server admin replaces one filesystem, with a different
> > filesystem, then nothing is going to work anyway. I don't see how that
> > is relevant. That's a case of 'doctor it hurts...'
> >
> > The bit that _is_ relevant is the 'migration' part, but since the
> > fs_locations_info FSLI4BX_CLSIMUL flag allows you to conclude that
> > replica is an exact replica at all times (i.e. contents are guaranteed
> > to be the same even if filehandles, directory cookies, etc are not) then
> > the STATUS4_FIXED flag does allow you to assume that paths have not
> > changed.
>
> So you're position is that "or switch to a different image" in the above
> is redundant, or just a mistake?
It's redundant: STATUS4_VERSIONED, STATUS4_UPDATED, STATUS4_WRITABLE,
and STATUS4_REFERRAL are also subject to the 'or switch to a different
image' caveat.
--
Trond Myklebust
Linux NFS client maintainer
NetApp
[email protected]
http://www.netapp.com
Trond Myklebust [[email protected]] wrote:
> On Mon, 2011-08-15 at 13:49 -0700, Malahal Naineni wrote:
> > NeilBrown [[email protected]] wrote:
> > > > POSIX allows the namespace to change at any time (rename() or unlink())
> > > > and so you cannot rely on addressing files by pathname. That was the
> > > > whole reason for introducing filehandles into NFSv2 in the first place.
> > > >
> > > > Volatile filehandles were introduced in NFSv4 without any attempt to fix
> > > > those shortcomings. There is no real prescription for how to recover in
> > > > a situation where a rename or unlink has occurred prior to the
> > > > filehandle expiring. Nor is there a reliable prescription for dealing
> > > > with the case where a new file of the same name has replaced the
> > > > original.
> > > > Basically, the implication is that volatile filehandles are only really
> > > > usable in a situation where the whole Filesystem is read-only on the
> > > > server.
> > >
> > > I substantially agree, though I think the implication can be refined a little.
> > >
> > > I would say that the implication is that a VFH is only really usable when the
> > > complete path leading to the file in question is read-only. We don't need
> > > to assume that other files in other parts of the hierarchy which have stable
> > > file handles are read-only.
> >
> > The spec recommends "change" attribute for validating data cache, name
> > cache, etc. Some client implementations use "change" attribute for
> > validating VFH though! Can we use it for validating VFH?
>
> The change attribute can only be used as a heuristic since it is not
> guaranteed to be a value that is unique to one file.
Agreed, it is a heuristic if we only use the file's "change id". If we
want to be very strict, we could potentially use change ids of all the
path components in the pathname... OR how about a mount option "use VFH
at your own risk"?
Thanks, Malahal.
On Mon, 2011-08-15 at 13:49 -0700, Malahal Naineni wrote:
> NeilBrown [[email protected]] wrote:
> > > POSIX allows the namespace to change at any time (rename() or unlink())
> > > and so you cannot rely on addressing files by pathname. That was the
> > > whole reason for introducing filehandles into NFSv2 in the first place.
> > >
> > > Volatile filehandles were introduced in NFSv4 without any attempt to fix
> > > those shortcomings. There is no real prescription for how to recover in
> > > a situation where a rename or unlink has occurred prior to the
> > > filehandle expiring. Nor is there a reliable prescription for dealing
> > > with the case where a new file of the same name has replaced the
> > > original.
> > > Basically, the implication is that volatile filehandles are only really
> > > usable in a situation where the whole Filesystem is read-only on the
> > > server.
> >
> > I substantially agree, though I think the implication can be refined a little.
> >
> > I would say that the implication is that a VFH is only really usable when the
> > complete path leading to the file in question is read-only. We don't need
> > to assume that other files in other parts of the hierarchy which have stable
> > file handles are read-only.
>
> The spec recommends "change" attribute for validating data cache, name
> cache, etc. Some client implementations use "change" attribute for
> validating VFH though! Can we use it for validating VFH?
The change attribute can only be used as a heuristic since it is not
guaranteed to be a value that is unique to one file.
Trond
--
Trond Myklebust
Linux NFS client maintainer
NetApp
[email protected]
http://www.netapp.com
On Thu, Aug 04, 2011 at 09:48:32AM -0700, Myklebust, Trond wrote:
> > -----Original Message-----
> > From: J. Bruce Fields [mailto:[email protected]]
> > Oh, neat, I'd forgotten that; you're thinking of STATUS4_FIXED? But
> > I'm
> > not sure it does the job:
> >
> > STATUS4_FIXED, which indicates a read-only image in the sense
> > that it will never change. The possibility is allowed that, as
> > a result of migration or switch to a different image, changed
> > data can be accessed, but within the confines of this instance,
> > no change is allowed. The client can use this fact to cache
> > aggressively.
> >
> > OK, so permission to set your attribute cache timeout very high,
> > perhaps, but I don't see why "changed data" couldn't mean changed
> > paths....
>
> No, but you can presumably use the FSLI4BX_CLSIMUL flag from
> fs_locations_info in order to find an equivalent replica.
I lost you.
Actually my real problem is that I don't understand the description of
STATUS4_FIXED. What does "or switch to a different image" mean? Not
"migration", or the sentence would have ended before the "or".
I read it as allowing a server admin to replace the filesystem image in
place, in which case from the client's point of view this allows the
filesystem to change at any time. Which makes the whole thing not
terribly useful, except (as the last sentence says) as a caching hint.
--b.
On Thu, Aug 04, 2011 at 12:03:44PM -0400, J. Bruce Fields wrote:
> The client has no way of knowing that an export is read only. (Or that
> the server guarantees the safety of looking up names again in the more
> general cases Neil describes.) Unless we decide that a server is making
> an implicit guarantee of that just by exposing volatile filehandles at
> all. Doesn't sound like the existing spec really says that, though.
>
> If an examination of existing implementations and/or some sort of new
> spec language could reassure us that servers will only ever expose
> volatile filehandles when it's safe to do so, then maybe it would make
> sense for the client to implement volatile filehandle recovery?
>
> But if there's a chance of "unsafe" servers out there, then it would
> seem like a trap for the unwary user....
>
> Your rootfs's probably aren't terribly large--could you copy around
> compressed block-level images instead of doing rsync?
Another scheme is to disconnect the file handles from the inode number.
I implemented this a couple years ago for a customer. Basically add
an extended attribute into each inode that contains the nfs file handle,
and that handle stays the same independent of the inode number. The
added complexity is that you need a new lookup data structure mapping
from your nfs handle to something that can be used to find the inode
(inode number typically).