Hi,
At present one can probably not run NFS (or InterMezzo) on top of
tmpfs.
Is there a suggested solution for fh_to_dentry and dentry_to_fh for
tmpfs?
An "iget" based solution might work but at present tmpfs inodes are
not hashed.
Thanks for any suggestions!
"Peter J. Braam" wrote:
>
> Hi,
>
> At present one can probably not run NFS (or InterMezzo) on top of
> tmpfs.
>
> Is there a suggested solution for fh_to_dentry and dentry_to_fh for
> tmpfs?
>
> An "iget" based solution might work but at present tmpfs inodes are
> not hashed.
I talked to neil brown about NFS and ramfs... he mentioned using
iunique()
--
Jeff Garzik | "Why is it that attractive girls like you
Building 1024 | always seem to have a boyfriend?"
MandrakeSoft | "Because I'm a nympho that owns a brewery?"
| - BBC TV show "Coupling"
Hi,
> "Peter J. Braam" wrote:
...
> > Is there a suggested solution for fh_to_dentry and dentry_to_fh for
> > tmpfs?
> >
> > An "iget" based solution might work but at present tmpfs inodes are
> > not hashed.
On Wed, Feb 20, 2002 at 11:56:40AM -0500, Jeff Garzik wrote:
...
> I talked to neil brown about NFS and ramfs... he mentioned using
> iunique()
So do I understand that hashing tmpfs inodes is perhaps the way to go?
Would the following also work?
- have a 32 bit counter: set inode->i_ino to count++
- up the generation number each time the counter warps.
Between boot cycles NFS could still get confused, that might be helped
by setting the initial generation to the system time.
Thoughts anyone?
- Peter -
>>>>> " " == Peter J Braam <[email protected]> writes:
> Would the following also work?
> - have a 32 bit counter: set inode->i_ino to count++
That is exactly what iunique() does except that it also checks for
uniqueness and allows you to specify a minimum value. Sooner or later
your 32-bit counter will wrap round...
> - up the generation number each time the counter warps.
> Between boot cycles NFS could still get confused, that might be
> helped by setting the initial generation to the system time.
Yep. That is what the 'fat' filesystem does.
Cheers,
Trond
On Wednesday February 20, [email protected] wrote:
> Hi,
>
> > "Peter J. Braam" wrote:
> ...
> > > Is there a suggested solution for fh_to_dentry and dentry_to_fh for
> > > tmpfs?
> > >
> > > An "iget" based solution might work but at present tmpfs inodes are
> > > not hashed.
> On Wed, Feb 20, 2002 at 11:56:40AM -0500, Jeff Garzik wrote:
> ...
> > I talked to neil brown about NFS and ramfs... he mentioned using
> > iunique()
... but Trond had a better idea....
>
>
> So do I understand that hashing tmpfs inodes is perhaps the way to go?
>
> Would the following also work?
>
> - have a 32 bit counter: set inode->i_ino to count++
> - up the generation number each time the counter warps.
You don't just need a number in inode->i_ino. You also need to be
able to find an inode given that number.
So you need to store all the inodes in a hash table.
But you don't want to penalise non-NFS users.
I would probably:
leave i_ino as set by new_inode
initialise inode->i_generation to CURRENT_TIME
in dentry_to_fh,
check if list_empty(&inode->i_hash)
if it is, then add the inode to some hash table indexed by the
address of the inode
put the address of the inode, i_ino and i_generation in the filehandle
in fh_to_dentry,
lookup the given address in the hash table.
if it is found, check the i_ino and i_generation
That means you are only hashing inodes exported by NFS, and you have
a pretty good guarantee of uniqueness (providing time doesn't go
backwards).
NeilBrown
?b ?g?|, 2002-02-21 06:53, Neil Brown ?g?D?G
> On Wednesday February 20, [email protected] wrote:
> > Hi,
> >
> > > "Peter J. Braam" wrote:
> > ...
> > > > Is there a suggested solution for fh_to_dentry and dentry_to_fh for
> > > > tmpfs?
> > > >
> > > > An "iget" based solution might work but at present tmpfs inodes are
> > > > not hashed.
> > On Wed, Feb 20, 2002 at 11:56:40AM -0500, Jeff Garzik wrote:
> > ...
> > > I talked to neil brown about NFS and ramfs... he mentioned using
> > > iunique()
> ... but Trond had a better idea....
> >
> >
> > So do I understand that hashing tmpfs inodes is perhaps the way to go?
> >
> > Would the following also work?
> >
> > - have a 32 bit counter: set inode->i_ino to count++
> > - up the generation number each time the counter warps.
>
> You don't just need a number in inode->i_ino. You also need to be
> able to find an inode given that number.
> So you need to store all the inodes in a hash table.
> But you don't want to penalise non-NFS users.
>
> I would probably:
> leave i_ino as set by new_inode
> initialise inode->i_generation to CURRENT_TIME
>
> in dentry_to_fh,
> check if list_empty(&inode->i_hash)
> if it is, then add the inode to some hash table indexed by the
> address of the inode
> put the address of the inode, i_ino and i_generation in the filehandle
>
> in fh_to_dentry,
> lookup the given address in the hash table.
> if it is found, check the i_ino and i_generation
>
>
> That means you are only hashing inodes exported by NFS, and you have
> a pretty good guarantee of uniqueness (providing time doesn't go
> backwards).
>
> NeilBrown
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
What I suggest is nfsd should export a symbol called
generic_fh_to_dentry() such that it will be more generic like
generic_file_read() to handle gneeric calls for every fs.
Thanks,
David
On February 21, [email protected] wrote:
>
> What I suggest is nfsd should export a symbol called
> generic_fh_to_dentry() such that it will be more generic like
> generic_file_read() to handle gneeric calls for every fs.
But every filesystem is really very different in this reguard.
What would you think this "generic_fh_to_dentry" should do?
We actually already have one. You set ->fh_to_dentry to NULL, and the
it used "iget".
NeilBrown
Hi Peter,
On Wed, 20 Feb 2002, Peter J. Braam wrote:
> Between boot cycles NFS could still get confused, that might be
> helped by setting the initial generation to the system time.
Between boot cycles you loose _all_ tmpfs files. That's what the
'tmp' in tmpfs talks about ;-)
Greetings
Christoph
> That means you are only hashing inodes exported by NFS, and you have
> a pretty good guarantee of uniqueness (providing time doesn't go
> backwards).
this may be obvious... apologies.
don't use the TOD directly -- it can go backwards if ntpd or an admin
sets it back. better to use a monotonically increasing number that
you completely control yourself.
also, if your timer resolution isn't good enough, a window opens
where two generated "uniquifiers" can be the same for all intents
and purposes.
if there's nothing else we've learned from NFS, it's that using
timestamps is a lousy way of managing cache coherency and file
identity. ;-)
On Thursday February 21, [email protected] wrote:
> > That means you are only hashing inodes exported by NFS, and you have
> > a pretty good guarantee of uniqueness (providing time doesn't go
> > backwards).
>
> this may be obvious... apologies.
>
> don't use the TOD directly -- it can go backwards if ntpd or an admin
> sets it back. better to use a monotonically increasing number that
> you completely control yourself.
>
> also, if your timer resolution isn't good enough, a window opens
> where two generated "uniquifiers" can be the same for all intents
> and purposes.
Certainly timeofday by itself isn't enough for the various reasons you
mention. But it does help to avoid accepting filehandles from before
the last reboot.
In my proposal there there were three numbers:
An address
A sequentially assigned inode number
A time of day.
Any two of these is probably adequate most of the time, but could
occasionally result in equal filehandles for different files. Adding
a third makes collision virtually impossible.
NeilBrown
On Monday February 25, [email protected] wrote:
>
>
> Neil Brown wrote:
>
> >On February 21, [email protected] wrote:
> >
> >>What I suggest is nfsd should export a symbol called
> >>generic_fh_to_dentry() such that it will be more generic like
> >>generic_file_read() to handle gneeric calls for every fs.
> >>
> >
> >But every filesystem is really very different in this reguard.
> >
> >What would you think this "generic_fh_to_dentry" should do?
> >
> >We actually already have one. You set ->fh_to_dentry to NULL, and the
> >it used "iget".
> >
> >NeilBrown
> >
> You know, we have serious problem implementing non block device
> filesystems with NFS. I actually spend quite a long while understanding
> the nfsd code in 2.4 . Here is the problems that I suffer....
Well, you are not alone. NFS appears to have been designed with UFS -
the original Unix File System, or close relatives - specifically in
mind.
The further a file system diverges from this, the harder it is to
provide NFS support.
>
> nfsd calling iget to read an inode info directly into the dcache even
> though there is no valid linked dentry in the dcache that is in the
> list_empty(inode->i_dentry) list, but what we have implement in our non
> block device filesystem is that our inode is dynamically generated using
> lookup(), that means the inode is only valid if we have dentry
> information and going through a normal
> lookup(neg-dentry)=>read_inode(ino) procedure. When we implement a non
> block device fs and want to serve it with nfsd we also suffer from this
> problem. Usually non block device system is some kind of fs related with
> name space, and name space in terms of kernel space is dentry. I bet
> most of the non block device fs hvae some similar problem.
The "iget" approach is really a hack that sort-of works for ext2 and
pretty much doesn't work for any other filesystem.
Some time in 2.5, this "iget" approach will go away. Any filesystem
that wants to be NFS-exportable will have to define explicit methods
for exporting, quite similiar to (but sibtly different from) the
current fh_to_dentry and dentry_to_fh methods.
In 2.4, you should not even consider supporting iget usage by kNFSd,
you should supply fh_to_dentry and dentry_to_fh.
>
> I suggests the fh handle mechanism can handle this kind of situation so
> that non block device filesystems can work with NFS . The current nfs
> have to maintain a stateless design so there are no easy way to not
> allowing VFS to not touch the dcache during a request verification.
>
> I suggest the fh_verify should go through a proper lookup process for
> inodes that have an empty inode->i_dentry list. This will make life for
> non block device filesystem much easier.
What do you mean by "a proper lookup process"? Do you mean having a
full path name from the filesystem root and following that?
That might make it easy for the filesystem, but it would make it
impossible for the NFS server.
Not only does the NFS server not know the name of the file, it cannot
know as the file might have been renamed by some other process since the
NFS server last knew the name.
NFSv4 has a concept of a "volatile" file handle that is supposed to
help with this: The server can tell the client "that file handle
doesn't work any more", and then the client might re-issue the
filename look requests. But there are still possible issues with
files being renamed between accesses.
>
> I think all non block device fs and non-Unix based fs (fs that don't use
> inode number identification) will have the same problem because they
> simply use iunique and ino have no meaning to them unless the procedure
> lookup()=>read_inode() sequece is properly executed.
Yes. It is a difficult problem.
But to work with NFS (v2 or v3) the filesystem *MUST* be able to
provide a stable, fixed length identifier for a file that is not
changed by rename, truncate, or server restart (or anything else, but
those are often the difficult ones).
The extent to which your filesystem cannot provide such an identifier
is the extent to which it cannot support NFS.
FAT based filesystems are a good example. We provide 90% support. If
you try to access a FAT filesystem from Linux (with no_subtree_check),
it will mostly work.
But if you open a file on the client, truncate it, rename it, reboot
the server, and then write to it, it will fail. The same is not true
of ext2.
I hope this helps.
NeilBrown