Hello,
Does NFS4 still depend on inode generation numbers? I understand that
earlier NFS versions used file handles based on inode and generation
number, but it seems to me that this shouldn't be required anymore with
the (stateful) NFS4.
Best,
-Nikolaus
--
»Time flies like an arrow, fruit flies like a Banana.«
PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C
On 10/22/2011 11:47 AM, Boaz Harrosh wrote:
> On 10/21/2011 10:44 AM, Nikolaus Rath wrote:
>> Yes, with 64bit inodes everything would be fine. But fuse uses 'long'
>> for inodes, so on 32bit systems you only have 32bit inodes even if ino_t
>> is 64bit.
>
> With out knowing any of the details, what kind of brain dead thing is
> putting "long" in an API. What do you do with 32bit user-mode on 64bit
> Kernel.
>
API => ABI
> 32 bits for inode numbers is a broken system for 15 years already.
> Fix the broken FUSE that falsely calls itself File-System and
> stop bitching about NFS. It's FUSE you should be hacking and fixing.
>
> Realy
> Boaz
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/21/2011 10:44 AM, Nikolaus Rath wrote:
> Yes, with 64bit inodes everything would be fine. But fuse uses 'long'
> for inodes, so on 32bit systems you only have 32bit inodes even if ino_t
> is 64bit.
With out knowing any of the details, what kind of brain dead thing is
putting "long" in an API. What do you do with 32bit user-mode on 64bit
Kernel.
32 bits for inode numbers is a broken system for 15 years already.
Fix the broken FUSE that falsely calls itself File-System and
stop bitching about NFS. It's FUSE you should be hacking and fixing.
Realy
Boaz
Boaz Harrosh <[email protected]> wrote:
>On 10/21/2011 10:44 AM, Nikolaus Rath wrote:
>> Yes, with 64bit inodes everything would be fine. But fuse uses 'long'
>> for inodes, so on 32bit systems you only have 32bit inodes even if
>ino_t
>> is 64bit.
>
>With out knowing any of the details, what kind of brain dead thing is
>putting "long" in an API. What do you do with 32bit user-mode on 64bit
>Kernel.
It's called a bug, they are known to appear occasionally in computer software.
>
>32 bits for inode numbers is a broken system for 15 years already.
>Fix the broken FUSE that falsely calls itself File-System and
>stop bitching about NFS. It's FUSE you should be hacking and fixing.
I merely answered the question of why I was interested in generation numbers in NFS4.
Best,
Nikolaus
--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
On 10/21/2011 12:00 PM, Trond Myklebust wrote:
> On Fri, 2011-10-21 at 09:54 -0400, Nikolaus Rath wrote:
>> Trond Myklebust <[email protected]> writes:
>>> On Thu, 2011-10-20 at 16:37 -0400, Nikolaus Rath wrote:
>>>> "J. Bruce Fields" <[email protected]> writes:
>>>>> On Thu, Oct 20, 2011 at 01:21:31PM -0400, Nikolaus Rath wrote:
>>>>>> I'm working on a FUSE file system that stores file system metadata in an
>>>>>> SQL database (http://code.google.com/p/s3ql/). Not having to keep track
>>>>>> of inode generation numbers would keep the code much simpler, because I
>>>>>> want to delete inode-rows from the SQL table when the last reference to
>>>>>> the inode is deleted (so I can't keep track of the generation no).
>>>>>
>>>>> You can use current time, or a counter, or something, as the generation
>>>>> number.
>>>>
>>>> With current time I'm screwed if the system clock doesn't have
>>>> sufficiently fine granularity. With a counter, I either have to remember
>>>> counter values per-inode even after the inode is deleted, or the global
>>>> counter will overflow at some point (in which case I may just as well
>>>> require unique inodes in the first place).
>>>
>>> The filehandle is between 32 (NFSv2) and 128(NFSv4) bytes long. How long
>>> do you expect it to take you to create+destroy between 2^256 and 2^1024
>>> inodes? I'm guessing that we'll all be long dead and the universe will
>>> have undergone heat death before that happens...
>>
>> Please stop assuming that I'm stupid or haven't thought about the
>> problem at all. The bottleneck is not the length of the NFS file handle,
>> but the length of the inode and generation number (both of which are
>> restricted to 32bit by FUSE) together with the requirement that not only
>> both of them together need to be unique forever, but the inode also
>> needs to be unique at any given instant (so they cannot be trivially
>> combined to form a 64bit value).
>
> No. The point is you don't need a generation number if you don't want to
> implement one...
>
> You can use any unique identifier + the inode number, and the unique
> identifier is only limited by the size of the filehandle.
So how do you choose the unique identifier? It's limited by FUSE to
32bit and therefore can't be a global counter, it can't be a timestamp
because the system clock may not have enough resolution, and it can't be
a per-inode counter because then I can't discard the counter after the
inode has been deleted.
Best,
-Nikolaus
--
»Time flies like an arrow, fruit flies like a Banana.«
PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C
Trond Myklebust <[email protected]> writes:
> On Thu, 2011-10-20 at 16:37 -0400, Nikolaus Rath wrote:
>> "J. Bruce Fields" <[email protected]> writes:
>> > On Thu, Oct 20, 2011 at 01:21:31PM -0400, Nikolaus Rath wrote:
>> >> I'm working on a FUSE file system that stores file system metadata in an
>> >> SQL database (http://code.google.com/p/s3ql/). Not having to keep track
>> >> of inode generation numbers would keep the code much simpler, because I
>> >> want to delete inode-rows from the SQL table when the last reference to
>> >> the inode is deleted (so I can't keep track of the generation no).
>> >
>> > You can use current time, or a counter, or something, as the generation
>> > number.
>>
>> With current time I'm screwed if the system clock doesn't have
>> sufficiently fine granularity. With a counter, I either have to remember
>> counter values per-inode even after the inode is deleted, or the global
>> counter will overflow at some point (in which case I may just as well
>> require unique inodes in the first place).
>
> The filehandle is between 32 (NFSv2) and 128(NFSv4) bytes long. How long
> do you expect it to take you to create+destroy between 2^256 and 2^1024
> inodes? I'm guessing that we'll all be long dead and the universe will
> have undergone heat death before that happens...
Please stop assuming that I'm stupid or haven't thought about the
problem at all. The bottleneck is not the length of the NFS file handle,
but the length of the inode and generation number (both of which are
restricted to 32bit by FUSE) together with the requirement that not only
both of them together need to be unique forever, but the inode also
needs to be unique at any given instant (so they cannot be trivially
combined to form a 64bit value).
Best,
-Nikolaus
--
»Time flies like an arrow, fruit flies like a Banana.«
PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C
On Fri, 2011-10-21 at 09:54 -0400, Nikolaus Rath wrote:
> Trond Myklebust <[email protected]> writes:
> > On Thu, 2011-10-20 at 16:37 -0400, Nikolaus Rath wrote:
> >> "J. Bruce Fields" <[email protected]> writes:
> >> > On Thu, Oct 20, 2011 at 01:21:31PM -0400, Nikolaus Rath wrote:
> >> >> I'm working on a FUSE file system that stores file system metadata in an
> >> >> SQL database (http://code.google.com/p/s3ql/). Not having to keep track
> >> >> of inode generation numbers would keep the code much simpler, because I
> >> >> want to delete inode-rows from the SQL table when the last reference to
> >> >> the inode is deleted (so I can't keep track of the generation no).
> >> >
> >> > You can use current time, or a counter, or something, as the generation
> >> > number.
> >>
> >> With current time I'm screwed if the system clock doesn't have
> >> sufficiently fine granularity. With a counter, I either have to remember
> >> counter values per-inode even after the inode is deleted, or the global
> >> counter will overflow at some point (in which case I may just as well
> >> require unique inodes in the first place).
> >
> > The filehandle is between 32 (NFSv2) and 128(NFSv4) bytes long. How long
> > do you expect it to take you to create+destroy between 2^256 and 2^1024
> > inodes? I'm guessing that we'll all be long dead and the universe will
> > have undergone heat death before that happens...
>
> Please stop assuming that I'm stupid or haven't thought about the
> problem at all. The bottleneck is not the length of the NFS file handle,
> but the length of the inode and generation number (both of which are
> restricted to 32bit by FUSE) together with the requirement that not only
> both of them together need to be unique forever, but the inode also
> needs to be unique at any given instant (so they cannot be trivially
> combined to form a 64bit value).
No. The point is you don't need a generation number if you don't want to
implement one...
You can use any unique identifier + the inode number, and the unique
identifier is only limited by the size of the filehandle.
Trond
--
Trond Myklebust
Linux NFS client maintainer
NetApp
[email protected]
http://www.netapp.com
On Thu, 2011-10-20 at 16:37 -0400, Nikolaus Rath wrote:
> "J. Bruce Fields" <[email protected]> writes:
> > On Thu, Oct 20, 2011 at 01:21:31PM -0400, Nikolaus Rath wrote:
> >> I'm working on a FUSE file system that stores file system metadata in an
> >> SQL database (http://code.google.com/p/s3ql/). Not having to keep track
> >> of inode generation numbers would keep the code much simpler, because I
> >> want to delete inode-rows from the SQL table when the last reference to
> >> the inode is deleted (so I can't keep track of the generation no).
> >
> > You can use current time, or a counter, or something, as the generation
> > number.
>
> With current time I'm screwed if the system clock doesn't have
> sufficiently fine granularity. With a counter, I either have to remember
> counter values per-inode even after the inode is deleted, or the global
> counter will overflow at some point (in which case I may just as well
> require unique inodes in the first place).
The filehandle is between 32 (NFSv2) and 128(NFSv4) bytes long. How long
do you expect it to take you to create+destroy between 2^256 and 2^1024
inodes? I'm guessing that we'll all be long dead and the universe will
have undergone heat death before that happens...
Trond
--
Trond Myklebust
Linux NFS client maintainer
NetApp
[email protected]
http://www.netapp.com
On Wed, Oct 19, 2011 at 10:17:43AM -0400, Nikolaus Rath wrote:
> Does NFS4 still depend on inode generation numbers? I understand that
> earlier NFS versions used file handles based on inode and generation
> number, but it seems to me that this shouldn't be required anymore with
> the (stateful) NFS4.
Yes, it's true that NFSv4 has open and close operations, but filehandles
are still used a great deal outside of that, and clients are still
allowed to assume that the same filehandle always refers to the same
object.
http://www.ietf.org/rfc/rfc3530.txt, section 4.2.1:
"If two filehandles from the same server are equal, they MUST
refer to the same file."
--b.
On Fri, 2011-10-21 at 12:09 -0400, Nikolaus Rath wrote:
> On 10/21/2011 12:00 PM, Trond Myklebust wrote:
> > On Fri, 2011-10-21 at 09:54 -0400, Nikolaus Rath wrote:
> >> Trond Myklebust <[email protected]> writes:
> >>> On Thu, 2011-10-20 at 16:37 -0400, Nikolaus Rath wrote:
> >>>> "J. Bruce Fields" <[email protected]> writes:
> >>>>> On Thu, Oct 20, 2011 at 01:21:31PM -0400, Nikolaus Rath wrote:
> >>>>>> I'm working on a FUSE file system that stores file system metadata in an
> >>>>>> SQL database (http://code.google.com/p/s3ql/). Not having to keep track
> >>>>>> of inode generation numbers would keep the code much simpler, because I
> >>>>>> want to delete inode-rows from the SQL table when the last reference to
> >>>>>> the inode is deleted (so I can't keep track of the generation no).
> >>>>>
> >>>>> You can use current time, or a counter, or something, as the generation
> >>>>> number.
> >>>>
> >>>> With current time I'm screwed if the system clock doesn't have
> >>>> sufficiently fine granularity. With a counter, I either have to remember
> >>>> counter values per-inode even after the inode is deleted, or the global
> >>>> counter will overflow at some point (in which case I may just as well
> >>>> require unique inodes in the first place).
> >>>
> >>> The filehandle is between 32 (NFSv2) and 128(NFSv4) bytes long. How long
> >>> do you expect it to take you to create+destroy between 2^256 and 2^1024
> >>> inodes? I'm guessing that we'll all be long dead and the universe will
> >>> have undergone heat death before that happens...
> >>
> >> Please stop assuming that I'm stupid or haven't thought about the
> >> problem at all. The bottleneck is not the length of the NFS file handle,
> >> but the length of the inode and generation number (both of which are
> >> restricted to 32bit by FUSE) together with the requirement that not only
> >> both of them together need to be unique forever, but the inode also
> >> needs to be unique at any given instant (so they cannot be trivially
> >> combined to form a 64bit value).
> >
> > No. The point is you don't need a generation number if you don't want to
> > implement one...
> >
> > You can use any unique identifier + the inode number, and the unique
> > identifier is only limited by the size of the filehandle.
>
> So how do you choose the unique identifier? It's limited by FUSE to
> 32bit and therefore can't be a global counter, it can't be a timestamp
AFAICS fuse gives you a 64-bit inode number and a 32-bit generation
counter. That is still 96 bits == 79 228 162 514 264 337 593 543 950 336
(or roughly 8*10^28) unique values if you use it as a single counter.
IOW: start allocating inode numbers incrementally from 0 - 2^64, then
each time you overflow the 64-bit inode number counter, bump the
generation number. You'll have to skip those inode numbers that are
already allocated in the subsequent generations, but the total number of
unique combinations is still likely to be more than large enough not to
be a worry.
> because the system clock may not have enough resolution, and it can't be
> a per-inode counter because then I can't discard the counter after the
> inode has been deleted.
If you need more unique values, then modify fuse to allow your
filesystem to manage the exportfs interface. The fuse ABI is versioned,
and can be extended to support new features.
Trond
--
Trond Myklebust
Linux NFS client maintainer
NetApp
[email protected]
http://www.netapp.com
On Thu, Oct 20, 2011 at 01:21:31PM -0400, Nikolaus Rath wrote:
> I'm working on a FUSE file system that stores file system metadata in an
> SQL database (http://code.google.com/p/s3ql/). Not having to keep track
> of inode generation numbers would keep the code much simpler, because I
> want to delete inode-rows from the SQL table when the last reference to
> the inode is deleted (so I can't keep track of the generation no).
You can use current time, or a counter, or something, as the generation
number.
> Now I'll either have to make inodes unique (and run into trouble after
> 2^32 inodes have been used), or keep with the current scheme of
> randomizing new inodes (which keeps the probability of problems low
> enough but is ugly).
With 2^32 inode numbers plus 2^32 generation numbers it should be
possible to work something out that doesn't require remembering every
old inode.
--b.
"J. Bruce Fields" <[email protected]> writes:
> On Wed, Oct 19, 2011 at 03:11:27PM -0400, Nikolaus Rath wrote:
>> "J. Bruce Fields" <[email protected]> writes:
>> > On Wed, Oct 19, 2011 at 10:17:43AM -0400, Nikolaus Rath wrote:
>> >> Does NFS4 still depend on inode generation numbers? I understand that
>> >> earlier NFS versions used file handles based on inode and generation
>> >> number, but it seems to me that this shouldn't be required anymore with
>> >> the (stateful) NFS4.
>> >
>> > Yes, it's true that NFSv4 has open and close operations, but filehandles
>> > are still used a great deal outside of that, and clients are still
>> > allowed to assume that the same filehandle always refers to the same
>> > object.
>> >
>> > http://www.ietf.org/rfc/rfc3530.txt, section 4.2.1:
>> >
>> > "If two filehandles from the same server are equal, they MUST
>> > refer to the same file."
>>
>> Duh - that's a disappointment. Thanks for the pointer!
>
> Just curious--why do you care?
I'm working on a FUSE file system that stores file system metadata in an
SQL database (http://code.google.com/p/s3ql/). Not having to keep track
of inode generation numbers would keep the code much simpler, because I
want to delete inode-rows from the SQL table when the last reference to
the inode is deleted (so I can't keep track of the generation no).
Now I'll either have to make inodes unique (and run into trouble after
2^32 inodes have been used), or keep with the current scheme of
randomizing new inodes (which keeps the probability of problems low
enough but is ugly).
Best,
-Nikolaus
--
»Time flies like an arrow, fruit flies like a Banana.«
PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C
On 10/21/2011 01:10 PM, Trond Myklebust wrote:
> On Fri, 2011-10-21 at 12:09 -0400, Nikolaus Rath wrote:
>> On 10/21/2011 12:00 PM, Trond Myklebust wrote:
>>> On Fri, 2011-10-21 at 09:54 -0400, Nikolaus Rath wrote:
>>>> Trond Myklebust <[email protected]> writes:
>>>>> On Thu, 2011-10-20 at 16:37 -0400, Nikolaus Rath wrote:
>>>>>> "J. Bruce Fields" <[email protected]> writes:
>>>>>>> On Thu, Oct 20, 2011 at 01:21:31PM -0400, Nikolaus Rath wrote:
>>>>>>>> I'm working on a FUSE file system that stores file system metadata in an
>>>>>>>> SQL database (http://code.google.com/p/s3ql/). Not having to keep track
>>>>>>>> of inode generation numbers would keep the code much simpler, because I
>>>>>>>> want to delete inode-rows from the SQL table when the last reference to
>>>>>>>> the inode is deleted (so I can't keep track of the generation no).
>>>>>>>
>>>>>>> You can use current time, or a counter, or something, as the generation
>>>>>>> number.
>>>>>>
>>>>>> With current time I'm screwed if the system clock doesn't have
>>>>>> sufficiently fine granularity. With a counter, I either have to remember
>>>>>> counter values per-inode even after the inode is deleted, or the global
>>>>>> counter will overflow at some point (in which case I may just as well
>>>>>> require unique inodes in the first place).
>>>>>
>>>>> The filehandle is between 32 (NFSv2) and 128(NFSv4) bytes long. How long
>>>>> do you expect it to take you to create+destroy between 2^256 and 2^1024
>>>>> inodes? I'm guessing that we'll all be long dead and the universe will
>>>>> have undergone heat death before that happens...
>>>>
>>>> Please stop assuming that I'm stupid or haven't thought about the
>>>> problem at all. The bottleneck is not the length of the NFS file handle,
>>>> but the length of the inode and generation number (both of which are
>>>> restricted to 32bit by FUSE) together with the requirement that not only
>>>> both of them together need to be unique forever, but the inode also
>>>> needs to be unique at any given instant (so they cannot be trivially
>>>> combined to form a 64bit value).
>>>
>>> No. The point is you don't need a generation number if you don't want to
>>> implement one...
>>>
>>> You can use any unique identifier + the inode number, and the unique
>>> identifier is only limited by the size of the filehandle.
>>
>> So how do you choose the unique identifier? It's limited by FUSE to
>> 32bit and therefore can't be a global counter, it can't be a timestamp
>
> AFAICS fuse gives you a 64-bit inode number and a 32-bit generation
> counter.
Yes, with 64bit inodes everything would be fine. But fuse uses 'long'
for inodes, so on 32bit systems you only have 32bit inodes even if ino_t
is 64bit.
> IOW: start allocating inode numbers incrementally from 0 - 2^64, then
> each time you overflow the 64-bit inode number counter, bump the
> generation number. You'll have to skip those inode numbers that are
> already allocated in the subsequent generations, but the total number of
> unique combinations is still likely to be more than large enough not to
> be a worry.
Yes, as I said eariler, it is possible to do with the available 32 + 32
bits, but it does introduce additional complexity.
>> because the system clock may not have enough resolution, and it can't be
>> a per-inode counter because then I can't discard the counter after the
>> inode has been deleted.
>
> If you need more unique values, then modify fuse to allow your
> filesystem to manage the exportfs interface. The fuse ABI is versioned,
> and can be extended to support new features.
FUSE 3 will have 64bit inodes, and I don't think this feature would make
it into 2.x.
Best,
-Nikolaus
--
»Time flies like an arrow, fruit flies like a Banana.«
PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C
"J. Bruce Fields" <[email protected]> writes:
> On Thu, Oct 20, 2011 at 01:21:31PM -0400, Nikolaus Rath wrote:
>> I'm working on a FUSE file system that stores file system metadata in an
>> SQL database (http://code.google.com/p/s3ql/). Not having to keep track
>> of inode generation numbers would keep the code much simpler, because I
>> want to delete inode-rows from the SQL table when the last reference to
>> the inode is deleted (so I can't keep track of the generation no).
>
> You can use current time, or a counter, or something, as the generation
> number.
With current time I'm screwed if the system clock doesn't have
sufficiently fine granularity. With a counter, I either have to remember
counter values per-inode even after the inode is deleted, or the global
counter will overflow at some point (in which case I may just as well
require unique inodes in the first place).
>> Now I'll either have to make inodes unique (and run into trouble after
>> 2^32 inodes have been used), or keep with the current scheme of
>> randomizing new inodes (which keeps the probability of problems low
>> enough but is ugly).
>
> With 2^32 inode numbers plus 2^32 generation numbers it should be
> possible to work something out that doesn't require remembering every
> old inode.
Certainly. All I'm saying is that the code would be simpler if there was
no need for generation numbers.
Best,
-Nikolaus
--
»Time flies like an arrow, fruit flies like a Banana.«
PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C
On Wed, Oct 19, 2011 at 03:11:27PM -0400, Nikolaus Rath wrote:
> "J. Bruce Fields" <[email protected]> writes:
> > On Wed, Oct 19, 2011 at 10:17:43AM -0400, Nikolaus Rath wrote:
> >> Does NFS4 still depend on inode generation numbers? I understand that
> >> earlier NFS versions used file handles based on inode and generation
> >> number, but it seems to me that this shouldn't be required anymore with
> >> the (stateful) NFS4.
> >
> > Yes, it's true that NFSv4 has open and close operations, but filehandles
> > are still used a great deal outside of that, and clients are still
> > allowed to assume that the same filehandle always refers to the same
> > object.
> >
> > http://www.ietf.org/rfc/rfc3530.txt, section 4.2.1:
> >
> > "If two filehandles from the same server are equal, they MUST
> > refer to the same file."
>
> Duh - that's a disappointment. Thanks for the pointer!
Just curious--why do you care?
--b.
"J. Bruce Fields" <[email protected]> writes:
> On Wed, Oct 19, 2011 at 10:17:43AM -0400, Nikolaus Rath wrote:
>> Does NFS4 still depend on inode generation numbers? I understand that
>> earlier NFS versions used file handles based on inode and generation
>> number, but it seems to me that this shouldn't be required anymore with
>> the (stateful) NFS4.
>
> Yes, it's true that NFSv4 has open and close operations, but filehandles
> are still used a great deal outside of that, and clients are still
> allowed to assume that the same filehandle always refers to the same
> object.
>
> http://www.ietf.org/rfc/rfc3530.txt, section 4.2.1:
>
> "If two filehandles from the same server are equal, they MUST
> refer to the same file."
Duh - that's a disappointment. Thanks for the pointer!
Best,
-Nikolaus
--
»Time flies like an arrow, fruit flies like a Banana.«
PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C