2012-08-18 09:42:11

by Namjae Jeon

[permalink] [raw]
Subject: [PATCH 0/4] fat: fix ESTALE errors

From: Namjae Jeon <[email protected]>

This patch-set eliminates the client side ESTALE errors when
a FAT partition exported over NFS has its dentries evicted
from the cache.

One of the reasons for this error is lack of permanent inode
numbers on FAT which makes it difficult to construct persistent
file handles.This can be overcome by using the on-disk location
of the directory entries (i_pos) as the inode number.

Once the i_pos is available, it is only a matter of reading the
directory entries from the disk clusters to locate the matching
entry and rebuild the corresponding inode.

Namjae Jeon (4):
fat: allocate persistent inode numbers
fat (exportfs): rebuild inode if ilookup() fails
fat (exportfs): rebuild directory-inode if fat_dget() fails
Documentation: update nfs option in filesystem/vfat.txt
---


2012-08-18 13:25:33

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 0/4] fat: fix ESTALE errors

On Sat, Aug 18, 2012 at 05:41:39AM -0400, Namjae Jeon wrote:
> From: Namjae Jeon <[email protected]>
>
> This patch-set eliminates the client side ESTALE errors when
> a FAT partition exported over NFS has its dentries evicted
> from the cache.
>
> One of the reasons for this error is lack of permanent inode
> numbers on FAT which makes it difficult to construct persistent
> file handles.This can be overcome by using the on-disk location
> of the directory entries (i_pos) as the inode number.

The hell it can. You've just made them unstable on rename(2).

2012-08-18 14:09:47

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH 0/4] fat: fix ESTALE errors

Al Viro <[email protected]> writes:

> On Sat, Aug 18, 2012 at 05:41:39AM -0400, Namjae Jeon wrote:
>> From: Namjae Jeon <[email protected]>
>>
>> This patch-set eliminates the client side ESTALE errors when
>> a FAT partition exported over NFS has its dentries evicted
>> from the cache.
>>
>> One of the reasons for this error is lack of permanent inode
>> numbers on FAT which makes it difficult to construct persistent
>> file handles.This can be overcome by using the on-disk location
>> of the directory entries (i_pos) as the inode number.
>
> The hell it can. You've just made them unstable on rename(2).

As more hint. We can't use i_pos as the inode number.

E.g. inode is unlinked but is still opened (orphaned inode), the dir
entry is free and you can create the inode on same i_pos. After that,
both inodes have same i_pos (so inode number).

Thanks.
--
OGAWA Hirofumi <[email protected]>

2012-08-20 04:19:55

by Namjae Jeon

[permalink] [raw]
Subject: Re: [PATCH 0/4] fat: fix ESTALE errors

2012/8/18, OGAWA Hirofumi <[email protected]>:
> Al Viro <[email protected]> writes:
>
>> On Sat, Aug 18, 2012 at 05:41:39AM -0400, Namjae Jeon wrote:
>>> From: Namjae Jeon <[email protected]>
>>>
>>> This patch-set eliminates the client side ESTALE errors when
>>> a FAT partition exported over NFS has its dentries evicted
>>> from the cache.
>>>
>>> One of the reasons for this error is lack of permanent inode
>>> numbers on FAT which makes it difficult to construct persistent
>>> file handles.This can be overcome by using the on-disk location
>>> of the directory entries (i_pos) as the inode number.
>>
>> The hell it can. You've just made them unstable on rename(2).
>
> As more hint. We can't use i_pos as the inode number.
>
> E.g. inode is unlinked but is still opened (orphaned inode), the dir
> entry is free and you can create the inode on same i_pos. After that,
> both inodes have same i_pos (so inode number).
>
> Thanks.
Hi. Ogawa.
Thanks for specific explanation. I will check it.
> --
> OGAWA Hirofumi <[email protected]>
>

2012-08-20 04:21:03

by Namjae Jeon

[permalink] [raw]
Subject: Re: [PATCH 0/4] fat: fix ESTALE errors

2012/8/18, Al Viro <[email protected]>:
> On Sat, Aug 18, 2012 at 05:41:39AM -0400, Namjae Jeon wrote:
>> From: Namjae Jeon <[email protected]>
>>
>> This patch-set eliminates the client side ESTALE errors when
>> a FAT partition exported over NFS has its dentries evicted
>> from the cache.
>>
>> One of the reasons for this error is lack of permanent inode
>> numbers on FAT which makes it difficult to construct persistent
>> file handles.This can be overcome by using the on-disk location
>> of the directory entries (i_pos) as the inode number.
>
> The hell it can. You've just made them unstable on rename(2).
Hello. Al.
I will check rename(2) case.
Thanks for reply.
>

2012-08-20 20:52:43

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 0/4] fat: fix ESTALE errors

On Mon, Aug 20, 2012 at 01:19:51PM +0900, Namjae Jeon wrote:
> 2012/8/18, OGAWA Hirofumi <[email protected]>:
> > Al Viro <[email protected]> writes:
> >
> >> On Sat, Aug 18, 2012 at 05:41:39AM -0400, Namjae Jeon wrote:
> >>> From: Namjae Jeon <[email protected]>
> >>>
> >>> This patch-set eliminates the client side ESTALE errors when
> >>> a FAT partition exported over NFS has its dentries evicted
> >>> from the cache.
> >>>
> >>> One of the reasons for this error is lack of permanent inode
> >>> numbers on FAT which makes it difficult to construct persistent
> >>> file handles.This can be overcome by using the on-disk location
> >>> of the directory entries (i_pos) as the inode number.
> >>
> >> The hell it can. You've just made them unstable on rename(2).
> >
> > As more hint. We can't use i_pos as the inode number.
> >
> > E.g. inode is unlinked but is still opened (orphaned inode), the dir
> > entry is free and you can create the inode on same i_pos. After that,
> > both inodes have same i_pos (so inode number).
> >
> > Thanks.
> Hi. Ogawa.
> Thanks for specific explanation. I will check it.

Fo somebody that knows more about fat than me--is there really any hope
of making it play well with nfs?

--b.

2012-08-21 05:19:51

by Namjae Jeon

[permalink] [raw]
Subject: Re: [PATCH 0/4] fat: fix ESTALE errors

2012/8/21, J. Bruce Fields <[email protected]>:
> On Mon, Aug 20, 2012 at 01:19:51PM +0900, Namjae Jeon wrote:
>> 2012/8/18, OGAWA Hirofumi <[email protected]>:
>> > Al Viro <[email protected]> writes:
>> >
>> >> On Sat, Aug 18, 2012 at 05:41:39AM -0400, Namjae Jeon wrote:
>> >>> From: Namjae Jeon <[email protected]>
>> >>>
>> >>> This patch-set eliminates the client side ESTALE errors when
>> >>> a FAT partition exported over NFS has its dentries evicted
>> >>> from the cache.
>> >>>
>> >>> One of the reasons for this error is lack of permanent inode
>> >>> numbers on FAT which makes it difficult to construct persistent
>> >>> file handles.This can be overcome by using the on-disk location
>> >>> of the directory entries (i_pos) as the inode number.
>> >>
>> >> The hell it can. You've just made them unstable on rename(2).
>> >
>> > As more hint. We can't use i_pos as the inode number.
>> >
>> > E.g. inode is unlinked but is still opened (orphaned inode), the dir
>> > entry is free and you can create the inode on same i_pos. After that,
>> > both inodes have same i_pos (so inode number).
>> >
>> > Thanks.
>> Hi. Ogawa.
>> Thanks for specific explanation. I will check it.
>
Hi Bruce.
> Fo somebody that knows more about fat than me--is there really any hope
> of making it play well with nfs?

I think that this patch is only solution to fix estale issue from
inode cache eviction.
In case FAT - it makes use of iunique() to get unique inode number
-which is just based upon getting an incremented value from unique
counter variable. So, there is no way to reconstruct the inode based
upon inode numbers - like in case of other filesystems

We can check it easily like this.

1. ls -al /directory on nfs client.
2. echo 3 > /proc/sys/vm/drop_caches
3. ls -al /directory on nfs client again. estale error will be occurred.

There is no estale issue from reclaim with this patch.

And.. Hi Ogawa.
I checked other filesystem about unlink - inode issue. but I found
Ext4 have same issue.
Although other filesysm is having this issue, Can we think It could be
only FAT issue ?
>
> --b.
>

2012-08-21 06:41:45

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH 0/4] fat: fix ESTALE errors

Namjae Jeon <[email protected]> writes:

> And.. Hi Ogawa.
> I checked other filesystem about unlink - inode issue. but I found
> Ext4 have same issue.
> Although other filesysm is having this issue, Can we think It could be
> only FAT issue ?

(I assume this issue == orphaned inode issue).

ext* doesn't have this issue. If ext* made orphaned inode, ext* doesn't
delete inode from inode table until calling iput() from last referencer.

In FAT case, FAT inode is embedded into dir entry. So, if unlinked inode
(then orphaned inode is detached (fat_detach())), FAT deletes inode (dir
entry) from dir.
--
OGAWA Hirofumi <[email protected]>

2012-08-21 10:58:39

by Namjae Jeon

[permalink] [raw]
Subject: Re: [PATCH 0/4] fat: fix ESTALE errors

2012/8/21, OGAWA Hirofumi <[email protected]>:
> Namjae Jeon <[email protected]> writes:
>
>> And.. Hi Ogawa.
>> I checked other filesystem about unlink - inode issue. but I found
>> Ext4 have same issue.
>> Although other filesysm is having this issue, Can we think It could be
>> only FAT issue ?
>
> (I assume this issue == orphaned inode issue).
>
> ext* doesn't have this issue. If ext* made orphaned inode, ext* doesn't
> delete inode from inode table until calling iput() from last referencer.
>
> In FAT case, FAT inode is embedded into dir entry. So, if unlinked inode
> (then orphaned inode is detached (fat_detach())), FAT deletes inode (dir
> entry) from dir.
Sorry for my mistake. You're right.
I am looking up other solution.
Thanks
> --
> OGAWA Hirofumi <[email protected]>
>

2012-08-21 13:20:53

by Steven J. Magnani

[permalink] [raw]
Subject: Re: [PATCH 0/4] fat: fix ESTALE errors

On Mon, 2012-08-20 at 16:52 -0400, J. Bruce Fields wrote:

> Fo somebody that knows more about fat than me--is there really any hope
> of making it play well with nfs?

I spent a lot of time looking into FAT ESTALE issues and had proposed
something similar to Namjae (https://lkml.org/lkml/2012/7/3/378). In the
discussions and experiments that followed, I eventually came to the
conclusion that the best I could do was make FAT play "better" with NFS
(https://lkml.org/lkml/2012/7/10/252).

If you define "well" as an absence of server-reported ESTALE due purely
to inode/dentry cache eviction I'm skeptical that this is possible in
any way that would be accepted into the mainline kernel. Code that
addresses ESTALE on the client side (https://lkml.org/lkml/2012/8/8/244)
seems to be successful, but since not everyone has control of the client
code there are cases where a server-side solution is desirable.

There's just no unique way to identify FAT inodes that works across all
the possible scenarios (renames, power cycles, zero-length files,
deletion of an object and creation of another - potentially of a
different type - at the same on-disk position, etc.). Also, clients are
sensitive to changes in a file handle - any server "reconstitution" of
an evicted inode must result in a NFS file handle that is identical to
the "original" reported to the client, or the client will cry ESTALE
despite the server's best efforts.

------------------------------------------------------------------------
Steven J. Magnani "I claim this network for MARS!
http://www.digidescorp.com Earthling, return my space modulator!"

#include <standard.disclaimer>

2012-08-21 18:03:36

by Ravishankar

[permalink] [raw]
Subject: Re: [PATCH 0/4] fat: fix ESTALE errors

On Tue, Aug 21, 2012 at 6:50 PM, Steven J. Magnani
<[email protected]> wrote:
> On Mon, 2012-08-20 at 16:52 -0400, J. Bruce Fields wrote:
>
>> Fo somebody that knows more about fat than me--is there really any hope
>> of making it play well with nfs?
>
> I spent a lot of time looking into FAT ESTALE issues and had proposed
> something similar to Namjae (https://lkml.org/lkml/2012/7/3/378). In the
> discussions and experiments that followed, I eventually came to the
> conclusion that the best I could do was make FAT play "better" with NFS
> (https://lkml.org/lkml/2012/7/10/252).
>

We had initially tried both spins of Steve's patches for our test
scenario (ls -lR on the client, while continually doing a drop_caches
on the server) but still got ESTALE errors on the client. Since
Steve's v2 patch set got accepted, we tried to build on top of that.

> If you define "well" as an absence of server-reported ESTALE due purely
> to inode/dentry cache eviction I'm skeptical that this is possible in
> any way that would be accepted into the mainline kernel.

We thought we got it right until the 'orphaned inode' flaw was pointed
out. At the expense of exposing my ignorance,could some one give me a
hint on what what would happen if there were writes continually to
two files- (i) one unlinked but still referenced by an open file
descriptor and (ii) the other one newly created but assigned the same
inode number(by virtue of this patch-set) on a FAT disk? From what I
tested, the writes to the first (zombie) file did not affect the
integrity of the contents of the second file and were in fact 'rolled
back' when the file descriptor was closed.Apparently, I'm missing
something here.

Thank you.

> Code that addresses ESTALE on the client side (https://lkml.org/lkml/2012/8/8/244)
> seems to be successful, but since not everyone has control of the client
> code there are cases where a server-side solution is desirable.
>
> There's just no unique way to identify FAT inodes that works across all
> the possible scenarios (renames, power cycles, zero-length files,
> deletion of an object and creation of another - potentially of a
> different type - at the same on-disk position, etc.). Also, clients are
> sensitive to changes in a file handle - any server "reconstitution" of
> an evicted inode must result in a NFS file handle that is identical to
> the "original" reported to the client, or the client will cry ESTALE
> despite the server's best efforts.
>
> ------------------------------------------------------------------------
> Steven J. Magnani "I claim this network for MARS!
> http://www.digidescorp.com Earthling, return my space modulator!"
>
> #include <standard.disclaimer>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
.-. .- ...- ..

2012-08-21 21:12:11

by Bastien Roucariès

[permalink] [raw]
Subject: Re: [PATCH 0/4] fat: fix ESTALE errors

On Tue, Aug 21, 2012 at 8:41 AM, OGAWA Hirofumi
<[email protected]> wrote:
> Namjae Jeon <[email protected]> writes:
>
>> And.. Hi Ogawa.
>> I checked other filesystem about unlink - inode issue. but I found
>> Ext4 have same issue.
>> Although other filesysm is having this issue, Can we think It could be
>> only FAT issue ?
>
> (I assume this issue == orphaned inode issue).
>
> ext* doesn't have this issue. If ext* made orphaned inode, ext* doesn't
> delete inode from inode table until calling iput() from last referencer.
>
> In FAT case, FAT inode is embedded into dir entry. So, if unlinked inode
> (then orphaned inode is detached (fat_detach())), FAT deletes inode (dir
> entry) from dir.

Could be possible to not delete it?

I mean using a special value for this case, mark delete (using 0xe5 as
first character) but put for instance creation month to be egal to 15.

This entry will be therefore be keep and not overwritten by successive
file creation.

At least this solve the file deleted issue (not the rename issue unfortunatly)

Bastien
> OGAWA Hirofumi <[email protected]>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2012-08-22 06:14:34

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH 0/4] fat: fix ESTALE errors

Bastien ROUCARIES <[email protected]> writes:

>> (I assume this issue == orphaned inode issue).
>>
>> ext* doesn't have this issue. If ext* made orphaned inode, ext* doesn't
>> delete inode from inode table until calling iput() from last referencer.
>>
>> In FAT case, FAT inode is embedded into dir entry. So, if unlinked inode
>> (then orphaned inode is detached (fat_detach())), FAT deletes inode (dir
>> entry) from dir.
>
> Could be possible to not delete it?

It should be deletable on linux. Because many apps are assuming
orphaned inode works.

> I mean using a special value for this case, mark delete (using 0xe5 as
> first character) but put for instance creation month to be egal to 15.
>
> This entry will be therefore be keep and not overwritten by successive
> file creation.
>
> At least this solve the file deleted issue (not the rename issue unfortunatly)

I assume you are saying to prevent creation somehow, not deletion. Yes, it
is possible though, it would give additional overhead and complexity to us.

Thanks.
--
OGAWA Hirofumi <[email protected]>

2012-08-22 06:17:55

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH 0/4] fat: fix ESTALE errors

OGAWA Hirofumi <[email protected]> writes:

>> I mean using a special value for this case, mark delete (using 0xe5 as
>> first character) but put for instance creation month to be egal to 15.
>>
>> This entry will be therefore be keep and not overwritten by successive
>> file creation.
>>
>> At least this solve the file deleted issue (not the rename issue unfortunatly)
>
> I assume you are saying to prevent creation somehow, not deletion. Yes, it
> is possible though, it would give additional overhead and complexity to us.

In short, it meant I think it is not so simple to do it.

Thanks.
--
OGAWA Hirofumi <[email protected]>