2013-07-08 01:13:09

by Ben Hutchings

[permalink] [raw]
Subject: NFS 'readdir loop' error on JFS

Jonathan McDowell and Karl Schmidt reported that when sharing a JFS
filesystem through NFS and Samba, NFS clients can report 'readdir loop'
and the directories in question then appear to have duplicate entries on
the client system.

This was seen with Linux 3.2 on the server and client. The JFS
directory code is basically unchanged since then, but NFS has changed
somewhat.

The original bug reports were:
http://bugs.debian.org/685407#85
http://bugs.debian.org/714974

The log messages are:
[593351.877678] NFS: directory fs/nfsd contains a readdir loop.Please contact your server vendor. The file: .nfs3proc.o.cmda.com has duplicate cookie 73
[593351.904689] NFS: directory fs/nfsd contains a readdir loop.Please contact your server vendor. The file: .nfs3proc.o.cmda.com has duplicate cookie 73
[280774.570555] NFS: directory //accounting contains a readdir loop.Please contact your server vendor. The file: .~lock.credit.rtf1.rtf# has duplicate cookie 199

Is this likely to be a problem with JFS, the NFS client or server? Can
anyone suggest how to investigate this further?

Ben.

--
Ben Hutchings
It is easier to write an incorrect program than to understand a correct one.


Attachments:
signature.asc (828.00 B)
This is a digitally signed message part

2013-07-08 16:15:07

by Karl Schmidt

[permalink] [raw]
Subject: Re: NFS 'readdir loop' error on JFS

On 07/07/2013 08:13 PM, Ben Hutchings wrote:
> Jonathan McDowell and Karl Schmidt reported that when sharing a JFS
> filesystem through NFS and Samba, NFS clients can report 'readdir loop'
> and the directories in question then appear to have duplicate entries on
> the client system.
>
> This was seen with Linux 3.2 on the server and client. The JFS
> directory code is basically unchanged since then, but NFS has changed
> somewhat.
>
> The original bug reports were:
> http://bugs.debian.org/685407#85
> http://bugs.debian.org/714974
>
> The log messages are:
> [593351.877678] NFS: directory fs/nfsd contains a readdir loop.Please contact your server vendor. The file: .nfs3proc.o.cmda.com has duplicate cookie 73
> [593351.904689] NFS: directory fs/nfsd contains a readdir loop.Please contact your server vendor. The file: .nfs3proc.o.cmda.com has duplicate cookie 73
> [280774.570555] NFS: directory //accounting contains a readdir loop.Please contact your server vendor. The file: .~lock.credit.rtf1.rtf# has duplicate cookie 199
>
> Is this likely to be a problem with JFS, the NFS client or server? Can
> anyone suggest how to investigate this further?
>
> Ben.

The experiment with NFS client on Windozs XP was bad - very poor performance. I'm told that it might
work better in Windoze 7 - removed and back to samba.

For now, I am not having problems after commenting out this line in smb.conf

#strict locking = yes

Previously, I had already commented out :

#oplocks = no
#level2oplocks = no

But could still reproduce the problem.

I'm not sure it is fixed as there were time before when all worked well for some weeks before the
errors turned.

Question: Is samba still using NLM protocol to lock files? ( I think NFS no longer uses that? )


,.,.,.
I've did find a different error during all this testing -included here as it might be related -
perhaps due to the last nfs update.

in /etc/exports the server called malaysia we have:
/home/accounting 192.168.1.0/24(rw,sync,no_root_squash,no_all_squash)

fstab on a Debian wheezy client:
malaysia:/home/accounting /mnt/accounting nfs defaults,rsize=8192,wsize=8192,intr 0 0

/mnt# ll
total 112
drwxrwsrwx 16 nobody nogroup 4096 2013-07-08 08:00 accounting/

This is wrong..

I think this may have to do with a bug workaround where the installer added
27.0.1.1 to the /etc/hosts file ?

Then there is /etc/idmapd and /etc/defaults/nfs-common on the client that I didn't used to have to
mess with?

It is also important to have the fqdn as the first name in /etc/hosts
hostname --fqdn is correct on both machines


I think this might have to do with a fix for CVE-2013-1923

The changes I made included adding

NEED_IDMAPD=yes

to /etc/defaults/nfs-common

And removing no_all_squash which is now the default.

Is NEED_IDMAPD=yes now needed as reverse lookups have been turned off?

The debian wiki needs updating to let people know they need to if this is really the case?

It has been a long night..



--------------------------------------------------------------------------------
Karl Schmidt EMail [email protected]
Transtronics, Inc. WEB http://secure.transtronics.com
3209 West 9th Street Ph (785) 841-3089
Lawrence, KS 66049 FAX (785) 841-0434

Wrapping people up in the symbols of success when they are unearned, is very destructive. kps

--------------------------------------------------------------------------------

2013-07-08 14:02:56

by Bryan Schumaker

[permalink] [raw]
Subject: Re: NFS 'readdir loop' error on JFS

Hi Ben,

I remember hitting this problem as I was working on the new-ish
readdir code. The problem has to do with the readdir cookies
generated by JFS. In a large directory the same cookie might be
reused to refer to different files, and this can confuse the client
(it entered an infinite loop before Trond came up with the loop
detection). I ended up switching to ext4 to finish the project.

- Bryan

On Sun, Jul 7, 2013 at 9:13 PM, Ben Hutchings <[email protected]> wrote:
> Jonathan McDowell and Karl Schmidt reported that when sharing a JFS
> filesystem through NFS and Samba, NFS clients can report 'readdir loop'
> and the directories in question then appear to have duplicate entries on
> the client system.
>
> This was seen with Linux 3.2 on the server and client. The JFS
> directory code is basically unchanged since then, but NFS has changed
> somewhat.
>
> The original bug reports were:
> http://bugs.debian.org/685407#85
> http://bugs.debian.org/714974
>
> The log messages are:
> [593351.877678] NFS: directory fs/nfsd contains a readdir loop.Please contact your server vendor. The file: .nfs3proc.o.cmda.com has duplicate cookie 73
> [593351.904689] NFS: directory fs/nfsd contains a readdir loop.Please contact your server vendor. The file: .nfs3proc.o.cmda.com has duplicate cookie 73
> [280774.570555] NFS: directory //accounting contains a readdir loop.Please contact your server vendor. The file: .~lock.credit.rtf1.rtf# has duplicate cookie 199
>
> Is this likely to be a problem with JFS, the NFS client or server? Can
> anyone suggest how to investigate this further?
>
> Ben.
>
> --
> Ben Hutchings
> It is easier to write an incorrect program than to understand a correct one.

2013-08-17 20:02:02

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH] jfs: fix readdir cookie incompatibility with NFSv4

On Thu, 2013-08-15 at 14:26 -0700, Christian Kujau wrote:
> On Thu, 15 Aug 2013 at 15:48, Dave Kleikamp wrote:
> > This patch replaces the one I posted yesterday. I like this better since
> > it doesn't require fixing existing on-disk cookies or skipping a
> > position in the in-inode index table.
>
> Thanks. Applied to 3.11-rc5 and tested, no more "readdir loop" messages
> and with unique inode numbers, great!
>
> Tested-by: Christian Kujau <[email protected]>

Karl and Jonathan, could you test the attached backport to 3.2?

(Instructions for rebuilding the Debian kernel package are at:
<http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official>)

Ben.

--
Ben Hutchings
Teamwork is essential - it allows you to blame someone else.


Attachments:
jfs-fix-readdir-cookie-incompatibility-with-nfsv4-3.2.patch (3.42 kB)
signature.asc (828.00 B)
This is a digitally signed message part
Download all attachments

2013-08-15 07:09:49

by Christian Kujau

[permalink] [raw]
Subject: Re: [PATCH] jfs: avoid misuse of cookie value of 2

On Wed, 14 Aug 2013 at 21:29, Christian Kujau wrote:

> On Wed, 14 Aug 2013 at 22:54, Dave Kleikamp wrote:
> > It looks like the problem is that jfs was using a cookie value of 2 for
> > a real directory entry, where NFSv4 expect 2 to represent "..". This
> > patch has so far only been lightly tested.
>
> Hm, a first compile of 3.11-rc5 errors out with:
>
> CC [M] fs/jfs/jfs_dtree.o
> /usr/local/src/linux-git/fs/jfs/jfs_dtree.c: In function ‘add_index’:
> /usr/local/src/linux-git/fs/jfs/jfs_dtree.c:493:13: error: invalid storage class for function ‘free_index’
[...]
>
> I'll run mrproper and try again...

This did not help, but adding a closing bracket did, in fs/jfs/jfs_dtree.c:354

if (jfs_ip->next_index < 3) {
jfs_ip->next_index = 3;
}
-----^

This compiled and booted and now I can run find(1) over that whole NFS
share, without any "readdir loop" messages and with unique inode numbers,
yay!

Tested-by: Christian Kujau <[email protected]>

Thanks!
Christian.
--
BOFH excuse #36:

dynamic software linking table corrupted

2013-08-19 19:40:59

by ben

[permalink] [raw]
Subject: Re: [JunkMail] Re: [PATCH] jfs: fix readdir cookie incompatibility with NFSv4

On 8/17/2013 2:01 PM, Ben Hutchings wrote:
> On Thu, 2013-08-15 at 14:26 -0700, Christian Kujau wrote:
>> On Thu, 15 Aug 2013 at 15:48, Dave Kleikamp wrote:
>>> This patch replaces the one I posted yesterday. I like this better since
>>> it doesn't require fixing existing on-disk cookies or skipping a
>>> position in the in-inode index table.
>>
>> Thanks. Applied to 3.11-rc5 and tested, no more "readdir loop" messages
>> and with unique inode numbers, great!
>>
>> Tested-by: Christian Kujau <[email protected]>
>
> Karl and Jonathan, could you test the attached backport to 3.2?
>
> (Instructions for rebuilding the Debian kernel package are at:
> <http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official>)
>
> Ben.
>
When I read this the other day I thought It was a great idea, but when
sleeping on it I have this gut feeling that somewhere there is a
jfs_dirent->position--; missing but I don't have enough familiarity with
the code to begin to guess where. Hopefully I am wrong.

ben

--
Ben Hildred
Support Services
Applied Plastic Coatings, Inc.
5000 Tabor St.
Wheat Ridge, CO 80033
303 424 9200
F: 303 424 8800
[email protected]
http://appliedplastic.com


2013-08-15 13:39:06

by Dave Kleikamp

[permalink] [raw]
Subject: Re: [PATCH] jfs: avoid misuse of cookie value of 2

On 08/15/2013 02:09 AM, Christian Kujau wrote:
> On Wed, 14 Aug 2013 at 21:29, Christian Kujau wrote:
>
>> On Wed, 14 Aug 2013 at 22:54, Dave Kleikamp wrote:
>>> It looks like the problem is that jfs was using a cookie value of 2 for
>>> a real directory entry, where NFSv4 expect 2 to represent "..". This
>>> patch has so far only been lightly tested.
>>
>> Hm, a first compile of 3.11-rc5 errors out with:
>>
>> CC [M] fs/jfs/jfs_dtree.o
>> /usr/local/src/linux-git/fs/jfs/jfs_dtree.c: In function ‘add_index’:
>> /usr/local/src/linux-git/fs/jfs/jfs_dtree.c:493:13: error: invalid storage class for function ‘free_index’
> [...]
>>
>> I'll run mrproper and try again...
>
> This did not help, but adding a closing bracket did, in fs/jfs/jfs_dtree.c:354
>
> if (jfs_ip->next_index < 3) {
> jfs_ip->next_index = 3;
> }
> -----^
>
> This compiled and booted and now I can run find(1) over that whole NFS
> share, without any "readdir loop" messages and with unique inode numbers,
> yay!

My bad. That's what happens when you clean up the patch after you test
it. I intended to remove the opening bracket when I removed a warning.

> Tested-by: Christian Kujau <[email protected]>

Thanks. After sleeping on it, I'm contemplating a simpler patch. I'll
keep you up to date.

>
> Thanks!
> Christian.
>

2013-08-24 22:22:12

by Christian Kujau

[permalink] [raw]
Subject: Re: [Jfs-discussion] [PATCH] jfs: fix readdir cookie incompatibility with NFSv4

On Thu, 15 Aug 2013 at 15:48, Dave Kleikamp wrote:
> This patch replaces the one I posted yesterday. I like this better since
> it doesn't require fixing existing on-disk cookies or skipping a
> position in the in-inode index table.
>
> NFSv4 reserves readdir cookie values 0-2 for special entries (. and ..),
> but jfs allows a value of 2 for a non-special entry. This incompatibility
> can result in the nfs client reporting a readdir loop.
>
> This patch doesn't change the value stored internally, but adds one to
> the value exposed to the iterate method.

Out of curiosity, will this land on 3.11?

Christian.
--
BOFH excuse #102:

Power company testing new voltage spike (creation) equipment

2013-08-15 13:38:25

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] jfs: avoid misuse of cookie value of 2

On Wed, Aug 14, 2013 at 10:54:31PM -0500, Dave Kleikamp wrote:
> For the sake of those not watching
> https://bugzilla.kernel.org/show_bug.cgi?id=60737
>
> It looks like the problem is that jfs was using a cookie value of 2 for
> a real directory entry, where NFSv4 expect 2 to represent "..". This
> patch has so far only been lightly tested.
>
> NFSv4 reserves cookie values 0, 1 and 2 for a rewind, and the "." and ".."
> entries. jfs was using 0 and 1 for "." and "..", but 2 for a regular entry.
> This patch makes jfs conform by using 1 and 2 for "." and ".." and fixes
> any regular entry using the value 2.

Oh, I'd forgotten that. From rfc 5661:

For some file system environments, the directory entries "." and
".." have special meaning, and in other environments, they do
not. If the server supports these special entries within a
directory, they SHOULD NOT be returned to the client as part of
the READDIR response. To enable some client environments, the
cookie values of zero, 1, and 2 are to be considered reserved.
Note that the UNIX client will use these values when combining
the server's response and local representations to enable a
fully formed UNIX directory presentation to the application.

OK!

--b.

>
> Signed-off-by: Dave Kleikamp <[email protected]>
>
> diff --git a/fs/jfs/jfs_dtree.c b/fs/jfs/jfs_dtree.c
> index 8743ba9..93466e8 100644
> --- a/fs/jfs/jfs_dtree.c
> +++ b/fs/jfs/jfs_dtree.c
> @@ -349,11 +349,8 @@ static u32 add_index(tid_t tid, struct inode *ip, s64 bn, int slot)
>
> ASSERT(DO_INDEX(ip));
>
> - if (jfs_ip->next_index < 2) {
> - jfs_warn("add_index: next_index = %d. Resetting!",
> - jfs_ip->next_index);
> - jfs_ip->next_index = 2;
> - }
> + if (jfs_ip->next_index < 3) {
> + jfs_ip->next_index = 3;
>
> index = jfs_ip->next_index++;
>
> @@ -2864,7 +2861,7 @@ void dtInitRoot(tid_t tid, struct inode *ip, u32 idotdot)
> } else
> ip->i_size = 1;
>
> - jfs_ip->next_index = 2;
> + jfs_ip->next_index = 3;
> } else
> ip->i_size = IDATASIZE;
>
> @@ -2951,7 +2948,7 @@ static void add_missing_indices(struct inode *inode, s64 bn)
> for (i = 0; i < p->header.nextindex; i++) {
> d = (struct ldtentry *) &p->slot[stbl[i]];
> index = le32_to_cpu(d->index);
> - if ((index < 2) || (index >= JFS_IP(inode)->next_index)) {
> + if ((index < 3) || (index >= JFS_IP(inode)->next_index)) {
> d->index = cpu_to_le32(add_index(tid, inode, bn, i));
> if (dtlck->index >= dtlck->maxcnt)
> dtlck = (struct dt_lock *) txLinelock(dtlck);
> @@ -3031,7 +3028,7 @@ int jfs_readdir(struct file *file, struct dir_context *ctx)
> struct jfs_dirent *jfs_dirent;
> int jfs_dirents;
> int overflow, fix_page, page_fixed = 0;
> - static int unique_pos = 2; /* If we can't fix broken index */
> + static int unique_pos = 3; /* If we can't fix broken index */
>
> if (ctx->pos == DIREND)
> return 0;
> @@ -3039,15 +3036,16 @@ int jfs_readdir(struct file *file, struct dir_context *ctx)
> if (DO_INDEX(ip)) {
> /*
> * persistent index is stored in directory entries.
> - * Special cases: 0 = .
> - * 1 = ..
> + * Special cases: 0 = rewind
> + * 1 = .
> + * 2 = ..
> * -1 = End of directory
> */
> do_index = 1;
>
> dir_index = (u32) ctx->pos;
>
> - if (dir_index > 1) {
> + if (dir_index > 2) {
> struct dir_table_slot dirtab_slot;
>
> if (dtEmpty(ip) ||
> @@ -3090,18 +3088,18 @@ int jfs_readdir(struct file *file, struct dir_context *ctx)
> return 0;
> }
> } else {
> - if (dir_index == 0) {
> + if (dir_index < 2) {
> /*
> * self "."
> */
> - ctx->pos = 0;
> + ctx->pos = 1;
> if (!dir_emit(ctx, ".", 1, ip->i_ino, DT_DIR))
> return 0;
> }
> /*
> * parent ".."
> */
> - ctx->pos = 1;
> + ctx->pos = 2;
> if (!dir_emit(ctx, "..", 2, PARENT(ip), DT_DIR))
> return 0;
>
> @@ -3122,22 +3120,24 @@ int jfs_readdir(struct file *file, struct dir_context *ctx)
> /*
> * Legacy filesystem - OS/2 & Linux JFS < 0.3.6
> *
> - * pn = index = 0: First entry "."
> - * pn = 0; index = 1: Second entry ".."
> + * pn = 0; index = 1: First entry "."
> + * pn = 0; index = 2: Second entry ".."
> * pn > 0: Real entries, pn=1 -> leftmost page
> * pn = index = -1: No more entries
> */
> dtpos = ctx->pos;
> - if (dtpos == 0) {
> + if (dtpos < 2) {
> + ctx->pos = 1;
> /* build "." entry */
> if (!dir_emit(ctx, ".", 1, ip->i_ino, DT_DIR))
> return 0;
> - dtoffset->index = 1;
> + dtoffset->index = 2;
> ctx->pos = dtpos;
> }
>
> if (dtoffset->pn == 0) {
> - if (dtoffset->index == 1) {
> + if (dtoffset->index == 2) {
> + ctx->pos = 2;
> /* build ".." entry */
> if (!dir_emit(ctx, "..", 2, PARENT(ip), DT_DIR))
> return 0;
> @@ -3210,8 +3210,12 @@ int jfs_readdir(struct file *file, struct dir_context *ctx)
> * directory index for the lost+found
> * directory. Rather than let it go,
> * we can try to fix it.
> + *
> + * Additionally, a value of 2 used to be
> + * valid, but it didn't work well with
> + * NFSv4, so if found, we need to change it
> */
> - if ((jfs_dirent->position < 2) ||
> + if ((jfs_dirent->position < 3) ||
> (jfs_dirent->position >=
> JFS_IP(ip)->next_index)) {
> if (!page_fixed && !isReadOnly(ip)) {

2013-08-15 04:29:37

by Christian Kujau

[permalink] [raw]
Subject: Re: [PATCH] jfs: avoid misuse of cookie value of 2

On Wed, 14 Aug 2013 at 22:54, Dave Kleikamp wrote:
> It looks like the problem is that jfs was using a cookie value of 2 for
> a real directory entry, where NFSv4 expect 2 to represent "..". This
> patch has so far only been lightly tested.

Hm, a first compile of 3.11-rc5 errors out with:

CC [M] fs/jfs/jfs_dtree.o
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c: In function ‘add_index’:
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:493:13: error: invalid storage class for function ‘free_index’
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:493:1: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:521:13: error: invalid storage class for function ‘modify_index’
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:546:12: error: invalid storage class for function ‘read_index’
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:927:12: error: invalid storage class for function ‘dtSplitUp’
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:1327:12: error: invalid storage class for function ‘dtSplitPage’
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:1639:12: error: invalid storage class for function ‘dtExtendPage’
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:1872:12: error: invalid storage class for function ‘dtSplitRoot’
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:2234:12: error: invalid storage class for function ‘dtDeleteUp’
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:2744:12: error: invalid storage class for function ‘dtRelink’
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:2915:13: error: invalid storage class for function ‘add_missing_indices’
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:2982:34: error: invalid storage class for function ‘next_jfs_dirent’
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:3333:12: error: invalid storage class for function ‘dtReadFirst’
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:3405:12: error: invalid storage class for function ‘dtReadNext’
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:3581:12: error: invalid storage class for function ‘dtCompare’
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:3657:12: error: invalid storage class for function ‘ciCompare’
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:3765:12: error: invalid storage class for function ‘ciGetLeafPrefixKey’
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:3832:13: error: invalid storage class for function ‘dtGetKey’
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:3896:13: error: invalid storage class for function ‘dtInsertEntry’
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:4054:13: error: invalid storage class for function ‘dtMoveEntry’
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:4255:13: error: invalid storage class for function ‘dtDeleteEntry’
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:4350:13: error: invalid storage class for function ‘dtTruncateEntry’
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:4430:13: error: invalid storage class for function ‘dtLinelockFreelist’
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:4565:1: error: expected declaration or statement at end of input
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c: At top level:
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:152:12: warning: ‘dtSplitUp’ used but never defined [enabled by default]
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:155:12: warning: ‘dtSplitPage’ used but never defined [enabled by default]
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:158:12: warning: ‘dtExtendPage’ used but never defined [enabled by default]
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:161:12: warning: ‘dtSplitRoot’ used but never defined [enabled by default]
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:164:12: warning: ‘dtDeleteUp’ used but never defined [enabled by default]
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:167:12: warning: ‘dtRelink’ used but never defined [enabled by default]
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:169:12: warning: ‘dtReadFirst’ used but never defined [enabled by default]
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:171:12: warning: ‘dtReadNext’ used but never defined [enabled by default]
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:174:12: warning: ‘dtCompare’ used but never defined [enabled by default]
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:176:12: warning: ‘ciCompare’ used but never defined [enabled by default]
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:179:13: warning: ‘dtGetKey’ used but never defined [enabled by default]
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:182:12: warning: ‘ciGetLeafPrefixKey’ used but never defined [enabled by default]
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:185:13: warning: ‘dtInsertEntry’ used but never defined [enabled by default]
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:188:13: warning: ‘dtMoveEntry’ used but never defined [enabled by default]
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:192:13: warning: ‘dtDeleteEntry’ used but never defined [enabled by default]
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:194:13: warning: ‘dtTruncateEntry’ used but never defined [enabled by default]
/usr/local/src/linux-git/fs/jfs/jfs_dtree.c:196:13: warning: ‘dtLinelockFreelist’ used but never defined [enabled by default]
make[7]: *** [fs/jfs/jfs_dtree.o] Error 1
make[7]: *** Waiting for unfinished jobs....
CC drivers/acpi/acpica/utmutex.o
CC drivers/acpi/acpica/utobject.o
make[6]: *** [fs/jfs] Error 2
make[5]: *** [fs] Error 2
make[5]: *** Waiting for unfinished jobs....


I'll run mrproper and try again...

Christian.
--
BOFH excuse #219:

Recursivity. Call back if it happens again.

2013-08-12 20:05:06

by Christian Kujau

[permalink] [raw]
Subject: Re: [Jfs-discussion] NFS 'readdir loop' error on JFS

On Mon, 12 Aug 2013 at 12:29, J. Bruce Fields wrote:
> It might be interesting to get a network trace (something like tcpdump
> -s0 -wtmp.pcap; then "wireshark tmp.pcap" and look at the "cookie"
> fields in the readdir calls and replies.

I've created #60737[0] to track this issue upstream and attached a pcap to
the bug, obtained while running "find <dir> -ls" on the client. But I fail
to look at the right details in tcpdump/wireshare, I don't see any cookie
information...

> You could also just run "strace -egetdents64 -v ls" on the server on
> the exported filesystem, in a problem directory, and see if the offsets
> are unique.

strace returned nothing for "getdents64", only "getdents". My test
filesystems are 256 MB in size, maybe this is too small for getdents64 to
be used? All the calls to "getdents" however return unique offsets, if I
did this right:

$ strace -egetdents -v ls /mnt/disk_jfs/usr/share/terminfo/q 2>&1 | egrep -o "d_off=[0-9]*" | sort

When running "ls" (even w/o "-l") on the client on that NFS share, this
"readdir loop" message is printed.

HTH,
Christian.

[0] https://bugzilla.kernel.org/show_bug.cgi?id=60737

2013-08-29 23:09:47

by Jonathan McDowell

[permalink] [raw]
Subject: Re: [PATCH] jfs: fix readdir cookie incompatibility with NFSv4

On Sat, Aug 17, 2013 at 10:01:31PM +0200, Ben Hutchings wrote:
> On Thu, 2013-08-15 at 14:26 -0700, Christian Kujau wrote:
> > On Thu, 15 Aug 2013 at 15:48, Dave Kleikamp wrote:
> > > This patch replaces the one I posted yesterday. I like this better since
> > > it doesn't require fixing existing on-disk cookies or skipping a
> > > position in the in-inode index table.
> >
> > Thanks. Applied to 3.11-rc5 and tested, no more "readdir loop" messages
> > and with unique inode numbers, great!
> >
> > Tested-by: Christian Kujau <[email protected]>
>
> Karl and Jonathan, could you test the attached backport to 3.2?

I finally managed to be able to schedule some downtime yesterday for one
of the machines I've been seeing this issue with. So far since the
reboot to this kernel (3.2.46-1 + the patch) I haven't seen a recurrence
of the problem; will update if I see anything.

J.

--
Just 'cause I remembered one thing doesn't make me smart!


Attachments:
(No filename) (955.00 B)
signature.asc (836.00 B)
Digital signature
Download all attachments

2013-08-15 21:26:32

by Christian Kujau

[permalink] [raw]
Subject: Re: [PATCH] jfs: fix readdir cookie incompatibility with NFSv4

On Thu, 15 Aug 2013 at 15:48, Dave Kleikamp wrote:
> This patch replaces the one I posted yesterday. I like this better since
> it doesn't require fixing existing on-disk cookies or skipping a
> position in the in-inode index table.

Thanks. Applied to 3.11-rc5 and tested, no more "readdir loop" messages
and with unique inode numbers, great!

Tested-by: Christian Kujau <[email protected]>

Thanks for the fix!
Christian.
--
BOFH excuse #442:

Trojan horse ran out of hay

2013-08-15 22:10:26

by Dave Kleikamp

[permalink] [raw]
Subject: Re: [PATCH] jfs: fix readdir cookie incompatibility with NFSv4

On 08/15/2013 04:26 PM, Christian Kujau wrote:
> On Thu, 15 Aug 2013 at 15:48, Dave Kleikamp wrote:
>> This patch replaces the one I posted yesterday. I like this better since
>> it doesn't require fixing existing on-disk cookies or skipping a
>> position in the in-inode index table.
>
> Thanks. Applied to 3.11-rc5 and tested, no more "readdir loop" messages
> and with unique inode numbers, great!
>
> Tested-by: Christian Kujau <[email protected]>
>
> Thanks for the fix!
> Christian.

Thanks for reporting, investigating and testing!

Dave


2013-08-12 08:18:49

by Christian Kujau

[permalink] [raw]
Subject: Re: [Jfs-discussion] NFS 'readdir loop' error on JFS

FWIW, this still happens when both client & server are running Linux
3.11.0-rc5 (vanilla).

$ dpkg -l | grep nfs | cut -c-70
ii libnfsidmap2:amd64 0.25-4 amd64
ii nfs-common 1:1.2.6-4 amd64
ii nfs-kernel-server 1:1.2.6-4 amd64

2013-08-26 22:28:26

by Dave Kleikamp

[permalink] [raw]
Subject: Re: [Jfs-discussion] [PATCH] jfs: fix readdir cookie incompatibility with NFSv4

On 08/24/2013 05:21 PM, Christian Kujau wrote:
> On Thu, 15 Aug 2013 at 15:48, Dave Kleikamp wrote:
>> This patch replaces the one I posted yesterday. I like this better since
>> it doesn't require fixing existing on-disk cookies or skipping a
>> position in the in-inode index table.
>>
>> NFSv4 reserves readdir cookie values 0-2 for special entries (. and ..),
>> but jfs allows a value of 2 for a non-special entry. This incompatibility
>> can result in the nfs client reporting a readdir loop.
>>
>> This patch doesn't change the value stored internally, but adds one to
>> the value exposed to the iterate method.
>
> Out of curiosity, will this land on 3.11?

Just sent a pull request to Linus. Once it's picked up, I'll submit a
patch to the stable trees.

2013-08-12 16:29:38

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [Jfs-discussion] NFS 'readdir loop' error on JFS

On Mon, Aug 12, 2013 at 01:29:15AM -0700, Christian Kujau wrote:
> Sorry for the noise, here's another oddity, same setup (client & server
> running 3.11-rc5):
>
> $ find /mnt/nfs/usr/share/ -name getopt.awk -ls
> 25072 4 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk
> 25072 4 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk
> 25072 4 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk
> 25072 4 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk
> 25072 4 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk
> 25072 4 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk
> 25072 4 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk
> 25072 4 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk
> 25072 4 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk
> 25072 4 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk
>
> It's the same file, but gets reported 10 times! Hence the error when
> trying to tar(1) the directory:
>
> $ tar -cf - /mnt/nfs/usr/share/awk/ > /dev/null
> tar: Removing leading `/' from member names
> tar: /mnt/nfs/usr/share/awk/: Cannot savedir: Too many levels of symbolic links
> tar: Exiting with failure status due to previous errors
>
> On the server:
>
> $ find /mnt/disk/usr/share/ -name getopt.awk -ls
> 25072 4 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/disk/usr/share/awk/getopt.awk
>
> So, is "JFS && NFS" really br0ken and nobody noticed?

It does sound like a jfs bug, and I don't know if anyone tests nfs
exports of jfs regularly.

It might be interesting to get a network trace (something like tcpdump
-s0 -wtmp.pcap; then "wireshark tmp.pcap" and look at the "cookie"
fields in the readdir calls and replies. The server shouldn't return
the same one twice on one read through the directory. And when the
client uses a cookie it should get the next entries, not
already-returned entries.)

You could also just run "strace -egetdents64 -v ls" on the server on
the exported filesystem, in a problem directory, and see if the offsets
are unique.

--b.

2013-08-15 03:55:21

by Dave Kleikamp

[permalink] [raw]
Subject: [PATCH] jfs: avoid misuse of cookie value of 2

For the sake of those not watching
https://bugzilla.kernel.org/show_bug.cgi?id=60737

It looks like the problem is that jfs was using a cookie value of 2 for
a real directory entry, where NFSv4 expect 2 to represent "..". This
patch has so far only been lightly tested.

NFSv4 reserves cookie values 0, 1 and 2 for a rewind, and the "." and ".."
entries. jfs was using 0 and 1 for "." and "..", but 2 for a regular entry.
This patch makes jfs conform by using 1 and 2 for "." and ".." and fixes
any regular entry using the value 2.

Signed-off-by: Dave Kleikamp <[email protected]>

diff --git a/fs/jfs/jfs_dtree.c b/fs/jfs/jfs_dtree.c
index 8743ba9..93466e8 100644
--- a/fs/jfs/jfs_dtree.c
+++ b/fs/jfs/jfs_dtree.c
@@ -349,11 +349,8 @@ static u32 add_index(tid_t tid, struct inode *ip, s64 bn, int slot)

ASSERT(DO_INDEX(ip));

- if (jfs_ip->next_index < 2) {
- jfs_warn("add_index: next_index = %d. Resetting!",
- jfs_ip->next_index);
- jfs_ip->next_index = 2;
- }
+ if (jfs_ip->next_index < 3) {
+ jfs_ip->next_index = 3;

index = jfs_ip->next_index++;

@@ -2864,7 +2861,7 @@ void dtInitRoot(tid_t tid, struct inode *ip, u32 idotdot)
} else
ip->i_size = 1;

- jfs_ip->next_index = 2;
+ jfs_ip->next_index = 3;
} else
ip->i_size = IDATASIZE;

@@ -2951,7 +2948,7 @@ static void add_missing_indices(struct inode *inode, s64 bn)
for (i = 0; i < p->header.nextindex; i++) {
d = (struct ldtentry *) &p->slot[stbl[i]];
index = le32_to_cpu(d->index);
- if ((index < 2) || (index >= JFS_IP(inode)->next_index)) {
+ if ((index < 3) || (index >= JFS_IP(inode)->next_index)) {
d->index = cpu_to_le32(add_index(tid, inode, bn, i));
if (dtlck->index >= dtlck->maxcnt)
dtlck = (struct dt_lock *) txLinelock(dtlck);
@@ -3031,7 +3028,7 @@ int jfs_readdir(struct file *file, struct dir_context *ctx)
struct jfs_dirent *jfs_dirent;
int jfs_dirents;
int overflow, fix_page, page_fixed = 0;
- static int unique_pos = 2; /* If we can't fix broken index */
+ static int unique_pos = 3; /* If we can't fix broken index */

if (ctx->pos == DIREND)
return 0;
@@ -3039,15 +3036,16 @@ int jfs_readdir(struct file *file, struct dir_context *ctx)
if (DO_INDEX(ip)) {
/*
* persistent index is stored in directory entries.
- * Special cases: 0 = .
- * 1 = ..
+ * Special cases: 0 = rewind
+ * 1 = .
+ * 2 = ..
* -1 = End of directory
*/
do_index = 1;

dir_index = (u32) ctx->pos;

- if (dir_index > 1) {
+ if (dir_index > 2) {
struct dir_table_slot dirtab_slot;

if (dtEmpty(ip) ||
@@ -3090,18 +3088,18 @@ int jfs_readdir(struct file *file, struct dir_context *ctx)
return 0;
}
} else {
- if (dir_index == 0) {
+ if (dir_index < 2) {
/*
* self "."
*/
- ctx->pos = 0;
+ ctx->pos = 1;
if (!dir_emit(ctx, ".", 1, ip->i_ino, DT_DIR))
return 0;
}
/*
* parent ".."
*/
- ctx->pos = 1;
+ ctx->pos = 2;
if (!dir_emit(ctx, "..", 2, PARENT(ip), DT_DIR))
return 0;

@@ -3122,22 +3120,24 @@ int jfs_readdir(struct file *file, struct dir_context *ctx)
/*
* Legacy filesystem - OS/2 & Linux JFS < 0.3.6
*
- * pn = index = 0: First entry "."
- * pn = 0; index = 1: Second entry ".."
+ * pn = 0; index = 1: First entry "."
+ * pn = 0; index = 2: Second entry ".."
* pn > 0: Real entries, pn=1 -> leftmost page
* pn = index = -1: No more entries
*/
dtpos = ctx->pos;
- if (dtpos == 0) {
+ if (dtpos < 2) {
+ ctx->pos = 1;
/* build "." entry */
if (!dir_emit(ctx, ".", 1, ip->i_ino, DT_DIR))
return 0;
- dtoffset->index = 1;
+ dtoffset->index = 2;
ctx->pos = dtpos;
}

if (dtoffset->pn == 0) {
- if (dtoffset->index == 1) {
+ if (dtoffset->index == 2) {
+ ctx->pos = 2;
/* build ".." entry */
if (!dir_emit(ctx, "..", 2, PARENT(ip), DT_DIR))
return 0;
@@ -3210,8 +3210,12 @@ int jfs_readdir(struct file *file, struct dir_context *ctx)
* directory index for the lost+found
* directory. Rather than let it go,
* we can try to fix it.
+ *
+ * Additionally, a value of 2 used to be
+ * valid, but it didn't work well with
+ * NFSv4, so if found, we need to change it
*/
- if ((jfs_dirent->position < 2) ||
+ if ((jfs_dirent->position < 3) ||
(jfs_dirent->position >=
JFS_IP(ip)->next_index)) {
if (!page_fixed && !isReadOnly(ip)) {

2013-08-15 20:49:27

by Dave Kleikamp

[permalink] [raw]
Subject: [PATCH] jfs: fix readdir cookie incompatibility with NFSv4

This patch replaces the one I posted yesterday. I like this better since
it doesn't require fixing existing on-disk cookies or skipping a
position in the in-inode index table.

NFSv4 reserves readdir cookie values 0-2 for special entries (. and ..),
but jfs allows a value of 2 for a non-special entry. This incompatibility
can result in the nfs client reporting a readdir loop.

This patch doesn't change the value stored internally, but adds one to
the value exposed to the iterate method.

Signed-off-by: Dave Kleikamp <[email protected]>
---
fs/jfs/jfs_dtree.c | 31 +++++++++++++++++++++++--------
1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/fs/jfs/jfs_dtree.c b/fs/jfs/jfs_dtree.c
index 8743ba9..0ec767e 100644
--- a/fs/jfs/jfs_dtree.c
+++ b/fs/jfs/jfs_dtree.c
@@ -3047,6 +3047,14 @@ int jfs_readdir(struct file *file, struct dir_context *ctx)

dir_index = (u32) ctx->pos;

+ /*
+ * NFSv4 reserves cookies 1 and 2 for . and .. so we add
+ * the value we return to the vfs is one greater than the
+ * one we use internally.
+ */
+ if (dir_index)
+ dir_index--;
+
if (dir_index > 1) {
struct dir_table_slot dirtab_slot;

@@ -3086,7 +3094,7 @@ int jfs_readdir(struct file *file, struct dir_context *ctx)
if (p->header.flag & BT_INTERNAL) {
jfs_err("jfs_readdir: bad index table");
DT_PUTPAGE(mp);
- ctx->pos = -1;
+ ctx->pos = DIREND;
return 0;
}
} else {
@@ -3094,14 +3102,14 @@ int jfs_readdir(struct file *file, struct dir_context *ctx)
/*
* self "."
*/
- ctx->pos = 0;
+ ctx->pos = 1;
if (!dir_emit(ctx, ".", 1, ip->i_ino, DT_DIR))
return 0;
}
/*
* parent ".."
*/
- ctx->pos = 1;
+ ctx->pos = 2;
if (!dir_emit(ctx, "..", 2, PARENT(ip), DT_DIR))
return 0;

@@ -3122,22 +3130,23 @@ int jfs_readdir(struct file *file, struct dir_context *ctx)
/*
* Legacy filesystem - OS/2 & Linux JFS < 0.3.6
*
- * pn = index = 0: First entry "."
- * pn = 0; index = 1: Second entry ".."
+ * pn = 0; index = 1: First entry "."
+ * pn = 0; index = 2: Second entry ".."
* pn > 0: Real entries, pn=1 -> leftmost page
* pn = index = -1: No more entries
*/
dtpos = ctx->pos;
- if (dtpos == 0) {
+ if (dtpos < 2) {
/* build "." entry */
+ ctx->pos = 1;
if (!dir_emit(ctx, ".", 1, ip->i_ino, DT_DIR))
return 0;
- dtoffset->index = 1;
+ dtoffset->index = 2;
ctx->pos = dtpos;
}

if (dtoffset->pn == 0) {
- if (dtoffset->index == 1) {
+ if (dtoffset->index == 2) {
/* build ".." entry */
if (!dir_emit(ctx, "..", 2, PARENT(ip), DT_DIR))
return 0;
@@ -3228,6 +3237,12 @@ int jfs_readdir(struct file *file, struct dir_context *ctx)
}
jfs_dirent->position = unique_pos++;
}
+ /*
+ * We add 1 to the index because we may
+ * use a value of 2 internally, and NFSv4
+ * doesn't like that.
+ */
+ jfs_dirent->position++;
} else {
jfs_dirent->position = dtpos;
len = min(d_namleft, DTLHDRDATALEN_LEGACY);
--
1.8.3.4


2013-08-12 08:29:20

by Christian Kujau

[permalink] [raw]
Subject: Re: [Jfs-discussion] NFS 'readdir loop' error on JFS

Sorry for the noise, here's another oddity, same setup (client & server
running 3.11-rc5):

$ find /mnt/nfs/usr/share/ -name getopt.awk -ls
25072 4 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk
25072 4 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk
25072 4 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk
25072 4 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk
25072 4 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk
25072 4 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk
25072 4 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk
25072 4 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk
25072 4 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk
25072 4 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/nfs/usr/share/awk/getopt.awk

It's the same file, but gets reported 10 times! Hence the error when
trying to tar(1) the directory:

$ tar -cf - /mnt/nfs/usr/share/awk/ > /dev/null
tar: Removing leading `/' from member names
tar: /mnt/nfs/usr/share/awk/: Cannot savedir: Too many levels of symbolic links
tar: Exiting with failure status due to previous errors

On the server:

$ find /mnt/disk/usr/share/ -name getopt.awk -ls
25072 4 -rw-r--r-- 1 root root 2237 Mar 16 04:46 /mnt/disk/usr/share/awk/getopt.awk

So, is "JFS && NFS" really br0ken and nobody noticed?

2013-08-10 19:43:58

by Karl Schmidt

[permalink] [raw]
Subject: Re: [Jfs-discussion] NFS 'readdir loop' error on JFS

On 08/10/2013 02:28 AM, Christian Kujau wrote:
> Interesting stuff. Out of curiosity I just tried this myself, both client
> & server are virtual machines running Debian/stable (3.2.0-4-amd64) and I
> was able to reproduce this. A test case would be:
>

I still haven't rebooted that machine - last chance to ask for any test info - as it looks like you
have a test case anyway.

I haven't lost any data that I know of - just programs complaining etc.

IMO, at one time, jfs was really a better choice ( good set of tools). Even in a few cases where
hardware failed the jfs tools worked well. Today with everyone banging on ext4 it has become the
better choice. ( I don't think IBM is interested in supporting jfs - no idea if they are phasing out
jfs2? ).




--------------------------------------------------------------------------------
Karl Schmidt EMail [email protected]
Transtronics, Inc. WEB http://secure.transtronics.com
3209 West 9th Street Ph (785) 841-3089
Lawrence, KS 66049 FAX (785) 841-0434

The world runs on individuals pursuing their separate interests.
The great achievements of civilization have not come from
government bureaus. Einstein didn?t construct his theory under
order from a bureaucrat. Henry Ford didn?t revolutionize the
automobile industry that way.
--------------------------------------------------------------------------------

2013-08-10 07:38:58

by Christian Kujau

[permalink] [raw]
Subject: Re: [Jfs-discussion] NFS 'readdir loop' error on JFS

Interesting stuff. Out of curiosity I just tried this myself, both client
& server are virtual machines running Debian/stable (3.2.0-4-amd64) and I
was able to reproduce this. A test case would be:

## server:
$ apt-get install nfs-kernel-server jfsutils
$ dd if=/dev/zero bs=1M count=256 > /var/test.img
$ losetup -f /var/test.img
$ mkfs.jfs /dev/loop0
$ mount -t jfs /dev/loop0 /mnt/disk
$ tar -C / -cf - usr/share | tar -C /mnt/disk/ -xf -
$ tail -1 /etc/exports
/mnt/disk 192.168.0.0/24(rw,sync,no_root_squash,no_subtree_check)
$ service nfs-kernel-server restart

## client
$ apt-get install nfs-common
$ showmount -e server | tail -1
/mnt/disk 192.168.0.0/24
$ tail -1 /etc/fstab
server:/mnt/disk /mnt/nfs nfs rsize=8192,wsize=8192,intr 0 0
$ mount /mnt/nfs
$ mount | tail -1
server:/mnt/disk on /mnt/nfs type nfs4 (rw,relatime,vers=4,rsize=8192,wsize=8192,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.137,minorversion=0,local_lock=none,addr=192.168.0.138)

$ tar -cf - /mnt/nfs/ > /dev/null
tar: Removing leading `/' from member names
tar: Removing leading `/' from hard link targets
tar: /mnt/nfs/usr/share/perl/5.14.2/Pod/: Cannot savedir: Too many levels of symbolic links
tar: Exiting with failure status due to previous errors

$ dmesg | tail
[ 63.912327] RPC: Registered named UNIX socket transport module.
[ 63.913801] RPC: Registered udp transport module.
[ 63.914713] RPC: Registered tcp transport module.
[ 63.915644] RPC: Registered tcp NFSv4.1 backchannel transport module.
[ 63.949485] FS-Cache: Loaded
[ 63.972688] FS-Cache: Netfs 'nfs' registered for caching
[ 63.993300] Installing knfsd (copyright (C) 1996 [email protected]).
[ 284.733629] loop: module loaded
[ 840.372846] NFS: directory 5.14.2/Pod contains a readdir loop.Please contact your server vendor. The file: Simple has duplicate cookie 18
[ 840.375842] NFS: directory 5.14.2/Pod contains a readdir loop.Please contact your server vendor. The file: Simple has duplicate cookie 18

There are no messages on the server when this happens. The message on the
client repeats on every attempt, this "Cannot savedir" above may be
triggering it.

2013-08-09 20:44:52

by Karl Schmidt

[permalink] [raw]
Subject: Re: NFS 'readdir loop' error on JFS

This problem is still alive - took a while for it to show up again:

(From the client - but speaks of the server )

Aug 9 14:32:20 singapore kernel: [497500.291867] NFS: directory timeless/JFK-prep contains a
readdir loop.Please contact your server vendor. The file: 234t.jpg has duplicate cookie 18
Aug 9 14:32:20 singapore kernel: [497500.291930] NFS: directory timeless/JFK-prep contains a
readdir loop.Please contact your server vendor. The file: 234t.jpg has duplicate cookie 18
Aug 9 14:32:20 singapore kernel: [497500.346824] NFS: directory pictures/wells_index contains a
readdir loop.Please contact your server vendor. The file: img_1260.jpgQ▒J@K���j�^ has duplicate
cookie 24
Aug 9 14:32:20 singapore kernel: [497500.346884] NFS: directory pictures/wells_index contains a
readdir loop.Please contact your server vendor. The file: img_1260.jpgQ▒J@K���j�^ has duplicate
cookie 24


This is a wheezy production server - I can run tests, provide information, but I'm about to purchase
some spare drives and move the server to ext4 to protect the data. So if anyone wants information,
now is the time to ask.

I will reboot this system in a few hours - waiting for your response.

( this problem took 25 days to reappear - so you might want to get some information ).

I'm an aging assembly programmer - I can do most things from the shell in Debian - I once did a
little C programming - but digging into this code is beyond me. On the other had, I am able and
willing to supply what every I can to help.

Server has:
# uname -a
Linux malaysia 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1 x86_64 GNU/Linux
# wajig list nfs
ii libnfsidmap2:amd64 0.25-4 amd64 NFS
idmapping library
ii nfs-common 1:1.2.6-4 amd64 NFS
support files common to client and server
ii nfs-kernel-server 1:1.2.6-4 amd64 support
for NFS kernel server

Export with loop:
/home/content 192.168.1.0/22(rw,sync,no_root_squash)

-


Client has:
# uname -a
Linux singapore 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1 x86_64 GNU/Linux


# wajig list nfs
ii libnfsidmap2:amd64 0.25-4 amd64 NFS
idmapping library
ii nfs-common 1:1.2.6-4 amd64 NFS
support files common to client and server

fstab mount with loop:
malaysia:/home/content /mnt/content nfs defaults,rsize=8192,wsize=8192,intr 0 0





--------------------------------------------------------------------------------
Karl Schmidt EMail [email protected]
Transtronics, Inc. WEB http://secure.transtronics.com
3209 West 9th Street Ph (785) 841-3089
Lawrence, KS 66049 FAX (785) 841-0434

Wrapping people up in the symbols of success when they are unearned, is very destructive. kps

--------------------------------------------------------------------------------

2013-09-05 03:41:14

by Jonathan McDowell

[permalink] [raw]
Subject: Re: Bug#714974: [PATCH] jfs: fix readdir cookie incompatibility with NFSv4

On Thu, Aug 29, 2013 at 03:48:03PM -0700, Jonathan McDowell wrote:
> On Sat, Aug 17, 2013 at 10:01:31PM +0200, Ben Hutchings wrote:
> > On Thu, 2013-08-15 at 14:26 -0700, Christian Kujau wrote:
> > > On Thu, 15 Aug 2013 at 15:48, Dave Kleikamp wrote:
> > > > This patch replaces the one I posted yesterday. I like this
> > > > better since it doesn't require fixing existing on-disk cookies
> > > > or skipping a position in the in-inode index table.
> > >
> > > Thanks. Applied to 3.11-rc5 and tested, no more "readdir loop"
> > > messages and with unique inode numbers, great!
> > >
> > > Tested-by: Christian Kujau <[email protected]>
> >
> > Karl and Jonathan, could you test the attached backport to 3.2?
>
> I finally managed to be able to schedule some downtime yesterday for
> one of the machines I've been seeing this issue with. So far since the
> reboot to this kernel (3.2.46-1 + the patch) I haven't seen a
> recurrence of the problem; will update if I see anything.

I see the patch hit 3.11, but just to confirm the 3.2 backport on top of
the Debian 3.2.46-1 kernel has seen no recurrence or issues in the past
week (with a set of JFS filesystems that are fairly extensively used
over NFS).

J.

--
Web [ Aunt Em: Hate Kansas. Hate you. Taking dog. Bye. Dorothy. ]
site: http:// [ ] Made by
http://www.earth.li/~noodles/ [ ] HuggieTag 0.0.24


Attachments:
(No filename) (1.41 kB)
signature.asc (836.00 B)
Digital signature
Download all attachments