2011-07-27 22:44:20

by Justin Piszcz

[permalink] [raw]
Subject: Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop



On Wed, 27 Jul 2011, Justin Piszcz wrote:

>
>
> On Wed, 27 Jul 2011, Trond Myklebust wrote:
>
> > On Wed, 2011-07-27 at 16:54 -0400, Trond Myklebust wrote:
> >> On Wed, 2011-07-27 at 16:37 -0400, Trond Myklebust wrote:
> >>> On Wed, 2011-07-27 at 15:47 -0400, Christoph Hellwig wrote:
> >>>> On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote:
> >>>>>
> >>>>>
> >>>>> On Wed, 27 Jul 2011, Christoph Hellwig wrote:
> >>>>>
> >>>>>> On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
> >>>>>>> Currently I do not see any dupes, however I have a script that moves
> >>>>>>> images out of the directory once an hour:
> >>>>>>> 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1
> >>>>>>
> >>>>>> Do you keep adding files to the directory while you move files out?
> >>>>> Yes, otherwise there are too many files in the directory and viewers, e.g.,
> >>>>> each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep
> >>>>> it around 5,000 pictures or less.
> >>>>>
> >>>>>> What's the rate of additions/removals to the directory?
> >>>>> Additions it depends, around 5,000 over a 12hr period, 416/hr, current:
> >>>>>
> >>>>> atom:/d1/motion# find cam1|wc
> >>>>> 5215 5215 166853
> >>>>> atom:/d1/motion# find cam2|wc
> >>>>> 5069 5069 162181
> >>>>> atom:/d1/motion# find cam3|wc
> >>>>> 5594 5594 178981
> >>>>> atom:/d1/motion#
> >>>>
> >>>> This sounds a lot like xfs simply filling up the directory index slots
> >>>> of files that you just moved out with new files, and nfs falsely
> >>>> claiming that this is a problem.
> >>>
> >>> Yep. There is an existing bugzilla report for this bug at
> >>>
> >>> https://bugzilla.kernel.org/show_bug.cgi?id=38572
> >>>
> >>> I have a preliminary patch there that attempts to turn off the loop
> >>> detection when the directory is seen to change, however that patch still
> >>> appears to have a bug in it, and I haven't had time to figure out what
> >>> is wrong yet.
> >>>
> >>> Can you perhaps take a look, Bryan?
> >>
> >> Actually, Justin, can you test the following slight variant on the patch
> >> in the bugzilla?
> >
> > Doh! This one will actually compile....
>
> Hi,
>
> Should I try 3.0 first or retry 2.6.38 w/ this patch?
>
> Justin.
>
>

I'll give 3.0 a go first.


Justin.


2011-07-28 20:49:53

by Myklebust, Trond

[permalink] [raw]
Subject: Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop

On Wed, 2011-07-27 at 18:44 -0400, Justin Piszcz wrote:
>
> On Wed, 27 Jul 2011, Justin Piszcz wrote:
>
> >
> >
> > On Wed, 27 Jul 2011, Trond Myklebust wrote:
> >
> > > On Wed, 2011-07-27 at 16:54 -0400, Trond Myklebust wrote:
> > >> On Wed, 2011-07-27 at 16:37 -0400, Trond Myklebust wrote:
> > >>> On Wed, 2011-07-27 at 15:47 -0400, Christoph Hellwig wrote:
> > >>>> On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote:
> > >>>>>
> > >>>>>
> > >>>>> On Wed, 27 Jul 2011, Christoph Hellwig wrote:
> > >>>>>
> > >>>>>> On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
> > >>>>>>> Currently I do not see any dupes, however I have a script that moves
> > >>>>>>> images out of the directory once an hour:
> > >>>>>>> 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1
> > >>>>>>
> > >>>>>> Do you keep adding files to the directory while you move files out?
> > >>>>> Yes, otherwise there are too many files in the directory and viewers, e.g.,
> > >>>>> each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep
> > >>>>> it around 5,000 pictures or less.
> > >>>>>
> > >>>>>> What's the rate of additions/removals to the directory?
> > >>>>> Additions it depends, around 5,000 over a 12hr period, 416/hr, current:
> > >>>>>
> > >>>>> atom:/d1/motion# find cam1|wc
> > >>>>> 5215 5215 166853
> > >>>>> atom:/d1/motion# find cam2|wc
> > >>>>> 5069 5069 162181
> > >>>>> atom:/d1/motion# find cam3|wc
> > >>>>> 5594 5594 178981
> > >>>>> atom:/d1/motion#
> > >>>>
> > >>>> This sounds a lot like xfs simply filling up the directory index slots
> > >>>> of files that you just moved out with new files, and nfs falsely
> > >>>> claiming that this is a problem.
> > >>>
> > >>> Yep. There is an existing bugzilla report for this bug at
> > >>>
> > >>> https://bugzilla.kernel.org/show_bug.cgi?id=38572
> > >>>
> > >>> I have a preliminary patch there that attempts to turn off the loop
> > >>> detection when the directory is seen to change, however that patch still
> > >>> appears to have a bug in it, and I haven't had time to figure out what
> > >>> is wrong yet.
> > >>>
> > >>> Can you perhaps take a look, Bryan?
> > >>
> > >> Actually, Justin, can you test the following slight variant on the patch
> > >> in the bugzilla?
> > >
> > > Doh! This one will actually compile....
> >
> > Hi,
> >
> > Should I try 3.0 first or retry 2.6.38 w/ this patch?
> >
> > Justin.
> >
> >
>
> I'll give 3.0 a go first.

I had Bryan do some more tests, which revealed a couple more issues. The
attached patch should fix those, and has resisted everything we've
thrown at it so far. It should apply to 2.6.39 and newer.

Cheers
Trond
8<-----------------------------------------------------------------------

2011-07-29 22:23:23

by Justin Piszcz

[permalink] [raw]
Subject: Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop



On Fri, 29 Jul 2011, Trond Myklebust wrote:

> On Fri, 2011-07-29 at 16:59 -0400, Justin Piszcz wrote:
>> On Fri, 29 Jul 2011, Bryan Schumaker wrote:
>>
>>> How does this look for printing out more information when a cookie loop is detected? Is there anything else that should be printed out? My patch applies on top of Trond's from yesterday.
>>
>>
>> Hi,
>>
>> This fails against 2.6.38:
>>
>> patching file fs/nfs/dir.c
>> Hunk #1 FAILED at 134.
>> Hunk #2 FAILED at 173.
>> Hunk #3 FAILED at 323.
>> Hunk #4 FAILED at 336.
>> Hunk #5 FAILED at 349.
>> Hunk #6 succeeded at 320 (offset -48 lines).
>> Hunk #7 FAILED at 741.
>> Hunk #8 succeeded at 716 (offset -59 lines).
>> Hunk #9 succeeded at 749 (offset -59 lines).
>> Hunk #10 succeeded at 763 (offset -59 lines).
>> 6 out of 10 hunks FAILED -- saving rejects to file fs/nfs/dir.c.rej
>> patching file include/linux/nfs_fs.h
>> Hunk #1 FAILED at 99.
>> 1 out of 1 hunk FAILED -- saving rejects to file include/linux/nfs_fs.h.rej
>> atom:/usr/src/linux#
>>
>> And the 3.0 kernel is broken for my wireless adapter:
>> http://www.gossamer-threads.com/lists/linux/kernel/1411576
>>
>> If you can make a combined patch for 2.6.38 I can try it, 2.6.39+ have a
>> horrible driver (rt2800usb) and 1 person emailed me as well stating the
>> same thing off-list (they stick with the manufacturer's driver or the *sta
>> one).
>
> I don't understand. The readdir loop detection code was first merged
> upstream in 2.6.39. 2.6.38 doesn't report any loops...

Hi,

Sorry--(my error) this is meant for the client, patched & will e-mail when
it happens again.

# patch -p1 < /home/jpiszcz/patch1
patching file fs/nfs/dir.c
patching file include/linux/nfs_fs.h

# patch -p1 < /home/jpiszcz/patch2
patching file fs/nfs/dir.c

(recompile->reboot->waiting for next error)

Justin.


2011-07-29 20:59:17

by Justin Piszcz

[permalink] [raw]
Subject: Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop


On Fri, 29 Jul 2011, Bryan Schumaker wrote:

> How does this look for printing out more information when a cookie loop is detected? Is there anything else that should be printed out? My patch applies on top of Trond's from yesterday.


Hi,

This fails against 2.6.38:

patching file fs/nfs/dir.c
Hunk #1 FAILED at 134.
Hunk #2 FAILED at 173.
Hunk #3 FAILED at 323.
Hunk #4 FAILED at 336.
Hunk #5 FAILED at 349.
Hunk #6 succeeded at 320 (offset -48 lines).
Hunk #7 FAILED at 741.
Hunk #8 succeeded at 716 (offset -59 lines).
Hunk #9 succeeded at 749 (offset -59 lines).
Hunk #10 succeeded at 763 (offset -59 lines).
6 out of 10 hunks FAILED -- saving rejects to file fs/nfs/dir.c.rej
patching file include/linux/nfs_fs.h
Hunk #1 FAILED at 99.
1 out of 1 hunk FAILED -- saving rejects to file include/linux/nfs_fs.h.rej
atom:/usr/src/linux#

And the 3.0 kernel is broken for my wireless adapter:
http://www.gossamer-threads.com/lists/linux/kernel/1411576

If you can make a combined patch for 2.6.38 I can try it, 2.6.39+ have a
horrible driver (rt2800usb) and 1 person emailed me as well stating the
same thing off-list (they stick with the manufacturer's driver or the *sta
one).

Justin.

2011-07-29 20:52:26

by Anna Schumaker

[permalink] [raw]
Subject: Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop

On 07/28/2011 04:48 PM, Trond Myklebust wrote:
> On Wed, 2011-07-27 at 18:44 -0400, Justin Piszcz wrote:
>>
>> On Wed, 27 Jul 2011, Justin Piszcz wrote:
>>
>>>
>>>
>>> On Wed, 27 Jul 2011, Trond Myklebust wrote:
>>>
>>>> On Wed, 2011-07-27 at 16:54 -0400, Trond Myklebust wrote:
>>>>> On Wed, 2011-07-27 at 16:37 -0400, Trond Myklebust wrote:
>>>>>> On Wed, 2011-07-27 at 15:47 -0400, Christoph Hellwig wrote:
>>>>>>> On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, 27 Jul 2011, Christoph Hellwig wrote:
>>>>>>>>
>>>>>>>>> On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote:
>>>>>>>>>> Currently I do not see any dupes, however I have a script that moves
>>>>>>>>>> images out of the directory once an hour:
>>>>>>>>>> 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1
>>>>>>>>>
>>>>>>>>> Do you keep adding files to the directory while you move files out?
>>>>>>>> Yes, otherwise there are too many files in the directory and viewers, e.g.,
>>>>>>>> each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep
>>>>>>>> it around 5,000 pictures or less.
>>>>>>>>
>>>>>>>>> What's the rate of additions/removals to the directory?
>>>>>>>> Additions it depends, around 5,000 over a 12hr period, 416/hr, current:
>>>>>>>>
>>>>>>>> atom:/d1/motion# find cam1|wc
>>>>>>>> 5215 5215 166853
>>>>>>>> atom:/d1/motion# find cam2|wc
>>>>>>>> 5069 5069 162181
>>>>>>>> atom:/d1/motion# find cam3|wc
>>>>>>>> 5594 5594 178981
>>>>>>>> atom:/d1/motion#
>>>>>>>
>>>>>>> This sounds a lot like xfs simply filling up the directory index slots
>>>>>>> of files that you just moved out with new files, and nfs falsely
>>>>>>> claiming that this is a problem.
>>>>>>
>>>>>> Yep. There is an existing bugzilla report for this bug at
>>>>>>
>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=38572
>>>>>>
>>>>>> I have a preliminary patch there that attempts to turn off the loop
>>>>>> detection when the directory is seen to change, however that patch still
>>>>>> appears to have a bug in it, and I haven't had time to figure out what
>>>>>> is wrong yet.
>>>>>>
>>>>>> Can you perhaps take a look, Bryan?
>>>>>
>>>>> Actually, Justin, can you test the following slight variant on the patch
>>>>> in the bugzilla?
>>>>
>>>> Doh! This one will actually compile....
>>>
>>> Hi,
>>>
>>> Should I try 3.0 first or retry 2.6.38 w/ this patch?
>>>
>>> Justin.
>>>
>>>
>>
>> I'll give 3.0 a go first.
>
> I had Bryan do some more tests, which revealed a couple more issues. The
> attached patch should fix those, and has resisted everything we've
> thrown at it so far. It should apply to 2.6.39 and newer.

This patch still looks good (after testing it a bit more today).

How does this look for printing out more information when a cookie loop is detected? Is there anything else that should be printed out? My patch applies on top of Trond's from yesterday.

- Bryan

8<-----------------------------------------------------------------------
>From 4d74863dc2bcd4e603a873b3725f0a05afd21f1f Mon Sep 17 00:00:00 2001
From: Bryan Schumaker <[email protected]>
Date: Fri, 29 Jul 2011 11:49:06 -0400
Subject: [PATCH] Additional readdir cookie loop information

Print out the name of the file that triggers the cookie loop message to
make it slightly easier to track down the cause.

Signed-off-by: Bryan Schumaker <[email protected]>
---
fs/nfs/dir.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index d23108b..b238d95 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -365,9 +365,10 @@ int nfs_readdir_search_for_cookie(struct nfs_cache_array *array, nfs_readdir_des
if (printk_ratelimit()) {
pr_notice("NFS: directory %s/%s contains a readdir loop."
"Please contact your server vendor. "
- "Offending cookie: %llu\n",
+ "The file: %s has duplicate cookie %llu\n",
desc->file->f_dentry->d_parent->d_name.name,
desc->file->f_dentry->d_name.name,
+ array->array[i].string.name,
*desc->dir_cookie);
}
status = -ELOOP;
--
1.7.6


>
> Cheers
> Trond
> 8<-----------------------------------------------------------------------
> From 75c0387540737a6663338d4ec0538bd6fb724173 Mon Sep 17 00:00:00 2001
> From: Trond Myklebust <[email protected]>
> Date: Thu, 28 Jul 2011 16:34:33 -0400
> Subject: [PATCH v3] NFS: Fix spurious readdir cookie loop messages
>
> If the directory contents change, then we have to accept that the
> file->f_pos value may shrink if we do a 'search-by-cookie'. In that
> case, we should turn off the loop detection and let the NFS client
> try to recover.
>
> The patch also fixes a second loop detection bug by ensuring
> that after turning on the ctx->duped flag, we read at least one new
> cookie into ctx->dir_cookie before attempting to match with
> ctx->dup_cookie.
>
> Reported-by: Petr Vandrovec <[email protected]>
> Cc: [email protected] [2.6.39+]
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
> fs/nfs/dir.c | 56 ++++++++++++++++++++++++++++-------------------
> include/linux/nfs_fs.h | 3 +-
> 2 files changed, 35 insertions(+), 24 deletions(-)
>
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index 57f578e..d23108b 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -134,18 +134,19 @@ const struct inode_operations nfs4_dir_inode_operations = {
>
> #endif /* CONFIG_NFS_V4 */
>
> -static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct rpc_cred *cred)
> +static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct inode *dir, struct rpc_cred *cred)
> {
> struct nfs_open_dir_context *ctx;
> ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
> if (ctx != NULL) {
> ctx->duped = 0;
> + ctx->attr_gencount = NFS_I(dir)->attr_gencount;
> ctx->dir_cookie = 0;
> ctx->dup_cookie = 0;
> ctx->cred = get_rpccred(cred);
> - } else
> - ctx = ERR_PTR(-ENOMEM);
> - return ctx;
> + return ctx;
> + }
> + return ERR_PTR(-ENOMEM);
> }
>
> static void put_nfs_open_dir_context(struct nfs_open_dir_context *ctx)
> @@ -173,7 +174,7 @@ nfs_opendir(struct inode *inode, struct file *filp)
> cred = rpc_lookup_cred();
> if (IS_ERR(cred))
> return PTR_ERR(cred);
> - ctx = alloc_nfs_open_dir_context(cred);
> + ctx = alloc_nfs_open_dir_context(inode, cred);
> if (IS_ERR(ctx)) {
> res = PTR_ERR(ctx);
> goto out;
> @@ -323,7 +324,6 @@ int nfs_readdir_search_for_pos(struct nfs_cache_array *array, nfs_readdir_descri
> {
> loff_t diff = desc->file->f_pos - desc->current_index;
> unsigned int index;
> - struct nfs_open_dir_context *ctx = desc->file->private_data;
>
> if (diff < 0)
> goto out_eof;
> @@ -336,7 +336,6 @@ int nfs_readdir_search_for_pos(struct nfs_cache_array *array, nfs_readdir_descri
> index = (unsigned int)diff;
> *desc->dir_cookie = array->array[index].cookie;
> desc->cache_entry_index = index;
> - ctx->duped = 0;
> return 0;
> out_eof:
> desc->eof = 1;
> @@ -349,14 +348,33 @@ int nfs_readdir_search_for_cookie(struct nfs_cache_array *array, nfs_readdir_des
> int i;
> loff_t new_pos;
> int status = -EAGAIN;
> - struct nfs_open_dir_context *ctx = desc->file->private_data;
>
> for (i = 0; i < array->size; i++) {
> if (array->array[i].cookie == *desc->dir_cookie) {
> + struct nfs_inode *nfsi = NFS_I(desc->file->f_path.dentry->d_inode);
> + struct nfs_open_dir_context *ctx = desc->file->private_data;
> +
> new_pos = desc->current_index + i;
> - if (new_pos < desc->file->f_pos) {
> + if (ctx->attr_gencount != nfsi->attr_gencount
> + || (nfsi->cache_validity & (NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA))) {
> + ctx->duped = 0;
> + ctx->attr_gencount = nfsi->attr_gencount;
> + } else if (new_pos < desc->file->f_pos) {
> + if (ctx->duped > 0
> + && ctx->dup_cookie == *desc->dir_cookie) {
> + if (printk_ratelimit()) {
> + pr_notice("NFS: directory %s/%s contains a readdir loop."
> + "Please contact your server vendor. "
> + "Offending cookie: %llu\n",
> + desc->file->f_dentry->d_parent->d_name.name,
> + desc->file->f_dentry->d_name.name,
> + *desc->dir_cookie);
> + }
> + status = -ELOOP;
> + goto out;
> + }
> ctx->dup_cookie = *desc->dir_cookie;
> - ctx->duped = 1;
> + ctx->duped = -1;
> }
> desc->file->f_pos = new_pos;
> desc->cache_entry_index = i;
> @@ -368,6 +386,7 @@ int nfs_readdir_search_for_cookie(struct nfs_cache_array *array, nfs_readdir_des
> if (*desc->dir_cookie == array->last_cookie)
> desc->eof = 1;
> }
> +out:
> return status;
> }
>
> @@ -740,19 +759,6 @@ int nfs_do_filldir(nfs_readdir_descriptor_t *desc, void *dirent,
> struct nfs_cache_array *array = NULL;
> struct nfs_open_dir_context *ctx = file->private_data;
>
> - if (ctx->duped != 0 && ctx->dup_cookie == *desc->dir_cookie) {
> - if (printk_ratelimit()) {
> - pr_notice("NFS: directory %s/%s contains a readdir loop. "
> - "Please contact your server vendor. "
> - "Offending cookie: %llu\n",
> - file->f_dentry->d_parent->d_name.name,
> - file->f_dentry->d_name.name,
> - *desc->dir_cookie);
> - }
> - res = -ELOOP;
> - goto out;
> - }
> -
> array = nfs_readdir_get_array(desc->page);
> if (IS_ERR(array)) {
> res = PTR_ERR(array);
> @@ -774,6 +780,8 @@ int nfs_do_filldir(nfs_readdir_descriptor_t *desc, void *dirent,
> *desc->dir_cookie = array->array[i+1].cookie;
> else
> *desc->dir_cookie = array->last_cookie;
> + if (ctx->duped != 0)
> + ctx->duped = 1;
> }
> if (array->eof_index >= 0)
> desc->eof = 1;
> @@ -805,6 +813,7 @@ int uncached_readdir(nfs_readdir_descriptor_t *desc, void *dirent,
> struct page *page = NULL;
> int status;
> struct inode *inode = desc->file->f_path.dentry->d_inode;
> + struct nfs_open_dir_context *ctx = desc->file->private_data;
>
> dfprintk(DIRCACHE, "NFS: uncached_readdir() searching for cookie %Lu\n",
> (unsigned long long)*desc->dir_cookie);
> @@ -818,6 +827,7 @@ int uncached_readdir(nfs_readdir_descriptor_t *desc, void *dirent,
> desc->page_index = 0;
> desc->last_cookie = *desc->dir_cookie;
> desc->page = page;
> + ctx->duped = 0;
>
> status = nfs_readdir_xdr_to_array(desc, page, inode);
> if (status < 0)
> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> index 8b579be..b96fb99 100644
> --- a/include/linux/nfs_fs.h
> +++ b/include/linux/nfs_fs.h
> @@ -99,9 +99,10 @@ struct nfs_open_context {
>
> struct nfs_open_dir_context {
> struct rpc_cred *cred;
> + unsigned long attr_gencount;
> __u64 dir_cookie;
> __u64 dup_cookie;
> - int duped;
> + signed char duped;
> };
>
> /*


2011-07-29 22:04:13

by Myklebust, Trond

[permalink] [raw]
Subject: Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop

On Fri, 2011-07-29 at 16:59 -0400, Justin Piszcz wrote:
> On Fri, 29 Jul 2011, Bryan Schumaker wrote:
>
> > How does this look for printing out more information when a cookie loop is detected? Is there anything else that should be printed out? My patch applies on top of Trond's from yesterday.
>
>
> Hi,
>
> This fails against 2.6.38:
>
> patching file fs/nfs/dir.c
> Hunk #1 FAILED at 134.
> Hunk #2 FAILED at 173.
> Hunk #3 FAILED at 323.
> Hunk #4 FAILED at 336.
> Hunk #5 FAILED at 349.
> Hunk #6 succeeded at 320 (offset -48 lines).
> Hunk #7 FAILED at 741.
> Hunk #8 succeeded at 716 (offset -59 lines).
> Hunk #9 succeeded at 749 (offset -59 lines).
> Hunk #10 succeeded at 763 (offset -59 lines).
> 6 out of 10 hunks FAILED -- saving rejects to file fs/nfs/dir.c.rej
> patching file include/linux/nfs_fs.h
> Hunk #1 FAILED at 99.
> 1 out of 1 hunk FAILED -- saving rejects to file include/linux/nfs_fs.h.rej
> atom:/usr/src/linux#
>
> And the 3.0 kernel is broken for my wireless adapter:
> http://www.gossamer-threads.com/lists/linux/kernel/1411576
>
> If you can make a combined patch for 2.6.38 I can try it, 2.6.39+ have a
> horrible driver (rt2800usb) and 1 person emailed me as well stating the
> same thing off-list (they stick with the manufacturer's driver or the *sta
> one).

I don't understand. The readdir loop detection code was first merged
upstream in 2.6.39. 2.6.38 doesn't report any loops...

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-07-30 09:58:14

by Justin Piszcz

[permalink] [raw]
Subject: Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop



On Fri, 29 Jul 2011, Justin Piszcz wrote:

>
>
> On Fri, 29 Jul 2011, Trond Myklebust wrote:
>
> > On Fri, 2011-07-29 at 16:59 -0400, Justin Piszcz wrote:
> >> On Fri, 29 Jul 2011, Bryan Schumaker wrote:
> >>
> >>> How does this look for printing out more information when a cookie loop is detected? Is there anything else that should be printed out? My patch applies on top of Trond's from yesterday.
> >>
> >>
> >> Hi,
> >>
> >> This fails against 2.6.38:
> >>
> >> patching file fs/nfs/dir.c
> >> Hunk #1 FAILED at 134.
> >> Hunk #2 FAILED at 173.
> >> Hunk #3 FAILED at 323.
> >> Hunk #4 FAILED at 336.
> >> Hunk #5 FAILED at 349.
> >> Hunk #6 succeeded at 320 (offset -48 lines).
> >> Hunk #7 FAILED at 741.
> >> Hunk #8 succeeded at 716 (offset -59 lines).
> >> Hunk #9 succeeded at 749 (offset -59 lines).
> >> Hunk #10 succeeded at 763 (offset -59 lines).
> >> 6 out of 10 hunks FAILED -- saving rejects to file fs/nfs/dir.c.rej
> >> patching file include/linux/nfs_fs.h
> >> Hunk #1 FAILED at 99.
> >> 1 out of 1 hunk FAILED -- saving rejects to file include/linux/nfs_fs.h.rej
> >> atom:/usr/src/linux#
> >>
> >> And the 3.0 kernel is broken for my wireless adapter:
> >> http://www.gossamer-threads.com/lists/linux/kernel/1411576
> >>
> >> If you can make a combined patch for 2.6.38 I can try it, 2.6.39+ have a
> >> horrible driver (rt2800usb) and 1 person emailed me as well stating the
> >> same thing off-list (they stick with the manufacturer's driver or the *sta
> >> one).
> >
> > I don't understand. The readdir loop detection code was first merged
> > upstream in 2.6.39. 2.6.38 doesn't report any loops...
>
> Hi,
>
> Sorry--(my error) this is meant for the client, patched & will e-mail when
> it happens again.
>
> # patch -p1 < /home/jpiszcz/patch1
> patching file fs/nfs/dir.c
> patching file include/linux/nfs_fs.h
>
> # patch -p1 < /home/jpiszcz/patch2
> patching file fs/nfs/dir.c
>
> (recompile->reboot->waiting for next error)
>
> Justin.

So I have been running Linux 2.6.37-(.. 3.0 recently) since Jan of this year on these new hosts and I have never had so much as a kernel OOPS, with these
patches, there were several kernel lockups/problems but the nfs/loop did
not show up.

I've went back to the previous (non-patched) kernel, is there a less invasive
patch?

http://home.comcast.net/~jpiszcz/20110730/kernel-error.txt

Justin.