2007-12-03 23:58:47

by Christoph Bartoschek

[permalink] [raw]
Subject: Invisible files and input/output errors

Hi,

we use a DFS-NFS-Gateway on AIX 5.1 as our NFS-Server. This works fine most of
the time but for four months we have strange problems. There are basically
two issues:

1. Using "ls" in specific directories after certain idle time gives us
an "Input/Output Error" without any results.

2. In some directories some entries are not shown. For example there is a
directory with the subdirectory ak_fcp in it. "ls" does not show this
directory. But it is possible to do "cd ak_fcp" and then normally work with
it. It is also possible to access files in the directory while being in the
parent directory: " less ak_fcp/logfile" works.

Does anybody know how to get rid of the problems? The problems occur on
several different linux versions.

Greetings
Christoph


2007-12-04 00:23:58

by NeilBrown

[permalink] [raw]
Subject: Re: Invisible files and input/output errors

On Tuesday December 4, [email protected] wrote:
> Hi,
>
> we use a DFS-NFS-Gateway on AIX 5.1 as our NFS-Server. This works fine most of
> the time but for four months we have strange problems. There are basically
> two issues:
>
> 1. Using "ls" in specific directories after certain idle time gives us
> an "Input/Output Error" without any results.
>
> 2. In some directories some entries are not shown. For example there is a
> directory with the subdirectory ak_fcp in it. "ls" does not show this
> directory. But it is possible to do "cd ak_fcp" and then normally work with
> it. It is also possible to access files in the directory while being in the
> parent directory: " less ak_fcp/logfile" works.
>
> Does anybody know how to get rid of the problems? The problems occur on
> several different linux versions.

Sounds a lot like directory cookie problems.

Can you get a network trace of the NFS packets.
Something like

tcpdump -s 0 -w /tmp/trace host NAME-OF-SERVER

then cause the failure while that is running.
Either compress and attach the result to an Email, or if it is too
big, stick it on a website somewhere.

NeilBrown

2007-12-04 10:07:00

by Christoph Bartoschek

[permalink] [raw]
Subject: Re: Invisible files and input/output errors


> Sounds a lot like directory cookie problems.
>
> Can you get a network trace of the NFS packets.
> Something like
>
> tcpdump -s 0 -w /tmp/trace host NAME-OF-SERVER
>
> then cause the failure while that is running.
> Either compress and attach the result to an Email, or if it is too
> big, stick it on a website somewhere.

I have attached a trace of a "ls" that does not show some directories.

I have to make some tests to reproduce the input/output errors.

Greetings
Christoph


Attachments:
(No filename) (489.00 B)
missing_files.trace.bz2 (7.46 kB)
Download all attachments

2007-12-05 23:31:23

by NeilBrown

[permalink] [raw]
Subject: Re: Invisible files and input/output errors

On Tuesday December 4, [email protected] wrote:
>
> > Sounds a lot like directory cookie problems.
> >
> > Can you get a network trace of the NFS packets.
> > Something like
> >
> > tcpdump -s 0 -w /tmp/trace host NAME-OF-SERVER
> >
> > then cause the failure while that is running.
> > Either compress and attach the result to an Email, or if it is too
> > big, stick it on a website somewhere.
>
> I have attached a trace of a "ls" that does not show some directories.

There is one interesting feature of this trace.

The 'cookie' assigned to each directory entry normally is 32 more than
the previous entry, though in some places the gap is 64 or another
large multiple of 32.

The whole directory is fetched in three requests - each request
sending the cookie from the last name in the previous reply.

For the last request, the cookie for the first entry returned is 992
more than that last cookie in the previous reply.
This is by far that largest gap, and that fact that it aligns with a
break between two requests seems a little suspicious.

It might be interesting to remove some of the files that appear early
in the list, e.g.

COUNT8SF manga.tcl test-run tanga.tcl flex sopec mopec thumper

and see how that affects the returned list of files.

In any case, it looks from the trace that the Linux client so doing
the right thing based on the information returned, so I would
suggested reporting this to whoever provided your NFS-DFS gateway.

NeilBrown


2007-12-06 06:22:17

by Christoph Bartoschek

[permalink] [raw]
Subject: Re: Invisible files and input/output errors


> In any case, it looks from the trace that the Linux client so doing
> the right thing based on the information returned, so I would
> suggested reporting this to whoever provided your NFS-DFS gateway.

Thanks, I am going to contact the server guys.

Christoph