2008-06-11 07:18:38

by Tobias Oetiker

[permalink] [raw]
Subject: .Xauthority going stale (but not really)

Experts,

I have a strange problem on a diskless client. At ramdom intervals
(hours) when I try to open an xterm I get the message

> xterm
No protocol specified
xterm Xt error: Can't open display: :13.0

Doing an strace on xterm I find that it gets a 'stale nfs handle'
when trying to open .Xauthority.

If I do a

cat .Xauthority >/dev/null

things go back to normal and I can open xterms again ...

so how can it be, that xterm gets a stale nfs handle error while
cat can read the file just fine and after cat did it it works for
xterm as well ?

Our home directories are automounted via nfs and then from there
bindmounted into /home if that rings a bell ...

cheers
tobi


--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten
http://it.oetiker.ch [email protected] ++41 62 213 9902


2008-06-12 00:03:25

by NeilBrown

[permalink] [raw]
Subject: Re: .Xauthority going stale (but not really)

On Wednesday June 11, [email protected] wrote:
> Experts,
>
> I have a strange problem on a diskless client. At ramdom intervals
> (hours) when I try to open an xterm I get the message
>
> > xterm
> No protocol specified
> xterm Xt error: Can't open display: :13.0
>
> Doing an strace on xterm I find that it gets a 'stale nfs handle'
> when trying to open .Xauthority.
>
> If I do a
>
> cat .Xauthority >/dev/null
>
> things go back to normal and I can open xterms again ...
>
> so how can it be, that xterm gets a stale nfs handle error while
> cat can read the file just fine and after cat did it it works for
> xterm as well ?

xterm is setuid root. cat is not.

The filesystem is exported in a way the causes 'root' accesses to be
treated as accessed by 'nobody'. If the NFS server is Linux, then the
export option "no_root_squash" can fix this.

If the .Xauthority file is in cache on the client, xterm will be able
to read it with no problem. If not, it will send a request to the
server for root to be able to read the file, and the server will
reject the request.

It should really return EACCES rather than ESTALE though ... what is
the NFS server?

NeilBrown

Subject: RE: .Xauthority going stale (but not really)

> -----Message d'origine-----
> De : Tobi Oetiker [mailto:[email protected]]=20
> Envoy=E9 : 19 ao=FBt 2008 10:29
>=20
> Hi Vincent,
>=20
> Today Fortier,Vincent [Montreal] wrote:
>=20
> > Hi Thomas,
> >
> > I saw this post from you:
> > http://marc.info/?l=3Dlinux-nfs&m=3D121345713729459&w=3D2
> >
> > Have you found any way to solve the problem?
> >
> > Until now I have only seen that behaviour on a 64bit 2.6.24 kernel=20
> > (running debian etchnhalf). I was about to check wether it does it=
or=20
> > not in 32bit.
> >
> > Help greatly appreciated!
> >
>=20
>
> we have not yet solved the problem ... we see it with a=20
> 2.6.24 server and 2.6.24 client both 64bit ....
>=20
> since the message comes from the server, I imagine the probem=20
> is with the server ... but maybe the client strokes the=20
> server the wrong way ...
>=20
> did you figure anything out ?

It has to be client-side... I have not seen this with any previous kern=
els on the client-side (either old redhat 7.3 with 2.4.20 kernels or De=
bian Sarge with 2.6.8 to 2.6.23.17). Also my server side is unchanged =
with a RHEL 4 (2.6.9 based kernel).

This is a totally new behaviour to me wich only occured with installati=
on of 64bit 2.6.24 kernel based debian etch clients (have not yet tried=
32bit to confirm wether it is 2.6.24 OR amd64 based). Although I cann=
ot reproduce it with a 2.6.23.17 32-bit.

If the problem reside with 2.6.24 kernel (and not 64bit specific) it sh=
ould be "easy" to git bisect where it comes from (although did not catc=
h the details of git bisect just yet :)

Note: Added linux-nfs in CC.

- vin

2008-08-19 16:41:30

by Tobias Oetiker

[permalink] [raw]
Subject: Re: RE: .Xauthority going stale (but not really)

Hi Vincent,

Today Fortier,Vincent [Montreal] wrote:

> It has to be client-side... I have not seen this with any
> previous kernels on the client-side (either old redhat 7.3 with
> 2.4.20 kernels or Debian Sarge with 2.6.8 to 2.6.23.17). Also my
> server side is unchanged with a RHEL 4 (2.6.9 based kernel).
>
> This is a totally new behaviour to me wich only occured with
> installation of 64bit 2.6.24 kernel based debian etch clients
> (have not yet tried 32bit to confirm wether it is 2.6.24 OR amd64
> based). Although I cannot reproduce it with a 2.6.23.17 32-bit.
>
> If the problem reside with 2.6.24 kernel (and not 64bit specific)
> it should be "easy" to git bisect where it comes from (although
> did not catch the details of git bisect just yet :)

Well can you reproduce the problem 'on demand' ? We have not yet
found a pattern.

cheers
tobi

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch [email protected] ++41 62 775 9902 / sb: -9900

Subject: RE: RE: .Xauthority going stale (but not really)

> -----Message d'origine-----
> De : Tobias Oetiker [mailto:[email protected]]
>
> Hi Vincent,
>
> Today Fortier,Vincent [Montreal] wrote:
>
> > It has to be client-side... I have not seen this with any previous
> > kernels on the client-side (either old redhat 7.3 with 2.4.20
kernels
> > or Debian Sarge with 2.6.8 to 2.6.23.17). Also my server side is
> > unchanged with a RHEL 4 (2.6.9 based kernel).
> >
> > This is a totally new behaviour to me wich only occured with
> > installation of 64bit 2.6.24 kernel based debian etch clients (have
> > not yet tried 32bit to confirm wether it is 2.6.24 OR amd64 based).

> > Although I cannot reproduce it with a 2.6.23.17 32-bit.
> >
> > If the problem reside with 2.6.24 kernel (and not 64bit specific) it

> > should be "easy" to git bisect where it comes from
> > (although did not catch the details of git bisect just yet :)
>
> Well can you reproduce the problem 'on demand' ? We have not
> yet found a pattern.

Yes. We have a common startup script that is being used on the
client-side. This script is being called with a keyword and, based on a
configuration file, it generate on the fly an execution string to
fire-up a remote (although sometimes local) application with a bunch of
required parameters depending of the language or login user per example.
The execution string is based on multiple connexion protocols since not
all servers yet work with ssh (we still have a old crappy hpux 10.20 K
box and we never took the time to install openssh on it!). Also, since
we have a pool of workstation that can be used for different purpose
some of them, depending of that daily usage, need to recieve popups from
different servers hence a few xhost +serverABC somehow became necessary.

To conclude, using thoses scripts I can generate a Stale NFS file handle
within 1 or 2 remote application launch. Now I have found a way to get
rid of the lock by forcing a read or write on the file (usually a simple
cat $HOME/.Xauthority works). To do so a conjunction of theses commands
seems to solve the problem:
chmod 600 $HOME/.Xauthority
cat $HOME/.Xauthority
xauth list 1>/dev/null

- vin