2003-05-14 08:15:07

by Nicolas Turro

[permalink] [raw]
Subject: Re: FW: am-utils or kernel bug ? Seems to be kernel or glibc bug...

On Wed, 2003-05-14 at 04:23, Ion Badulescu wrote:
> > > i am running Redhat 9.0 ( kernel 2.4.20 )
> > > and am-utils (am-utils-6.0.9-2) (because i need the browsing
feature
> > > that automount doen't support).
> > >
> > > Unfortunatelly, amd sometimes hangs at boot time during its
> > > initialization (/etc/rc.d/init.d/amd ).
> > > I can reproduce this bug with /etc/rc.d/init.d/amd start / stop
> > > sequences, sometimes the start hangs sometimes it works.
> > > This bug occurs on ALL RedHat 9.0 boxes we have (7 PC with totally
> > > different hardware).

...

> > > [root@redhat-serv root]# strace -p 2454
> > > futex(0x4212e1c8, FUTEX_WAIT, -2, NULL <unfinished ...>
> > >
> > >
> > > [root@redhat-serv root]# strace -p 2455
> > > select(1024, [4 5 6 7], NULL, NULL, {932, 980000} <unfinished ...>
>
> I'll be damned if I understand what the futex is used for here. But since
> that's the parent amd, presumably it's waiting for the child to complete
> something, probably a mount.
>
> As for the second trace, we need to know what the four filedescriptors are
> for. 'lsof -p 2455' should shed some light...
>
> I suspect either a bug in glibc (likely), or a bug in the way amd uses
> some Unix primitives and which just happen to work on older glibc's (less
> likely). It's going to be rather hard to debug, however, if we can't
> reproduce it locally.
>
> Another suggestion I have is this: boot into an older kernel without futex
> support (2.4.18-27.7.x should do just fine, ignore the missing
> dependencies because they are not fatal). Glibc will adjust to the older
> kernel and use other mechanisms, and we'll see if the hang still occurs.
> Basically, since futexes were back-ported by Red Hat from 2.5 kernels, I
> suspect there might be some bugs or races in there, and this test would
> help to clear it out.

You were right, Ion,
switching to a RH8 kernel ( 2.4.18-14 ) , solved the issue. I cannot
reproduce this futex bug on the father process...

Who should i contact in order to correct things ?

--
Nicolas Turro <[email protected]>
INRIA


2003-05-14 13:23:19

by Ion Badulescu

[permalink] [raw]
Subject: Re: FW: am-utils or kernel bug ? Seems to be kernel or glibc bug...

On 14 May 2003, Nicolas Turro wrote:

> You were right, Ion,
> switching to a RH8 kernel ( 2.4.18-14 ) , solved the issue. I cannot
> reproduce this futex bug on the father process...
>
> Who should i contact in order to correct things ?

bugzilla.redhat.com is a good first start.

Hmm. I just realized that I was also running my script on a plain vanilla
2.4.20 kernel, NOT rh9's own kernel, so that's probably why I couldn't
reproduce the problem. I'll try again tonight, but as you said this points
strongly towards some new RH kernel feature which is less than stable or
which modifies certain semantics in ways that occasionally break amd.

Ion

--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.