2006-10-20 18:38:59

by Ben Greear

[permalink] [raw]
Subject: futex hang with rpm in 2.6.17.1-2174_FC5

I had a dead nfs server that was causing some programs to pause,
in particular 'yum install foo' was paused. I kill -9'd the
yum related processes.

I fixed up the nfs server and was able to un-mount the file system.
I subsequently killed many backed up updatedb and similar processes.

Now, there are no rpm processes, but if I try 'rpm [anything]' it
hangs trying to open a futex:

open("/var/lib/rpm/Packages", O_RDONLY|O_LARGEFILE) = 4
fcntl64(4, F_SETFD, FD_CLOEXEC) = 0
fstat64(4, {st_mode=S_IFREG|0644, st_size=41390080, ...}) = 0
futex(0xb7ba178c, FUTEX_WAIT, 1, NULL <unfinished ...>

Is there any way to figure out what is causing this futex-wait?

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2006-10-21 05:24:27

by Dave Jones

[permalink] [raw]
Subject: Re: futex hang with rpm in 2.6.17.1-2174_FC5

On Fri, Oct 20, 2006 at 11:38:58AM -0700, Ben Greear wrote:
> I had a dead nfs server that was causing some programs to pause,
> in particular 'yum install foo' was paused. I kill -9'd the
> yum related processes.
>
> I fixed up the nfs server and was able to un-mount the file system.
> I subsequently killed many backed up updatedb and similar processes.
>
> Now, there are no rpm processes, but if I try 'rpm [anything]' it
> hangs trying to open a futex:
>
> open("/var/lib/rpm/Packages", O_RDONLY|O_LARGEFILE) = 4
> fcntl64(4, F_SETFD, FD_CLOEXEC) = 0
> fstat64(4, {st_mode=S_IFREG|0644, st_size=41390080, ...}) = 0
> futex(0xb7ba178c, FUTEX_WAIT, 1, NULL <unfinished ...>
>
> Is there any way to figure out what is causing this futex-wait?

The dead rpm you killed left behind locks in its databases.
rm -f /var/lib/rpm/__db* and it should work again.

Dave

--
http://www.codemonkey.org.uk

2006-10-21 17:43:52

by Ben Greear

[permalink] [raw]
Subject: Re: futex hang with rpm in 2.6.17.1-2174_FC5

Dave Jones wrote:
> On Fri, Oct 20, 2006 at 11:38:58AM -0700, Ben Greear wrote:
> > I had a dead nfs server that was causing some programs to pause,
> > in particular 'yum install foo' was paused. I kill -9'd the
> > yum related processes.
> >
> > I fixed up the nfs server and was able to un-mount the file system.
> > I subsequently killed many backed up updatedb and similar processes.
> >
> > Now, there are no rpm processes, but if I try 'rpm [anything]' it
> > hangs trying to open a futex:
> >
> > open("/var/lib/rpm/Packages", O_RDONLY|O_LARGEFILE) = 4
> > fcntl64(4, F_SETFD, FD_CLOEXEC) = 0
> > fstat64(4, {st_mode=S_IFREG|0644, st_size=41390080, ...}) = 0
> > futex(0xb7ba178c, FUTEX_WAIT, 1, NULL <unfinished ...>
> >
> > Is there any way to figure out what is causing this futex-wait?
>
> The dead rpm you killed left behind locks in its databases.
> rm -f /var/lib/rpm/__db* and it should work again.
>
I'll give that a try, but shouldn't these locks clean themselves up when the
process is killed or shouldn't rpm notice the previous process is dead and
clean it up itself?

Thanks,
Ben

> Dave
>
>


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2006-10-21 18:00:10

by Dave Jones

[permalink] [raw]
Subject: Re: futex hang with rpm in 2.6.17.1-2174_FC5

On Sat, Oct 21, 2006 at 10:45:02AM -0700, Ben Greear wrote:
> Dave Jones wrote:
> > On Fri, Oct 20, 2006 at 11:38:58AM -0700, Ben Greear wrote:
> > > I had a dead nfs server that was causing some programs to pause,
> > > in particular 'yum install foo' was paused. I kill -9'd the
> > > yum related processes.
> > >
> > The dead rpm you killed left behind locks in its databases.
> > rm -f /var/lib/rpm/__db* and it should work again.
> >
> I'll give that a try, but shouldn't these locks clean themselves up when the
> process is killed

If you kill -9'd the processes, what do you expect to do
the clean up work ?

> or shouldn't rpm notice the previous process is dead and
> clean it up itself?

Sounds sensible to me and you, but in the past sensible ideas and
rpm maintainers haven't gone hand in hand.

Dave

--
http://www.codemonkey.org.uk

2006-10-21 18:07:30

by Ben Greear

[permalink] [raw]
Subject: Re: futex hang with rpm in 2.6.17.1-2174_FC5

Dave Jones wrote:
> On Sat, Oct 21, 2006 at 10:45:02AM -0700, Ben Greear wrote:
> > Dave Jones wrote:
> > > On Fri, Oct 20, 2006 at 11:38:58AM -0700, Ben Greear wrote:
> > > > I had a dead nfs server that was causing some programs to pause,
> > > > in particular 'yum install foo' was paused. I kill -9'd the
> > > > yum related processes.
> > > >
> > > The dead rpm you killed left behind locks in its databases.
> > > rm -f /var/lib/rpm/__db* and it should work again.
> > >
> > I'll give that a try, but shouldn't these locks clean themselves up when the
> > process is killed
>
> If you kill -9'd the processes, what do you expect to do
> the clean up work ?
>

Well, you can do tricks with file handles so that they are automatically
closed/deleted when
a process exits, even with kill -9. Since this lock is evidently
something in the kernel (since the kernel
call is blocking), then it seems like a similar trick could be crafted.

> > or shouldn't rpm notice the previous process is dead and
> > clean it up itself?
>
> Sounds sensible to me and you, but in the past sensible ideas and
> rpm maintainers haven't gone hand in hand.
>
Ahhh :)

Thanks,
Ben


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2006-10-21 21:55:37

by Ulrich Drepper

[permalink] [raw]
Subject: Re: futex hang with rpm in 2.6.17.1-2174_FC5

On 10/21/06, Ben Greear <[email protected]> wrote:
> Well, you can do tricks with file handles so that they are automatically
> closed/deleted when
> a process exits, even with kill -9.

Nope, that cannot work. The lock object must be visible in the
filesystem space.

The correct solution would be to use robust mutexes. But, as Dave
said, it's easier said than implemented in rpm.

2006-11-27 00:03:27

by Denys Vlasenko

[permalink] [raw]
Subject: Re: futex hang with rpm in 2.6.17.1-2174_FC5

On Saturday 21 October 2006 20:08, Ben Greear wrote:
> > > or shouldn't rpm notice the previous process is dead and
> > > clean it up itself?
> >
> > Sounds sensible to me and you, but in the past sensible ideas and
> > rpm maintainers haven't gone hand in hand.
> >
> Ahhh :)

Well said. rpm's source tarball size doubles with each release
and it contains such unexpected things as ELF manipulation
library. I have no idea what business rpm can possibly have with
parsing ELF headers.
--
vda