2001-04-23 01:55:37

by Manuel McLure

[permalink] [raw]
Subject: Kernel hang on multi-threaded X process crash

Well, this is what I get for being on the cutting edge... :-(

I'm now running into problems where the machine will totally hang (no
network, no SysRq, no nothin') regularly. The triggers seem to be running
aviplay or mozilla.

Symptoms will be that I am running aviplay or mozilla, and the machine will
suddenly hang and need to be hard-reset. I can trigger it 100% by telling
aviplay to zoom 2x.

I finally managed to reproduce it while I was on a console (I telneted in
from another machine, and ran aviplay on the X display that was on console
7 while the machine was displaying console 1) - the only message before the
hang was "Trying to vfree() nonexistent vm area (d0992000)" - no Oops was
shown.

Whenever this happens, the e2fsck step at reboot shows a
"Entry 'core.XXXX' in <dir> (XXXXXX) had deleted/unused inode XXXXXX.
CLEARED" message. The core file is always in whatever directory I was
running the process that seems to cause the crash. It seems like either the
core is a symptom of the underlying problem, or the process coredumping is
causing the hang.

The machine is an Athlon Thunderbird 900MHz with 256M of PC133 DRAM on an
MSI K7T Turbo R motherboard. I am running 2.4.3-ac12 currently, 2.4.3-ac11
and 2.4.3-ac5 hung the same way at least once each before I started
tracking this down. I am running Red Hat 7.1, and am using the
XFree86-4.0.3 RPMs that come with RH71 with the CVS DRI trunk installed
over it. The kernel was built with kgcc, a gcc-2.96 built kernel has the
same problem.

Any ideas?

--
Manuel A. McLure KE6TAW | ...for in Ulthar, according to an ancient
<[email protected]> | and significant law, no man may kill a cat.
<http://www.mclure.org> | -- H.P. Lovecraft


2001-04-23 02:27:32

by Manuel McLure

[permalink] [raw]
Subject: Re: Kernel hang on multi-threaded X process crash


On 2001.04.22 18:55 Manuel McLure wrote:

> The machine is an Athlon Thunderbird 900MHz with 256M of PC133 DRAM on an
> MSI K7T Turbo R motherboard. I am running 2.4.3-ac12 currently,
> 2.4.3-ac11
> and 2.4.3-ac5 hung the same way at least once each before I started
> tracking this down. I am running Red Hat 7.1, and am using the
> XFree86-4.0.3 RPMs that come with RH71 with the CVS DRI trunk installed
> over it. The kernel was built with kgcc, a gcc-2.96 built kernel has the
> same problem.

Following up on myself, I replaced the contents of /usr/X11R6 server with
the standard 4.0.3 RPMs that come with RH 7.1 and it made no difference.
Also, if it's important my video card is a Voodoo 5 5500.

--
Manuel A. McLure KE6TAW | ...for in Ulthar, according to an ancient
<[email protected]> | and significant law, no man may kill a cat.
<http://www.mclure.org> | -- H.P. Lovecraft

2001-04-23 03:17:23

by Manuel McLure

[permalink] [raw]
Subject: Re: Kernel hang on multi-threaded X process crash


On 2001.04.22 19:27 Manuel McLure wrote:
>
> On 2001.04.22 18:55 Manuel McLure wrote:
>
> > The machine is an Athlon Thunderbird 900MHz with 256M of PC133 DRAM on
> an
> > MSI K7T Turbo R motherboard. I am running 2.4.3-ac12 currently,
> > 2.4.3-ac11
> > and 2.4.3-ac5 hung the same way at least once each before I started
> > tracking this down. I am running Red Hat 7.1, and am using the
> > XFree86-4.0.3 RPMs that come with RH71 with the CVS DRI trunk installed
> > over it. The kernel was built with kgcc, a gcc-2.96 built kernel has
> the
> > same problem.
>
> Following up on myself, I replaced the contents of /usr/X11R6 server with
> the standard 4.0.3 RPMs that come with RH 7.1 and it made no difference.
> Also, if it's important my video card is a Voodoo 5 5500.

To follow up on my followup, I can now reproduce this 100% and get the
"Trying to vfree()..." message on the console. To do this I start Mozilla,
switch to a text console, and do a "killall -QUIT mozilla". A couple of
"Trying to vfree()..." messages later, it's big red switch time.

I'm going to try this with standard 2.4.3 as well as the 2.4.2 that comes
with RH71 - hopefully my filesystem will handle all the fscks :-(
--
Manuel A. McLure KE6TAW | ...for in Ulthar, according to an ancient
<[email protected]> | and significant law, no man may kill a cat.
<http://www.mclure.org> | -- H.P. Lovecraft

2001-04-23 03:24:06

by Manuel McLure

[permalink] [raw]
Subject: Re: Kernel hang on multi-threaded X process crash


On 2001.04.22 20:17 Manuel McLure wrote:
>
> On 2001.04.22 19:27 Manuel McLure wrote:
> >
> > On 2001.04.22 18:55 Manuel McLure wrote:
> >
> > > The machine is an Athlon Thunderbird 900MHz with 256M of PC133 DRAM
> on
> > an
> > > MSI K7T Turbo R motherboard. I am running 2.4.3-ac12 currently,
> > > 2.4.3-ac11
> > > and 2.4.3-ac5 hung the same way at least once each before I started
> > > tracking this down. I am running Red Hat 7.1, and am using the
> > > XFree86-4.0.3 RPMs that come with RH71 with the CVS DRI trunk
> installed
> > > over it. The kernel was built with kgcc, a gcc-2.96 built kernel has
> > the
> > > same problem.
> >
> > Following up on myself, I replaced the contents of /usr/X11R6 server
> with
> > the standard 4.0.3 RPMs that come with RH 7.1 and it made no
> difference.
> > Also, if it's important my video card is a Voodoo 5 5500.
>
> To follow up on my followup, I can now reproduce this 100% and get the
> "Trying to vfree()..." message on the console. To do this I start
> Mozilla,
> switch to a text console, and do a "killall -QUIT mozilla". A couple of
^^^^^^^

Make that "killall -QUIT mozilla-bin"...


> "Trying to vfree()..." messages later, it's big red switch time.
>
> I'm going to try this with standard 2.4.3 as well as the 2.4.2 that comes
> with RH71 - hopefully my filesystem will handle all the fscks :-(

--
Manuel A. McLure KE6TAW | ...for in Ulthar, according to an ancient
<[email protected]> | and significant law, no man may kill a cat.
<http://www.mclure.org> | -- H.P. Lovecraft

2001-04-23 04:29:18

by idalton

[permalink] [raw]
Subject: Re: Kernel hang on multi-threaded X process crash

On Sun, Apr 22, 2001 at 08:23:39PM -0700, Manuel McLure wrote:
>
> On 2001.04.22 20:17 Manuel McLure wrote:
> >
> > To follow up on my followup, I can now reproduce this 100% and get the
> > "Trying to vfree()..." message on the console. To do this I start
> > Mozilla,
> > switch to a text console, and do a "killall -QUIT mozilla". A couple of
>
> Make that "killall -QUIT mozilla-bin"...

I'll see if mine does this too. I've been having hard locks
intermittantly too, sometimes starting X, mostly with mozilla. My
system's a dual Pmmx200 with ATI rage pro and Matrox Mill 2 PCI.

-- Ferret

2001-04-23 06:00:43

by Manuel McLure

[permalink] [raw]
Subject: Re: Kernel hang on multi-threaded X process crash


On 2001.04.22 20:23 Manuel McLure wrote:
>
> On 2001.04.22 20:17 Manuel McLure wrote:
> >
> > On 2001.04.22 19:27 Manuel McLure wrote:
> > >
> > > On 2001.04.22 18:55 Manuel McLure wrote:
> > >
> > > > The machine is an Athlon Thunderbird 900MHz with 256M of PC133 DRAM
> > on
> > > an
> > > > MSI K7T Turbo R motherboard. I am running 2.4.3-ac12 currently,
> > > > 2.4.3-ac11
> > > > and 2.4.3-ac5 hung the same way at least once each before I started
> > > > tracking this down. I am running Red Hat 7.1, and am using the
> > > > XFree86-4.0.3 RPMs that come with RH71 with the CVS DRI trunk
> > installed
> > > > over it. The kernel was built with kgcc, a gcc-2.96 built kernel
> has
> > > the
> > > > same problem.
> > >
> > > Following up on myself, I replaced the contents of /usr/X11R6 server
> > with
> > > the standard 4.0.3 RPMs that come with RH 7.1 and it made no
> > difference.
> > > Also, if it's important my video card is a Voodoo 5 5500.
> >
> > To follow up on my followup, I can now reproduce this 100% and get the
> > "Trying to vfree()..." message on the console. To do this I start
> > Mozilla,
> > switch to a text console, and do a "killall -QUIT mozilla". A couple of
> ^^^^^^^
>
> Make that "killall -QUIT mozilla-bin"...
>
>
> > "Trying to vfree()..." messages later, it's big red switch time.
> >
> > I'm going to try this with standard 2.4.3 as well as the 2.4.2 that
> comes
> > with RH71 - hopefully my filesystem will handle all the fscks :-(

At Andrew Morton's suggestion, I added a BUG() to mm/vmalloc.c:vfree() and
copied down the stack trace. According to gdb on vmlinux, the symbols are:

vfree
release_segments+50
exit_mmap+17
elf_core_dump
elf_core_dump
mmput+38
do_exit+157
do_invalid_op
die+79
do_invalid_op+127
vfree+110
__call_console_drivers+59
_call_console_drivers+87
call_console_drivers+228
release_console_sem+21
error_code+52
elf_core_dump
vfree+110
release_segments+50
exit_mmap+17
elf_core_dump
elf_core_dump
mmput+38
do_coredump+556
collect_signal+168
elf_core_dump
do_signal+418
do_general_protection
force_sig_info+121
do_general_protection
force_sig+17
do_general_protection+53
error_code+52
signal_return+20

(at least one more BUG() output scrolled off the console).

I also tested this on the 2.4.2-2 kernel that comes with RH 7.1 and I did
not see the problem. Also, if with 2.4.3-ac12 I used Alt-Sysrq-s and
Alt-Sysrq-u to sync and mount my filesystems read-only before doing the
killall, the problem did not occur either. Could the core dumping code be
causing the problem?

I'll now try with the base 2.4.3 to see what happens.

--
Manuel A. McLure KE6TAW | ...for in Ulthar, according to an ancient
<[email protected]> | and significant law, no man may kill a cat.
<http://www.mclure.org> | -- H.P. Lovecraft

2001-04-23 09:22:30

by Alan

[permalink] [raw]
Subject: Re: Kernel hang on multi-threaded X process crash

> > Following up on myself, I replaced the contents of /usr/X11R6 server with
> > the standard 4.0.3 RPMs that come with RH 7.1 and it made no difference.
> > Also, if it's important my video card is a Voodoo 5 5500.
>
> To follow up on my followup, I can now reproduce this 100% and get the
> "Trying to vfree()..." message on the console. To do this I start Mozilla,
> switch to a text console, and do a "killall -QUIT mozilla". A couple of
> "Trying to vfree()..." messages later, it's big red switch time.
>
> I'm going to try this with standard 2.4.3 as well as the 2.4.2 that comes
> with RH71 - hopefully my filesystem will handle all the fscks :-(

Do you have DRI enabled and if so does disabling DRI help ?

2001-04-23 09:36:40

by Alan

[permalink] [raw]
Subject: Re: Kernel hang on multi-threaded X process crash

Strange trace but it looks like a bug in the -ac experimental multithreaded
core dump patches. I've got a couple of other reports consistent with them
being broken somewhere

Does it have to be something like mozilla (xmms also probably breaks it) that
does this. If so I suspect its specific to multithreaded apps and its a bug
in the core dump changes.

If so I guess I revert them


2001-04-23 15:26:04

by Manuel McLure

[permalink] [raw]
Subject: Re: Kernel hang on multi-threaded X process crash


On 2001.04.23 02:38 Alan Cox wrote:
> Strange trace but it looks like a bug in the -ac experimental
> multithreaded
> core dump patches. I've got a couple of other reports consistent with
> them
> being broken somewhere
>
> Does it have to be something like mozilla (xmms also probably breaks it)
> that
> does this. If so I suspect its specific to multithreaded apps and its a
> bug
> in the core dump changes.
>
> If so I guess I revert them

Both mozilla and aviplay (which are both multithreaded) trigger this - I
haven't tried with xmms. Simpler programs like xclock or cat don't trigger
it.

To answer the question in your other email, I don't have DRI enabled (since
tdfx.o won't load for me due to rwsem fixes - see other thread).

Thanks for your help.
--
Manuel A. McLure KE6TAW | ...for in Ulthar, according to an ancient
<[email protected]> | and significant law, no man may kill a cat.
<http://www.mclure.org> | -- H.P. Lovecraft

2001-04-23 15:41:23

by Alan

[permalink] [raw]
Subject: Re: Kernel hang on multi-threaded X process crash

> Both mozilla and aviplay (which are both multithreaded) trigger this - I
> haven't tried with xmms. Simpler programs like xclock or cat don't trigger
> it.

Thanks. I'll revert the core dump stuff for 2.4.4-ac unless anyone cares to
fix the fix

2001-04-24 00:20:35

by Don Dugger

[permalink] [raw]
Subject: Re: Kernel hang on multi-threaded X process crash

Alan-

I certainly care to fix it (since I wrote the patch). Since `aviplay' seems
to be the easy way to trigger it I'll look into it.

On Mon, Apr 23, 2001 at 04:40:12PM +0100, Alan Cox wrote:
> > Both mozilla and aviplay (which are both multithreaded) trigger this - I
> > haven't tried with xmms. Simpler programs like xclock or cat don't trigger
> > it.
>
> Thanks. I'll revert the core dump stuff for 2.4.4-ac unless anyone cares to
> fix the fix
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Don Dugger
"Censeo Toto nos in Kansa esse decisse." - D. Gale
[email protected]
Ph: 303/938-9838