2001-02-13 13:05:28

by Tony Gale

[permalink] [raw]
Subject: 2.4.x SMP blamed for Xfree 4.0 crashes

Having experienced a number of crashes with Xfree 4.0 with 2.4
kernels, that I wasn't getting with 2.2 kernels, a quick search on
the xfree Xpert mailing list reveals this:

--------------------------------------------------------------------
(http://www.xfree86.org/pipermail/xpert/2001-January/004666.html)

Mark Vojkovich [email protected]
Wed, 10 Jan 2001 10:49:05 -0800 (PST)

On Wed, 10 Jan 2001, Martin Schenk wrote:
> I'm using the nv.o driver
of XFree-4.0.1 with a TNT card
> on a SMP system under the new linux kernel 2.4.0.
>> Occasionally XFree segfaults and dumps me back to the command
> line.
> I assume the problem has to do with my using the new kernel,
> which provides finer grained kernel locks.
>> I attach the startup info from XFree.log and a backtrace done
> on the coredump (if someone is interested in the coredump,
> it is about 2MB bzip2'd).

This is a long-standing problem with 2.3 and 2.4 SMP
kernels. I believe it is a kernel bug and isn't the XFree86
project's problem. The problem does not exist on 2.2
SMP kernels nor on 2.3/4 UP kernels. The symptoms are
random segfaults in perfectly fine XFree86 code.
---------------------------------------------------------------------

Anyone looking into this?

-tony


---
E-Mail: Tony Gale <[email protected]>
If a nation expects to be ignorant and free,
... it expects what never was and never will be.
-- Thomas Jefferson

The views expressed above are entirely those of the writer
and do not represent the views, policy or understanding of
any other person or official body.


2001-02-13 13:17:50

by Alan

[permalink] [raw]
Subject: Re: 2.4.x SMP blamed for Xfree 4.0 crashes

> Having experienced a number of crashes with Xfree 4.0 with 2.4
> kernels, that I wasn't getting with 2.2 kernels, a quick search on
> the xfree Xpert mailing list reveals this:

Yeah I've seen this claim repeatedly. XFree 4.0.2 crashes for me in similar
ways on 3dfx and matrox cards and it happens with 2.2 kernels as well. What
makes me suspicious its XFree triggered is that there isnt really anything
XFree does that would trigger mm bugs on x86 platforms. It isnt threaded, it
doesnt make extensive threaded use of mmap. But of course it does touch
hardware directly, paticularly the AGPgart. That might be an obvious first
candidate but having looked at it I see no problems.

> Anyone looking into this?

I believe it to be Xfree or glibc problems. So I'm not. Since I can't get
XFree 4 stable on 2.2 I dont have a useful setup to study this.

Alan



2001-02-13 13:21:19

by Richard B. Johnson

[permalink] [raw]
Subject: Re: 2.4.x SMP blamed for Xfree 4.0 crashes

On Tue, 13 Feb 2001, Tony Gale wrote:

> Having experienced a number of crashes with Xfree 4.0 with 2.4
> kernels, that I wasn't getting with 2.2 kernels, a quick search on
> the xfree Xpert mailing list reveals this:
>
> --------------------------------------------------------------------
> (http://www.xfree86.org/pipermail/xpert/2001-January/004666.html)
>
> Mark Vojkovich [email protected]
> Wed, 10 Jan 2001 10:49:05 -0800 (PST)
>
> On Wed, 10 Jan 2001, Martin Schenk wrote:
> > I'm using the nv.o driver
> of XFree-4.0.1 with a TNT card
> > on a SMP system under the new linux kernel 2.4.0.
> >> Occasionally XFree segfaults and dumps me back to the command
> > line.
> > I assume the problem has to do with my using the new kernel,
> > which provides finer grained kernel locks.
> >> I attach the startup info from XFree.log and a backtrace done
> > on the coredump (if someone is interested in the coredump,
> > it is about 2MB bzip2'd).
>
> This is a long-standing problem with 2.3 and 2.4 SMP
> kernels. I believe it is a kernel bug and isn't the XFree86
> project's problem. The problem does not exist on 2.2
> SMP kernels nor on 2.3/4 UP kernels. The symptoms are
> random segfaults in perfectly fine XFree86 code.
> ---------------------------------------------------------------------
>
> Anyone looking into this?
>
> -tony

A work-around seems to be `option "noaccel"` . This may not
be either XFree86 or Linux problem, but a problem with MMX
and certain screen-cards, in kernels so compiled. I would
try another screen card if you have one.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


2001-02-13 14:16:39

by Tony Gale

[permalink] [raw]
Subject: Re: 2.4.x SMP blamed for Xfree 4.0 crashes


On 13-Feb-2001 Alan Cox wrote:
>> Having experienced a number of crashes with Xfree 4.0 with 2.4
>> kernels, that I wasn't getting with 2.2 kernels, a quick search on
>> the xfree Xpert mailing list reveals this:
>
> Yeah I've seen this claim repeatedly. XFree 4.0.2 crashes for me in
> similar
> ways on 3dfx and matrox cards and it happens with 2.2 kernels as
> well.

Mine's rock solid with 2.2 though. I have two Matrox Millennium IIs
multi-headed, on SMP - asking for trouble :-)

>
> I believe it to be Xfree or glibc problems. So I'm not. Since I
> can't get
> XFree 4 stable on 2.2 I dont have a useful setup to study this.
>

I've had a report that 2.4.2pre3 has sorted out the problem, so am
trying that. Grabs straw: maybe the VM accounting changes have helped?

-tony


---
E-Mail: Tony Gale <[email protected]>
Q: What's the difference between a dead dog in the road and a dead
lawyer in the road?
A: There are skid marks in front of the dog.

The views expressed above are entirely those of the writer
and do not represent the views, policy or understanding of
any other person or official body.

2001-02-13 14:39:11

by Zdenek Kabelac

[permalink] [raw]
Subject: Re: 2.4.x SMP blamed for Xfree 4.0 crashes

Alan Cox wrote:
> Yeah I've seen this claim repeatedly. XFree 4.0.2 crashes for me in
similar
> ways on 3dfx and matrox cards and it happens with 2.2 kernels as well. What
> makes me suspicious its XFree triggered is that there isnt really anything
> XFree does that would trigger mm bugs on x86 platforms. It isnt threaded, it
> doesnt make extensive threaded use of mmap. But of course it does touch
> hardware directly, paticularly the AGPgart. That might be an obvious first
> candidate but having looked at it I see no problems.
>
> > Anyone looking into this?
>
> I believe it to be Xfree or glibc problems. So I'm not. Since I can't get
> XFree 4 stable on 2.2 I dont have a useful setup to study this.

I'll try to repeat my problem here in the hope someone will notice this
and will help me.

The problem is with the usage of RTL & XFree86 4.0
In the old days of Xfree3.3.6 I've no problems at all. After upgrading
to XFree4.0
this problem has appeared:

RTL scheduler calls my module code and this randomly segfaults (usually
its in
spinlock and trace shows that it was looping in page_fault)
(this happen with any rtl module)

This problem appears only IF my module has been loaded AFTER mga.o drm
kernel driver.
When I'm not using accelerated XFree (without kernel module) or I'm
preloading
my module before mga.o (it even help to stop xfree, remove mga, load my
driver,
restart xfree) everything runs just fine.

As drm code is cooperating with AGP and DMA and I'm not skilled enough
to know about these
memory mapping problems I've no idea how could happen that my kernel
module code
is not present in the specified memory. As I've noticed drm contains
some code
for handling page fault exception however for RTL task its "a must" that
code has to
be present in memory.

I'm having couple of questins - is it correct when I assume than kernel
memory
including the memory used for kernel module drivers is unswapable and
will
always stay in the same physical place ?

Should I use PG_reserver or PG_locked for the pages I want that MMU
would not touch ?
Is it useful to increment counter in mem_map_t as it is done in the drm
driver ?

thanks

bye

--
There are three types of people in the world:
those who can count, and those who can't.
Zdenek Kabelac http://i.am/kabi/ [email protected] {debian.org; fi.muni.cz}

2001-02-13 14:44:31

by David Woodhouse

[permalink] [raw]
Subject: Re: 2.4.x SMP blamed for Xfree 4.0 crashes


[email protected] said:
>
> This is a long-standing problem with 2.3 and 2.4 SMP kernels. I
> believe it is a kernel bug and isn't the XFree86 project's problem.
> The problem does not exist on 2.2 SMP kernels nor on 2.3/4 UP kernels.
> The symptoms are random segfaults in perfectly fine XFree86 code.

> Anyone looking into this?

I had an XFree86 setup which was perfectly stable in RH6.2, and had been
for some months. Upon upgrading to RH7 - with glibc-2.2 and new
screensavers, it started falling over almost every night.

The crashes got less frequent when I started running X against glibc-2.1
(note _running_ against glibc-2.1, not just compiling against it and running
against glibc-2.2). But they were still happening.

In the week or so since killing xscreensaver, neither of the boxen on which
I was seeing this have fallen over. Another box on which I killed
xscreensaver but didn't run X against glibc-2.1 is still suffering, albeit
less frequently.

Looks like two separate bugs - I see no reason to expect that there's only
one such bug causing X to fall over :)

If others can try these remedies and confirm that they really do help, and
it's not just the statistics playing silly buggers with my brain, that'd
hopefully give us some lead on getting it fixed.


passion /etc/X11 $ cat X
#!/bin/sh

logger "X restarting"
free | logger
cp /var/log/XFree86.0.log /var/log/XFree86.0.log-old
sync
sync
sync

exec /usr/i386-glibc21-linux/lib/ld-linux.so.2 --library-path /usr/i386-glibc21-linux/lib /usr/X11R6/bin/XFree86-4.0.2-glibc2.1 -modulepath /usr/X11R6/lib/modules-4.0.2-glibc2.1/ "$@"


--
dwmw2


2001-02-13 15:13:42

by Tony Gale

[permalink] [raw]
Subject: Re: 2.4.x SMP blamed for Xfree 4.0 crashes


On 13-Feb-2001 David Woodhouse wrote:
> The crashes got less frequent when I started running X against
> glibc-2.1
> (note _running_ against glibc-2.1, not just compiling against it
> and running
> against glibc-2.2). But they were still happening.

I'm running RH6.2 with glibc-2.1.3-22

>
> In the week or so since killing xscreensaver, neither of the boxen
> on which
> I was seeing this have fallen over. Another box on which I killed
> xscreensaver but didn't run X against glibc-2.1 is still suffering,
> albeit
> less frequently.

Yes, a number of the xscreensaver modules cause XFree to crash - they
always have. I have previously tested them all and disabled the ones
that crash Xfree.

But, under 2.4 I have had X crash whilst I've been using it - which
didn't happen under 2.2.

>
> Looks like two separate bugs - I see no reason to expect that
> there's only
> one such bug causing X to fall over :)

It's a good premise :-)

-tony


---
E-Mail: Tony Gale <[email protected]>
I base my fashion taste on what doesn't itch.
-- Gilda Radner

The views expressed above are entirely those of the writer
and do not represent the views, policy or understanding of
any other person or official body.

2001-02-13 15:37:05

by Pete Toscano

[permalink] [raw]
Subject: Re: 2.4.x SMP blamed for Xfree 4.0 crashes

i have been running 4.0.2 on my smp system using the 2.4.1 kernel. the
one thing is, i was using the xfree out of precision insite's cvs with
the g400 binary-only hal lib dri module loaded. every-so-often,
especially when closing windows or switching virtual desktops, the
kernel would crash. luckily, i'm also running kdb on a serial console,
so i am able to check things out and keep a log. unfortunately, when
btp all the processes, i found no text.lock, which is as far as i know
how to "debug" a kernel crash.

of course, this could very well be something wrong with the binary-only
module from matrox, so i'm seeing if the same problem presents itself
with the original mga.o loaded (which also disables hardware dri).

pete

On Tue, 13 Feb 2001, Tony Gale wrote:

> Having experienced a number of crashes with Xfree 4.0 with 2.4
> kernels, that I wasn't getting with 2.2 kernels, a quick search on
> the xfree Xpert mailing list reveals this:

--
Pete Toscano [email protected] 703.948.3364
GPG fingerprint: D8F5 A087 9A4C 56BB 8F78 B29C 1FF0 1BA7 9008 2736


Attachments:
(No filename) (1.07 kB)
(No filename) (232.00 B)
Download all attachments

2001-02-13 17:02:23

by David Howells

[permalink] [raw]
Subject: Re: 2.4.x SMP blamed for Xfree 4.0 crashes

I had problems with XFree86 4.0 and 4.0.1 locking solidly up under Linux 2.4.x
after about 10mins until I upgraded to XFree86 4.0.2. Now it seems rock solid.
XFree86 3.3.x was always okay.

I've got a Dual-PII machine and an NVidia GeForce.

David

2001-02-13 19:55:51

by Rogerio Brito

[permalink] [raw]
Subject: Re: 2.4.x SMP blamed for Xfree 4.0 crashes

On Feb 13 2001, Alan Cox wrote:
> Yeah I've seen this claim repeatedly. XFree 4.0.2 crashes for me in similar
> ways on 3dfx and matrox cards and it happens with 2.2 kernels as well.

I thought that I was going crazy or that it was just my
inability to configure things correctly, but it is kind of
comforting to see that I'm not the only one seeing problems
with XFree86 4.0.2 + matrox + kernel 2.2.18 (UP system -- an
AMD Duron with chipset KT133).

When X 4.0.2 entered the Debian testing distribution, I
immediately upgraded (I had used X 4.0.1 before with very good
results, but that system had an HD crash and I reinstalled
Debian potato, that comes with X 3.3.6). I got all these
strange Segfaults and crashes with a vanilla 2.2.18 kernel. I
went back to X 3.3.6 and everything is running perfectly fine
since then, but I'd like to use the new features of X 4.


[]s, Roger...

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rogerio Brito - [email protected] - http://www.ime.usp.br/~rbrito/
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

2001-02-13 20:16:06

by Aaron Dewell

[permalink] [raw]
Subject: Re: 2.4.x SMP blamed for Xfree 4.0 crashes


BTW, same result on sparc32 smp+X. xfs segfaults, so does X itself. It
doesn't even get to trying to start. I didn't even think of trying to
go back to a UP kernel. I'll have to try that now.

Aaron

On Tue, 13 Feb 2001, Rogerio Brito wrote:
> On Feb 13 2001, Alan Cox wrote:
> > Yeah I've seen this claim repeatedly. XFree 4.0.2 crashes for me in similar
> > ways on 3dfx and matrox cards and it happens with 2.2 kernels as well.
>
> I thought that I was going crazy or that it was just my
> inability to configure things correctly, but it is kind of
> comforting to see that I'm not the only one seeing problems
> with XFree86 4.0.2 + matrox + kernel 2.2.18 (UP system -- an
> AMD Duron with chipset KT133).
>
> When X 4.0.2 entered the Debian testing distribution, I
> immediately upgraded (I had used X 4.0.1 before with very good
> results, but that system had an HD crash and I reinstalled
> Debian potato, that comes with X 3.3.6). I got all these
> strange Segfaults and crashes with a vanilla 2.2.18 kernel. I
> went back to X 3.3.6 and everything is running perfectly fine
> since then, but I'd like to use the new features of X 4.
>
>
> []s, Roger...

2001-02-13 22:09:49

by Anton Blanchard

[permalink] [raw]
Subject: Re: 2.4.x SMP blamed for Xfree 4.0 crashes


> Yes, actually it is... So I'm wrong then, it's not the same problem.

A rebuild of the binaries in question should fix it.

Anton