2006-09-11 19:22:51

by Marc Perkel

[permalink] [raw]
Subject: Segfault Error - what does this mean?

Just put a new server online trying out the new AMD AM2 processor. I compiled a custom kernel because the regular Fedora Core kernels aren't yet compatible with my Asus M2NPV-VM motherboard using the nVidia chipset.

I compiled 2.6.18rc6 and got segfault errors. So I tried the 2.6.17.13 kernel and same thing. About every 20 minutes or so I get one of these.

Sep 11 12:05:18 pascal kernel: exim[19840]: segfault at 0000000000000000 rip 0000003f53e73ee0 rsp 00007fff9e561d18 error 4

At one point the server locked up. Before I put it online I did several days of memory testing with no errors. I believe the Exim code is solid as it has been running flawlessly on all my other servers.

It's probably hardware or some incompatibility but I'm not sure where to start looking. Trying to understand what this error means. What is Error 4? How do I track this down?

Thanks in advance.




2006-09-11 19:55:32

by Jesper Juhl

[permalink] [raw]
Subject: Re: Segfault Error - what does this mean?

On 11/09/06, Marc Perkel <[email protected]> wrote:
> Just put a new server online trying out the new AMD AM2 processor. I compiled a custom kernel because the regular Fedora Core kernels aren't yet compatible with my Asus M2NPV-VM motherboard using the nVidia chipset.
>
> I compiled 2.6.18rc6 and got segfault errors. So I tried the 2.6.17.13 kernel and same thing. About every 20 minutes or so I get one of these.
>
> Sep 11 12:05:18 pascal kernel: exim[19840]: segfault at 0000000000000000 rip 0000003f53e73ee0 rsp 00007fff9e561d18 error 4
>
> At one point the server locked up. Before I put it online I did several days of memory testing with no errors. I believe the Exim code is solid as it has been running flawlessly on all my other servers.
>
> It's probably hardware or some incompatibility but I'm not sure where to start looking. Trying to understand what this error means. What is Error 4? How do I track this down?
>

Hi, I don't have a solution to your problem unfortunately, just want
to give you a bit of advice.

1) Try searching the archives and google a bit for info before posting
(and when you do post, make it clear that you've done so). As it turns
out this problem has surfaced before and there are several
interresting threads on the subject - here are a few I found (there
are others) :
http://lkml.org/lkml/2005/9/8/4
http://linux.derkeiler.com/Mailing-Lists/Kernel/2005-06/3234.html
http://www.mhonarc.org/archive/html/procmail/2006-06/msg00140.html

2) If you find previous posts related to your problem you'll often see
requests for specific info in them (like Andi asking "... catch a
crash in gdb and type x/i $pc what do you see?" in
http://lkml.org/lkml/2005/9/8/17 ). Try to provide such info in your
initial post - it'll greatly enhance your chances of getting useful
replies (and/or getting the bug fixed).

3) Please read the REPORTING-BUGS document in the root of the kernel
source dir and provide the info it outlines - usually saves a bunch of
ping-pong mails back and forth asking for more info.

4) Try to add people who may have an insight into the problem to Cc: -
looking at previous threads on the subject and finding the people who
have previously shown an interrest is usually a good starting point.
The right people may miss your report if you only send to LKML.


--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2006-09-11 21:04:43

by Phillip Susi

[permalink] [raw]
Subject: Re: Segfault Error - what does this mean?

A segfault means the program tried to touch a part of memory in a way
that it is not allowed. In this case, the address was 0, which would
indicate that exim attempted to dereference a null pointer. It would
appear this is a bug in exim, not the kernel.

Marc Perkel wrote:
> Just put a new server online trying out the new AMD AM2 processor. I
> compiled a custom kernel because the regular Fedora Core kernels aren't
> yet compatible with my Asus M2NPV-VM motherboard using the nVidia chipset.
>
> I compiled 2.6.18rc6 and got segfault errors. So I tried the 2.6.17.13
> kernel and same thing. About every 20 minutes or so I get one of these.
>
> Sep 11 12:05:18 pascal kernel: exim[19840]: segfault at 0000000000000000
> rip 0000003f53e73ee0 rsp 00007fff9e561d18 error 4
>
> At one point the server locked up. Before I put it online I did several
> days of memory testing with no errors. I believe the Exim code is solid
> as it has been running flawlessly on all my other servers.
>
> It's probably hardware or some incompatibility but I'm not sure where to
> start looking. Trying to understand what this error means. What is Error
> 4? How do I track this down?
>
> Thanks in advance.
>