On Tue, Oct 16, 2007 at 07:17:32PM +0200, Arnaud Fontaine wrote:
> Hello,
>
> We have often the following error from the kernel:
>
> sshd[1551] trap invalid opcode rip:2aeacc0677a0 rsp:7fffe0c7e688 error:0
> Eeek! page_mapcount(page) went negative! (-1)
> page pfn = 7f7a8
> page->flags = 400000000001002c
> page->count = 1
> page->mapping = ffff810056170550
> vma->vm_ops = 0xffffffff80667ba0
> vma->vm_ops->nopage = _stext+0x7fdf7000/0x20
> vma->vm_ops->fault = filemap_fault+0x0/0x450
> vma->vm_file->f_op->mmap = generic_file_mmap+0x0/0x50
> ....
>
> We have tested with different kernel (2.6.23.1 and 2.6.22) and the same
> error happens with different process. Any idea for knowing what could
> cause this error?
Many of these that I've seen have turned out to be a hardware problem.
Try running memtest86+ on that machine for a while.
It doesn't catch all problems, but it will highlight more common memory faults.
Dave
--
http://www.codemonkey.org.uk
>>>>> "Dave" == Dave Jones <[email protected]> writes:
Dave> Many of these that I've seen have turned out to be a hardware
Dave> problem. Try running memtest86+ on that machine for a while.
Dave> It doesn't catch all problems, but it will highlight more
Dave> common memory faults.
Hello,
We ran memtest86+ before production, it was about one month ago. Do you
think it could come from that anyway?
Regards,
Arnaud Fontaine
On Wed, Oct 17, 2007 at 01:03:02AM +0200, Arnaud Fontaine wrote:
> >>>>> "Dave" == Dave Jones <[email protected]> writes:
>
> Dave> Many of these that I've seen have turned out to be a hardware
> Dave> problem. Try running memtest86+ on that machine for a while.
> Dave> It doesn't catch all problems, but it will highlight more
> Dave> common memory faults.
>
> Hello,
>
> We ran memtest86+ before production, it was about one month ago. Do you
> think it could come from that anyway?
Not impossible. Hardware failures can occur at any time.
Somewhat unlikely though. As I mentioned, memtest also doesn't trap
all hardware problems. I have a board that passes memtest with flying
colours, yet dies under even slight load. Examination of the board
shows that it has leaking capacitors.
Dave
--
http://www.codemonkey.org.uk
Arnaud Fontaine <[email protected]> writes:
>>>>>> "Dave" == Dave Jones <[email protected]> writes:
>
> Dave> Many of these that I've seen have turned out to be a hardware
> Dave> problem. Try running memtest86+ on that machine for a while.
> Dave> It doesn't catch all problems, but it will highlight more
> Dave> common memory faults.
>
> Hello,
>
> We ran memtest86+ before production, it was about one month ago. Do you
> think it could come from that anyway?
I find that a lot of the time memtest does not reveal an error. Only
when you combine multiple sources or on random access do you get
errors. For example compiling a kernel while doing heavy I/O on the
disk. But that might just be me. Errors are rather random occurances.
Compiling a kernel repeadatly and multiple in parallel is usualy a
good test. If it sometimes fails to compile then it is near certain a
hardware error.
MfG
Goswin