2006-11-27 10:02:09

by Zhao Forrest

[permalink] [raw]
Subject: Which patch fix the 8G memory problem on x64 platform?

Hi Andi,

The kernel 2.6.18.3 runs very well on my x64 server with 2 CPU's and
8G memory; however kernel 2.6.16.32 kernel panic(Kernel panic - not
syncing: Attempted to kill init) under the stress test. After I use
mem=4000M for kernel 2.6.16.32, the kernel panic doesn't happen under
stress test.

This bug also happens with latest sles10 kernel(2.6.16.21-0.25-smp),
which is based on 2.6.16.21.

Do you know what patch fixed this bug between 2.6.16.32 and 2.6.18.3?
Then we could backport the patch to both 2.6.16.32 and sles10 kernel.

Thanks,
Forrest


2006-11-27 10:16:03

by Andi Kleen

[permalink] [raw]
Subject: Re: Which patch fix the 8G memory problem on x64 platform?

On Monday 27 November 2006 11:02, Zhao Forrest wrote:
> Hi Andi,
>
> The kernel 2.6.18.3 runs very well on my x64 server with 2 CPU's and
> 8G memory; however kernel 2.6.16.32 kernel panic(Kernel panic - not
> syncing: Attempted to kill init) under the stress test. After I use
> mem=4000M for kernel 2.6.16.32, the kernel panic doesn't happen under
> stress test.

I'm not aware of a "8G memory problem"

Best you write a full bug report and possibly git bisect it.

-Andi

2006-11-28 05:33:57

by Zhao Forrest

[permalink] [raw]
Subject: Re: Which patch fix the 8G memory problem on x64 platform?

On 11/27/06, Andi Kleen <[email protected]> wrote:
> On Monday 27 November 2006 11:02, Zhao Forrest wrote:
> > Hi Andi,
> >
> > The kernel 2.6.18.3 runs very well on my x64 server with 2 CPU's and
> > 8G memory; however kernel 2.6.16.32 kernel panic(Kernel panic - not
> > syncing: Attempted to kill init) under the stress test. After I use
> > mem=4000M for kernel 2.6.16.32, the kernel panic doesn't happen under
> > stress test.
>
> I'm not aware of a "8G memory problem"
>
> Best you write a full bug report and possibly git bisect it.
>

Hi Andy,

My bad. After the further testing, we found this bug is not related to
the volumn of physical memory. During the stress test, when the system
halt, there's only "Kernel panic - not
syncing: Attempted to kill init" on the screen, no stack call trace
is printed out. Also we found the content in the address pointed by
rSP is all 0xff, so don't know how to debug it.
This bug is reproduced with kernel 2.6.16.32 on both IBM and SUN MP servers.

I first need to contact the author of test case if we could send the
test case to open source. The test case is called "crashme", and the
main idea of test case is:
A signal handler is set up so that in most cases the machine exception
generated by the illegal instructions, bad operands, etc in the procedure
made up of random data are caught; and another round of randomness may
be tried. Eventually a random instruction may corrupt the program or
the machine state in such a way that the program must halt. This is
a test of the robustness of the hardware/software for instruction
fault handling.

Now we are doing git-bisect, which will take some time......

Thanks,
Forrest

2006-11-28 09:51:25

by Andi Kleen

[permalink] [raw]
Subject: Re: Which patch fix the 8G memory problem on x64 platform?


> I first need to contact the author of test case if we could send the
> test case to open source. The test case is called "crashme",

Is that the classical crashme as found in LTP or an enhanced one?
Do you run it in a special way? Is the crash reproducible?

We normally run crashme regularly as part of LTP, Cerberus etc.
so at least any obvious bugs should in theory be caught.

-Andi