Message-ID: <46F3188C.1040902@andrew.cmu.edu>
Date: Thu, 20 Sep 2007 21:04:12 -0400
From: Yucheng Low <ylow@andrew.cmu.edu>
User-Agent: Thunderbird 1.5.0.13 (X11/20070824)
MIME-Version: 1.0
To: Ray Lee <ray-lk@madrabbit.org>
CC: linux-kernel@vger.kernel.org
Subject: Re: PROBLEM: System Freeze on Particular workload with kernel 2.6.22.6
References: <46F0E19D.8000400@andrew.cmu.edu> <2c0942db0709200914p5ba04307pee519d4991f62299@mail.gmail.com>
In-Reply-To: <2c0942db0709200914p5ba04307pee519d4991f62299@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2533
Lines: 60

Hi all,

Thanks all. After lots of testing, I isolated the problem to one of the
memory modules.

Thought it might have been a kernel problem as I thought memtest should
be exhaustive enough considering I ran it for so long, but apparently not...
Even now, the bad module still does not show any errors in memtest...

Thanks,
Yucheng

Ray Lee wrote:
> On 9/19/07, Low Yucheng <ylow@andrew.cmu.edu> wrote:
>   
>> [1.] Summary
>> System Freeze on Particular workload with kernel 2.6.22.6
>>
>> [2.] Description
>> System freezes on repeated application of the following command
>> for f in *png ; do convert -quality 100 $f `basename $f png`jpg; done
>>
>> Problem is consistent and repeatable.
>> Problem persists when running on a different drive, and also in pure console (no X).
>>
>> One time, the following error logged in syslog:
>> Sep 19 04:22:11 mossnew kernel: [  301.883919] VM: killing process convert
>> Sep 19 04:22:11 mossnew kernel: [  301.884382] swap_free: Unused swap offset entry 0000ff00
>> Sep 19 04:22:11 mossnew kernel: [  301.884421] swap_free: Unused swap offset entry 00000300
>> Sep 19 04:22:11 mossnew kernel: [  301.884456] swap_free: Unused swap offset entry 00000200
>> Sep 19 04:22:11 mossnew kernel: [  301.884491] swap_free: Unused swap offset entry 0000ff00
>> Sep 19 04:22:11 mossnew kernel: [  301.884527] swap_free: Unused swap offset entry 0000ff00
>> Sep 19 04:22:11 mossnew kernel: [  301.884562] swap_free: Unused swap offset entry 00000100
>>
>> Should not be a RAM problem. RAM has survived 12 hrs of Memtest with no errors.
>> Should not be a CPU problem either. I have been running CPU intensive tasks for days.
>>     
>
> The "Unused swap offset entry" is almost always a sign of bad memory,
> if google can be trusted. Your workload is *extremely* CPU and memory
> intensive (and even hits the disk!), so this looks like bad RAM, bad
> cooling, or a marginal power supply that is failing under load.
>
> memtest86+ doesn't stress the CPU nearly as much, so it often doesn't
> show all the problems.
>
> Take your RAM down to one stick and try again (looks like you have 2G
> installed?). If that still fails, try different RAM. If that still
> fails, then swap out the power supply for another if you can, and try
> again.
>
> Ray
>
>   

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/