Date: Thu, 24 Feb 2011 00:30:22 +0100
From: Jan Niehusmann <jan@gondor.com>
To: linux-kernel@vger.kernel.org
Subject: memory corruption when (un)plugging VGA cable
Message-ID: <20110223233022.GA3439@x61s.reliablesolutions.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.20 (2009-06-14)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2298
Lines: 62

On a Thinkpad x61s, I noticed some memory corruption when
plugging/unplugging the external VGA connection.

Symptoms:
---------

4 bytes at the beginning of a page get overwritten by zeroes. 
The address of the corruption varies when rebooting the machine, but
stays constant while it's running (so it's possible to repeatedly write
some data and then corrupt it again by plugging the cable).

Some example addresses I observed were:
0x998da000
0xb4e6a000
0xb4854000
0xb4843000
(locations in /dev/mem - this is physical memory, right?)

Environment:
------------

2.6.37.1, x86_64, Thinkpad x61s, Intel GM965 with integrated graphics,
6GB of RAM (this may be triggering the problem, as officially, only 4GB
are supported).

I first observed the issue in November 2010, running 2.6.36, and then
again in January, with 2.6.37. Obviously, it doesn't get triggered too
often in day-to-day use (or just isn't noticed), so I'm not sure since
when it actually happened.

How to trigger:
---------------

I first noticed the problem after suspend-to-ram. Later I noticed it's
also possible to trigger the issue by just plugging the VGA cable, so I
guess suspending/resuming only triggers it because it also dis/enables
the VGA output.

Today I spent some time to actively reproduce the issue, successfully.

To reliably detect the problem, I filled up a tmpfs mount with a big
file (3GB) with known content, which makes the corruption hit part of
that file very often. That way, I can reproduce & detect the corruption
within a few minutes (usually after 1-2 reboots).

When plugging the cable while displaying a text console, the backlight
of the internal display goes darker a few moments after actually plugging
the cable. That seems to be the moment when the corruption occurs.


I tried to find the cause by setting a dataw breakpoint with kdb. 
The breakpoint works (verified by writing to the affected file), but
doesn't trigger when the corruption occurs. Not sure what that means.


Suggestions on what I could try to find the cause are very welcome!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/