Message-ID: <49671B77.7080705@rs-labs.com>
Date: Fri, 09 Jan 2009 10:40:07 +0100
From: Roman Medina-Heigl Hernandez <roman@rs-labs.com>
User-Agent: Thunderbird 2.0.0.19 (Windows/20081209)
MIME-Version: 1.0
To: linux-kernel@vger.kernel.org
Subject: Oops with Gigabyte motherboard "GA-X48-DQ6" and r8168 Realtek driver
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2639
Lines: 51

Hello kernel-hackers,

I bought a PC intended to be used as a server with the recent Gigabyte
motherboard "GA-X48-DQ6":
http://www.giga-byte.es/products/mb/specs/ga_x48_dq6.html

It includes two RTL8111/8168B NICs, which seem the cause of the kernel
crashes (*but I'm not sure*)..., a Quad core, 4GB RAM and two 500GB HDs.

I installed Debian Linux (4.0) on it and I'm getting *kernel panic*
errors (r8168 module) when set to production state. I couldn't get any
kernel debug messages since I'm using Debian stock kernel
(2.6.18-6-686-bigmem) and I'm not a kernel hacker either. It happens from
time to time so it is not easily reproduceable, or at least I couldn't find
the way to reproduce it on purpose (I stressed the server forcing huge sftp
transfers and/or using iperf tool, without success). But the problem is
there and it seems a NIC driver problem.

Moreover, the machine is located at a remote location so I have to trust
other people reading screen messages for me, etc. It's not easy to debug in
this situation. I've been told that the oops were produced in r8168 module,
from time to time, and that they saw also one only crash while booting the
server "due to SATA" (???). The server uses software-raid and was
re-syncing to a second disk since the second disk was new (so expect high
disk load). I'm a bit confused, it seems that the problem could be the
r8168 driver but I cannot be sure at all.

Could you help me? Do you have reports of similar problems with similar NIC
/ motherboard? How to solve it? I'm sorry for not providing more info
(lspci, etc) but as I previously said the server is in a remote location
and it's currently powered-off. Any hints?

I'm using latest Realtek driver for RTL8111/8168B at:
ftp://202.65.194.212/cn/nic/r8168-8.010.00.tar.bz2
(I have r8169 driver *not* loaded and blacklisted in
/etc/modprobe.d/blacklist -since it didn't work for my 8111B-, and initrd
was rebuilt with new r8168 driver; so the only NIC driver loaded is r8168,
compiled from the former .tar.bz2 (8.010 version).

As a quick workaround (since I need to put that server on production state
*ASAP*), would you recommend to boot with safer options (noacpi, etc)?
Which ones exactly? (no problem if they degrade performance a bit or if
they cause less power-saving; in this case, stability and uptime is preferred).

Thank you from your comprehension and cooperation.
-Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/