2006-09-12 08:49:19

by Denys Vlasenko

[permalink] [raw]
Subject: MSI K9N Neo: crash under heavy IDE read

Hello,

I bought new Athlon46 mobo with AM2 socket and recently
I noticed that copying large amounts of data reliably
crashes 2.6.17.11 64-bit on it.

memtest runs ok on this machine overnight.
Machine is not overclocked.

Copying movies from SATA drive to PATA drive oopses
after few gigabytes transferred. Creating iso image
with mkisofs (done entirely on PATA drive, no SATA attached)
does the same.

After some testing I found ou that rw load crashes
machine rather fast, while read load usually runs for several
minutes before crash. Setting udma4 or udma3 instead of udma5
doesn't help. Pity I don't have my own SATA drive to run tests
with it, ran most of the tests on PATA drive.

rw loads crashed twice on the same instruction, movl (%rdx), %eax
in fs/mpage.o, and I tracked it down to corresponding C line:

general protection fault: 0000 [1] PREEMPT
CPU 0
Modules linked in: nls_koi8_r nls_cp866 snd_pcm_oss snd_mixer_oss snd_hda_intel snd_hda_codec snd_pcm snd_timer snd soundcore
snd_page_alloc ehci_hcd usb_storage usbcore nfsd exportfs autofs4 ip_conntrack_irc ip_conntrack_ftp ip_conntrack
Pid: 3396, comm: sync Not tainted 2.6.17.11_64 #1
RIP: 0010:[<ffffffff8018ff5b>] <ffffffff8018ff5b>{__mpage_writepage+206}
RSP: 0000:ffff810029119b08 EFLAGS: 00010287
RAX: 0000000000000000 RBX: ffff8100011174d8 RCX: ffff81007f744440
RDX: ffff010004fccb50 RSI: 0000000000000001 RDI: 0000000000000000
RBP: ffff810029119c58 R08: ffff810004fccaf0 R09: ffff810029119ea8
R10: 0000000000000000 R11: ffff810001117468 R12: ffff8100011174d8
R13: 0000000000000008 R14: ffffffff801fcef1 R15: ffff81004c85e6c0
FS: 0000000000000000(0000) GS:ffffffff806b4000(0000) knlGS:00000000f7e09ba0
CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 000000000806a40f CR3: 0000000041824000 CR4: 00000000000006e0
Process sync (pid: 3396, threadinfo ffff810029118000, task ffff81007d8c4b20)
Stack: ffff810029119ea8 ffff810029119d64 ffff810029119d58 ffffffff801fce1f
0000000900000008 ffff8100458e4848 ffff8100458e4700 0000000800000035
ffff810029119bc8 ffff81007f744440
Call Trace: <ffffffff801fce1f>{fat_get_block+0} <ffffffff801fcef1>{fat_writepage+0}
<ffffffff80190c8c>{mpage_writepages+561} <ffffffff801fcef1>{fat_writepage+0}
<ffffffff801fce1f>{fat_get_block+0} <ffffffff801fceda>{fat_writepages+16}
<ffffffff8015192a>{do_writepages+40} <ffffffff8018f1b0>{__writeback_single_inode+485}
<ffffffff8018f65e>{sync_sb_inodes+473} <ffffffff8018f7f7>{sync_inodes_sb+142}
<ffffffff8018f8c5>{__sync_inodes+158} <ffffffff8018f956>{sync_inodes+25}
<ffffffff8016ee0a>{do_sync+26} <ffffffff8016ee57>{sys_sync+14}
<ffffffff8011b12e>{ia32_sysret+0}

Code: 8b 02 a8 04 74 0a 0f 0b 68 12 ae 4c 80 c2 e9 01 8b 02 a8 20
RIP <ffffffff8018ff5b>{__mpage_writepage+206} RSP <ffff810029119b08>

objdump -d:
170: 4c 89 c2 mov %r8,%rdx
==> 173: 8b 02 mov (%rdx),%eax
175: a8 04 test $0x4,%al
177: 74 0a je 183 <__mpage_writepage+0xde>
179: 0f 0b ud2a

mpage.s:
movq $0, -232(%rbp) #, boundary_block
movq $0, -224(%rbp) #, boundary_bdev
movq %r8, %rdx # head, bh
.L19:
==> movl (%rdx), %eax #* bh, D.16458
/*
Corresponding part of mpage.c:
if (page_has_buffers(page)) {
struct buffer_head *head = page_buffers(page);
struct buffer_head *bh = head;
/* If they're all mapped and dirty, do it */
page_block = 0;
do {
asm("#just before movl (%rdx), %eax");
BUG_ON(buffer_locked(bh));
if (!buffer_mapped(bh)) {
*/
testb $4, %al #, D.16458
je .L20 #,


However I'm afraid it may be not useful, because read
load tests crash in random places. Reads are done by cat >/dev/null.
A few assorted traces (written down by hand):

unable to handle kernel paging request at ffffa5007fdff6c0
RIP: free_block+140

GP at reiserfs_releasepage+103

unable to handle kernel paging request at ffff81001a7289c0
RIP: ffff81001a7289c0 (so it jumped into nirvana...)
trace: sys_read+71 ia32_sysret+0
comm: cat

NULL deref 00000011 at __generic_file_aio_read+275

Hardware info:
AMD Athlon64 socket AM2 @ 1800MHz
RAM DDRII 2 Gb
lspci:
00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a1)
00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a2)
00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a2)
00:01.2 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2)
00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1)
00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2)
00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1)
00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a2)
00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a2)
00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2)
00:06.1 Audio device: nVidia Corporation MCP55 High Definition Audio (rev a2)
00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a2)
00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a2)
00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a2)
00:0d.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a2)
00:0e.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a2)
00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a2)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
06:00.0 VGA compatible controller: ATI Technologies Inc RV370 [Sapphire X550 Silent]
06:00.1 Display controller: ATI Technologies Inc RV370 secondary [Sapphire X550 Silent]

hdparm (switched into udma3 by hand. usually it is in udma5):
/dev/hda:
Model=WDC WD2500JB-55GVC0, FwRev=08.02D08, SerialNo=WD-WCAL78337950
Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
RawCHS=16383/16/63, TrkSize=57600, SectSize=600, ECCbytes=74
BuffType=DualPortCache, BuffSize=8192kB, MaxMultSect=16, MultSect=16
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=268435455
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 *udma3 udma4 udma5
AdvancedPM=no WriteCache=enabled
Drive conforms to: device does not report version:
* signifies the current active mode

Compressed dmesg and lspci -vvvxxx attached.
--
vda


Attachments:
(No filename) (6.75 kB)
dmesg.bz2 (4.88 kB)
lspci-vvvxxx.bz2 (4.10 kB)
Download all attachments

2006-09-12 16:21:46

by Denys Vlasenko

[permalink] [raw]
Subject: Re: MSI K9N Neo: crash under heavy IDE read

On Tuesday 12 September 2006 10:46, Denis Vlasenko wrote:
> Hello,
>
> I bought new Athlon46 mobo with AM2 socket and recently
> I noticed that copying large amounts of data reliably
> crashes 2.6.17.11 64-bit on it.
>
> memtest runs ok on this machine overnight.
> Machine is not overclocked.
>
> Copying movies from SATA drive to PATA drive oopses
> after few gigabytes transferred. Creating iso image
> with mkisofs (done entirely on PATA drive, no SATA attached)
> does the same.
>
> After some testing I found ou that rw load crashes
> machine rather fast, while read load usually runs for several
> minutes before crash. Setting udma4 or udma3 instead of udma5
> doesn't help. Pity I don't have my own SATA drive to run tests
> with it, ran most of the tests on PATA drive.

I obtained PCI config space dumps under Windows XP on this machine
and compared them to Linux settings. Integrated PATA IDE controller
has some differences in rows 5x and 8x. Grep for "IDE interface".

Maybe this sheds some light.

URLs to chipset docs, anyone?...

--- lspci-vvvxxx3 2006-09-12 16:35:37.000000000 +0200
+++ lspci-vvvxxx_win3 2006-09-12 16:35:42.000000000 +0200
@@ -24,8 +24,8 @@ dev 00:01.0
40: 62 14 60 72 00 00 00 00 FA 3E FF 00 FA 3E FF 00
50: FA 3E FF 00 00 5A 62 02 00 00 00 01 00 00 FE 7F
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 F9 FF
- 70: 10 00 FF FF C1 00 00 00 00 00 05 19 00 00 00 00
- 80: 09 80 00 2D 00 11 00 00 F0 00 00 01 F0 00 00 00
+ 70: 10 00 FF FF C1 00 00 00 00 00 45 19 00 00 30 00
+ 80: 09 D0 00 D2 01 12 00 00 F0 00 00 01 F0 00 00 00
90: 00 08 00 00 00 00 00 00 21 47 95 86 EF CD AB 00
A0: 03 00 10 80 00 00 00 00 00 00 00 00 00 00 00 00
B0: 00 00 00 00 00 0A 7F 0A 00 00 00 00 00 00 00 00
@@ -74,7 +74,7 @@ dev 00:02.0
00: DE 10 6C 03 07 00 B0 00 A1 10 03 0C 00 00 80 00
10: 00 B0 AF FE 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 62 14 60 72
- 30: 00 00 00 00 44 00 00 00 00 00 00 00 0F 01 03 01
+ 30: 00 00 00 00 44 00 00 00 00 00 00 00 17 01 03 01
40: 62 14 60 72 01 00 02 FE 00 00 00 00 00 00 00 00
50: 00 00 00 00 1D 47 40 00 10 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
@@ -92,14 +92,14 @@ dev 00:02.1
00: DE 10 6D 03 06 00 B0 00 A2 20 03 0C 00 00 80 00
10: 00 AC AF FE 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 62 14 60 72
- 30: 00 00 00 00 44 00 00 00 00 00 00 00 07 02 03 01
+ 30: 00 00 00 00 44 00 00 00 00 00 00 00 15 02 03 01
40: 62 14 60 72 0A 80 98 20 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 20 20 01 00 00 60 18 85 03 3C 0A 01 00 00 00 00
- 70: 00 00 08 00 00 10 20 80 89 3D B6 22 77 25 94 00
+ 70: 00 00 08 00 00 10 20 80 89 3D B6 22 77 25 B4 00
80: 01 00 02 FE 00 00 00 00 00 00 00 00 15 16 00 00
90: 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00
- A0: 01 00 00 00 00 00 08 C0 00 00 00 00 00 00 00 00
+ A0: 01 00 00 00 00 00 00 C0 00 00 00 00 00 00 00 00
B0: 00 11 22 33 44 00 00 00 FF 03 00 00 00 00 00 00
C0: 10 10 2D 0D 00 00 00 00 00 00 00 00 00 00 00 00
D0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
@@ -112,10 +112,10 @@ dev 00:04.0 IDE interface
20: A1 FF 00 00 00 00 00 00 00 00 00 00 62 14 60 72
30: 00 00 00 00 44 00 00 00 00 00 00 00 00 00 03 01
40: 62 14 60 72 01 00 02 00 00 00 00 00 00 00 00 00
- 50: 02 F0 00 00 00 00 00 00 A8 A8 20 20 0F 00 20 20
+ 50: 02 F0 00 00 00 00 00 00 A8 A8 20 20 5F 00 20 20
60: 00 00 C0 C6 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
- 80: 00 00 00 00 00 D0 77 7C 00 00 02 30 00 00 00 00
+ 80: 00 00 00 00 00 D0 1F 22 00 00 C0 3F 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 01
A0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
B0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
@@ -128,12 +128,12 @@ dev 00:05.0 IDE interface (SATA)
00: DE 10 7F 03 07 00 B0 00 A2 85 01 01 00 00 80 00
10: 01 D8 00 00 81 D4 00 00 01 D4 00 00 81 D0 00 00
20: 01 D0 00 00 00 90 AF FE 00 00 00 00 62 14 60 72
- 30: 00 00 00 00 44 00 00 00 00 00 00 00 0A 01 03 01 3C: PCI_INTERRUPT_LINE
+ 30: 00 00 00 00 44 00 00 00 00 00 00 00 15 01 03 01
40: 62 14 60 72 01 B0 02 00 00 00 00 00 00 00 00 00
50: 0B 68 08 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 41 0C 00 00 00 0F 06 42 00 00 00 00
70: 2C 78 C4 40 01 10 00 00 01 10 00 00 20 00 20 00
- 80: 00 00 00 00 D5 FF FF FD 00 00 5F 0D FF EA F7 F2
+ 80: 00 00 00 00 D5 FF FF FD 00 00 5F 0D FF EA FF F3
90: 00 00 FF 3D 00 00 00 00 06 00 06 10 7F 03 01 01
A0: 12 0A 00 00 00 00 00 00 00 00 00 00 33 33 00 02
B0: 05 CC 84 00 00 00 00 00 00 00 00 00 00 00 00 00
@@ -146,13 +146,13 @@ dev 00:05.1 IDE interface (SATA)
00: DE 10 7F 03 07 00 B0 00 A2 85 01 01 00 00 80 00
10: 01 CC 00 00 81 C8 00 00 01 C8 00 00 81 C4 00 00
20: 01 C4 00 00 00 80 AF FE 00 00 00 00 62 14 60 72
- 30: 00 00 00 00 44 00 00 00 00 00 00 00 0B 02 03 01 3C: PCI_INTERRUPT_LINE
+ 30: 00 00 00 00 44 00 00 00 00 00 00 00 16 02 03 01
40: 62 14 60 72 01 B0 02 00 00 00 00 00 00 00 00 00
50: 0B 68 08 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 41 0C 00 00 00 0F 06 42 00 00 00 00
70: 2C 78 C4 40 01 10 00 00 01 10 00 00 20 00 20 00
- 80: 00 00 00 00 BD 59 FF FF 00 00 0B 12 1B FF FB FF
- 90: 00 00 73 3C 00 00 00 00 06 00 06 10 7F 03 01 01
+ 80: 00 00 00 00 BC 79 FF FF 00 00 0B 18 1E FF FB DF
+ 90: 00 00 53 3C 00 00 00 00 06 00 06 10 7F 03 01 01
A0: 12 0A 00 00 00 00 00 00 00 00 00 00 33 33 00 02
B0: 05 CC 84 00 00 00 00 00 00 00 00 00 00 00 00 00
C0: 00 00 00 00 00 00 00 00 0A 00 0A 00 08 00 02 A8
@@ -161,7 +161,7 @@ dev 00:05.1 IDE interface (SATA)
F0: 00 00 00 00 00 00 00 00 00 00 0C 00 00 00 00 00

dev 00:06.0
- 00: DE 10 70 03 04 00 B0 00 A2 01 04 06 00 00 81 00
+ 00: DE 10 70 03 07 00 B0 00 A2 01 04 06 00 00 81 00
10: 00 00 00 00 00 00 00 00 00 01 01 40 F0 00 80 02
20: F0 FF 00 00 F0 FF 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 B8 00 00 00 00 00 00 00 00 00 02 02
@@ -182,7 +182,7 @@ dev 00:06.1
00: DE 10 71 03 06 00 B0 00 A2 00 03 04 00 00 80 00
10: 00 40 AF FE 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 62 14 60 72
- 30: 00 00 00 00 44 00 00 00 00 00 00 00 05 02 02 05
+ 30: 00 00 00 00 44 00 00 00 00 00 00 00 17 02 02 05
40: 62 14 60 72 01 50 02 C0 00 00 00 00 01 01 0F 00
50: 05 6C 80 01 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 0F 00 00 00 08 00 02 A8
@@ -200,7 +200,7 @@ dev 00:08.0
00: DE 10 73 03 07 00 B0 00 A2 00 80 06 00 00 00 00
10: 00 30 AF FE 81 C0 00 00 00 A8 AF FE 00 A4 AF FE
20: 00 00 00 00 00 00 00 00 00 00 00 00 62 14 60 72
- 30: 00 00 00 00 44 00 00 00 00 00 00 00 0F 01 01 14
+ 30: 00 00 00 00 44 00 00 00 00 00 00 00 16 01 01 14
40: 62 14 60 72 01 70 02 FE 00 00 00 00 0C 00 00 00
50: 05 6C 86 01 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 FF 00 00 00 08 00 02 A8
@@ -215,10 +215,10 @@ dev 00:08.0
F0: 00 00 00 00 10 00 00 00 42 01 00 00 00 00 00 00

dev 00:0B.0
- 00: DE 10 74 03 04 00 10 00 A2 00 04 06 10 00 01 00
+ 00: DE 10 74 03 07 00 10 00 A2 00 04 06 10 00 01 00
10: 00 00 00 00 00 00 00 00 00 02 02 00 F1 01 00 00
20: F0 FF 00 00 F1 FF 01 00 00 00 00 00 00 00 00 00
- 30: 00 00 00 00 40 00 00 00 00 00 00 00 00 00 02 00
+ 30: 00 00 00 00 40 00 00 00 00 00 00 00 00 00 06 00
40: 0D 48 00 00 DE 10 00 00 01 50 02 F8 00 00 00 00
50: 05 60 82 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 08 80 00 A8 00 00 E0 FE 00 00 00 00 00 00 00 00
@@ -233,10 +233,10 @@ dev 00:0B.0
F0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

dev 00:0C.0
- 00: DE 10 74 03 04 00 10 00 A2 00 04 06 10 00 01 00
+ 00: DE 10 74 03 07 00 10 00 A2 00 04 06 10 00 01 00
10: 00 00 00 00 00 00 00 00 00 03 03 00 F1 01 00 00
20: F0 FF 00 00 F1 FF 01 00 00 00 00 00 00 00 00 00
- 30: 00 00 00 00 40 00 00 00 00 00 00 00 00 00 02 00
+ 30: 00 00 00 00 40 00 00 00 00 00 00 00 00 00 06 00
40: 0D 48 00 00 DE 10 00 00 01 50 02 F8 00 00 00 00
50: 05 60 82 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 08 80 00 A8 00 00 E0 FE 00 00 00 00 00 00 00 00
@@ -251,10 +251,10 @@ dev 00:0C.0
F0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

dev 00:0D.0
- 00: DE 10 78 03 04 00 10 00 A2 00 04 06 10 00 01 00
+ 00: DE 10 78 03 07 00 10 00 A2 00 04 06 10 00 01 00
10: 00 00 00 00 00 00 00 00 00 04 04 00 F1 01 00 00
20: F0 FF 00 00 F1 FF 01 00 00 00 00 00 00 00 00 00
- 30: 00 00 00 00 40 00 00 00 00 00 00 00 00 00 02 00
+ 30: 00 00 00 00 40 00 00 00 00 00 00 00 00 00 06 00
40: 0D 48 00 00 DE 10 00 00 01 50 02 F8 00 00 00 00
50: 05 60 82 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 08 80 00 A8 00 00 E0 FE 00 00 00 00 00 00 00 00
@@ -269,10 +269,10 @@ dev 00:0D.0
F0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

dev 00:0E.0
- 00: DE 10 75 03 04 00 10 00 A2 00 04 06 10 00 01 00
+ 00: DE 10 75 03 07 00 10 00 A2 00 04 06 10 00 01 00
10: 00 00 00 00 00 00 00 00 00 05 05 00 F1 01 00 00
20: F0 FF 00 00 F1 FF 01 00 00 00 00 00 00 00 00 00
- 30: 00 00 00 00 40 00 00 00 00 00 00 00 00 00 02 00
+ 30: 00 00 00 00 40 00 00 00 00 00 00 00 00 00 06 00
40: 0D 48 00 00 DE 10 00 00 01 50 02 F8 00 00 00 00
50: 05 60 82 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 08 80 00 A8 00 00 E0 FE 00 00 00 00 00 00 00 00
@@ -288,7 +288,7 @@ dev 00:0E.0

dev 00:0F.0
00: DE 10 77 03 07 00 10 00 A2 00 04 06 10 00 01 00
- 10: 00 00 00 00 00 00 00 00 00 06 06 00 E1 E1 00 00
+ 10: 00 00 00 00 00 00 00 00 00 06 06 00 E1 E1 00 20
20: B0 FE B0 FE 01 D0 F1 DF 00 00 00 00 00 00 00 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 00 00 1A 00
40: 0D 48 00 00 DE 10 00 00 01 50 02 F8 00 00 00 00
@@ -352,10 +352,10 @@ dev 00:18.2
80: 02 00 00 00 00 00 00 00 24 C2 6A 00 20 03 13 00
90: 10 08 01 00 5B 80 10 74 10 00 00 80 31 00 00 00
A0: E9 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00
- B0: 13 8B B1 10 3A 00 00 00 D9 FF 05 E0 0C FF 7F F7
- C0: 00 00 03 00 00 00 00 00 00 00 00 00 00 00 00 00
- D0: FF 00 F0 1F FF FF 01 00 01 80 3F 01 FF 07 00 00
- E0: 7F 00 F0 0F FF FF 01 00 00 C0 9F 01 FF 07 00 00
+ B0: 5B D5 3F 15 39 00 00 00 07 00 05 20 0B FF 01 00
+ C0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+ D0: FE 80 7F F0 F8 FF 01 1E 3C 88 3F FF FE 0F 00 00
+ E0: 7E 80 7F F0 F8 FF 01 1F 3E C0 9F FF FE 07 00 00
F0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

dev 00:18.3
@@ -364,26 +364,26 @@ dev 00:18.3
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 F0 00 00 00 00 00 00 00 00 00 00 00
40: FF 3B 04 00 40 00 10 02 00 00 00 00 00 00 00 00
- 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 F8 FF 00
+ 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 F8 FF 07
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 11 01 02 51 11 80 00 50 00 2A 00 08 2B 23 00 00
80: 00 00 07 23 13 21 13 21 00 00 00 00 00 00 00 00
- 90: 00 00 00 00 3F 40 00 00 F0 AB FF 7F 00 00 00 00
+ 90: 00 00 00 00 3F 40 00 00 F0 AB FF FF 00 00 00 00
A0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
B0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
C0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
D0: 00 00 00 00 01 A7 0D 00 00 00 60 00 25 25 25 00
- E0: 00 00 00 00 20 22 51 00 19 07 00 00 00 00 00 00
+ E0: 00 00 00 00 20 22 54 00 19 07 00 00 00 00 00 00
F0: 0F 00 10 00 00 04 00 00 02 00 00 00 F2 0F 04 00

dev 06:00.0
00: 02 10 63 5B 07 00 10 00 00 00 00 03 10 00 80 00
10: 08 00 00 D0 01 E0 00 00 00 00 BF FE 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 4B 17 90 14
- 30: 00 00 BC FE 50 00 00 00 00 00 00 00 0A 01 00 00
+ 30: 00 00 BC FE 50 00 00 00 00 00 00 00 10 01 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 4B 17 90 14
50: 01 58 02 06 00 00 00 00 10 80 01 00 60 02 00 00
- 60: 10 08 00 00 01 1D 00 00 00 00 01 11 00 00 00 00
+ 60: 10 09 00 00 01 1D 00 00 00 00 01 11 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 05 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00


cpuinfo:
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 79
model name : AMD Athlon(tm) 64 Processor 3000+
stepping : 2
cpu MHz : 1809.300
cache size : 512 KB
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni cx16 lahf_lm svm cr8_legacy
bogomips : 3620.20
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc

--
vda

2006-09-13 07:44:53

by Denys Vlasenko

[permalink] [raw]
Subject: MSI K9N Neo: corruption on streaming PATA/SATA i/o - testers needed

Hello lkml'ers

If you happen to have the same mobo, MSI AM2 K9N NEO-F

http://www.msi.com.tw/program/products/mainboard/mbd/pro_mbd_detail.php?UID=733

and streaming disk i/o works for you, plase send me:
* kernel version and .config
* hdparm -i output
* lspci -vvvxxx output

Thanks... now with gory details.

On Tuesday 12 September 2006 18:18, Denis Vlasenko wrote:
> > I bought new Athlon46 mobo with AM2 socket and recently
> > I noticed that copying large amounts of data reliably
> > crashes 2.6.17.11 64-bit on it.
> >
> > memtest runs ok on this machine overnight.
> > Machine is not overclocked.
> >
> > Copying movies from SATA drive to PATA drive oopses
> > after few gigabytes transferred. Creating iso image
> > with mkisofs (done entirely on PATA drive, no SATA attached)
> > does the same.
> >
> > After some testing I found ou that rw load crashes
> > machine rather fast, while read load usually runs for several
> > minutes before crash. Setting udma4 or udma3 instead of udma5
> > doesn't help. Pity I don't have my own SATA drive to run tests
> > with it, ran most of the tests on PATA drive.
>
> I obtained PCI config space dumps under Windows XP on this machine
> and compared them to Linux settings. Integrated PATA IDE controller
> has some differences in rows 5x and 8x.

As Alan pointed out, there aren't any obvious differences which
may affect IDE. They are mostly in AMD CPU Northbridge...

I bought SATA II Samsung drive yesterday. Created 32gb file on it.
Windows XP can do "copy 32g nul" - no problems.
The very same thing on Linux randomly crashes in a few minutes.

Please note that I was using Linux on this machine for a bit more than
a month and I had just one unexplained (at that time) crash.
I compiled stuff on it and copied a lot of files (kernel trees, for one),
without crashes. It was feeling like everything was working okay.

So it seems likely that memory corruption and resulting crashes
occur only when there are large amounts of streaming I/O, "short"
reads/writes (less than ten megabytes at once) are affected much less
or not affected at all.

BTW, I tried running with mem=500 and tried 32-bit kernels,
still crashes.

I also did a direct device-to-device copy using dd
(translation: no filesystem code involved) and it crashed too.

And to repeat: it happens both on PATA IDE Western Digital drive
(WDC WD2500JB-55GVC0, FwRev=08.02D08, SerialNo=WD-WCAL78337950)
and this new Samsung SATA drive...

Does this rings any bells? What is different when I do streaming i/o?
IOW: where should I start poking in the driver? Because Windows
somehow works, dammit... it must be possible to make it work...

I do not have another mobo for testing.
--
vda

2006-09-13 19:51:51

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: MSI K9N Neo: crash under heavy IDE read

Hello,

Denis Vlasenko <[email protected]> writes:

> I bought new Athlon46 mobo with AM2 socket and recently
> I noticed that copying large amounts of data reliably
> crashes 2.6.17.11 64-bit on it.
>
> memtest runs ok on this machine overnight.
> Machine is not overclocked.
>
> Copying movies from SATA drive to PATA drive oopses
> after few gigabytes transferred. Creating iso image
> with mkisofs (done entirely on PATA drive, no SATA attached)
> does the same.

I don't know about K9N Neo, but I have MSI K9N Ultra-2F (the same
MCP55) and have no such issues. But:
- I'm not using drivers/ide anymore (I was using 2.6.17.11 with
Alan's libata-PATA patch and now I have ca. 2.6.18-rc6 merged
with Jeff's pata-drivers git branch)
- I have only PATA CD-ROM (and SATA disk, Seagate ST3250823AS).
- just 1 GB of RAM.

Copied CD/DVD-ROM discs to HDD few times, no problems.

$ dd bs=$((1024*1024)) count=$((32*1024)) if=/dev/zero of=temp.tmp
32768+0 records in
32768+0 records out
34359738368 bytes (34 GB) copied, 670.17 seconds, 51.3 MB/s
$ time cat temp.tmp > /dev/null

real 10m0.175s
user 0m3.210s
sys 1m11.240s
--
Krzysztof Halasa

2006-09-14 08:33:24

by Thomas Richter

[permalink] [raw]
Subject: Re: MSI K9N Neo: crash under heavy IDE read

Hi,

> > I bought new Athlon46 mobo with AM2 socket and recently
> > I noticed that copying large amounts of data reliably
> > crashes 2.6.17.11 64-bit on it.
> >
> > memtest runs ok on this machine overnight.
> > Machine is not overclocked.
> >
> > Copying movies from SATA drive to PATA drive oopses
> > after few gigabytes transferred. Creating iso image
> > with mkisofs (done entirely on PATA drive, no SATA attached)
> > does the same.
> >
> > After some testing I found ou that rw load crashes
> > machine rather fast, while read load usually runs for several
> > minutes before crash. Setting udma4 or udma3 instead of udma5
> > doesn't help. Pity I don't have my own SATA drive to run tests
> > with it, ran most of the tests on PATA drive.
>
> I obtained PCI config space dumps under Windows XP on this machine
> and compared them to Linux settings. Integrated PATA IDE controller
> has some differences in rows 5x and 8x. Grep for "IDE interface".
>
> Maybe this sheds some light.
>
> URLs to chipset docs, anyone?...

Not really, but allow me to make another comment: MSI seems to have
massive problems with the K9N based boards. A good percentage of them
just "freaks out" from time to time and shuts the system down with no
aparent reason. This happens under Linux as well as under windows, and
is, interestingly, related to copying large amounts of data to the
PATA controller, and transfering large amounts of data over the LAN
causes the same problem. It thus would be interesting to know whether
for the affected board the same or a similar problem appears under
win32. If so, it's just the defective board, and not a linux kernel
problem.

Reference for the problem: Look into the MSI forum at http://www.msi.com,
check for the thread "Post your K9N problems here...".

So long,
Thomas


2006-09-14 09:41:35

by Denys Vlasenko

[permalink] [raw]
Subject: Re: MSI K9N Neo: crash under heavy IDE read

Thanks for everyone answered.

The problem is resolved - one DDR2 module is bad now.

One month ago it survived overnight memtest run,
but now it is definitely bad. memtest detects problems
in a few seconds.

On Wednesday 13 September 2006 21:51, Krzysztof Halasa wrote:
> > Copying movies from SATA drive to PATA drive oopses
> > after few gigabytes transferred. Creating iso image
> > with mkisofs (done entirely on PATA drive, no SATA attached)
> > does the same.
>
> I don't know about K9N Neo, but I have MSI K9N Ultra-2F (the same
> MCP55) and have no such issues. But:
> - I'm not using drivers/ide anymore (I was using 2.6.17.11 with
> Alan's libata-PATA patch and now I have ca. 2.6.18-rc6 merged
> with Jeff's pata-drivers git branch)
> - I have only PATA CD-ROM (and SATA disk, Seagate ST3250823AS).
> - just 1 GB of RAM.
>
> Copied CD/DVD-ROM discs to HDD few times, no problems.

On Thursday 14 September 2006 10:33, Thomas Richter wrote:
> Not really, but allow me to make another comment: MSI seems to have
> massive problems with the K9N based boards. A good percentage of them
> just "freaks out" from time to time and shuts the system down with no
> aparent reason.

I found info about such mobos and preliminary analysis said
to indicate that one of mobo capacitors had wrong specs.
MSI said to start replacing affected mobos.

My mobo serial number doesn't seem to fall into range of
affected ones, but entire affair doesn't sound encouraging at all.

Not to mention that BIOS is somewhat more buggy that I expected, too.
--
vda