2013-07-31 23:05:05

by Tom Weber

[permalink] [raw]
Subject: PROBLEM: memory corruption with numa_balancing on 3.10.4, 3.10.3 (>3.8.13)

I have reproducable memory corruption with numa_balancing enabled.

On a 2 CPU (XEON E5-2630) Box with 64GB RAM running 3
memtester 18G
jobs (http://pyropus.ca/software/memtester/) and 1 KVM session (win7
64bit, 2CPUs, 3GRAM) i can trigger the problem usually within an hour.

3.10.3 (and all I've tested since 3.8.13) give me memory corruption
reports in at least one of the memtester jobs. Usually in all of them.
The KVM Session and other programs (like aptitude) randomly crash too.

3.10.4 seems to take way longer to show up the memory corruption reports
in the memtester jobs, but the KVM Session crashes rather quickly.

I haven't seen a kernel oops or anything yet, just reports like
traps: qemu:vlorenz[7878] general protection ip:7f4748890639 sp:7f472ecdcc68 error:0 in kvm[7f4748681000+37e000]
qemu:vlorenz[8085]: segfault at 3b5c8 ip 00007fa211813e56 sp 00007fa1f7c60c30 error 4 in kvm[7fa211605000+37e000]

and memtester endless lines of
...
FAILURE: 0x00000000 != 0x00000001 at offset 0x17b6c06e0.
FAILURE: 0x00000000 != 0x00000001 at offset 0x17b6c06e8.
FAILURE: 0x00000000 != 0x00000001 at offset 0x17b6c0700.
...
once it starts reporting corruption (changing patterns of course)

running the same kernels with numa_balancing=disable is stable.
running the same kernels on a single CPU Machine is also stable.

The .config for 3.10.3/4 (identical) is attached.
also dmesg after booting.

The Machine is a Supermicro X9DR3-LN4F+ Board, 2 XEON E5-2630, 64GB RAM
(32GB on each CPU), debian 7 and a self compiled Kernel.

let me know if I can provide further Info,
Tom




Attachments:
config-3.10.4 (104.27 kB)
dmesg.3.10.4 (65.86 kB)
lspci-vvv (160.27 kB)
modules (4.56 kB)
ioports (1.55 kB)
cpuinfo (22.10 kB)
iomem (4.40 kB)
Download all attachments