LinuxLists.cc - Memory issues with Opteron 6220

2012-02-08 14:37:50

Subject: Memory issues with Opteron 6220

Hey,

We're seeing unexpected slowdowns and other memory issues with a new system.
Enough to render it unusable. For example:

Error: open3: fork failed: Cannot allocate memory

at times where there's no real memory pressure:
total used free shared buffers cached
Mem: 132270720 131942388 328332 0 299768 103334420
-/+ buffers/cache: 28308200 103962520
Swap: 7811068 13760 7797308

The simplest test we've been able to trigger the slowdowns with, is executing
'dpkg -l perl'. On our other systems, this takes a fraction of a second, at
least with a hot cache. Here it takes somewhere between two and four seconds
even when there's no load on the machine. Several other things, including our own
software is similarly slowed down by an order of magnitude or more.

The system is a Dell Poweredge R715, with two eight-core Opteron 6220
processors and 128G of memory. We have several similar systems, such as the one
this should replace: R715, 2x8 core Opteron 6140, 128G memory, and they do not
exhibit any similar symptoms.

We have tried with 2.6.37, 2.6.38, 3.2.5 and 3.3-rc1 with no luck. The
microcode updates from AMD have not helped either.

stracing dpkg -l perl yields
$ time strace -cf dpkg -l perl >/dev/null
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
95.91 0.017821 1782 10 munmap
3.40 0.000632 1 1181 read
0.35 0.000065 1 77 37 open
[..]
0.00 0.000000 0 2 arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00 0.018580 2197 49 total

real 0m4.005s
user 0m3.250s
sys 0m0.720s

It might just be a red herring though, since it doesn't account for the real
time anyway. On a functioning system the output looks like:
$ time strace -cf dpkg -l perl >/dev/null
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.000123 1 117 read
0.00 0.000000 0 160 write
[..]
0.00 0.000000 0 2 arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00 0.000123 588 47 total

real 0m0.276s
user 0m0.160s
sys 0m0.090s

The two most obvious differences between a system that works and one that does
not, is the newer CPU and newer memory. The older machines have Samsung
M393B1K70CHD-YH9 chips (8G DDR3 1333MHz ECC REG) and new one has Samsung
M393B2G70BH0-CK0 chips (16G DDR3 1600MHz ECC REG)

/proc/cpuinfo:
processor : 15
vendor_id : AuthenticAMD
cpu family : 21
model : 1
model name : AMD Opteron(TM) Processor 6220
stepping : 2
microcode : 0x6000613
cpu MHz : 3000.048
cache size : 2048 KB
physical id : 1
siblings : 8
core id : 3
cpu cores : 4
apicid : 39
initial apicid : 39
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm
constant_tsc rep_good nopl nonstop_tsc extd_apicid amd_dcm aperfmperf pni
pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm
cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
xop skinit wdt lwp fma4 nodeid_msr topoext perfctr_core arat cpb npt lbrv
svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter
pfthreshold
bogomips : 6000.40
TLB size : 1536 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate [9]

DMI info:
Memory Device
Array Handle: 0x1000
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 16384 MB
Form Factor: DIMM
Set: 6
Locator: DIMM_B4
Bank Locator: Not Specified
Type: <OUT OF SPEC>
Type Detail: Synchronous
Speed: 1600 MHz (0.6 ns)
Manufacturer: 80CE80B380CE
Part Number: M393B2G70BH0-CK0

If it all seems a bit vague, it's because we're at wits end with how to debug
this issue. Consistent slowdowns and occasional failure to allocate memory for
no apparent reason is what we're seeing. Any help or suggestions is very
welcome.

dmesg is available at http://dev.exherbo.org/~arkanoid/atlas-dmesg-3.2.5.txt
--
Anders Ossowicki

2012-02-09 08:33:37

by Ingo Molnar

[permalink] [raw]

Subject: Re: Memory issues with Opteron 6220

* Anders Ossowicki <[email protected]> wrote:

> Hey,
>
> We're seeing unexpected slowdowns and other memory issues with a new system.
> Enough to render it unusable. For example:
>
> Error: open3: fork failed: Cannot allocate memory
>
> at times where there's no real memory pressure:
> total used free shared buffers cached
> Mem: 132270720 131942388 328332 0 299768 103334420
> -/+ buffers/cache: 28308200 103962520
> Swap: 7811068 13760 7797308
>
> [...]

> The system is a Dell Poweredge R715, with two eight-core
> Opteron 6220 processors and 128G of memory. We have several
> similar systems, such as the one this should replace: R715,
> 2x8 core Opteron 6140, 128G memory, and they do not exhibit
> any similar symptoms.

130 MB of RAM visible to Linux isn't the expected bootup default
indeed. Around 130 *GB* would be expected ...

> We have tried with 2.6.37, 2.6.38, 3.2.5 and 3.3-rc1 with no luck. The
> microcode updates from AMD have not helped either.

Nasty.

No smoking gun in the dmesg:

> dmesg is available at http://dev.exherbo.org/~arkanoid/atlas-dmesg-3.2.5.txt

[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
[ 0.000000] BIOS-e820: 0000000000100000 - 00000000df679000 (usable)
[ 0.000000] BIOS-e820: 00000000df679000 - 00000000df68f000 (reserved)
[ 0.000000] BIOS-e820: 00000000df68f000 - 00000000df6ce000 (ACPI data)
[ 0.000000] BIOS-e820: 00000000df6ce000 - 00000000e0000000 (reserved)
[ 0.000000] BIOS-e820: 00000000f0000000 - 00000000f4000000 (reserved)
[ 0.000000] BIOS-e820: 00000000fe000000 - 00000000fec90000 (reserved)
[ 0.000000] BIOS-e820: 00000000fec94000 - 00000000fecd0000 (reserved)
[ 0.000000] BIOS-e820: 00000000fecd4000 - 0000000100000000 (reserved)
[ 0.000000] BIOS-e820: 0000000100000000 - 000000201f000000 (usable)

that 0x201f000000 is slightly above 128 GB.

The lowlevel x86 RAM init code seems to be fine:

[ 0.000000] last_pfn = 0x201f000 max_arch_pfn = 0x400000000

that 0x201f000 correctly points to slighly above 128 GB
physical.

[ 0.000000] init_memory_mapping: 0000000100000000-000000201f000000

that too shows that the lowlevel x86 platform memory init code
still sees 128 GB.

it's spread out amongst 4 nodes, 32 GB each:

[ 0.000000] Initmem setup node 0 0000000000000000-0000000820000000
[ 0.000000] NODE_DATA [000000081fffb000 - 000000081fffffff]
[ 0.000000] Initmem setup node 1 0000000820000000-0000001020000000
[ 0.000000] NODE_DATA [000000101fffb000 - 000000101fffffff]
[ 0.000000] Initmem setup node 2 0000001020000000-0000001820000000
[ 0.000000] NODE_DATA [000000181fffb000 - 000000181fffffff]
[ 0.000000] Initmem setup node 3 0000001820000000-000000201f000000
[ 0.000000] NODE_DATA [000000201effa000 - 000000201effefff]

the NORMAL zone gets set up properly:

[ 0.000000] Normal 0x00100000 -> 0x0201f000

and each node zone got 32 GB of RAM:

[ 0.000000] Normal zone: 7354368 pages, LIFO batch:31
[ 0.000000] Normal zone: 8257536 pages, LIFO batch:31
[ 0.000000] Normal zone: 8257536 pages, LIFO batch:31
[ 0.000000] Normal zone: 8253504 pages, LIFO batch:31

and it's all visible in the end to the MM:

[ 0.000000] Built 4 zonelists in Zone order, mobility grouping on. Total pages: 33021506

that's still 125 GB. (cgroup_page appears to pick up 1GB of RAM
btw.)

So where is the rest of RAM gone? How does /proc/meminfo look
like?

Thanks,

Ingo

2012-02-09 09:11:54

by Eric Dumazet

[permalink] [raw]

Subject: Re: Memory issues with Opteron 6220

Le jeudi 09 février 2012 à 09:33 +0100, Ingo Molnar a écrit :
> * Anders Ossowicki <[email protected]> wrote:
>
> > Hey,
> >
> > We're seeing unexpected slowdowns and other memory issues with a new system.
> > Enough to render it unusable. For example:
> >
> > Error: open3: fork failed: Cannot allocate memory
> >
> > at times where there's no real memory pressure:
> > total used free shared buffers cached
> > Mem: 132270720 131942388 328332 0 299768 103334420
> > -/+ buffers/cache: 28308200 103962520
> > Swap: 7811068 13760 7797308
> >
> > [...]
>
> > The system is a Dell Poweredge R715, with two eight-core
> > Opteron 6220 processors and 128G of memory. We have several
> > similar systems, such as the one this should replace: R715,
> > 2x8 core Opteron 6140, 128G memory, and they do not exhibit
> > any similar symptoms.
>
> 130 MB of RAM visible to Linux isn't the expected bootup default
> indeed. Around 130 *GB* would be expected ...

Not sure what you mean, I see 128GB in the "free" output, as expected.

I dont understand why there are 4 nodes, given "The system is a Dell
Poweredge R715, with two eight-core Opteron 6220".

Or are each 6220 splitted on two nodes ?

2012-02-09 12:44:08

by Anders Ossowicki

[permalink] [raw]

Subject: Re: Memory issues with Opteron 6220

On Wed, Feb 08, 2012 at 09:56:28PM +0100, Andreas Herrmann wrote:
> I assume you have the latest BIOS on your system?
Yep, 2.3.0 is the newest available on Dell's website for this machine.

> After glancing through attached dmesg I wonder whether you have "Cool
> and quiet" disabled in BIOS, see
>
> [ 8.936505] [Firmware Bug]: powernow-k8: No compatible ACPI _PSS objects found.
> [ 8.936514] [Firmware Bug]: powernow-k8: Try again with latest BIOS.
>
> Is this on purpose?
I went digging through the power management options of the bios and found that
CPU performance was set to System DBPM[1] by default. After switching it to OS
DBPM, powernow-k8 seemed a lot happier:

[ 5.272938] powernow-k8: Found 4 AMD Opteron(TM) Processor 6220
(16 cpu cores) (version 2.20.00)
[ 5.273111] powernow-k8: Core Performance Boosting: on.
[ 5.273256] powernow-k8: 0 : pstate 0 (3000 MHz)
[..]
[ 5.274601] powernow-k8: 4 : pstate 4 (1400 MHz)

full dmesg at http://dev.exherbo.org/~arkanoid/atlas-dmesg-3.2.5-20120209.txt

>From cursory investigation, it appears we've gotten the expected performance
back, when all CPUs are running at max frequency. So far so good.

I am curious though... a few observations:
1) With System DBPM, /proc/cpuinfo said 3GHz, the performance of the machine
was crappy.
2) With OS DBPM, /proc/cpuinfo said 1.4GHz, the performance of the machine was
equally crappy, as expected.
3) With OS DBPM, and the performance cpufreq governor, /proc/cpuinfo said 3GHz,
the performance of the machine was good. Again as expected.

The conclusion I draw from this is that something (the BIOS?) is lying to the
OS. Bad Dell!

The manual is sparse on explanations of this System DBPM. It basically says that
it is a Dell proprietary implementation in BIOS, that provides improved
performance/watt over the OS implementation of AMD PowerNow!.

I apologise if that made you spit out a mouthful of coffee but that really is
what it says. It doesn't seem to be doing its job very well.

This leaves the issue of randomly failing memory allocations. I can't see why
that would be related to the power management woes, but I am by no means an
expert. I'll see if we can still trigger the problem, but if someone can see a
causal link, please enlighten me.

> To rule out memory from being the culprit ...
> Have you tested the newer CPU system with the old memory?
Nope.

> Have you observed any MCEs (e.g. DRAM ECC errors) on the failing system)?
> EDAC should report them in dmesg if this is the case.
Nothing in dmesg or the iDRAC's service event log (where ECC errors usually get
logged as well).

[1] Demand-based power management, apparently.

--
Anders Ossowicki

2012-02-09 13:24:25

by Ingo Molnar

[permalink] [raw]

Subject: Re: Memory issues with Opteron 6220

* Eric Dumazet <[email protected]> wrote:

> Le jeudi 09 f?vrier 2012 ? 09:33 +0100, Ingo Molnar a ?crit :
> > * Anders Ossowicki <[email protected]> wrote:
> >
> > > Hey,
> > >
> > > We're seeing unexpected slowdowns and other memory issues with a new system.
> > > Enough to render it unusable. For example:
> > >
> > > Error: open3: fork failed: Cannot allocate memory
> > >
> > > at times where there's no real memory pressure:
> > > total used free shared buffers cached
> > > Mem: 132270720 131942388 328332 0 299768 103334420
> > > -/+ buffers/cache: 28308200 103962520
> > > Swap: 7811068 13760 7797308
> > >
> > > [...]
> >
> > > The system is a Dell Poweredge R715, with two eight-core
> > > Opteron 6220 processors and 128G of memory. We have several
> > > similar systems, such as the one this should replace: R715,
> > > 2x8 core Opteron 6140, 128G memory, and they do not exhibit
> > > any similar symptoms.
> >
> > 130 MB of RAM visible to Linux isn't the expected bootup default
> > indeed. Around 130 *GB* would be expected ...
>
> Not sure what you mean, I see 128GB in the "free" output, as
> expected.

Erm, yes. I plead temporary blindness!

So all RAM is visible properly. This error:

> > > Error: open3: fork failed: Cannot allocate memory

suggests allocation failure. How is that possible with so much
RAM?

Thanks,

Ingo

2012-02-09 13:28:53

by Ingo Molnar

[permalink] [raw]

Subject: Re: Memory issues with Opteron 6220

* Anders Ossowicki <[email protected]> wrote:

> I went digging through the power management options of the
> bios and found that CPU performance was set to System DBPM[1]
> by default. After switching it to OS DBPM, powernow-k8 seemed
> a lot happier:

Your bootlog says:

[ 0.330000] Performance Events: Broken BIOS detected, complain to your hardware vendor.
[ 0.330000] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 1430076)

Do you get that message if DBPM is enabled?

If the message disappeared then I'd suggest to do what that
kernel message suggests and ask the vendor to disable that BIOS
option by default, it breaks stuff.

Thanks,

Ingo

2012-02-09 13:49:30

by Anders Ossowicki

[permalink] [raw]

Subject: Re: Memory issues with Opteron 6220

On Thu, Feb 09, 2012 at 02:28:25PM +0100, Ingo Molnar wrote:
> Your bootlog says:
>
> [ 0.330000] Performance Events: Broken BIOS detected, complain to your hardware vendor.
> [ 0.330000] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 1430076)
>
> Do you get that message if DBPM is enabled?

It's there with System DBPM, OS DBPM and with power management disabled (i.e.
set to maximum performance).
--
Anders Ossowicki

2012-02-09 16:11:08

by Yinghai Lu

[permalink] [raw]

Subject: Re: Memory issues with Opteron 6220

On Thu, Feb 9, 2012 at 5:49 AM, Anders Ossowicki <[email protected]> wrote:
> On Thu, Feb 09, 2012 at 02:28:25PM +0100, Ingo Molnar wrote:
>> Your bootlog says:
>>
>> [ ? ?0.330000] Performance Events: Broken BIOS detected, complain to your hardware vendor.
>> [ ? ?0.330000] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 1430076)
>>
>> Do you get that message if DBPM is enabled?
>
> It's there with System DBPM, OS DBPM and with power management disabled (i.e.
> set to maximum performance).

mtrr setting has some problem too.

[ 3.098277] mtrr: your CPUs had inconsistent fixed MTRR settings
[ 3.100001] mtrr: probably your BIOS does not setup all CPUs.
[ 3.110000] mtrr: corrected configuration.

can you boot with "debug ignore_loglevel show_msr=16" ?

Yinghai

2012-02-09 17:51:23

by Anders Ossowicki

[permalink] [raw]

Subject: Re: Memory issues with Opteron 6220

On Thu, Feb 09, 2012 at 05:11:04PM +0100, Yinghai Lu wrote:
> mtrr setting has some problem too.
>
> [ 3.098277] mtrr: your CPUs had inconsistent fixed MTRR settings
> [ 3.100001] mtrr: probably your BIOS does not setup all CPUs.
> [ 3.110000] mtrr: corrected configuration.
>
> can you boot with "debug ignore_loglevel show_msr=16" ?

Yep, right here:
http://dev.exherbo.org/~arkanoid/atlas-dmesg-3.2.5-20120209-mtrr.txt

--
Anders

2012-02-09 18:56:46

by Yinghai Lu

[permalink] [raw]

Subject: Re: Memory issues with Opteron 6220

On Thu, Feb 9, 2012 at 9:51 AM, Anders Ossowicki <[email protected]> wrote:
> On Thu, Feb 09, 2012 at 05:11:04PM +0100, Yinghai Lu wrote:
>> mtrr setting has some problem too.
>>
>> [ ? ?3.098277] mtrr: your CPUs had inconsistent fixed MTRR settings
>> [ 3.100001] mtrr: probably your BIOS does not setup all CPUs.
>> [ ? ?3.110000] mtrr: corrected configuration.
>>
>> can you boot with "debug ignore_loglevel show_msr=16" ?
>
> Yep, right here:
> http://dev.exherbo.org/~arkanoid/atlas-dmesg-3.2.5-20120209-mtrr.txt
>

Too bad, print_cpu_info() calling for AP get removed by some commit.

now we can not print initial AP register anymore.

Yinghai

2012-02-09 21:15:38

by Jesper Krogh

[permalink] [raw]

Subject: Re: Memory issues with Opteron 6220

On 2012-02-09 09:33, Ingo Molnar wrote:
> * Anders Ossowicki<[email protected]> wrote:
>> Hey,
>>
>> We're seeing unexpected slowdowns and other memory issues with a new system.
>> Enough to render it unusable. For example:
>>
>> Error: open3: fork failed: Cannot allocate memory
>>
>> at times where there's no real memory pressure:
>> total used free shared buffers cached
>> Mem: 132270720 131942388 328332 0 299768 103334420
>> -/+ buffers/cache: 28308200 103962520
>> Swap: 7811068 13760 7797308
>>
>> [...]
Anders' co-worker here.. below C-code (Summary: for -t processes that
repeatedly
allocates and dallocates 2GB of memory) can excersize the bug
pretty frequently using -t 32 on this machine. On the other 128GB
machine it can run without issues.

It actually ended up toasting the machine:
jk@nysvin:~$ ./foo -t 32
-bash: fork: Cannot allocate memory
jk@nysvin:~$ w
-bash: fork: Cannot allocate memory
jk@nysvin:~$ top
-bash: fork: Cannot allocate memory
jk@nysvin:~$ ls
-bash: fork: Cannot allocate memory

I dont know what to conclude.

jk@nysvin:~$ ./foo -t 32
Upper bound: 1953 MB
malloc(1953) MB failed. iterations: 6
malloc(1953) MB failed. iterations: 2
malloc(1953) MB failed. iterations: 8

foo.c
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <sys/wait.h>

void worker(void)
{
long long i;
char *p;
int action;
int mult = 500000;
int size = mult * 4096;
fprintf(stderr,"Upper bound: %lu MB\n",(long int) (size/1024/1024));
for (i=0; ; i++) {
action = i%2;
switch(action) {
case 0:
p = malloc(size);
if (!p){
fprintf(stderr,"malloc(%lu) MB failed. iterations:
%lli\n", (long int)size/1024/1024,i);
exit(1);
}
break;
case 1:
free(p);
break;
}
}
}

void usage(const char *cmd)
{
fprintf(stderr,"Usage: %s [-t numthreads]\n", cmd);
exit(1);
}

int main(int argc, char **argv)
{
int c, i;
int nproc = sysconf(_SC_NPROCESSORS_ONLN);

while ((c = getopt(argc, argv, "t:")) != EOF) {
switch (c) {
case 't':
nproc = strtol(optarg, 0, 0);
break;
default:
usage(argv[0]);
}
}

//printf("forking %d children\n", nproc);
for (i=0; i < nproc; i++) {
switch(fork()) {
case -1:
fprintf(stderr,"fork: %s\n", strerror(errno));
exit(1);
case 0: /* child */
worker();
exit(0);
default: /* parent */
/* nothing */
break;
}
}

for (i=0; i < nproc; i++) {
int x, p;
p = wait(&x);
}

return 0;
}

Can also be found here: http://shrek.krogh.cc/~jesper/foo.c

--
Jesper Krogh

2012-02-10 15:21:21

by Jesper Krogh

[permalink] [raw]

Subject: Re: Memory issues with Opteron 6220

Long story short, this is a red herring.

The system we migrated the configuration from had
vm.overcommit_memory => 2, so then the new
one got that too. (50% actual memory + swap)

That worked fine.. We set it back in 2008 due to the
heuristic version not doing the correct thing. What
has happened over the years is that the memory grow
and swap/memory ration has gone smaller, both due
to memory growth and swap being more and more irellevant.

So the new system was set up with reduced swap 8GB vs. 100GB
which mean that the algorithm used by overcommit_memory
ended up not allowing more than: 64GB+8GB of memory being
used (less than physical memory).. The system migrated from would
by this rule allow 64+100GB, this fitting quite ok.

I guess it took so long to realize, since something with "overcommit"
isn't what springs into mind when you dont think you're even
close to be there, combined with the mis-leading power-saving issue
that just confused the problem.

I would admit that we could have saved a significant of time/fustration
if dmesg had revealed a message that it was the overcommit limits being
hit and thus knocking off the processes.

Another change to suggest would be to not kill off processes due
to overcommit at least before actual memory size had been reached.

But, long story short, system misconfiguration..

--
Jesper

2012-02-11 13:49:08

by Ingo Molnar

[permalink] [raw]

Subject: Re: Memory issues with Opteron 6220

* Jesper Krogh <[email protected]> wrote:

> [...]
>
> I would admit that we could have saved a significant of
> time/fustration if dmesg had revealed a message that it was
> the overcommit limits being hit and thus knocking off the
> processes.

It would be helpful if you could enhance the printk in such a
way, and if you tested it with your workload that triggers it -
and send us the resulting patch. Having more information in the
dmesg is never bad.

Thanks,

Ingo

2012-02-11 13:50:36

by Ingo Molnar

[permalink] [raw]

Subject: Re: Memory issues with Opteron 6220

* Yinghai Lu <[email protected]> wrote:

> On Thu, Feb 9, 2012 at 9:51 AM, Anders Ossowicki <[email protected]> wrote:
> > On Thu, Feb 09, 2012 at 05:11:04PM +0100, Yinghai Lu wrote:
> >> mtrr setting has some problem too.
> >>
> >> [ ? ?3.098277] mtrr: your CPUs had inconsistent fixed MTRR settings
> >> [ 3.100001] mtrr: probably your BIOS does not setup all CPUs.
> >> [ ? ?3.110000] mtrr: corrected configuration.
> >>
> >> can you boot with "debug ignore_loglevel show_msr=16" ?
> >
> > Yep, right here:
> > http://dev.exherbo.org/~arkanoid/atlas-dmesg-3.2.5-20120209-mtrr.txt
> >
>
> Too bad, print_cpu_info() calling for AP get removed by some
> commit.
>
> now we can not print initial AP register anymore.

Mind sending a patch that puts it back, so that it's printed via
KERN_DEBUG or such, i.e. does not get emitted in the default
log. Maybe even tie it to apic=debug or so.

Thanks,

Ingo

2012-02-14 09:32:33

by Anders Ossowicki

[permalink] [raw]

Subject: Re: Memory issues with Opteron 6220

On Sat, Feb 11, 2012 at 02:48:47PM +0100, Ingo Molnar wrote:
> It would be helpful if you could enhance the printk in such a
> way, and if you tested it with your workload that triggers it -
> and send us the resulting patch. Having more information in the
> dmesg is never bad.
I'd be happy to do so, but I'm not sure where the appropriate place to add it
is. I'm guessing a printk wrapped in a printk_ratelimit somewhere in mm but I
know next to nothing about the internals of the kernel.
--
Anders Ossowicki