Hello,
http://google-perftools.googlecode.com/svn-history/r48/trunk/src/base/atomicops-internals-x86.cc
says
" // Opteron Rev E has a bug in which on very rare occasions a locked
// instruction doesn't act as a read-acquire barrier if followed by a
// non-locked read-modify-write instruction. Rev F has this bug in
// pre-release versions, but not in versions released to customers,
// so we test only for Rev E, which is family 15, model 32..63 inclusive.
if (strcmp(vendor, "AuthenticAMD") == 0 && // AMD
family == 15 &&
32 <= model && model <= 63) {
AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug = true;
} else {
AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug = false;
}
"
does kernel have quirk/workaround for this? I'm looking at arch/x86/kernel/cpu
but I don't see workaround related to this (possibly I'm overlooking).
--
Arkadiusz Miśkiewicz PLD/Linux Team
arekm / maven.pl http://ftp.pld-linux.org/
Arkadiusz Miskiewicz writes:
>
> Hello,
>
> http://google-perftools.googlecode.com/svn-history/r48/trunk/src/base/atomicops-internals-x86.cc
> says
>
> " // Opteron Rev E has a bug in which on very rare occasions a locked
> // instruction doesn't act as a read-acquire barrier if followed by a
> // non-locked read-modify-write instruction. Rev F has this bug in
> // pre-release versions, but not in versions released to customers,
> // so we test only for Rev E, which is family 15, model 32..63 inclusive.
> if (strcmp(vendor, "AuthenticAMD") == 0 && // AMD
> family == 15 &&
> 32 <= model && model <= 63) {
> AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug = true;
> } else {
> AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug = false;
> }
> "
>
> does kernel have quirk/workaround for this? I'm looking at arch/x86/kernel/cpu
> but I don't see workaround related to this (possibly I'm overlooking).
I can find no reference to this alleged RevE erratum in the
Athlon64/Opteron revision guide (25759.pdf).
But if this bug is real then we need to know about it. Could
you ask the author of the code you quoted above to clarify?
/Mikael
On Monday 04 August 2008, Mikael Pettersson wrote:
> Arkadiusz Miskiewicz writes:
> > Hello,
> >
> > http://google-perftools.googlecode.com/svn-history/r48/trunk/src/base/at
> >omicops-internals-x86.cc says
> >
> > " // Opteron Rev E has a bug in which on very rare occasions a locked
> > // instruction doesn't act as a read-acquire barrier if followed by a
> > // non-locked read-modify-write instruction. Rev F has this bug in
> > // pre-release versions, but not in versions released to customers,
> > // so we test only for Rev E, which is family 15, model 32..63
> > inclusive. if (strcmp(vendor, "AuthenticAMD") == 0 && // AMD
> > family == 15 &&
> > 32 <= model && model <= 63) {
> > AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug = true;
> > } else {
> > AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug = false;
> > }
> > "
> >
> > does kernel have quirk/workaround for this? I'm looking at
> > arch/x86/kernel/cpu but I don't see workaround related to this (possibly
> > I'm overlooking).
>
> I can find no reference to this alleged RevE erratum in the
> Athlon64/Opteron revision guide (25759.pdf).
>
> But if this bug is real then we need to know about it. Could
> you ask the author of the code you quoted above to clarify?
Got answer, opensolaris has some workarounds for this bug I still don't know
which errata # is that:
http://groups.google.com/group/google-perftools/browse_thread/thread/3d1b78d4a9db8c6e
btw. I got info about this bug after hiting this problem:
http://bugs.mysql.com/bug.php?id=26081
> /Mikael
--
Arkadiusz Miśkiewicz PLD/Linux Team
arekm / maven.pl http://ftp.pld-linux.org/
On Mon, 4 Aug 2008 15:56:05 +0200, Arkadiusz Miskiewicz wrote:
>On Monday 04 August 2008, Mikael Pettersson wrote:
>> Arkadiusz Miskiewicz writes:
>> > Hello,
>> >
>> > http://google-perftools.googlecode.com/svn-history/r48/trunk/src/base/=
>at
>> >omicops-internals-x86.cc says
>> >
>> > " // Opteron Rev E has a bug in which on very rare occasions a locked
>> > // instruction doesn't act as a read-acquire barrier if followed by a
>> > // non-locked read-modify-write instruction. Rev F has this bug in
>> > // pre-release versions, but not in versions released to customers,
>> > // so we test only for Rev E, which is family 15, model 32..63
>> > inclusive. if (strcmp(vendor, "AuthenticAMD") =3D=3D 0 && // AMD
>> > family =3D=3D 15 &&
>> > 32 <=3D model && model <=3D 63) {
>> > AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug =3D true;
>> > } else {
>> > AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug =3D false;
>> > }
>> > "
>> >
>> > does kernel have quirk/workaround for this? I'm looking at
>> > arch/x86/kernel/cpu but I don't see workaround related to this (possib=
>ly
>> > I'm overlooking).
>>
>> I can find no reference to this alleged RevE erratum in the
>> Athlon64/Opteron revision guide (25759.pdf).
>>
>> But if this bug is real then we need to know about it. Could
>> you ask the author of the code you quoted above to clarify?
>
>Got answer, opensolaris has some workarounds for this bug I still don't kno=
>w=20
>which errata # is that:
>
>http://groups.google.com/group/google-perftools/browse_thread/thread/3d1b78=
>d4a9db8c6e
>
>btw. I got info about this bug after hiting this problem:=20
>http://bugs.mysql.com/bug.php?id=3D26081
Thanks, found the Solaris code in question and the mysql discussion.
I'll dig deeper tomorrow.
/Mikael
Mikael Pettersson writes:
> On Mon, 4 Aug 2008 15:56:05 +0200, Arkadiusz Miskiewicz wrote:
> >On Monday 04 August 2008, Mikael Pettersson wrote:
> >> Arkadiusz Miskiewicz writes:
> >> > Hello,
> >> >
> >> > http://google-perftools.googlecode.com/svn-history/r48/trunk/src/base/=
> >at
> >> >omicops-internals-x86.cc says
> >> >
> >> > " // Opteron Rev E has a bug in which on very rare occasions a locked
> >> > // instruction doesn't act as a read-acquire barrier if followed by a
> >> > // non-locked read-modify-write instruction. Rev F has this bug in
> >> > // pre-release versions, but not in versions released to customers,
> >> > // so we test only for Rev E, which is family 15, model 32..63
> >> > inclusive. if (strcmp(vendor, "AuthenticAMD") =3D=3D 0 && // AMD
> >> > family =3D=3D 15 &&
> >> > 32 <=3D model && model <=3D 63) {
> >> > AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug =3D true;
> >> > } else {
> >> > AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug =3D false;
> >> > }
> >> > "
> >> >
> >> > does kernel have quirk/workaround for this? I'm looking at
> >> > arch/x86/kernel/cpu but I don't see workaround related to this (possib=
> >ly
> >> > I'm overlooking).
> >>
> >> I can find no reference to this alleged RevE erratum in the
> >> Athlon64/Opteron revision guide (25759.pdf).
> >>
> >> But if this bug is real then we need to know about it. Could
> >> you ask the author of the code you quoted above to clarify?
> >
> >Got answer, opensolaris has some workarounds for this bug I still don't kno=
> >w=20
> >which errata # is that:
> >
> >http://groups.google.com/group/google-perftools/browse_thread/thread/3d1b78=
> >d4a9db8c6e
> >
> >btw. I got info about this bug after hiting this problem:=20
> >http://bugs.mysql.com/bug.php?id=3D26081
>
> Thanks, found the Solaris code in question and the mysql discussion.
> I'll dig deeper tomorrow.
I investigated the Solaris track, but I've found no detailed
explanation of the alleged bug. I've asked the Sun engineer
who committed the fix for an explanation, but so far there's
been no reply.
Anyway, here's what I've found out.
It's Solaris bug # 6323525.
They call it "Mutex primitives don't work as expected."
if (number_of_cores() < 2) then don't have bug
if (family == 0xf && Model < 0x40) then have bug
if (rdmsr(MSR_BU_CFG/*0xC0011023*/) & 2) then bug is masked
lock: // mutex_lock, spin_lock, etc
...
lock; cmpxchg ..
jnz fail
ret; nop; nop; nop // patched to "lfence; ret" if bug
The workaround is to place a fencing instruction (lfence) between
the mutex operation and the subsequent read-modify-write instruction.
(This provides the necessary load memory barrier.)
There's no change to the unlock code.
Anyone know who to contact @ AMD about confirming or denying this?
/Mikael