2005-03-30 21:46:07

by Dave Jones

[permalink] [raw]
Subject: x86-64 bad pmds in 2.6.11.6

[apologies to Andi for getting this twice, I goofed the l-k address
the first time]


I arrived at the office today to find my workstation had this spew
in its dmesg buffer..

mm/memory.c:97: bad pmd ffff81004b017438(00000038a5500a88).
mm/memory.c:97: bad pmd ffff81004b017440(0000000000000003).
mm/memory.c:97: bad pmd ffff81004b017448(00007ffffffff73b).
mm/memory.c:97: bad pmd ffff81004b017450(00007ffffffff73c).
mm/memory.c:97: bad pmd ffff81004b017458(00007ffffffff73d).
mm/memory.c:97: bad pmd ffff81004b017468(00007ffffffff73e).
mm/memory.c:97: bad pmd ffff81004b017470(00007ffffffff73f).
mm/memory.c:97: bad pmd ffff81004b017478(00007ffffffff740).
mm/memory.c:97: bad pmd ffff81004b017480(00007ffffffff741).
mm/memory.c:97: bad pmd ffff81004b017488(00007ffffffff742).
mm/memory.c:97: bad pmd ffff81004b017490(00007ffffffff743).
mm/memory.c:97: bad pmd ffff81004b017498(00007ffffffff744).
mm/memory.c:97: bad pmd ffff81004b0174a0(00007ffffffff745).
mm/memory.c:97: bad pmd ffff81004b0174a8(00007ffffffff746).
mm/memory.c:97: bad pmd ffff81004b0174b0(00007ffffffff747).
mm/memory.c:97: bad pmd ffff81004b0174b8(00007ffffffff748).
mm/memory.c:97: bad pmd ffff81004b0174c0(00007ffffffff749).
mm/memory.c:97: bad pmd ffff81004b0174c8(00007ffffffff74a).
mm/memory.c:97: bad pmd ffff81004b0174d0(00007ffffffff74b).
mm/memory.c:97: bad pmd ffff81004b0174d8(00007ffffffff74c).
mm/memory.c:97: bad pmd ffff81004b0174e0(00007ffffffff74d).
mm/memory.c:97: bad pmd ffff81004b0174e8(00007ffffffff74e).
mm/memory.c:97: bad pmd ffff81004b0174f0(00007ffffffff74f).
mm/memory.c:97: bad pmd ffff81004b0174f8(00007ffffffff750).
mm/memory.c:97: bad pmd ffff81004b017500(00007ffffffff751).
mm/memory.c:97: bad pmd ffff81004b017508(00007ffffffff752).
mm/memory.c:97: bad pmd ffff81004b017510(00007ffffffff753).
mm/memory.c:97: bad pmd ffff81004b017518(00007ffffffff754).
mm/memory.c:97: bad pmd ffff81004b017520(00007ffffffff755).
mm/memory.c:97: bad pmd ffff81004b017528(00007ffffffff756).
mm/memory.c:97: bad pmd ffff81004b017530(00007ffffffff757).
mm/memory.c:97: bad pmd ffff81004b017538(00007ffffffff758).
mm/memory.c:97: bad pmd ffff81004b017540(00007ffffffff759).
mm/memory.c:97: bad pmd ffff81004b017548(00007ffffffff75a).
mm/memory.c:97: bad pmd ffff81004b017550(00007ffffffff75b).
mm/memory.c:97: bad pmd ffff81004b017558(00007ffffffff75c).
mm/memory.c:97: bad pmd ffff81004b017560(00007ffffffff75d).
mm/memory.c:97: bad pmd ffff81004b017568(00007ffffffff75e).
mm/memory.c:97: bad pmd ffff81004b017570(00007ffffffff75f).
mm/memory.c:97: bad pmd ffff81004b017578(00007ffffffff760).
mm/memory.c:97: bad pmd ffff81004b017580(00007ffffffff761).
mm/memory.c:97: bad pmd ffff81004b017588(00007ffffffff762).
mm/memory.c:97: bad pmd ffff81004b017590(00007ffffffff763).
mm/memory.c:97: bad pmd ffff81004b017598(00007ffffffff764).
mm/memory.c:97: bad pmd ffff81004b0175a0(00007ffffffff765).
mm/memory.c:97: bad pmd ffff81004b0175a8(00007ffffffff766).
mm/memory.c:97: bad pmd ffff81004b0175b0(00007ffffffff767).
mm/memory.c:97: bad pmd ffff81004b0175b8(00007ffffffff768).
mm/memory.c:97: bad pmd ffff81004b0175c0(00007ffffffff769).
mm/memory.c:97: bad pmd ffff81004b0175c8(00007ffffffff76a).
mm/memory.c:97: bad pmd ffff81004b0175d0(00007ffffffff76b).
mm/memory.c:97: bad pmd ffff81004b0175d8(00007ffffffff76c).
mm/memory.c:97: bad pmd ffff81004b0175e0(00007ffffffff76d).
mm/memory.c:97: bad pmd ffff81004b0175e8(00007ffffffff76e).
mm/memory.c:97: bad pmd ffff81004b0175f0(00007ffffffff76f).
mm/memory.c:97: bad pmd ffff81004b0175f8(00007ffffffff770).
mm/memory.c:97: bad pmd ffff81004b017600(00007ffffffff771).
mm/memory.c:97: bad pmd ffff81004b017608(00007ffffffff772).
mm/memory.c:97: bad pmd ffff81004b017610(00007ffffffff773).
mm/memory.c:97: bad pmd ffff81004b017618(00007ffffffff774).
mm/memory.c:97: bad pmd ffff81004b017628(0000000000000010).
mm/memory.c:97: bad pmd ffff81004b017630(00000000078bfbff).
mm/memory.c:97: bad pmd ffff81004b017638(0000000000000006).
mm/memory.c:97: bad pmd ffff81004b017640(0000000000001000).
mm/memory.c:97: bad pmd ffff81004b017648(0000000000000011).
mm/memory.c:97: bad pmd ffff81004b017650(0000000000000064).
mm/memory.c:97: bad pmd ffff81004b017658(0000000000000003).
mm/memory.c:97: bad pmd ffff81004b017660(0000000000400040).
mm/memory.c:97: bad pmd ffff81004b017668(0000000000000004).
mm/memory.c:97: bad pmd ffff81004b017670(0000000000000038).
mm/memory.c:97: bad pmd ffff81004b017678(0000000000000005).
mm/memory.c:97: bad pmd ffff81004b017680(0000000000000008).
mm/memory.c:97: bad pmd ffff81004b017688(0000000000000007).
mm/memory.c:97: bad pmd ffff81004b017698(0000000000000008).
mm/memory.c:97: bad pmd ffff81004b0176a8(0000000000000009).
mm/memory.c:97: bad pmd ffff81004b0176b0(0000000000403840).
mm/memory.c:97: bad pmd ffff81004b0176b8(000000000000000b).
mm/memory.c:97: bad pmd ffff81004b0176c0(00000000000001f4).
mm/memory.c:97: bad pmd ffff81004b0176c8(000000000000000c).
mm/memory.c:97: bad pmd ffff81004b0176d0(00000000000001f4).
mm/memory.c:97: bad pmd ffff81004b0176d8(000000000000000d).
mm/memory.c:97: bad pmd ffff81004b0176e0(00000000000001f4).
mm/memory.c:97: bad pmd ffff81004b0176e8(000000000000000e).
mm/memory.c:97: bad pmd ffff81004b0176f0(00000000000001f4).
mm/memory.c:97: bad pmd ffff81004b0176f8(0000000000000017).
mm/memory.c:97: bad pmd ffff81004b017708(000000000000000f).
mm/memory.c:97: bad pmd ffff81004b017710(00007ffffffff734).
mm/memory.c:97: bad pmd ffff81004b017730(5f36387800000000).
mm/memory.c:97: bad pmd ffff81004b017738(0000000000003436).


I've not done a memtest86 run on this (yet), but I'll be very
surprised if this is bad RAM, especially considering other
folks also seem to have hit the same thing when they moved
to 2.6.11. (My workstation ran 2.6.9/2.6.10 without incident
previously).

http://lkml.org/lkml/2005/3/11/42 for example lists a similar
dump (though obviously differing addresses).
Googling around reveals a bunch of other similar dumps.

Dave


2005-03-31 10:41:59

by Andi Kleen

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6

On Wed, Mar 30, 2005 at 04:44:55PM -0500, Dave Jones wrote:
> [apologies to Andi for getting this twice, I goofed the l-k address
> the first time]
>
>
> I arrived at the office today to find my workstation had this spew
> in its dmesg buffer..

Looks like random memory corruption to me.

Can you enable slab debugging etc.?

> mm/memory.c:97: bad pmd ffff81004b017438(00000038a5500a88).
> mm/memory.c:97: bad pmd ffff81004b017440(0000000000000003).
> mm/memory.c:97: bad pmd ffff81004b017448(00007ffffffff73b).
> mm/memory.c:97: bad pmd ffff81004b017450(00007ffffffff73c).
> mm/memory.c:97: bad pmd ffff81004b017458(00007ffffffff73d).
> mm/memory.c:97: bad pmd ffff81004b017468(00007ffffffff73e).
> mm/memory.c:97: bad pmd ffff81004b017470(00007ffffffff73f).
> mm/memory.c:97: bad pmd ffff81004b017478(00007ffffffff740).
> mm/memory.c:97: bad pmd ffff81004b017480(00007ffffffff741).
> mm/memory.c:97: bad pmd ffff81004b017488(00007ffffffff742).
> mm/memory.c:97: bad pmd ffff81004b017490(00007ffffffff743).
> mm/memory.c:97: bad pmd ffff81004b017498(00007ffffffff744).
> mm/memory.c:97: bad pmd ffff81004b0174a0(00007ffffffff745).
> mm/memory.c:97: bad pmd ffff81004b0174a8(00007ffffffff746).
> mm/memory.c:97: bad pmd ffff81004b0174b0(00007ffffffff747).
> mm/memory.c:97: bad pmd ffff81004b0174b8(00007ffffffff748).
> mm/memory.c:97: bad pmd ffff81004b0174c0(00007ffffffff749).
> mm/memory.c:97: bad pmd ffff81004b0174c8(00007ffffffff74a).
> mm/memory.c:97: bad pmd ffff81004b0174d0(00007ffffffff74b).
> mm/memory.c:97: bad pmd ffff81004b0174d8(00007ffffffff74c).
> mm/memory.c:97: bad pmd ffff81004b0174e0(00007ffffffff74d).
> mm/memory.c:97: bad pmd ffff81004b0174e8(00007ffffffff74e).
> mm/memory.c:97: bad pmd ffff81004b0174f0(00007ffffffff74f).
> mm/memory.c:97: bad pmd ffff81004b0174f8(00007ffffffff750).
> mm/memory.c:97: bad pmd ffff81004b017500(00007ffffffff751).
> mm/memory.c:97: bad pmd ffff81004b017508(00007ffffffff752).
> mm/memory.c:97: bad pmd ffff81004b017510(00007ffffffff753).
> mm/memory.c:97: bad pmd ffff81004b017518(00007ffffffff754).
> mm/memory.c:97: bad pmd ffff81004b017520(00007ffffffff755).
> mm/memory.c:97: bad pmd ffff81004b017528(00007ffffffff756).
> mm/memory.c:97: bad pmd ffff81004b017530(00007ffffffff757).
> mm/memory.c:97: bad pmd ffff81004b017538(00007ffffffff758).
> mm/memory.c:97: bad pmd ffff81004b017540(00007ffffffff759).
> mm/memory.c:97: bad pmd ffff81004b017548(00007ffffffff75a).
> mm/memory.c:97: bad pmd ffff81004b017550(00007ffffffff75b).
> mm/memory.c:97: bad pmd ffff81004b017558(00007ffffffff75c).
> mm/memory.c:97: bad pmd ffff81004b017560(00007ffffffff75d).
> mm/memory.c:97: bad pmd ffff81004b017568(00007ffffffff75e).
> mm/memory.c:97: bad pmd ffff81004b017570(00007ffffffff75f).
> mm/memory.c:97: bad pmd ffff81004b017578(00007ffffffff760).
> mm/memory.c:97: bad pmd ffff81004b017580(00007ffffffff761).
> mm/memory.c:97: bad pmd ffff81004b017588(00007ffffffff762).
> mm/memory.c:97: bad pmd ffff81004b017590(00007ffffffff763).
> mm/memory.c:97: bad pmd ffff81004b017598(00007ffffffff764).
> mm/memory.c:97: bad pmd ffff81004b0175a0(00007ffffffff765).
> mm/memory.c:97: bad pmd ffff81004b0175a8(00007ffffffff766).
> mm/memory.c:97: bad pmd ffff81004b0175b0(00007ffffffff767).
> mm/memory.c:97: bad pmd ffff81004b0175b8(00007ffffffff768).
> mm/memory.c:97: bad pmd ffff81004b0175c0(00007ffffffff769).
> mm/memory.c:97: bad pmd ffff81004b0175c8(00007ffffffff76a).
> mm/memory.c:97: bad pmd ffff81004b0175d0(00007ffffffff76b).
> mm/memory.c:97: bad pmd ffff81004b0175d8(00007ffffffff76c).
> mm/memory.c:97: bad pmd ffff81004b0175e0(00007ffffffff76d).
> mm/memory.c:97: bad pmd ffff81004b0175e8(00007ffffffff76e).
> mm/memory.c:97: bad pmd ffff81004b0175f0(00007ffffffff76f).
> mm/memory.c:97: bad pmd ffff81004b0175f8(00007ffffffff770).
> mm/memory.c:97: bad pmd ffff81004b017600(00007ffffffff771).
> mm/memory.c:97: bad pmd ffff81004b017608(00007ffffffff772).
> mm/memory.c:97: bad pmd ffff81004b017610(00007ffffffff773).
> mm/memory.c:97: bad pmd ffff81004b017618(00007ffffffff774).
> mm/memory.c:97: bad pmd ffff81004b017628(0000000000000010).
> mm/memory.c:97: bad pmd ffff81004b017630(00000000078bfbff).
> mm/memory.c:97: bad pmd ffff81004b017638(0000000000000006).
> mm/memory.c:97: bad pmd ffff81004b017640(0000000000001000).
> mm/memory.c:97: bad pmd ffff81004b017648(0000000000000011).
> mm/memory.c:97: bad pmd ffff81004b017650(0000000000000064).
> mm/memory.c:97: bad pmd ffff81004b017658(0000000000000003).
> mm/memory.c:97: bad pmd ffff81004b017660(0000000000400040).
> mm/memory.c:97: bad pmd ffff81004b017668(0000000000000004).
> mm/memory.c:97: bad pmd ffff81004b017670(0000000000000038).
> mm/memory.c:97: bad pmd ffff81004b017678(0000000000000005).
> mm/memory.c:97: bad pmd ffff81004b017680(0000000000000008).
> mm/memory.c:97: bad pmd ffff81004b017688(0000000000000007).
> mm/memory.c:97: bad pmd ffff81004b017698(0000000000000008).
> mm/memory.c:97: bad pmd ffff81004b0176a8(0000000000000009).
> mm/memory.c:97: bad pmd ffff81004b0176b0(0000000000403840).
> mm/memory.c:97: bad pmd ffff81004b0176b8(000000000000000b).
> mm/memory.c:97: bad pmd ffff81004b0176c0(00000000000001f4).
> mm/memory.c:97: bad pmd ffff81004b0176c8(000000000000000c).
> mm/memory.c:97: bad pmd ffff81004b0176d0(00000000000001f4).
> mm/memory.c:97: bad pmd ffff81004b0176d8(000000000000000d).
> mm/memory.c:97: bad pmd ffff81004b0176e0(00000000000001f4).
> mm/memory.c:97: bad pmd ffff81004b0176e8(000000000000000e).
> mm/memory.c:97: bad pmd ffff81004b0176f0(00000000000001f4).
> mm/memory.c:97: bad pmd ffff81004b0176f8(0000000000000017).
> mm/memory.c:97: bad pmd ffff81004b017708(000000000000000f).
> mm/memory.c:97: bad pmd ffff81004b017710(00007ffffffff734).
> mm/memory.c:97: bad pmd ffff81004b017730(5f36387800000000).
> mm/memory.c:97: bad pmd ffff81004b017738(0000000000003436).
>
>
> I've not done a memtest86 run on this (yet), but I'll be very
> surprised if this is bad RAM, especially considering other
> folks also seem to have hit the same thing when they moved
> to 2.6.11. (My workstation ran 2.6.9/2.6.10 without incident
> previously).
>
> http://lkml.org/lkml/2005/3/11/42 for example lists a similar
> dump (though obviously differing addresses).
> Googling around reveals a bunch of other similar dumps.

Yes I saw them, but I supposed it is some driver going bad.
If you want you can collect hardware data and see if there is
a common driver.

-Andi

2005-03-31 21:52:23

by Dave Jones

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6

On Thu, Mar 31, 2005 at 12:41:17PM +0200, Andi Kleen wrote:
> On Wed, Mar 30, 2005 at 04:44:55PM -0500, Dave Jones wrote:
> > [apologies to Andi for getting this twice, I goofed the l-k address
> > the first time]
> >
> >
> > I arrived at the office today to find my workstation had this spew
> > in its dmesg buffer..
>
> Looks like random memory corruption to me.
>
> Can you enable slab debugging etc.?

SLAB_DEBUG=y. Nothing in the logs.

> Yes I saw them, but I supposed it is some driver going bad.
> If you want you can collect hardware data and see if there is
> a common driver.

There's quite a bit in this box

00:06.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8111 PCI (rev 07) 00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-8111 LPC (rev 05) 00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-8111 IDE (rev 03) 00:07.2 SMBus: Advanced Micro Devices [AMD] AMD-8111 SMBus 2.0 (rev 02)
00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-8111 ACPI (rev 05)
00:07.5 Multimedia audio controller: Advanced Micro Devices [AMD] AMD-8111 AC97 Audio (rev 03)
00:0a.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12)
00:0a.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev 01)
00:0b.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12)
00:0b.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev 01)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
02:07.0 USB Controller: NEC Corporation USB (rev 41)
02:07.1 USB Controller: NEC Corporation USB (rev 41)
02:07.2 USB Controller: NEC Corporation USB 2.0 (rev 02)
02:08.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1010 66MHz Ultra3 SCSI Adapter (rev 01)
02:08.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1010 66MHz Ultra3 SCSI Adapter (rev 01)
02:09.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703X Gigabit Ethernet (rev 02)
03:00.0 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b)
03:00.1 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b)
03:0a.0 Unknown mass storage controller: Triones Technologies, Inc. HPT366/368/370/370A/372 (rev 03)
03:0b.0 Unknown mass storage controller: Silicon Image, Inc. (formerly CMD Technology Inc) SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)
03:0c.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link)
04:00.0 Host bridge: Advanced Micro Devices [AMD] AMD-8151 System Controller (rev 13)
04:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8151 AGP Bridge (rev 13)
05:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G550 AGP (rev 01)

The SATA & SCSI controllers have no disks attached. Firewire can be ignored (theres
no actual connector even for it on the board). The various USB controllers
are mostly unused. Only one of them is USB2.0, so that sees occasional
usb-storage use. Not noticed anything going bad there though.

Dave

2005-04-01 11:53:09

by Sergey S. Kostyliov

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6

On Friday 01 April 2005 01:52, Dave Jones wrote:
> On Thu, Mar 31, 2005 at 12:41:17PM +0200, Andi Kleen wrote:
> > On Wed, Mar 30, 2005 at 04:44:55PM -0500, Dave Jones wrote:
> > > [apologies to Andi for getting this twice, I goofed the l-k address
> > > the first time]
> > >
> > >
> > > I arrived at the office today to find my workstation had this spew
> > > in its dmesg buffer..
> >
> > Looks like random memory corruption to me.
> >
> > Can you enable slab debugging etc.?
>
> SLAB_DEBUG=y. Nothing in the logs.
>
> > Yes I saw them, but I supposed it is some driver going bad.
> > If you want you can collect hardware data and see if there is
> > a common driver.
>
> There's quite a bit in this box
>
> 00:06.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8111 PCI (rev 07) 00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-8111 LPC (rev 05) 00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-8111 IDE (rev 03) 00:07.2 SMBus: Advanced Micro Devices [AMD] AMD-8111 SMBus 2.0 (rev 02)
> 00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-8111 ACPI (rev 05)
> 00:07.5 Multimedia audio controller: Advanced Micro Devices [AMD] AMD-8111 AC97 Audio (rev 03)
> 00:0a.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12)
> 00:0a.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev 01)
> 00:0b.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12)
> 00:0b.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev 01)
> 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
> 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
> 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
> 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
> 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
> 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
> 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
> 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
> 02:07.0 USB Controller: NEC Corporation USB (rev 41)
> 02:07.1 USB Controller: NEC Corporation USB (rev 41)
> 02:07.2 USB Controller: NEC Corporation USB 2.0 (rev 02)
> 02:08.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1010 66MHz Ultra3 SCSI Adapter (rev 01)
> 02:08.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1010 66MHz Ultra3 SCSI Adapter (rev 01)
> 02:09.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703X Gigabit Ethernet (rev 02)
> 03:00.0 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b)
> 03:00.1 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b)
> 03:0a.0 Unknown mass storage controller: Triones Technologies, Inc. HPT366/368/370/370A/372 (rev 03)
> 03:0b.0 Unknown mass storage controller: Silicon Image, Inc. (formerly CMD Technology Inc) SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)
> 03:0c.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link)
> 04:00.0 Host bridge: Advanced Micro Devices [AMD] AMD-8151 System Controller (rev 13)
> 04:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8151 AGP Bridge (rev 13)
> 05:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G550 AGP (rev 01)
>
> The SATA & SCSI controllers have no disks attached. Firewire can be ignored (theres
> no actual connector even for it on the board). The various USB controllers
> are mostly unused. Only one of them is USB2.0, so that sees occasional
> usb-storage use. Not noticed anything going bad there though.
>
> Dave

And here is my box (looks like there is no many hardware drivers
in common).

rathamahata@lights rathamahata $ /sbin/lspci
0000:00:06.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8111 PCI (rev 07)
0000:00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-8111 LPC (rev 05)
0000:00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-8111 IDE (rev 03)
0000:00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-8111 ACPI (rev 05)
0000:00:0a.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12)
0000:00:0a.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev 01)
0000:00:0b.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12)
0000:00:0b.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev 01)
0000:00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
0000:00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
0000:00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
0000:00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
0000:00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
0000:00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
0000:00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
0000:00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
0000:01:01.0 PCI bridge: IBM PCI-X to PCI-X Bridge (rev 02)
0000:02:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID (rev 02)
0000:03:03.0 Ethernet controller: Intel Corporation 82541GI/PI Gigabit Ethernet Controller
0000:03:04.0 Ethernet controller: Intel Corporation 82541GI/PI Gigabit Ethernet Controller
0000:04:00.0 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b)
0000:04:00.1 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b)
0000:04:06.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
rathamahata@lights rathamahata $

e1000 is handled by by intel's e1000 driver


usb is not compiled in
rathamahata@lights linux-2.6.11 $ grep CONFIG_USB .config
# CONFIG_USB is not set
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB_ARCH_HAS_OHCI=y
# CONFIG_USB_GADGET is not set
rathamahata@lights linux-2.6.11 $

--
Sergey S. Kostyliov <[email protected]>
Jabber ID: [email protected]

2005-04-07 02:49:12

by Dave Jones

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6

On Thu, Mar 31, 2005 at 12:41:17PM +0200, Andi Kleen wrote:
> On Wed, Mar 30, 2005 at 04:44:55PM -0500, Dave Jones wrote:
> > [apologies to Andi for getting this twice, I goofed the l-k address
> > the first time]
> >
> >
> > I arrived at the office today to find my workstation had this spew
> > in its dmesg buffer..
>
> Looks like random memory corruption to me.
>
> Can you enable slab debugging etc.?
>
> > mm/memory.c:97: bad pmd ffff81004b017438(00000038a5500a88).
> > mm/memory.c:97: bad pmd ffff81004b017440(0000000000000003).
> > mm/memory.c:97: bad pmd ffff81004b017448(00007ffffffff73b).
> > mm/memory.c:97: bad pmd ffff81004b017450(00007ffffffff73c).
> > etc..

I realised today that this happens every time X starts up for
the first time. I did some experiments, and found that with 2.6.12rc1
it's gone. Either it got fixed accidentally, or its hidden now
by one of the many changes in 4-level patches.

I'll try and narrow this down a little more tomorrow, to see if I
can pinpoint the exact -bk snapshot (may be tricky given they were
broken for a while), as it'd be good to get this fixed in 2.6.11.x
if .12 isn't going to show up any time soon.

Dave

2005-04-07 06:29:34

by Andi Kleen

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6

> I realised today that this happens every time X starts up for
> the first time. I did some experiments, and found that with 2.6.12rc1
> it's gone. Either it got fixed accidentally, or its hidden now
> by one of the many changes in 4-level patches.
>
> I'll try and narrow this down a little more tomorrow, to see if I
> can pinpoint the exact -bk snapshot (may be tricky given they were
> broken for a while), as it'd be good to get this fixed in 2.6.11.x
> if .12 isn't going to show up any time soon.

Can you supply a strace of the /dev/mem, /dev/kmem accesses of
your X server? (including the mmaps or read/writes if available)

My X server doesn't seem to cause that.

-Andi

2005-04-08 16:33:32

by Clem Taylor

[permalink] [raw]
Subject: re: x86-64 bad pmds in 2.6.11.6

Dave Jones reported seeing bad pmd messages in 2.6.11.6. I've been
seeing them with 2.6.11 and today with 2.6.11.6. When I first saw the
problem I ran memtest86 and it didn't catch anything after ~3hours.
However, I don't see them when X starts. They tend to happen after a
program segfaults:

2.6.11:
Apr 3 23:23:33 klaatu kernel: sh[16361]: segfault at 0000000000000000
rip 0000000000000000 rsp 00007ffffffff020 error 14
Apr 3 23:23:33 klaatu kernel: mm/memory.c:97: bad pmd
ffff810027171010(00000000006b68b9).
.. many more ...

2.6.11.6:
Apr 8 12:03:17 klaatu kernel: grep[20971]: segfault at
0000000000000000 rip 0000000000000000 rsp 00007ffffffff090 error 14
Apr 8 12:03:17 klaatu kernel: mm/memory.c:97: bad pmd
ffff810095929010(0000000000000015).
.... many more ...
Apr 8 12:03:18 klaatu kernel: mm/memory.c:97: bad pmd
ffff8100959299d0(000034365f363878).
Apr 8 12:03:18 klaatu kernel: grep[21116]: segfault at
0000000000000000 rip 0000000000000000 rsp 00007ffffffff0a0 error 14
Apr 8 12:03:18 klaatu kernel: mm/memory.c:97: bad pmd
ffff810095f5b000(000000000000000f).
...

At the time I was doing a
find ... -exec grep -H ...
over a linux kernel tree.

I repeated the find and I didn't see segfaults the second run.

--Clem

2005-04-14 13:53:54

by Hugh Dickins

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6

On Thu, 7 Apr 2005, Andi Kleen wrote:
> Dave Jones wrote:
> > I realised today that this happens every time X starts up for
> > the first time. I did some experiments, and found that with 2.6.12rc1
> > it's gone. Either it got fixed accidentally, or its hidden now
> > by one of the many changes in 4-level patches.
> >
> > I'll try and narrow this down a little more tomorrow, to see if I
> > can pinpoint the exact -bk snapshot (may be tricky given they were
> > broken for a while), as it'd be good to get this fixed in 2.6.11.x
> > if .12 isn't going to show up any time soon.
>
> Can you supply a strace of the /dev/mem, /dev/kmem accesses of
> your X server? (including the mmaps or read/writes if available)
>
> My X server doesn't seem to cause that.

I can't explain why it should appear fixed in 2.6.12-rc1 (probably
other complicating factors at work), but I do believe you've fixed
this in 2.6.12-rc2, and the patch which should go into -stable is
your load_cr3 patch below, which Linus took from Andrew on 28 March.

I say this because I was intrigued by the resemblance between Sergey's
and Dave's corruptions, and spent a while trying to work out where they
come from. The giveaway is the little ASCII string they share at the
end (seen also in Clem's extract)

mm/memory.c:97: bad pmd ffff81004b017730(5f36387800000000).
mm/memory.c:97: bad pmd ffff81004b017738(0000000000003436).

That says "x86_64", and a grep for that as a string shows ELF_PLATFORM,
and a grep for that shows create_elf_tables in fs/binfmt_elf.c. _All_
this pmd corruption (except for the first line, presumably pushing a
user address on stack) originates from create_elf_tables (the neatly
ascending stack addresses being the argv and envp pointers, incrementing
by 1 because only a NUL-string is found for each, the real strings being
off elsewhere in the intended new stack page, not in this pmd page).

It looks very much as if the mm being created has for pmd a page
which was used for user stack in the outgoing mm; but somehow exec's
exit_mmap TLB flushing hasn't taken effect. I only now noticed this
patch where you fix just such an issue.

Hugh

From: "Andi Kleen" <[email protected]>

Always reload CR3 completely when a lazy MM thread drops a MM. This avoids
keeping stale mappings around in the TLB that could be run into by the CPU by
itself (e.g. during prefetches).

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

25-akpm/arch/x86_64/kernel/smp.c | 3 ++-
25-akpm/include/asm-x86_64/mmu_context.h | 10 ++++++++--
2 files changed, 10 insertions(+), 3 deletions(-)

diff -puN arch/x86_64/kernel/smp.c~x86_64-always-reload-cr3-completely-when-a-lazy-mm arch/x86_64/kernel/smp.c
--- 25/arch/x86_64/kernel/smp.c~x86_64-always-reload-cr3-completely-when-a-lazy-mm Wed Mar 23 15:38:58 2005
+++ 25-akpm/arch/x86_64/kernel/smp.c Wed Mar 23 15:38:58 2005
@@ -25,6 +25,7 @@
#include <asm/pgalloc.h>
#include <asm/tlbflush.h>
#include <asm/mach_apic.h>
+#include <asm/mmu_context.h>
#include <asm/proto.h>

/*
@@ -52,7 +53,7 @@ static inline void leave_mm (unsigned lo
if (read_pda(mmu_state) == TLBSTATE_OK)
BUG();
clear_bit(cpu, &read_pda(active_mm)->cpu_vm_mask);
- __flush_tlb();
+ load_cr3(swapper_pg_dir);
}

/*
diff -puN include/asm-x86_64/mmu_context.h~x86_64-always-reload-cr3-completely-when-a-lazy-mm include/asm-x86_64/mmu_context.h
--- 25/include/asm-x86_64/mmu_context.h~x86_64-always-reload-cr3-completely-when-a-lazy-mm Wed Mar 23 15:38:58 2005
+++ 25-akpm/include/asm-x86_64/mmu_context.h Wed Mar 23 15:38:58 2005
@@ -28,6 +28,11 @@ static inline void enter_lazy_tlb(struct
}
#endif

+static inline void load_cr3(pgd_t *pgd)
+{
+ asm volatile("movq %0,%%cr3" :: "r" (__pa(pgd)) : "memory");
+}
+
static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
struct task_struct *tsk)
{
@@ -40,7 +45,8 @@ static inline void switch_mm(struct mm_s
write_pda(active_mm, next);
#endif
set_bit(cpu, &next->cpu_vm_mask);
- asm volatile("movq %0,%%cr3" :: "r" (__pa(next->pgd)) : "memory");
+ load_cr3(next->pgd);
+
if (unlikely(next->context.ldt != prev->context.ldt))
load_LDT_nolock(&next->context, cpu);
}
@@ -54,7 +60,7 @@ static inline void switch_mm(struct mm_s
* tlb flush IPI delivery. We must reload CR3
* to make sure to use no freed page tables.
*/
- asm volatile("movq %0,%%cr3" :: "r" (__pa(next->pgd)) : "memory");
+ load_cr3(next->pgd);
load_LDT_nolock(&next->context, cpu);
}
}

2005-04-14 17:01:36

by Andi Kleen

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6

> It looks very much as if the mm being created has for pmd a page
> which was used for user stack in the outgoing mm; but somehow exec's
> exit_mmap TLB flushing hasn't taken effect. I only now noticed this
> patch where you fix just such an issue.

Thanks for the analysis. However I doubt the load_cr3 patch can fix
it. All it does is to stop the CPU from prefetching mappings (which
can cause different problem). But the Linux code who does bad pmd checks
never looks at CR3 anyways, it always uses the current->mm. If
bad pmd sees a bad page it must be still in the page tables of the MM,
not a stable TLB entry.

It must be something else. Somehow we get a freed page into
the page table hierarchy. After the initial 4level implementation
I did not do many changes there, my suspection would be rather
on the recent memory.c changes.

-Andi

2005-04-14 17:35:01

by Hugh Dickins

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6

On Thu, 14 Apr 2005, Andi Kleen wrote:
>
> Thanks for the analysis. However I doubt the load_cr3 patch can fix
> it. All it does is to stop the CPU from prefetching mappings (which
> can cause different problem).

I thought that the leave_mm code (before your patch) flushes the TLB, but
restores cr3 to the mm, while removing that cpu from the mm's cpu_vm_mask.

So any speculation, not just prefetching, on that cpu is in danger of
bringing address translations according to that mm back into the TLB.

But when the mm is torn down in exit_mmap, there's no longer any record
that the TLB on that cpu needs flushing, so stale translations remain.

As a rule, we always flush TLB _after_ invalidating, not just before,
for this kind of reason.

My paranoia of speculation may be excessive: I _think_ what I outline
above is a real possibility on Intel, but you and others know AMD much
better than I (and the reports I've seen are on AMD64, not EM64T).

> But the Linux code who does bad pmd checks
> never looks at CR3 anyways, it always uses the current->mm. If
> bad pmd sees a bad page it must be still in the page tables of the MM,
> not a stable TLB entry.

Sure, the "mm/memory.c:97: bad pmd" messages are coming from
clear_pmd_range, when the corrupted task exits later (but probably
not much later, since its user stack is oddly distributed across
two different pages: some mentioned SIGSEGVs I think).

The pmd really is bad, but it got to be bad because it had stack data
written into it by create_elf_tables, when the TLB mistakenly thought
it already knew what physical page 0x00007ffffffff000 was mapped to
(prior kernel accesses to that user stack are not by user address).

Hugh

2005-04-14 18:10:36

by Andi Kleen

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6

On Thu, Apr 14, 2005 at 06:34:58PM +0100, Hugh Dickins wrote:
> On Thu, 14 Apr 2005, Andi Kleen wrote:
> >
> > Thanks for the analysis. However I doubt the load_cr3 patch can fix
> > it. All it does is to stop the CPU from prefetching mappings (which
> > can cause different problem).
>
> I thought that the leave_mm code (before your patch) flushes the TLB, but
> restores cr3 to the mm, while removing that cpu from the mm's cpu_vm_mask.
>
> So any speculation, not just prefetching, on that cpu is in danger of
> bringing address translations according to that mm back into the TLB.
>
> But when the mm is torn down in exit_mmap, there's no longer any record
> that the TLB on that cpu needs flushing, so stale translations remain.
>
> As a rule, we always flush TLB _after_ invalidating, not just before,
> for this kind of reason.

Yes this is all true. In fact I have several bug fixes for problems
in this area.

But this all cannot explain corruptions comming from the kernel,
you tend to only see problems with the CPU prefetching something.

Note that with the cr3 reload you end up with init_mm, which
is not any useful mm. So even if there was a store from the kernel
into a stale mapping it would cause -EFAULT now. But that is
not happening.

>
> My paranoia of speculation may be excessive: I _think_ what I outline
> above is a real possibility on Intel, but you and others know AMD much
> better than I (and the reports I've seen are on AMD64, not EM64T).

It is not both on Intel and AMD :) These CPUs do a lot of prefetching
behind your back, any stale mappings at any time in the TLB eventually
cause problems. But other ones than this.


> Sure, the "mm/memory.c:97: bad pmd" messages are coming from
> clear_pmd_range, when the corrupted task exits later (but probably
> not much later, since its user stack is oddly distributed across
> two different pages: some mentioned SIGSEGVs I think).
>
> The pmd really is bad, but it got to be bad because it had stack data
> written into it by create_elf_tables, when the TLB mistakenly thought
> it already knew what physical page 0x00007ffffffff000 was mapped to
> (prior kernel accesses to that user stack are not by user address).

What I meant is that the overwriting must be from Linux code
acting in the direct mapping, not due stale TLBs for addresses < __PAGE_OFFSET.

I will take a closer look at the rc1/rc2 patches later this evening
and see if I can spot something. Can only report back tomorrow though.

-Andi

2005-04-14 18:12:46

by Andi Kleen

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6 II

> I will take a closer look at the rc1/rc2 patches later this evening
> and see if I can spot something. Can only report back tomorrow though.

Actually itt started in .11 already - sigh - on rereading the thread.
That will make the code audit harder :/

-Andi

2005-04-14 18:27:44

by Chris Wright

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6 II

* Andi Kleen ([email protected]) wrote:
> > I will take a closer look at the rc1/rc2 patches later this evening
> > and see if I can spot something. Can only report back tomorrow though.
>
> Actually itt started in .11 already - sigh - on rereading the thread.
> That will make the code audit harder :/

Yes, I've seen it in .11 and earlier kernels. Happen to have same
"x86_64" string on my bad pmd dumps, but can't reproduce it at all.
So, for now, I can hold off on adding the reload cr3 patch to -stable
unless you think it should be there anyway.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2005-04-15 17:24:20

by Andi Kleen

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6 II

On Thu, Apr 14, 2005 at 11:27:12AM -0700, Chris Wright wrote:
> * Andi Kleen ([email protected]) wrote:
> > > I will take a closer look at the rc1/rc2 patches later this evening
> > > and see if I can spot something. Can only report back tomorrow though.
> >
> > Actually itt started in .11 already - sigh - on rereading the thread.
> > That will make the code audit harder :/
>
> Yes, I've seen it in .11 and earlier kernels. Happen to have same
> "x86_64" string on my bad pmd dumps, but can't reproduce it at all.
> So, for now, I can hold off on adding the reload cr3 patch to -stable
> unless you think it should be there anyway.

It is a bug fix (actually there is another related patch that fixes
a similar bug), but we lived with the problems for years so I guess
they can wait for .12.

If there was a fix for the bad pmd problem it might be a candidate
for stable, but so far we dont know what causes it yet.

-Andi

2005-04-15 17:28:36

by Chris Wright

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6 II

* Andi Kleen ([email protected]) wrote:
> On Thu, Apr 14, 2005 at 11:27:12AM -0700, Chris Wright wrote:
> > Yes, I've seen it in .11 and earlier kernels. Happen to have same
> > "x86_64" string on my bad pmd dumps, but can't reproduce it at all.
> > So, for now, I can hold off on adding the reload cr3 patch to -stable
> > unless you think it should be there anyway.
>
> It is a bug fix (actually there is another related patch that fixes
> a similar bug), but we lived with the problems for years so I guess
> they can wait for .12.

Sounds good.

> If there was a fix for the bad pmd problem it might be a candidate
> for stable, but so far we dont know what causes it yet.

If I figure a way to trigger here, I'll report back.

thanks,
-chris

2005-04-15 17:58:25

by Hugh Dickins

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6 II

On Fri, 15 Apr 2005, Chris Wright wrote:
> * Andi Kleen ([email protected]) wrote:
> > On Thu, Apr 14, 2005 at 11:27:12AM -0700, Chris Wright wrote:
> > > Yes, I've seen it in .11 and earlier kernels. Happen to have same
> > > "x86_64" string on my bad pmd dumps, but can't reproduce it at all.
> > > So, for now, I can hold off on adding the reload cr3 patch to -stable
> > > unless you think it should be there anyway.
> >
> > It is a bug fix (actually there is another related patch that fixes
> > a similar bug), but we lived with the problems for years so I guess
> > they can wait for .12.
>
> Sounds good.

I must confess, with all due respect to Andi, that I don't understand his
dismissal of the possibility that load_cr3 in leave_mm might be the fix
(to create_elf_tables writing user stack data into the pmd).

My belief is that leaving any opening for unpredictable speculations to
pull stale translations into the TLB, is a recipe for strange trouble
down the line when those translations may get used in actuality.

I'd been hoping Andi would come to see it my way overnight!
since I'm clearly not up to arguing the case persuasively.

But I certainly don't expect Chris to add an unjustified patch to -stable.

> > If there was a fix for the bad pmd problem it might be a candidate
> > for stable, but so far we dont know what causes it yet.
>
> If I figure a way to trigger here, I'll report back.

Dave, earlier on you were quite able to reproduce the problem on 2.6.11,
finding it happened the first time you ran X. Do you have any time to
reverify that, then try to reproduce with the load_cr3 in leave_mm patch?

But please don't waste your time on this unless you think it's plausible.

Thanks,
Hugh

2005-04-15 18:07:38

by Dave Jones

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6 II

On Fri, Apr 15, 2005 at 06:58:20PM +0100, Hugh Dickins wrote:

> > > If there was a fix for the bad pmd problem it might be a candidate
> > > for stable, but so far we dont know what causes it yet.
> > If I figure a way to trigger here, I'll report back.
>
> Dave, earlier on you were quite able to reproduce the problem on 2.6.11,
> finding it happened the first time you ran X. Do you have any time to
> reverify that, then try to reproduce with the load_cr3 in leave_mm patch?
>
> But please don't waste your time on this unless you think it's plausible.

I used to be able to reproduce it 100% by doing this on an vanilla
upstream kernel. Then it changed behaviour so I only saw it happening
on the Fedora kernel. For the latest Fedora update kernel I backported
this change..
- x86_64: Only free PMDs and PUDs after other CPUs have been flushed
as a 'try it and see'. At first I thought it killed the bug, but
a day or so later, it started doing it again.

In the Fedora kernel we have a patch which restricts /dev/mem reading,
so I got suspicious about this interacting with any of the changes
that had happened to drivers/char/mem.c
Out of curiousity, I backported the 3-4 patches from .12rc to
the Fedora .11 kernel, and haven't seen the problem since.

The bizarre thing is I can't explain why any of those patches would
make such a difference. Given the bug seems to be coming and going
for me, its possible its just masked the problem.

Dave

2005-04-19 13:35:28

by Andi Kleen

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6 II

On Fri, Apr 15, 2005 at 06:58:20PM +0100, Hugh Dickins wrote:
> On Fri, 15 Apr 2005, Chris Wright wrote:
> > * Andi Kleen ([email protected]) wrote:
> > > On Thu, Apr 14, 2005 at 11:27:12AM -0700, Chris Wright wrote:
> > > > Yes, I've seen it in .11 and earlier kernels. Happen to have same
> > > > "x86_64" string on my bad pmd dumps, but can't reproduce it at all.
> > > > So, for now, I can hold off on adding the reload cr3 patch to -stable
> > > > unless you think it should be there anyway.
> > >
> > > It is a bug fix (actually there is another related patch that fixes
> > > a similar bug), but we lived with the problems for years so I guess
> > > they can wait for .12.
> >
> > Sounds good.
>
> I must confess, with all due respect to Andi, that I don't understand his
> dismissal of the possibility that load_cr3 in leave_mm might be the fix
> (to create_elf_tables writing user stack data into the pmd).

Sorry for the late answer.

Ok, lets try again. The hole fixed by this patch only covers
the case of an kernel thread with lazy mm doing some memory access
(or more likely the CPU doing a prefetch there). But ELF loading
never happens in lazy mm kernel threads.AFAIK in a "real" process
the TLB is always fully consistent.

Does that explanation satisfy you?

I agree that my earlier one was a bit dubious because I argued about
the direct mapping, but the argv setup actually uses user addresses.
But I still think it must be something else.

-Andi

2005-04-19 15:52:29

by Hugh Dickins

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6 II

On Tue, 19 Apr 2005, Andi Kleen wrote:
> On Fri, Apr 15, 2005 at 06:58:20PM +0100, Hugh Dickins wrote:
> >
> > I must confess, with all due respect to Andi, that I don't understand his
> > dismissal of the possibility that load_cr3 in leave_mm might be the fix
> > (to create_elf_tables writing user stack data into the pmd).
>
> Sorry for the late answer.

Not at all. I didn't expect you to persist in trying to persuade me,
thank you for doing so, and I apologize for taking your time on this.

> Ok, lets try again. The hole fixed by this patch only covers
> the case of an kernel thread with lazy mm doing some memory access
> (or more likely the CPU doing a prefetch there). But ELF loading
> never happens in lazy mm kernel threads.AFAIK in a "real" process
> the TLB is always fully consistent.
>
> Does that explanation satisfy you?

It does. Well, I needed to restudy exec_mmap and switch_mm in detail,
and having done so, I agree that the only way you can get through
exec_mmap's activate_mm without fully flushing the cpu's TLB, is if
the active_mm matches the newly allocated mm (itself impossible since
there's a reference on the active_mm), and the cpu bit is still set
in cpu_vm_mask - precisely not the case if we went through leave_mm.
Yet I was claiming your leave_mm fix could flush TLB for exec_mmap
where it wasn't already done.

Sorry for letting the neatness of my pmd/stack story blind me
to its impossibility, and for wasting your time.

Hugh

2005-04-22 17:37:21

by Andi Kleen

[permalink] [raw]
Subject: Debugging patch was Re: x86-64 bad pmds in 2.6.11.6 II


Can people who can reproduce the x86-64 2.6.11 pmd bad problem please apply
the following patch and see (a) if it can be still reprocuded with it
and send the output generated. Also a strace of the program that showed
it (pid and name of it should be dumped) would be useful if not too big.

After staring some time at the code I cant find the problem, but
I somehow suspect it has to do with early page table frees. That is
why they were disabled. This should not cause any memory leaks,
the page tables will be always freed at process exit, so it is
safe to apply even for production machines.

Thanks,

-Andi


diff -u linux-2.6.11/mm/memory.c-o linux-2.6.11/mm/memory.c
--- linux-2.6.11/mm/memory.c-o 2005-03-02 08:38:08.000000000 +0100
+++ linux-2.6.11/mm/memory.c 2005-04-22 19:32:30.305402456 +0200
@@ -94,6 +94,7 @@
if (pmd_none(*pmd))
return;
if (unlikely(pmd_bad(*pmd))) {
+ printk("%s:%d: ", current->comm, current->pid);
pmd_ERROR(*pmd);
pmd_clear(pmd);
return;
diff -u linux-2.6.11/mm/mmap.c-o linux-2.6.11/mm/mmap.c
--- linux-2.6.11/mm/mmap.c-o 2005-03-02 08:38:12.000000000 +0100
+++ linux-2.6.11/mm/mmap.c 2005-04-22 19:33:10.354580428 +0200
@@ -1645,11 +1645,13 @@
return;
if (first < FIRST_USER_PGD_NR * PGDIR_SIZE)
first = FIRST_USER_PGD_NR * PGDIR_SIZE;
+#if 0
/* No point trying to free anything if we're in the same pte page */
if ((first & PMD_MASK) < (last & PMD_MASK)) {
clear_page_range(tlb, first, last);
flush_tlb_pgtables(mm, first, last);
}
+#endif
}

/* Normal function to fix up a mapping

2005-04-27 14:26:25

by Andi Kleen

[permalink] [raw]
Subject: New debugging patch was Re: x86-64 bad pmds in 2.6.11.6 II


Could someone who reproduces this problem apply the following
patch and see if the WARN_ON triggers?


diff -u linux-2.6.11/mm/memory.c-o linux-2.6.11/mm/memory.c
--- linux-2.6.11/mm/memory.c-o 2005-03-02 08:38:08.000000000 +0100
+++ linux-2.6.11/mm/memory.c 2005-04-27 15:48:19.777104735 +0200
@@ -94,6 +94,7 @@
if (pmd_none(*pmd))
return;
if (unlikely(pmd_bad(*pmd))) {
+ printk("%s:%d: ", current->comm, current->pid);
pmd_ERROR(*pmd);
pmd_clear(pmd);
return;
@@ -113,6 +114,7 @@
unsigned long addr = start, next;
pmd_t *pmd, *__pmd;

+ WARN_ON(start == end);
if (pud_none(*pud))
return;
if (unlikely(pud_bad(*pud))) {

2005-04-27 17:38:44

by Dave Jones

[permalink] [raw]
Subject: Re: New debugging patch was Re: x86-64 bad pmds in 2.6.11.6 II

On Wed, Apr 27, 2005 at 04:23:44PM +0200, Andi Kleen wrote:
>
> Could someone who reproduces this problem apply the following
> patch and see if the WARN_ON triggers?
>
>
> diff -u linux-2.6.11/mm/memory.c-o linux-2.6.11/mm/memory.c
> --- linux-2.6.11/mm/memory.c-o 2005-03-02 08:38:08.000000000 +0100
> +++ linux-2.6.11/mm/memory.c 2005-04-27 15:48:19.777104735 +0200
> @@ -94,6 +94,7 @@
> if (pmd_none(*pmd))
> return;
> if (unlikely(pmd_bad(*pmd))) {
> + printk("%s:%d: ", current->comm, current->pid);
> pmd_ERROR(*pmd);
> pmd_clear(pmd);
> return;
> @@ -113,6 +114,7 @@
> unsigned long addr = start, next;
> pmd_t *pmd, *__pmd;
>
> + WARN_ON(start == end);
> if (pud_none(*pud))
> return;
> if (unlikely(pud_bad(*pud))) {

I'm up to my eyeballs in other stuff right now, so probably won't
get a chance to test this personally. I'll add it to the Fedora
testing rpm however, as 1-2 users are also hitting it.

I'll let you know if I hear anything back.

Dave

2005-04-29 11:08:36

by Hans Kristian Rosbach

[permalink] [raw]
Subject: Re: New debugging patch was Re: x86-64 bad pmds in 2.6.11.6 II

On Wed, 2005-04-27 at 19:37, Dave Jones wrote:
> On Wed, Apr 27, 2005 at 04:23:44PM +0200, Andi Kleen wrote:
> >
> > Could someone who reproduces this problem apply the following
> > patch and see if the WARN_ON triggers?
> >
*snip*
> I'm up to my eyeballs in other stuff right now, so probably won't
> get a chance to test this personally. I'll add it to the Fedora
> testing rpm however, as 1-2 users are also hitting it.
>
> I'll let you know if I hear anything back.
>
> Dave

I'm seeing this problem on 2.6.11-1.14_FC3smp, a part of dmesg is
supplied below. (these messages have flooded the buffer).
Kernel was updated from kernel-smp-2.6.9-1.667 yesterday, and no
similar messages was in dmesg at that time after 8 days uptime.

Fedora Core 3 w/latest updates atm
Running Mysql, exim, http + perl, clamav, spamassasin, courier.

Tyan motherboard, dual AMD 246, 4x512MB ram (one on each bus),
two 3ware 9xx, dual broadcom nic. (lspci output below)

I have not tried the testing kernel you suggested, but will see what
time will allow today. We already have a scheduled reboot on it so
maybe I'll just sneak in a kernel update again.

More info available on request.

-HK


# uptime
12:57:50 up 4:03, 1 user, load average: 1.85, 1.87, 1.74

# lspci
00:06.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8111 PCI (rev 07)
00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-8111 LPC (rev 05)
00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-8111 IDE (rev
03)
00:07.2 SMBus: Advanced Micro Devices [AMD] AMD-8111 SMBus 2.0 (rev 02)
00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-8111 ACPI (rev 05)
00:0a.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge
(rev 12)
00:0a.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev 01)
00:0b.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge
(rev 12)
00:0b.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev 01)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
Miscellaneous Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
HyperTransport Technology Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
Miscellaneous Control
01:03.0 RAID bus controller: 3ware Inc 3ware Inc 3ware 9xxx-series
SATA-RAID
02:03.0 RAID bus controller: 3ware Inc 3ware Inc 3ware 9xxx-series
SATA-RAID
02:09.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704
Gigabit Ethernet (rev 03)
02:09.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704
Gigabit Ethernet (rev 03)
03:06.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
03:08.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100]
(rev 10)


# lsmod
Module Size Used by
md5 5953 1
ipv6 297665 52
autofs4 24521 0
sunrpc 169017 1
xfs 563953 1
exportfs 8385 1 xfs
dm_mod 69761 0
video 20169 0
button 9185 0
battery 12233 0
ac 6857 0
i2c_amd8111 8129 0
i2c_core 28353 1 i2c_amd8111
hw_random 7393 0
e100 42305 0
mii 7105 1 e100
tg3 91717 0
reiserfs 276665 3
3w_9xxx 38725 6
sd_mod 20929 8
scsi_mod 155665 2 3w_9xxx,sd_mod


Part of dmesg:
mm/memory.c:97: bad pmd ffff810059861a80(000000000000000b).
mm/memory.c:97: bad pmd ffff810059861a90(000000000000000c).
mm/memory.c:97: bad pmd ffff810059861aa0(000000000000000d).
mm/memory.c:97: bad pmd ffff810059861ab0(000000000000000e).
mm/memory.c:97: bad pmd ffff810059861ac0(0000000000000017).
mm/memory.c:97: bad pmd ffff810059861ad0(000000000000000f).
mm/memory.c:97: bad pmd ffff810059861ad8(00007ffffffffaf8).
mm/memory.c:97: bad pmd ffff810059861af8(000034365f363878).
mm/memory.c:97: bad pmd ffff810074cb7878(0000003926400a88).
mm/memory.c:97: bad pmd ffff810074cb7880(0000000000000004).
mm/memory.c:97: bad pmd ffff810074cb7888(00007ffffffffb00).
mm/memory.c:97: bad pmd ffff810074cb7890(00007ffffffffb01).
mm/memory.c:97: bad pmd ffff810074cb7898(00007ffffffffb02).
mm/memory.c:97: bad pmd ffff810074cb78a0(00007ffffffffb03).
mm/memory.c:97: bad pmd ffff810074cb78b0(00007ffffffffb04).
mm/memory.c:97: bad pmd ffff810074cb78b8(00007ffffffffb05).
mm/memory.c:97: bad pmd ffff810074cb78c0(00007ffffffffb06).
mm/memory.c:97: bad pmd ffff810074cb78c8(00007ffffffffb07).
mm/memory.c:97: bad pmd ffff810074cb78d0(00007ffffffffb08).
mm/memory.c:97: bad pmd ffff810074cb78d8(00007ffffffffb09).
mm/memory.c:97: bad pmd ffff810074cb78e0(00007ffffffffb0a).
mm/memory.c:97: bad pmd ffff810074cb78e8(00007ffffffffb0b).
mm/memory.c:97: bad pmd ffff810074cb78f0(00007ffffffffb0c).
mm/memory.c:97: bad pmd ffff810074cb78f8(00007ffffffffb0d).
mm/memory.c:97: bad pmd ffff810074cb7900(00007ffffffffb0e).
mm/memory.c:97: bad pmd ffff810074cb7908(00007ffffffffb0f).
mm/memory.c:97: bad pmd ffff810074cb7910(00007ffffffffb10).
mm/memory.c:97: bad pmd ffff810074cb7918(00007ffffffffb11).
mm/memory.c:97: bad pmd ffff810074cb7920(00007ffffffffb12).
mm/memory.c:97: bad pmd ffff810074cb7928(00007ffffffffb13).
mm/memory.c:97: bad pmd ffff810074cb7930(00007ffffffffb14).
mm/memory.c:97: bad pmd ffff810074cb7938(00007ffffffffb15).
mm/memory.c:97: bad pmd ffff810074cb7940(00007ffffffffb16).
mm/memory.c:97: bad pmd ffff810074cb7948(00007ffffffffb17).
mm/memory.c:97: bad pmd ffff810074cb7950(00007ffffffffb18).
mm/memory.c:97: bad pmd ffff810074cb7958(00007ffffffffb19).
mm/memory.c:97: bad pmd ffff810074cb7960(00007ffffffffb1a).
mm/memory.c:97: bad pmd ffff810074cb7968(00007ffffffffb1b).
mm/memory.c:97: bad pmd ffff810074cb7970(00007ffffffffb1c).
mm/memory.c:97: bad pmd ffff810074cb7978(00007ffffffffb1d).
mm/memory.c:97: bad pmd ffff810074cb7980(00007ffffffffb1e).
mm/memory.c:97: bad pmd ffff810074cb7988(00007ffffffffb1f).
mm/memory.c:97: bad pmd ffff810074cb7990(00007ffffffffb20).
mm/memory.c:97: bad pmd ffff810074cb7998(00007ffffffffb21).
mm/memory.c:97: bad pmd ffff810074cb79a0(00007ffffffffb22).
mm/memory.c:97: bad pmd ffff810074cb79a8(00007ffffffffb23).
mm/memory.c:97: bad pmd ffff810074cb79b0(00007ffffffffb24).
mm/memory.c:97: bad pmd ffff810074cb79b8(00007ffffffffb25).
mm/memory.c:97: bad pmd ffff810074cb79c0(00007ffffffffb26).
mm/memory.c:97: bad pmd ffff810074cb79c8(00007ffffffffb27).
mm/memory.c:97: bad pmd ffff810074cb79d0(00007ffffffffb28).
mm/memory.c:97: bad pmd ffff810074cb79d8(00007ffffffffb29).
mm/memory.c:97: bad pmd ffff810074cb79e0(00007ffffffffb2a).
mm/memory.c:97: bad pmd ffff810074cb79f0(0000000000000010).
mm/memory.c:97: bad pmd ffff810074cb79f8(00000000078bfbff).
mm/memory.c:97: bad pmd ffff810074cb7a00(0000000000000006).
mm/memory.c:97: bad pmd ffff810074cb7a08(0000000000001000).
mm/memory.c:97: bad pmd ffff810074cb7a10(0000000000000011).
mm/memory.c:97: bad pmd ffff810074cb7a18(0000000000000064).
mm/memory.c:97: bad pmd ffff810074cb7a20(0000000000000003).
mm/memory.c:97: bad pmd ffff810074cb7a28(0000000000400040).
mm/memory.c:97: bad pmd ffff810074cb7a30(0000000000000004).
mm/memory.c:97: bad pmd ffff810074cb7a38(0000000000000038).
mm/memory.c:97: bad pmd ffff810074cb7a40(0000000000000005).
mm/memory.c:97: bad pmd ffff810074cb7a48(0000000000000008).
mm/memory.c:97: bad pmd ffff810074cb7a50(0000000000000007).
mm/memory.c:97: bad pmd ffff810074cb7a60(0000000000000008).
mm/memory.c:97: bad pmd ffff810074cb7a70(0000000000000009).
mm/memory.c:97: bad pmd ffff810074cb7a78(0000000000401480).
mm/memory.c:97: bad pmd ffff810074cb7a80(000000000000000b).
mm/memory.c:97: bad pmd ffff810074cb7a90(000000000000000c).
mm/memory.c:97: bad pmd ffff810074cb7aa0(000000000000000d).
mm/memory.c:97: bad pmd ffff810074cb7ab0(000000000000000e).
mm/memory.c:97: bad pmd ffff810074cb7ac0(0000000000000017).
mm/memory.c:97: bad pmd ffff810074cb7ad0(000000000000000f).
mm/memory.c:97: bad pmd ffff810074cb7ad8(00007ffffffffaf9).
mm/memory.c:97: bad pmd ffff810074cb7af8(0034365f36387800).


2005-04-29 16:00:20

by Christopher Warner

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6 II


> It does. Well, I needed to restudy exec_mmap and switch_mm in detail,
> and having done so, I agree that the only way you can get through
> exec_mmap's activate_mm without fully flushing the cpu's TLB, is if
> the active_mm matches the newly allocated mm (itself impossible since
> there's a reference on the active_mm), and the cpu bit is still set
> in cpu_vm_mask - precisely not the case if we went through leave_mm.
> Yet I was claiming your leave_mm fix could flush TLB for exec_mmap
> where it wasn't already done.
>
> Sorry for letting the neatness of my pmd/stack story blind me
> to its impossibility, and for wasting your time.
>
> Hugh
> -

Any updated information one should know about this before testing?

I'm getting bad pmds in 2.6.11.5; Tyan S2882/dual AMD 246 opterons. The
problem only occurs when doing some thread intensive task. I'm going to
try and strace/bt and send some information as it occurs. Hardware is
almost identical to the setup above.

-Christopher Warner


2005-04-29 16:19:43

by Chris Wright

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6 II

* Christopher Warner ([email protected]) wrote:
> Any updated information one should know about this before testing?

Andi would like to see output from this patch:

http://marc.theaimsgroup.com/?l=linux-kernel&m=111461231610952&w=2

thanks,
-chris

2005-04-29 17:33:20

by Dave Jones

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6 II

On Fri, Apr 29, 2005 at 07:12:59AM -0400, Christopher Warner wrote:
>
> > It does. Well, I needed to restudy exec_mmap and switch_mm in detail,
> > and having done so, I agree that the only way you can get through
> > exec_mmap's activate_mm without fully flushing the cpu's TLB, is if
> > the active_mm matches the newly allocated mm (itself impossible since
> > there's a reference on the active_mm), and the cpu bit is still set
> > in cpu_vm_mask - precisely not the case if we went through leave_mm.
> > Yet I was claiming your leave_mm fix could flush TLB for exec_mmap
> > where it wasn't already done.
> >
> > Sorry for letting the neatness of my pmd/stack story blind me
> > to its impossibility, and for wasting your time.
> >
> > Hugh
> > -
>
> Any updated information one should know about this before testing?
>
> I'm getting bad pmds in 2.6.11.5; Tyan S2882/dual AMD 246 opterons.

Datapoint: exactly the same model as my workstation which showed
this problem recently.

Dave

2005-05-02 20:13:25

by Christopher Warner

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6 II

Actually I am testing your patches. Its just going to take some time.
The problem occurs under severe load and I'm in the process of doing
load testing this for an inhouse app this week. Soon as i'm able to send
debug information I will.

-Christopher Warner

On Mon, 2005-05-02 at 19:00 +0200, Andi Kleen wrote:
> > Datapoint: exactly the same model as my workstation which showed
> > this problem recently.
>
> FYI
>
> Since all the people who run into this are refusing to test my
> debugging/test patches (no feedback at all for them so far) I give up on this
> now.
>
> -Andi
>

2005-05-02 17:02:51

by Andi Kleen

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6 II

> Datapoint: exactly the same model as my workstation which showed
> this problem recently.

FYI

Since all the people who run into this are refusing to test my
debugging/test patches (no feedback at all for them so far) I give up on this
now.

-Andi

2005-05-02 20:34:52

by Chris Wright

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6 II

* Christopher Warner ([email protected]) wrote:
> Actually I am testing your patches. Its just going to take some time.
> The problem occurs under severe load and I'm in the process of doing
> load testing this for an inhouse app this week. Soon as i'm able to send
> debug information I will.

Same here. I've just never found a way to trigger other than wait.

thanks,
-chris

2005-05-02 21:09:06

by Dave Jones

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6 II

On Mon, May 02, 2005 at 01:33:59PM -0700, Chris Wright wrote:
> * Christopher Warner ([email protected]) wrote:
> > Actually I am testing your patches. Its just going to take some time.
> > The problem occurs under severe load and I'm in the process of doing
> > load testing this for an inhouse app this week. Soon as i'm able to send
> > debug information I will.
>
> Same here. I've just never found a way to trigger other than wait.

*nod*, the current test-kernel update for Fedora also has your
debugging patches, but none of the users have hit them (or reported
them) yet.

Dave

2005-05-03 14:30:46

by Andi Kleen

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6 II

On Mon, May 02, 2005 at 05:08:39PM -0400, Dave Jones wrote:
> On Mon, May 02, 2005 at 01:33:59PM -0700, Chris Wright wrote:
> > * Christopher Warner ([email protected]) wrote:
> > > Actually I am testing your patches. Its just going to take some time.
> > > The problem occurs under severe load and I'm in the process of doing
> > > load testing this for an inhouse app this week. Soon as i'm able to send
> > > debug information I will.
> >
> > Same here. I've just never found a way to trigger other than wait.
>
> *nod*, the current test-kernel update for Fedora also has your
> debugging patches, but none of the users have hit them (or reported
> them) yet.

The second version with the WARN_ON? If not please update to that one.

-Andi

2005-05-03 15:18:37

by Dave Jones

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6 II

On Tue, May 03, 2005 at 04:28:58PM +0200, Andi Kleen wrote:
> On Mon, May 02, 2005 at 05:08:39PM -0400, Dave Jones wrote:
> > On Mon, May 02, 2005 at 01:33:59PM -0700, Chris Wright wrote:
> > > * Christopher Warner ([email protected]) wrote:
> > > > Actually I am testing your patches. Its just going to take some time.
> > > > The problem occurs under severe load and I'm in the process of doing
> > > > load testing this for an inhouse app this week. Soon as i'm able to send
> > > > debug information I will.
> > >
> > > Same here. I've just never found a way to trigger other than wait.
> >
> > *nod*, the current test-kernel update for Fedora also has your
> > debugging patches, but none of the users have hit them (or reported
> > them) yet.
>
> The second version with the WARN_ON? If not please update to that one.

I lost track.. Here's what I included..

--- linux-2.6.11/mm/memory.c~ 2005-04-27 13:37:20.000000000 -0400
+++ linux-2.6.11/mm/memory.c 2005-04-27 13:37:45.000000000 -0400
@@ -94,6 +94,7 @@ static inline void clear_pmd_range(struc
if (pmd_none(*pmd))
return;
if (unlikely(pmd_bad(*pmd))) {
+ printk("%s:%d: ", current->comm, current->pid);
pmd_ERROR(*pmd);
pmd_clear(pmd);
return;
@@ -113,6 +114,7 @@ static inline void clear_pud_range(struc
unsigned long addr = start, next;
pmd_t *pmd, *__pmd;

+ WARN_ON(start == end);
if (pud_none(*pud))
return;
if (unlikely(pud_bad(*pud))) {



Dave

2005-05-10 14:21:20

by Christopher Warner

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6 II

2.6.11.5 kernel,
Tyan S2882/dual AMD 246 opterons

sh:18983: mm/memory.c:99: bad pmd ffff810005974cc8(00007ffffffffe46).
sh:18983: mm/memory.c:99: bad pmd ffff810005974cd0(00007ffffffffe47).
sh:18983: mm/memory.c:99: bad pmd ffff810005974cd8(00007ffffffffe48).
sh:18983: mm/memory.c:99: bad pmd ffff810005974ce0(00007ffffffffe49).
sh:18983: mm/memory.c:99: bad pmd ffff810005974ce8(00007ffffffffe4a).
sh:18983: mm/memory.c:99: bad pmd ffff810005974cf0(00007ffffffffe4b).
sh:18983: mm/memory.c:99: bad pmd ffff810005974cf8(00007ffffffffe4c).
sh:18983: mm/memory.c:99: bad pmd ffff810005974d00(00007ffffffffe4d).
sh:18983: mm/memory.c:99: bad pmd ffff810005974d08(00007ffffffffe4e).
sh:18983: mm/memory.c:99: bad pmd ffff810005974d10(00007ffffffffe4f).
sh:18983: mm/memory.c:99: bad pmd ffff810005974d18(00007ffffffffe50).
sh:18983: mm/memory.c:99: bad pmd ffff810005974d20(00007ffffffffe51).
sh:18983: mm/memory.c:99: bad pmd ffff810005974d30(0000000000000010).
sh:18983: mm/memory.c:99: bad pmd ffff810005974d38(00000000078bfbff).
sh:18983: mm/memory.c:99: bad pmd ffff810005974d40(0000000000000006).
sh:18983: mm/memory.c:99: bad pmd ffff810005974d48(0000000000001000).
sh:18983: mm/memory.c:99: bad pmd ffff810005974d50(0000000000000011).
sh:18983: mm/memory.c:99: bad pmd ffff810005974d58(0000000000000064).
sh:18983: mm/memory.c:99: bad pmd ffff810005974d60(0000000000000003).
sh:18983: mm/memory.c:99: bad pmd ffff810005974d68(0000000000400040).
sh:18983: mm/memory.c:99: bad pmd ffff810005974d70(0000000000000004).
sh:18983: mm/memory.c:99: bad pmd ffff810005974d78(0000000000000038).
sh:18983: mm/memory.c:99: bad pmd ffff810005974d80(0000000000000005).
sh:18983: mm/memory.c:99: bad pmd ffff810005974d88(0000000000000008).
sh:18983: mm/memory.c:99: bad pmd ffff810005974d90(0000000000000007).
sh:18983: mm/memory.c:99: bad pmd ffff810005974d98(00002aaaaaaab000).
sh:18983: mm/memory.c:99: bad pmd ffff810005974da0(0000000000000008).
sh:18983: mm/memory.c:99: bad pmd ffff810005974db0(0000000000000009).
sh:18983: mm/memory.c:99: bad pmd ffff810005974db8(0000000000413a00).
sh:18983: mm/memory.c:99: bad pmd ffff810005974dc0(000000000000000b).
sh:18983: mm/memory.c:99: bad pmd ffff810005974dc8(00000000000007d3).
sh:18983: mm/memory.c:99: bad pmd ffff810005974dd0(000000000000000c).
sh:18983: mm/memory.c:99: bad pmd ffff810005974dd8(00000000000007d3).
sh:18983: mm/memory.c:99: bad pmd ffff810005974de0(000000000000000d).
sh:18983: mm/memory.c:99: bad pmd ffff810005974df0(000000000000000e).
sh:18983: mm/memory.c:99: bad pmd ffff810005974e00(0000000000000017).
sh:18983: mm/memory.c:99: bad pmd ffff810005974e10(000000000000000f).
sh:18983: mm/memory.c:99: bad pmd ffff810005974e18(00007ffffffffe3a).
sh:18983: mm/memory.c:99: bad pmd ffff810005974e38(34365f3638780000)

2005-05-10 16:27:23

by Chris Wright

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6 II

* Christopher Warner ([email protected]) wrote:
> 2.6.11.5 kernel,
> Tyan S2882/dual AMD 246 opterons

Got a time stamp by any chance (or a clue re: what was going on at the
time)?

thanks,
-chris

2005-05-10 16:39:18

by Dave Jones

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6 II

On Tue, May 10, 2005 at 05:36:54AM -0400, Christopher Warner wrote:
> 2.6.11.5 kernel,
> Tyan S2882/dual AMD 246 opterons
> sh:18983: mm/memory.c:99: bad pmd ffff810005974cc8(00007ffffffffe46).
> sh:18983: mm/memory.c:99: bad pmd ffff810005974cd0(00007ffffffffe47).

That's the 3rd or 4th time I've seen this reported on this hardware.
It's not exclusive to it, but it does seem more susceptible
for some reason. Spooky.

Dave

2005-05-10 16:47:04

by Andi Kleen

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6 II

On Tue, May 10, 2005 at 12:38:51PM -0400, Dave Jones wrote:
> On Tue, May 10, 2005 at 05:36:54AM -0400, Christopher Warner wrote:
> > 2.6.11.5 kernel,
> > Tyan S2882/dual AMD 246 opterons
> > sh:18983: mm/memory.c:99: bad pmd ffff810005974cc8(00007ffffffffe46).
> > sh:18983: mm/memory.c:99: bad pmd ffff810005974cd0(00007ffffffffe47).
>
> That's the 3rd or 4th time I've seen this reported on this hardware.
> It's not exclusive to it, but it does seem more susceptible
> for some reason. Spooky.

It seems to be clear now that it is hardware independent.

I actually got it once now too, but only after 24+h stress test :/

I have a better debugging patch now that I will be testing soon,
hopefully that turns something up.

-Andi

2005-05-10 16:48:11

by Christopher Warner

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6 II

Thats from a couple of days ago. The bug used to be easily reproducible
and then it just stopped suddenly.

I've thrown the loads of the server machines in question way into the
2000 range with lots of threads, io, cpu usage. I'm really confused as
to what exactly it could be. I'm going to try on a couple of different
machines this week.

I'm starting to suspect it has something to do with the mobo itself,
since I have two or three next to me I'll try on those other machines
and then try on a completely different machine/motherboard.

It's really sneaky, so we'll see what happens.

On Tue, 2005-05-10 at 09:26 -0700, Chris Wright wrote:
> * Christopher Warner ([email protected]) wrote:
> > 2.6.11.5 kernel,
> > Tyan S2882/dual AMD 246 opterons
>
> Got a time stamp by any chance (or a clue re: what was going on at the
> time)?
>
> thanks,
> -chris

2005-05-10 17:00:10

by Dave Jones

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6 II

On Tue, May 10, 2005 at 06:46:49PM +0200, Andi Kleen wrote:
> On Tue, May 10, 2005 at 12:38:51PM -0400, Dave Jones wrote:
> > On Tue, May 10, 2005 at 05:36:54AM -0400, Christopher Warner wrote:
> > > 2.6.11.5 kernel,
> > > Tyan S2882/dual AMD 246 opterons
> > > sh:18983: mm/memory.c:99: bad pmd ffff810005974cc8(00007ffffffffe46).
> > > sh:18983: mm/memory.c:99: bad pmd ffff810005974cd0(00007ffffffffe47).
> >
> > That's the 3rd or 4th time I've seen this reported on this hardware.
> > It's not exclusive to it, but it does seem more susceptible
> > for some reason. Spooky.
>
> It seems to be clear now that it is hardware independent.
>
> I actually got it once now too, but only after 24+h stress test :/
>
> I have a better debugging patch now that I will be testing soon,
> hopefully that turns something up.

Ok, I'm respinning the Fedora update kernel today for other
reasons, if you have that patch in time, I'll toss it in too.

Though as yet, no further reports from our users.

Dave

2005-05-10 20:33:12

by Andi Kleen

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6 II

On Tue, May 10, 2005 at 12:59:38PM -0400, Dave Jones wrote:
> On Tue, May 10, 2005 at 06:46:49PM +0200, Andi Kleen wrote:
> > On Tue, May 10, 2005 at 12:38:51PM -0400, Dave Jones wrote:
> > > On Tue, May 10, 2005 at 05:36:54AM -0400, Christopher Warner wrote:
> > > > 2.6.11.5 kernel,
> > > > Tyan S2882/dual AMD 246 opterons
> > > > sh:18983: mm/memory.c:99: bad pmd ffff810005974cc8(00007ffffffffe46).
> > > > sh:18983: mm/memory.c:99: bad pmd ffff810005974cd0(00007ffffffffe47).
> > >
> > > That's the 3rd or 4th time I've seen this reported on this hardware.
> > > It's not exclusive to it, but it does seem more susceptible
> > > for some reason. Spooky.
> >
> > It seems to be clear now that it is hardware independent.
> >
> > I actually got it once now too, but only after 24+h stress test :/
> >
> > I have a better debugging patch now that I will be testing soon,
> > hopefully that turns something up.
>
> Ok, I'm respinning the Fedora update kernel today for other
> reasons, if you have that patch in time, I'll toss it in too.

The patch has considerable overhead, probably not good idea
for a production rpm.

-Andi

2005-05-10 20:50:24

by Chris Wright

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6 II

* Andi Kleen ([email protected]) wrote:
> On Tue, May 10, 2005 at 12:59:38PM -0400, Dave Jones wrote:
> > On Tue, May 10, 2005 at 06:46:49PM +0200, Andi Kleen wrote:
> > > On Tue, May 10, 2005 at 12:38:51PM -0400, Dave Jones wrote:
> > > > On Tue, May 10, 2005 at 05:36:54AM -0400, Christopher Warner wrote:
> > > > > 2.6.11.5 kernel,
> > > > > Tyan S2882/dual AMD 246 opterons
> > > > > sh:18983: mm/memory.c:99: bad pmd ffff810005974cc8(00007ffffffffe46).
> > > > > sh:18983: mm/memory.c:99: bad pmd ffff810005974cd0(00007ffffffffe47).
> > > >
> > > > That's the 3rd or 4th time I've seen this reported on this hardware.
> > > > It's not exclusive to it, but it does seem more susceptible
> > > > for some reason. Spooky.
> > >
> > > It seems to be clear now that it is hardware independent.
> > >
> > > I actually got it once now too, but only after 24+h stress test :/
> > >
> > > I have a better debugging patch now that I will be testing soon,
> > > hopefully that turns something up.
> >
> > Ok, I'm respinning the Fedora update kernel today for other
> > reasons, if you have that patch in time, I'll toss it in too.
>
> The patch has considerable overhead, probably not good idea
> for a production rpm.

I don't mind running it here. I've triggered it once, and not hit the
WARN_ON(start == end). Current was "sh", not that helpful.

thanks,
-chris

2005-05-12 21:24:41

by Andi Kleen

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6 II

On Tue, May 10, 2005 at 12:59:38PM -0400, Dave Jones wrote:
> On Tue, May 10, 2005 at 06:46:49PM +0200, Andi Kleen wrote:
> > On Tue, May 10, 2005 at 12:38:51PM -0400, Dave Jones wrote:
> > > On Tue, May 10, 2005 at 05:36:54AM -0400, Christopher Warner wrote:
> > > > 2.6.11.5 kernel,
> > > > Tyan S2882/dual AMD 246 opterons
> > > > sh:18983: mm/memory.c:99: bad pmd ffff810005974cc8(00007ffffffffe46).
> > > > sh:18983: mm/memory.c:99: bad pmd ffff810005974cd0(00007ffffffffe47).
> > >
> > > That's the 3rd or 4th time I've seen this reported on this hardware.
> > > It's not exclusive to it, but it does seem more susceptible
> > > for some reason. Spooky.
> >
> > It seems to be clear now that it is hardware independent.
> >
> > I actually got it once now too, but only after 24+h stress test :/
> >
> > I have a better debugging patch now that I will be testing soon,
> > hopefully that turns something up.
>
> Ok, I'm respinning the Fedora update kernel today for other
> reasons, if you have that patch in time, I'll toss it in too.
>
> Though as yet, no further reports from our users.

Here's the new patch. However it costs some memory bloat because
I added a new field to struct page

-Andi


Index: linux/mm/page_alloc.c
===================================================================
--- linux.orig/mm/page_alloc.c
+++ linux/mm/page_alloc.c
@@ -912,6 +912,7 @@ nopage:
return NULL;
got_pg:
zone_statistics(zonelist, z);
+ page->freer = (void *)-1ULL;
return page;
}

@@ -962,13 +963,16 @@ void __pagevec_free(struct pagevec *pvec
{
int i = pagevec_count(pvec);

- while (--i >= 0)
+ while (--i >= 0) {
+ pvec->pages[i]->freer = (void *)__builtin_return_address(0);
free_hot_cold_page(pvec->pages[i], pvec->cold);
+ }
}

fastcall void __free_pages(struct page *page, unsigned int order)
{
if (!PageReserved(page) && put_page_testzero(page)) {
+ page->freer = (void *)__builtin_return_address(0);
if (order == 0)
free_hot_page(page);
else
@@ -1595,6 +1599,7 @@ void __init memmap_init_zone(unsigned lo
struct page *page;

for (page = start; page < (start + size); page++) {
+ page->freer = NULL;
set_page_zone(page, NODEZONE(nid, zone));
set_page_count(page, 0);
reset_page_mapcount(page);
Index: linux/include/linux/mm.h
===================================================================
--- linux.orig/include/linux/mm.h
+++ linux/include/linux/mm.h
@@ -261,6 +261,7 @@ struct page {
void *virtual; /* Kernel virtual address (NULL if
not kmapped, ie. highmem) */
#endif /* WANT_PAGE_VIRTUAL */
+ void *freer;
};

/*
Index: linux/mm/memory.c
===================================================================
--- linux.orig/mm/memory.c
+++ linux/mm/memory.c
@@ -48,6 +48,7 @@
#include <linux/rmap.h>
#include <linux/module.h>
#include <linux/init.h>
+#include <linux/kallsyms.h>

#include <asm/pgalloc.h>
#include <asm/uaccess.h>
@@ -108,6 +109,14 @@ static inline void clear_pmd_range(struc

pmd = pmd_offset(pud, addr);

+ {
+ struct page *p = virt_to_page(pmd);
+ if (page_count(p) < 1) {
+ printk("%s:%d free pmd %lx ", current->comm, current->pid, addr);
+ print_symbol("freed by %s\n", (unsigned long)p->freer);
+ }
+ }
+
/* Only free fully aligned ranges */
if (!((addr | end) & ~PUD_MASK))
empty_pmd = pmd;

2005-09-20 17:13:18

by Charles McCreary

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6

Another datapoint for this thread. The box spewing the bad pmds messages is a
dual opteron 246 on a TYAN S2885 Thunder K8W motherboard. Kernel is
2.6.11.4-20a-smp.

Approximately one hour after the bad pmd's, the box was completely
unresponsive. This machine is either idle or heavily loaded, many threads,
lots of io and nfs network traffic. Never see this when idle. When heavily
loaded, it will invariably become unresponsive within 24 hrs. Looks
reproducible. I'm willing to provide more information and test patches.

Output:
Sep 15 06:42:46 lakeport -- MARK --
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680bc8
(00002aaaaaaaba98).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680bd0
(0000000000000002).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680bd8
(00007ffffffffdcc).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680be0
(00007ffffffffdcd).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680bf0
(00007ffffffffdce).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680bf8
(00007ffffffffdcf).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c00
(00007ffffffffdd0).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c08
(00007ffffffffdd1).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c10
(00007ffffffffdd2).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c18
(00007ffffffffdd3).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c20
(00007ffffffffdd4).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c28
(00007ffffffffdd5).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c30
(00007ffffffffdd6).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c38
(00007ffffffffdd7).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c40
(00007ffffffffdd8).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c48
(00007ffffffffdd9).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c50
(00007ffffffffdda).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c58
(00007ffffffffddb).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c60
(00007ffffffffddc).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c68
(00007ffffffffddd).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c70
(00007ffffffffdde).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c78
(00007ffffffffddf).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c80
(00007ffffffffde0).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c88
(00007ffffffffde1).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c90
(00007ffffffffde2).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c98
(00007ffffffffde3).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680ca0
(00007ffffffffde4).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680ca8
(00007ffffffffde5).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680cb0
(00007ffffffffde6).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680cc0
(0000000000000010).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680cc8
(00000000078bfbff).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680cd0
(0000000000000006).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680cd8
(0000000000001000).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680ce0
(0000000000000011).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680ce8
(0000000000000064).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680cf0
(0000000000000003).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680cf8
(0000000000400040).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680d00
(0000000000000004).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680d08
(0000000000000038).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680d10
(0000000000000005).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680d18
(0000000000000009).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680d20
(0000000000000007).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680d28
(00002aaaaaaab000).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680d30
(0000000000000008).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680d40
(0000000000000009).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680d48
(00000000004010f0).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680d50
(000000000000000b).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680d60
(000000000000000c).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680d70
(000000000000000d).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680d80
(000000000000000e).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680d90
(0000000000000017).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680da0
(000000000000000f).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680da8
(00007ffffffffdc5).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680dc0
(3638780000000000).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680dc8
(000000000034365f).
Sep 15 07:22:47 lakeport -- MARK --

2005-09-20 17:30:54

by Linus Torvalds

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6



On Tue, 20 Sep 2005, Charles McCreary wrote:
>
> Another datapoint for this thread. The box spewing the bad pmds messages is a
> dual opteron 246 on a TYAN S2885 Thunder K8W motherboard. Kernel is
> 2.6.11.4-20a-smp.

This is quite possibly the result of an Opteron errata (tlb flush
filtering is broken on SMP) that we worked around as of 2.6.14-rc4.

So either just try 2.6.14-rc2, or try the appended patch (it has since
been confirmed by many more people).

Linus

---
diff-tree bc5e8fdfc622b03acf5ac974a1b8b26da6511c99 (from 61ffcafafb3d985e1ab8463be0187b421614775c)
Author: Linus Torvalds <[email protected]>
Date: Sat Sep 17 15:41:04 2005 -0700

x86-64/smp: fix random SIGSEGV issues

They seem to have been due to AMD errata 63/122; the fix is to disable
TLB flush filtering in SMP configurations.

Confirmed to fix the problem by Andrew Walrond <[email protected]>

[ Let's see if we'll have a better fix eventually, this is the Q&D
"let's get this fixed and out there" version ]

Signed-off-by: Linus Torvalds <[email protected]>

diff --git a/arch/x86_64/kernel/setup.c b/arch/x86_64/kernel/setup.c
--- a/arch/x86_64/kernel/setup.c
+++ b/arch/x86_64/kernel/setup.c
@@ -831,11 +831,26 @@ static void __init amd_detect_cmp(struct
#endif
}

+#define HWCR 0xc0010015
+
static int __init init_amd(struct cpuinfo_x86 *c)
{
int r;
int level;

+#ifdef CONFIG_SMP
+ unsigned long value;
+
+ // Disable TLB flush filter by setting HWCR.FFDIS:
+ // bit 6 of msr C001_0015
+ //
+ // Errata 63 for SH-B3 steppings
+ // Errata 122 for all(?) steppings
+ rdmsrl(HWCR, value);
+ value |= 1 << 6;
+ wrmsrl(HWCR, value);
+#endif
+
/* Bit 31 in normal CPUID used for nonstandard 3DNow ID;
3DNow is IDd by bit 31 in extended CPUID (1*32+31) anyway */
clear_bit(0*32+31, &c->x86_capability);

2005-09-20 22:15:08

by Chris Wedgwood

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6

On Tue, Sep 20, 2005 at 10:30:48AM -0700, Linus Torvalds wrote:

> This is quite possibly the result of an Opteron errata (tlb flush
> filtering is broken on SMP) that we worked around as of 2.6.14-rc4.

It would be really interesting to know if this does help. I was told
em64t also have the 'bad pmd' problem but I can't make it happen here
on opteron on em64t.

2005-09-20 23:23:42

by Dave Jones

[permalink] [raw]
Subject: Re: x86-64 bad pmds in 2.6.11.6

On Tue, Sep 20, 2005 at 12:44:46PM -0700, Chris Wedgwood wrote:
> On Tue, Sep 20, 2005 at 10:30:48AM -0700, Linus Torvalds wrote:
>
> > This is quite possibly the result of an Opteron errata (tlb flush
> > filtering is broken on SMP) that we worked around as of 2.6.14-rc4.
>
> It would be really interesting to know if this does help. I was told
> em64t also have the 'bad pmd' problem but I can't make it happen here
> on opteron on em64t.

In the dozens of reports of bad pmd that Fedora users filed, there
wasn't a single EM64T user. In fact, most of the hits were from
very similar product lines, from a handful of vendors (Tyan's seemed
especially susceptable). It may be that other vendors updated their
BIOS's to include this workaround already, and Tyan and a few others
lagged behind.

Dave