2008-08-23 10:55:24

by Jari Aalto

[permalink] [raw]
Subject: 2.6.25 DMA: Out of SW-IOMMU space - Asus M2N32 AMD 8GB memory


Message from /etc/syslog:

[1] Aug 21 11:01:19 jondo kernel: [174628.275859] DMA: Out of SW-IOMMU space for 65536 bytes at device 0000:00:0d.0

My AMD freezes with Kernel 2.6.25 has experienced regular freezing so
that only power button can take the system down. This is alarming,
because the system can stay up only a few days.

I've spent countless of hours reading related "Out of SW-IOMMU space"
(Google) documents. For some people they have worked, for some they
haven't and there has not been any clear explanation what options
whould/should be used in what chipsets/MBs and why.

I've gone through various combinations of kernel boot options, but
nothing seems to completely solve the problem:

iommu=soft swiotlb=65536

Freezing continued, but the disk corruption did not happen any more.
Increasing the swiotlb value has not had helped.

iommu=soft,memaper=3 swiotlb=65536

Adding memaper did not help. "Out of SW-IOMMU space" messages [see
1] creept in and I'm preparing to see another freeze eventally.

iommu=noaperture

Same as above. Not progress.

iommu=noagp,noaperture swiotlb=512M

Current options that I use. They were giving hope for 2 days,
but then a single "Out of SW-IOMMU space" message appeared. I'm
afraid the freeze is about to come.

Should I try following options next? or just "iommu=off"?

iommu=noagp,noaperture,off swiotlb=512M
===

I don't understand enough what are the effects related to the MCP55 SATA
Controller which seems to be the target [See 1; based on device id
"00:0d.0"] of these IOMMU messages. Only the plain SATA connectors, not
the onboard RAID SATA connectors, are in use for the harddisk.

To best of my knowledge going through this motherboard:

- Asus award bios does not have setting related to IOMMU. I'm using the
latest bios 2001 from http://www.asus.com
- has no aperture setting in bios.
- has no AGP, only PCI and CPIe slots.

My arsenal of knowledge is exhausting, so please, if you have any
insight what could be examined further or what could be done to solve
the IOMMU problem, let me know.

Jari

Some of the links and threads I've read
---------------------------------------

"Appendix L. Known Issues" > The X86-64 platform (AMD64/EM64T) and 2.6 kernels
ftp://download.nvidia.com/XFree86/Linux-x86/1.0-8174/README/32bit_html/appendix-l.html

"What is AGP Aperture size?"
http://www.techpowerup.com/articles/overclocking/vidcard/43

"PCI-DMA: high address but no IOMMU"
http://article.gmane.org/gmane.linux.kernel/342411

"Out of IOMMU space"
http://www.x86-64.org/pipermail/discuss/2005-September/006490.html

"Your BIOS doesn't leave a aperture memory hole"
http://www.linuxquestions.org/questions/linux-hardware-18/your-bios-doesnt-leave-a-aperture-memory-hole-624088/

Hardware details
----------------
OS
$ cat /etc/debian_version
lenny/sid (pinning: that's 90% testing + 10% unstable packages)

Kernel
$ uname -a
2.6.25-2-amd64 #1 SMP Mon Jul 14 11:05:23 UTC 2008 x86_64 GNU/Linux

CPU
$ cat /proc/cpuinfo
model name : AMD Athlon(tm) X2 Dual Core Processor BE-2400
stepping : 2
cpu MHz : 2310.518
cache size : 512 KB
...

$ cat /proc/meminfo
MemTotal: 8266632 kB
MemFree: 110212 kB
Buffers: 237132 kB
Cached: 3803660 kB
SwapCached: 0 kB
...

HD
$ hdparm -I /dev/sda

ATA device, with non-removable media
Model Number: ST31000340AS
Serial Number: 5QJ01MS4
Firmware Revision: SD01

http://www.seagate.com/ww/v/index.jsp?vgnextoid=0732f141e7f43110VgnVCM100000f5ee0a0aRCRD

MB

Asus M2N32-SLI Deluxe/Wireless Edition
- nvidia nForce 590 SLI chipset MCP
- 2 x PCIe (SLI x16), 1 x PCI (x4), 1 x PCI (x1), 2 x PCI 2.2
- Socket AM2

http://www.asus.com/products.aspx?l1=3&l2=101&l3=300&model=1163&modelmenu=1

$ lspci -nn
00:0d.0 IDE interface [0101]: nVidia Corporation MCP55 SATA Controller [10de:037f] (rev a2)
01:00.0 VGA compatible controller [0300]: nVidia Corporation G70 [GeForce 7600 GS] [10de:0392] (rev a1)
02:0b.0 FireWire (IEEE 1394) [0c00]: Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) [104c:8023]
03:00.0 Mass storage controller [0180]: Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller [1095:3132] (rev 01)
...

lspci -vv
----------------------------

00:16.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a2)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 32 bytes
Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
I/O behind bridge: 00009000-00009fff
Memory behind bridge: fde00000-fdefffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA+ VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Subsystem: nVidia Corporation Device 0000
Capabilities: [48] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ Queue=0/1 Enable+
Address: 00000000fee0300c Data: 4151
Capabilities: [60] HyperTransport: MSI Mapping Enable+ Fixed-
Mapping Address Base: 00000000fee00000
Capabilities: [80] Express (v1) Root Port (Slot+), MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <4us
ExtTag- RBE+ FLReset-
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #1, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <512ns, L1 <4us
ClockPM- Suprise- LLActRep+ BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surpise-
Slot # 0, PowerLimit 0.000000; Interlock- NoCompl-
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
Control: AttnInd Off, PwrInd On, Power- Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
Changed: MRL- PresDet+ LinkState+
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
RootCap: CRSVisible-
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
Capabilities: [100] Virtual Channel <?>
Kernel driver in use: pcieport-driver
Kernel modules: shpchp


[1] Full message from syslog
-----------------------------

Aug 21 11:01:19 jondo kernel: [174628.275859] DMA: Out of SW-IOMMU space for 65536 bytes at device 0000:00:0d.0
Aug 21 11:01:19 jondo kernel: [174628.279020] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Aug 21 11:01:19 jondo kernel: [174628.279020] ata3.00: cmd 35/00:00:9f:b9:fd/00:04:71:00:00/e0 tag 0 dma 524288 out
Aug 21 11:01:19 jondo kernel: [174628.279020] res 50/00:00:96:b9:fd/00:00:71:00:00/e0 Emask 0x40 (internal error)
Aug 21 11:01:19 jondo kernel: [174628.279020] ata3.00: status: { DRDY }
Aug 21 11:01:19 jondo kernel: [174628.322932] ata3.00: configured for UDMA/133
Aug 21 11:01:19 jondo kernel: [174628.322932] ata3: EH complete
Aug 21 11:01:19 jondo kernel: [174628.330761] sd 2:0:0:0: [sda] 1953525168 512-byte hardware sectors (1000205 MB)
Aug 21 11:01:19 jondo kernel: [174628.340876] sd 2:0:0:0: [sda] Write Protect is off
Aug 21 11:01:19 jondo kernel: [174628.340876] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
Aug 21 11:01:19 jondo kernel: [174628.351250] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

dmesg
-------------------------------

[ 0.914265] Linux agpgart interface v0.103
...
[ 3.687719] ata1: SATA link down (SStatus 0 SControl 0)
[ 5.770299] ata2: SATA link down (SStatus 0 SControl 0)
[ 5.582800] ACPI: PCI Interrupt Link [APC3] enabled at IRQ 18
[ 5.582811] ACPI: PCI Interrupt 0000:02:08.1[A] -> Link [APC3] -> GSI 18 (level, low) -> IRQ 18
[ 5.584163] NFORCE-MCP55: 0000:00:0c.0 (rev a1) UDMA133 controller
[ 5.584167] NFORCE-MCP55: IDE controller (0x10de:0x036e rev 0xa1) at PCI slot 0000:00:0c.0
[ 5.584187] NFORCE-MCP55: not 100% native mode: will probe irqs later
[ 5.584194] NFORCE-MCP55: IDE port disabled
[ 5.584198] ide0: BM-DMA at 0xf400-0xf407, BIOS settings: hda:DMA, hdb:DMA
[ 5.584208] Probing IDE interface ide0...
[ 5.661667] firewire_ohci: Added fw-ohci device 0000:02:08.1, OHCI version 1.10
[ 5.661706] ACPI: PCI Interrupt Link [APC1] enabled at IRQ 16
[ 5.661706] ACPI: PCI Interrupt 0000:02:0b.0[A] -> Link [APC1] -> GSI 16 (level, low) -> IRQ 16
[ 5.732701] firewire_ohci: Added fw-ohci device 0000:02:0b.0, OHCI version 1.10
[ 6.345280] ACPI: PCI Interrupt Link [APCL] enabled at IRQ 20
[ 6.345280] ACPI: PCI Interrupt 0000:00:0a.1[B] -> Link [APCL] -> GSI 20 (level, low) -> IRQ 20
[ 6.345280] PCI: Setting latency timer of device 0000:00:0a.1 to 64
[ 6.345280] ehci_hcd 0000:00:0a.1: EHCI Host Controller
[ 6.345280] ehci_hcd 0000:00:0a.1: new USB bus registered, assigned bus number 2
[ 6.345280] ehci_hcd 0000:00:0a.1: debug port 1
[ 6.345280] PCI: cache line size of 64 is not supported by device 0000:00:0a.1
[ 6.345280] ehci_hcd 0000:00:0a.1: irq 20, io mem 0xfe02e000


2008-08-23 12:31:19

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: 2.6.25 DMA: Out of SW-IOMMU space - Asus M2N32 AMD 8GB memory

Jari Aalto <[email protected]> writes:

> [1] Aug 21 11:01:19 jondo kernel: [174628.275859] DMA: Out of SW-IOMMU space for 65536 bytes at device 0000:00:0d.0
> model name : AMD Athlon(tm) X2 Dual Core Processor
> BE-2400

grep GART_IOMMU .config
--
Krzysztof Halasa

2008-08-23 13:58:52

by Alistair John Strachan

[permalink] [raw]
Subject: Re: 2.6.25 DMA: Out of SW-IOMMU space - Asus M2N32 AMD 8GB memory

On Saturday 23 August 2008 13:31:03 Krzysztof Halasa wrote:
> Jari Aalto <[email protected]> writes:
> > [1] Aug 21 11:01:19 jondo kernel: [174628.275859] DMA: Out of
> > SW-IOMMU space for 65536 bytes at device 0000:00:0d.0 model name :
> > AMD Athlon(tm) X2 Dual Core Processor
> > BE-2400
>
> grep GART_IOMMU .config

Agreed, you shouldn't be using the SW-IOMMU on this processor.

That said, do you use the r8169 driver with jumbo frames enabled? Francois
Romieu just fixed a leak in it that affected Intel platforms (because some
have no hardware IOMMU).

--
Cheers,
Alistair.

2008-08-25 00:19:56

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: 2.6.25 DMA: Out of SW-IOMMU space - Asus M2N32 AMD 8GB memory

Jari Aalto <[email protected]> writes:

> What, if after booting to 2.6.26 and without any 'iommu' boot parameters,
> there still appears "Out of Iommu space" messages? What kind of logs
> should I post?

You shouldn't be getting "Out of SW-IOMMU space" messages when not
using SW-IOMMU.

> I'm a bit nervous due to past experience of complete hardd disk
> corruption. But if that was problem with the old kernel and more later
> versions are safer in this repect, I could once more.

I'm not sure if the newer versions are safer. It may be a hardware/BIOS
problem and it may happen again. Make sure you have a usable backup
first.
--
Krzysztof Halasa

2008-08-25 09:50:17

by Jari Aalto

[permalink] [raw]
Subject: Re: 2.6.25 DMA: Out of SW-IOMMU space - Asus M2N32 AMD 8GB memory

Alistair John Strachan <[email protected]> writes:

> On Saturday 23 August 2008 13:31:03 Krzysztof Halasa wrote:
>
>> Jari Aalto <[email protected]> writes:
>> > [1] Aug 21 11:01:19 jondo kernel: [174628.275859] DMA: Out of
>> > SW-IOMMU space for 65536 bytes at device 0000:00:0d.0 model name :
>> > AMD Athlon(tm) X2 Dual Core Processor
>> > BE-2400
>
> Agreed, you shouldn't be using the SW-IOMMU on this processor.

So the correct boot paramer is?

imoou=off,noagp,noaperture

no swiotlb setting at all?

> That said, do you use the r8169 driver with jumbo frames enabled? Francois
> Romieu just fixed a leak in it that affected Intel platforms (because some
> have no hardware IOMMU).

I'm using the onboard WiFi:

$ lsmod | grep rtl
rtl8187 39424 0

$ dmesg ...

[ 2.758861] usb 1-9: new full speed USB device using ohci_hcd and address 4
[ 2.978258] usb 1-9: configuration #1 chosen from 1 choice
[ 2.985369] usb 1-9: New USB device found, idVendor=0bda, idProduct=8187
[ 2.985372] usb 1-9: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 2.985374] usb 1-9: Product: RTL8187_Wireless
[ 2.985376] usb 1-9: Manufacturer: Manufacturer_Realtek_RTL8187_
[ 2.985378] usb 1-9: SerialNumber: 0015AF0B59A6

I'm not sure how it maps to pci -nn:

00:04.0 PCI bridge [0604]: nVidia Corporation C51 PCI Express Bridge [10de:02fb] (rev a1)
00:09.0 ISA bridge [0601]: nVidia Corporation MCP55 LPC Bridge [10de:0360] (rev a2)
00:09.1 SMBus [0c05]: nVidia Corporation MCP55 SMBus [10de:0368] (rev a2)
00:0a.0 USB Controller [0c03]: nVidia Corporation MCP55 USB Controller [10de:036c] (rev a1)
00:0a.1 USB Controller [0c03]: nVidia Corporation MCP55 USB Controller [10de:036d] (rev a2)
00:0e.0 PCI bridge [0604]: nVidia Corporation MCP55 PCI bridge [10de:0370] (rev a2)
00:10.0 Bridge [0680]: nVidia Corporation MCP55 Ethernet [10de:0373] (rev a2)
00:11.0 Bridge [0680]: nVidia Corporation MCP55 Ethernet [10de:0373] (rev a2)
00:16.0 PCI bridge [0604]: nVidia Corporation MCP55 PCI Express bridge [10de:0375] (rev a2)

Thank you,
Jari

2008-08-24 22:16:34

by Jari Aalto

[permalink] [raw]
Subject: Re: 2.6.25 DMA: Out of SW-IOMMU space - Asus M2N32 AMD 8GB memory

Robert Hancock <[email protected]> writes:

> Jari Aalto wrote:
>>>> $ grep -Ei 'iommu|agp' /boot/config-2.6.25-2-amd64
>>>> CONFIG_GART_IOMMU=y
>>>> CONFIG_CALGARY_IOMMU=y
>>>> CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT=y
>>>> CONFIG_IOMMU_HELPER=y
>>>> CONFIG_AGP=y
>>>> CONFIG_AGP_AMD64=y
>>>> CONFIG_AGP_INTEL=m
>>>> CONFIG_AGP_SIS=m
>>>> CONFIG_AGP_VIA=m
>>> Do you have problems without "iommu=XXX"?
>>
>> ...whole harddisk to corrupt and I had to reinstall everything.
>> It might have been a little older kernel (2.6.23?), but I don't recall
>> it exactly.
>>
>> Would you suggest that "immu=off" would be best option?
>
> No, you can't use iommu=off. If you have memory located over 4GB and
> devices which can only do 32-bit DMA then you need some kind of IOMMU
> support, otherwise things will just blow up.
>
> If you use no options and it's enabled in the kernel config it should be
> using the GART IOMMU built into the CPU on this machine..

So, with newest kernel I should just leave 'iommu' out from boot
options. I could try to upgrade even to 2.6.26, which Debian has in
unstable repository.

What, if after booting to 2.6.26 and without any 'iommu' boot parameters,
there still appears "Out of Iommu space" messages? What kind of logs
should I post?

I'm a bit nervous due to past experience of complete hardd disk
corruption. But if that was problem with the old kernel and more later
versions are safer in this repect, I could once more.

Jari

2008-08-24 21:53:01

by Robert Hancock

[permalink] [raw]
Subject: Re: 2.6.25 DMA: Out of SW-IOMMU space - Asus M2N32 AMD 8GB memory

Jari Aalto wrote:
> Krzysztof Halasa <[email protected]> writes:
>
>> Jari Aalto <[email protected]> writes:
>>
>>> >From Debian stock kernel:
>>>
>>> $ grep -Ei 'iommu|agp' /boot/config-2.6.25-2-amd64
>>> CONFIG_GART_IOMMU=y
>>> CONFIG_CALGARY_IOMMU=y
>>> CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT=y
>>> CONFIG_IOMMU_HELPER=y
>>> CONFIG_AGP=y
>>> CONFIG_AGP_AMD64=y
>>> CONFIG_AGP_INTEL=m
>>> CONFIG_AGP_SIS=m
>>> CONFIG_AGP_VIA=m
>> Should be fine. SWIOTLB is there for a backup, most (?) Intel machines
>> don't have IOMMU (even the newest desktop boards).
>>
>>> Are these the correct boot options, without swiotlb?
>>>
>>> iommu=noagp,noaperture,off
>>>
>>> Or just:
>>>
>>> iommu=off
>> Do you have problems without "iommu=XXX"?
>
> Initially when the PC was installed (and had no IOMMU) options, that
> cause the whole harddisk to corrupt and I had to reinstall everything.
> It might have been a little older kernel (2.6.23?), but I don't recall
> it exactly.
>
> According to messages in syslog I tried to track down similar incidents
> and found the iommu articles. But the information was like trial and
> error.
>
> Would you suggest that "immu=off" would be best option?

No, you can't use iommu=off. If you have memory located over 4GB and
devices which can only do 32-bit DMA then you need some kind of IOMMU
support, otherwise things will just blow up.

If you use no options and it's enabled in the kernel config it should be
using the GART IOMMU built into the CPU on this machine..

2008-08-24 15:08:59

by Jari Aalto

[permalink] [raw]
Subject: Re: 2.6.25 DMA: Out of SW-IOMMU space - Asus M2N32 AMD 8GB memory

Krzysztof Halasa <[email protected]> writes:

> Jari Aalto <[email protected]> writes:
>
>>>From Debian stock kernel:
>>
>> $ grep -Ei 'iommu|agp' /boot/config-2.6.25-2-amd64
>> CONFIG_GART_IOMMU=y
>> CONFIG_CALGARY_IOMMU=y
>> CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT=y
>> CONFIG_IOMMU_HELPER=y
>> CONFIG_AGP=y
>> CONFIG_AGP_AMD64=y
>> CONFIG_AGP_INTEL=m
>> CONFIG_AGP_SIS=m
>> CONFIG_AGP_VIA=m
>
> Should be fine. SWIOTLB is there for a backup, most (?) Intel machines
> don't have IOMMU (even the newest desktop boards).
>
>> Are these the correct boot options, without swiotlb?
>>
>> iommu=noagp,noaperture,off
>>
>> Or just:
>>
>> iommu=off
>
> Do you have problems without "iommu=XXX"?

Initially when the PC was installed (and had no IOMMU) options, that
cause the whole harddisk to corrupt and I had to reinstall everything.
It might have been a little older kernel (2.6.23?), but I don't recall
it exactly.

According to messages in syslog I tried to track down similar incidents
and found the iommu articles. But the information was like trial and
error.

Would you suggest that "immu=off" would be best option?

Thanks,
Jari

2008-08-24 13:42:43

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: 2.6.25 DMA: Out of SW-IOMMU space - Asus M2N32 AMD 8GB memory

Jari Aalto <[email protected]> writes:

>>From Debian stock kernel:
>
> $ grep -Ei 'iommu|agp' /boot/config-2.6.25-2-amd64
> CONFIG_GART_IOMMU=y
> CONFIG_CALGARY_IOMMU=y
> CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT=y
> CONFIG_IOMMU_HELPER=y
> CONFIG_AGP=y
> CONFIG_AGP_AMD64=y
> CONFIG_AGP_INTEL=m
> CONFIG_AGP_SIS=m
> CONFIG_AGP_VIA=m

Should be fine. SWIOTLB is there for a backup, most (?) Intel machines
don't have IOMMU (even the newest desktop boards).

> Are these the correct boot options, without swiotlb?
>
> iommu=noagp,noaperture,off
>
> Or just:
>
> iommu=off

Do you have problems without "iommu=XXX"?
--
Krzysztof Halasa

2008-08-24 08:34:52

by Jari Aalto

[permalink] [raw]
Subject: Re: 2.6.25 DMA: Out of SW-IOMMU space - Asus M2N32 AMD 8GB memory

Krzysztof Halasa <[email protected]> writes:

> Jari Aalto <[email protected]> writes:
>
>> [1] Aug 21 11:01:19 jondo kernel: [174628.275859] DMA: Out of SW-IOMMU space for 65536 bytes at device 0000:00:0d.0
>> model name : AMD Athlon(tm) X2 Dual Core Processor
>> BE-2400
>
> grep GART_IOMMU .config

>From Debian stock kernel:

$ grep -Ei 'iommu|agp' /boot/config-2.6.25-2-amd64
CONFIG_GART_IOMMU=y
CONFIG_CALGARY_IOMMU=y
CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT=y
CONFIG_IOMMU_HELPER=y
CONFIG_AGP=y
CONFIG_AGP_AMD64=y
CONFIG_AGP_INTEL=m
CONFIG_AGP_SIS=m
CONFIG_AGP_VIA=m

Jari

2008-08-24 08:40:17

by Jari Aalto

[permalink] [raw]
Subject: Re: 2.6.25 DMA: Out of SW-IOMMU space - Asus M2N32 AMD 8GB memory

Alistair John Strachan <[email protected]> writes:

> On Saturday 23 August 2008 13:31:03 Krzysztof Halasa wrote:
>
>> Jari Aalto <[email protected]> writes:
>> > [1] Aug 21 11:01:19 jondo kernel: [174628.275859] DMA: Out of
>> > SW-IOMMU space for 65536 bytes at device 0000:00:0d.0 model name :
>> > AMD Athlon(tm) X2 Dual Core Processor
>> > BE-2400
>>
>> grep GART_IOMMU .config
>
> Agreed, you shouldn't be using the SW-IOMMU on this processor.

Are these the correct boot options, without swiotlb?

iommu=noagp,noaperture,off

Or just:

iommu=off

> That said, do you use the r8169 driver with jumbo frames enabled? Francois
> Romieu just fixed a leak in it that affected Intel platforms (because some
> have no hardware IOMMU).

The onboard WiFi is in use:

$ lsmod | grep rtl
rtl8187 39424 0

$ dmesg

[ 2.758861] usb 1-9: new full speed USB device using ohci_hcd and address 4
[ 2.978258] usb 1-9: configuration #1 chosen from 1 choice
[ 2.985369] usb 1-9: New USB device found, idVendor=0bda, idProduct=8187
[ 2.985372] usb 1-9: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 2.985374] usb 1-9: Product: RTL8187_Wireless
[ 2.985376] usb 1-9: Manufacturer: Manufacturer_Realtek_RTL8187_
[ 2.985378] usb 1-9: SerialNumber: 0015AF0B59A6

Jari

2008-08-28 20:49:27

by Jari Aalto

[permalink] [raw]
Subject: Re: 2.6.25 DMA: Out of SW-IOMMU space - Asus M2N32 AMD 8GB memory

Krzysztof Halasa <[email protected]> writes:

> Jari Aalto <[email protected]> writes:
>
>> What, if after booting to 2.6.26 and without any 'iommu' boot parameters,
>> there still appears "Out of Iommu space" messages? What kind of logs
>> should I post?
>
> You shouldn't be getting "Out of SW-IOMMU space" messages when not
> using SW-IOMMU.

REF: http://article.gmane.org/gmane.linux.kernel/725293 (thread start)

I regret to report that:

- Upgraded to kernel 2.6.26-1-amd64 (Debian/unstable 2.6.26-3)

$ grep -Ei 'iommu|agp' /boot/config-2.6.26-1-amd64
CONFIG_GART_IOMMU=y
CONFIG_CALGARY_IOMMU=y
CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT=y
CONFIG_IOMMU_HELPER=y
CONFIG_AGP=y
CONFIG_AGP_AMD64=y
CONFIG_AGP_INTEL=m
CONFIG_AGP_SIS=m
CONFIG_AGP_VIA=m
# CONFIG_IOMMU_DEBUG is not set

- Rebooted without 'iommu' parameter. dmesg:

[ 0.000000] Kernel command line: root=UUID=cb9d814f-d885-435b-8e6d-ac17c0ac5aa1 ro quiet vga=0x317 swiotlb=32768
[ 0.004000] Checking aperture...
[ 0.004000] Node 0: aperture @ 4000000 size 32 MB
[ 0.004000] Aperture pointing to e820 RAM. Ignoring.
[ 0.004000] No AGP bridge found
[ 0.004000] Your BIOS doesn't leave a aperture memory hole
[ 0.004000] Please enable the IOMMU option in the BIOS setup
[ 0.004000] This costs you 64 MB of RAM
[ 0.004000] Mapping aperture over 65536 KB of RAM @ 4000000
...
[ 0.374935] PCI-DMA: Disabling AGP.
[ 0.374935] PCI-DMA: aperture base @ 4000000 size 65536 KB
[ 0.374935] PCI-DMA: using GART IOMMU.
[ 0.374935] PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture


- The syslog after boot reads:

Aug 28 20:18:28 jondo kernel: [972060.192696] DMA: Out of SW-IOMMU space for 24576 bytes at device 0000:00:0d.0
Aug 28 20:18:28 jondo kernel: [972060.192760] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Aug 28 20:18:28 jondo kernel: [972060.196705] ata3.00: cmd ca/00:60:c1:91:5f/00:00:00:00:00/e8 tag 0 dma 49152 out
Aug 28 20:18:28 jondo kernel: [972060.196707] res 50/00:00:bf:cf:b5/00:00:71:00:00/ea Emask 0x40 (internal error)
Aug 28 20:18:28 jondo kernel: [972060.200700] ata3.00: status: { DRDY }
Aug 28 20:18:28 jondo kernel: [972060.231128] ata3.00: configured for UDMA/133
Aug 28 20:18:28 jondo kernel: [972060.231137] ata3: EH complete
Aug 28 20:18:28 jondo kernel: [972060.231148] DMA: Out of SW-IOMMU space for 24576 bytes at device 0000:00:0d.0
Aug 28 20:18:28 jondo kernel: [972060.233247] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Aug 28 20:18:28 jondo kernel: [972060.237246] ata3.00: cmd
ca/00:60:c1:91:5f/00:00:00:00:00/e8 tag 0 dma 49152 out

>> I'm a bit nervous due to past experience of complete hardd disk
>> corruption. But if that was problem with the old kernel and more later
>> versions are safer in this repect, I could once more.
>
> I'm not sure if the newer versions are safer. It may be a hardware/BIOS
> problem and it may happen again. Make sure you have a usable backup
> first.

If there is anything I can do, please let me know. This must be a kernel
issue somewhere.

My BE-2400 processor still continues to freeze after few days.

Jari

2008-08-28 20:59:43

by Yinghai Lu

[permalink] [raw]
Subject: Re: 2.6.25 DMA: Out of SW-IOMMU space - Asus M2N32 AMD 8GB memory

On Thu, Aug 28, 2008 at 1:49 PM, Jari Aalto <[email protected]> wrote:
> Krzysztof Halasa <[email protected]> writes:
>
>> Jari Aalto <[email protected]> writes:
>>
>>> What, if after booting to 2.6.26 and without any 'iommu' boot parameters,
>>> there still appears "Out of Iommu space" messages? What kind of logs
>>> should I post?
>>
>> You shouldn't be getting "Out of SW-IOMMU space" messages when not
>> using SW-IOMMU.
>
> REF: http://article.gmane.org/gmane.linux.kernel/725293 (thread start)
>
> I regret to report that:
>
> - Upgraded to kernel 2.6.26-1-amd64 (Debian/unstable 2.6.26-3)

can you send out whole boot log? with "debug initcall_debug"

http://people.redhat.com/mingo/tip.git/readme.txt

YH

2008-08-29 03:48:37

by Gerardo Exequiel Pozzi

[permalink] [raw]
Subject: Re: 2.6.25 DMA: Out of SW-IOMMU space - Asus M2N32 AMD 8GB memory

Hi

(sorry my english)

When booted kernel with no iommu parameter, it uses the iommu gart (from amd) and the size of the aperture is 64MB
You can try to grow this size with iommu=memaper=2 for 128MB or 3 for 256MB. I don't have experience with this, but
i read that this, solves the problem.

I have the similar hardware, but when boot in 64 bits, limit the memory with mem=4G (discarded 0.5G that are (remaped from pci space) beyond 4G)
and iommu=noaperture. (This is not option for you, will be discard 4G of RAM)


Asus M2N32-SLI DELUXE (BIOS Phoenix ver 1603 [12/17/2007])
4 x 1GB OCZ DDR2 PC2-6400 Platinum Rev2 ( OCZ2P800R21G )
AMD Athlon 64 X2 5200+ ( ADA5200IAA6CS ) (stepping F2)

I have realiced some test an posted here (sorry in spanish): http://www.pcmasmas.com/viewtopic.php?t=31445


Buena suerte.


Jari Aalto <jari.aalto <at> cante.net> writes:

> Initially when the PC was installed (and had no IOMMU) options, that
> cause the whole harddisk to corrupt and I had to reinstall everything.
> It might have been a little older kernel (2.6.23?), but I don't recall
> it exactly.

> According to messages in syslog I tried to track down similar incidents
> and found the iommu articles. But the information was like trial and
> error.

> Would you suggest that "immu=off" would be best option?


--
Gerardo Exequiel Pozzi ( djgera )
http://www.djgera.com.ar
KeyID: 0x1B8C330D
Key fingerprint = 0CAA D5D4 CD85 4434 A219 76ED 39AB 221B 1B8C 330D

2008-08-29 06:25:56

by Jari Aalto

[permalink] [raw]
Subject: Re: 2.6.25 DMA: Out of SW-IOMMU space - Asus M2N32 AMD 8GB memory

Gerardo Exequiel Pozzi <[email protected]> writes:

> When booted kernel with no iommu parameter, it uses the iommu gart
> (from amd) and the size of the aperture is 64MB You can try to grow
> this size with iommu=memaper=2 for 128MB or 3 for 256MB. I don't have
> experience with this, but i read that this, solves the problem.

See my original bug report, which included experiences with
memaper. Unofrtunately they didn't help:

REF: http://article.gmane.org/gmane.linux.kernel/725293 (thread start)

but I'll try that once more with this 2.6.26 kernel to double check.

> I have the similar hardware, but when boot in 64 bits, limit the
> memory with mem=4G (discarded 0.5G that are (remaped from pci space)
> beyond 4G) and iommu=noaperture. (This is not option for you, will be
> discard 4G of RAM)

Interestingly 'noaperture' together with 'noagp' has been the best
option so far. It gives workign system for about week, but eventually
the "out of IOMMU space" messages start appear.

The system has always seen the full 8 GB memory without problems.

> Asus M2N32-SLI DELUXE (BIOS Phoenix ver 1603 [12/17/2007])
> 4 x 1GB OCZ DDR2 PC2-6400 Platinum Rev2 ( OCZ2P800R21G )
> AMD Athlon 64 X2 5200+ ( ADA5200IAA6CS ) (stepping F2)

The only difference to your system is more recent CPU:

$ cat/proc/cpuinfo

processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 107
model name : AMD Athlon(tm) X2 Dual Core Processor BE-2400
stepping : 2
cpu MHz : 2310.516
cache size : 512 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch
bogomips : 4625.06
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc
...

> I have realiced some test an posted here (sorry in spanish): http://www.pcmasmas.com/viewtopic.php?t=31445

Could you explain this a bit more. Is the mentioned "tip-latest" the
current kernel development tree? Should I try to compile this kernel and
use it for testing?

Jari

2008-08-29 08:08:51

by Yinghai Lu

[permalink] [raw]
Subject: Re: 2.6.25 DMA: Out of SW-IOMMU space - Asus M2N32 AMD 8GB memory

On Fri, Aug 29, 2008 at 12:49 AM, Jari Aalto <[email protected]> wrote:
>> On Thu, Aug 28, 2008 at 1:49 PM, Jari Aalto <[email protected]> wrote:
>>
>>> Krzysztof Halasa <[email protected]> writes:
>>>
>>>> Jari Aalto <[email protected]> writes:
>>>>
>>>>> What, if after booting to 2.6.26 and without any 'iommu' boot parameters,
>>>>> there still appears "Out of Iommu space" messages? What kind of logs
>>>>> should I post?
>>>>
>>>> You shouldn't be getting "Out of SW-IOMMU space" messages when not
>>>> using SW-IOMMU.
>>>
>>> REF: http://article.gmane.org/gmane.linux.kernel/725293 (thread start)
>>>
>>> I regret to report that:
>>>
>>> - Upgraded to kernel 2.6.26-1-amd64 (Debian/unstable 2.6.26-3)
>>
>> can you send out whole boot log? with "debug initcall_debug"
>
> See below.
>
>> http://people.redhat.com/mingo/tip.git/readme.txt
>
> Could you explain this document a bit. What git commands should I use to
> retrive test kernel to build?


mkdir linux.trees.git || exit -1
cd linux.trees.git

git init
git remote add linus
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git

git remote add tip
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git

git remote update

git checkout -b tip-latest tip/master

can you fix your mail client? it stripped Cc and automatically?

YH

2008-08-29 18:52:35

by Gerardo Exequiel Pozzi

[permalink] [raw]
Subject: Re: 2.6.25 DMA: Out of SW-IOMMU space - Asus M2N32 AMD 8GB memory

(sorry, my english, again, very BAD!)

(not in mailing-list, please CC to me)


Jari Aalto <jari.aalto <at> cante.net> writes:

>> When booted kernel with no iommu parameter, it uses the iommu gart
>> (from amd) and the size of the aperture is 64MB You can try to grow
>> this size with iommu=memaper=2 for 128MB or 3 for 256MB. I don't have
>> experience with this, but i read that this, solves the problem.
>
>See my original bug report, which included experiences with
>memaper. Unofrtunately they didn't help:
>
> REF: http://article.gmane.org/gmane.linux.kernel/725293 (thread >start)
>
>but I'll try that once more with this 2.6.26 kernel to double check.

No your options are bad,

You have mixed options for "swiotlb" and "amd gart", see the
Documentation/x86_64/boot-options.txt

iommu=soft,memaper=3 swiotlb=65536 BAD.

memaper=3 don't have any effects here, memaper is only for "AMD GART"

The noagp option in this motherboard is redundant.

iommu=noagp,noaperture,off swiotlb=512M BAD combination options!
OFF, OFF, OFF, ON: The result are unknown.

You need to use the IOMMU (from AMD GART) that with your options: you
never used.

When you boot with no iommu options, kernels uses the AMD GART in this
system, with an aperture of 64MB (too small) to fix, you need only one
option: iommu=memaper=3

?Buena suerte!



>
>> I have the similar hardware, but when boot in 64 bits, limit the
>> memory with mem=4G (discarded 0.5G that are (remaped from pci space)
>> beyond 4G) and iommu=noaperture. (This is not option for you, will be
>> discard 4G of RAM)
>
>Interestingly 'noaperture' together with 'noagp' has been the best
>option so far. It gives workign system for about week, but eventually
>the "out of IOMMU space" messages start appear.

>The system has always seen the full 8 GB memory without problems.


--
Gerardo Exequiel Pozzi ( djgera )
http://www.djgera.com.ar
KeyID: 0x1B8C330D
Key fingerprint = 0CAA D5D4 CD85 4434 A219 76ED 39AB 221B 1B8C 330D

2008-08-29 21:40:42

by Jari Aalto

[permalink] [raw]
Subject: Re: 2.6.25 DMA: Out of SW-IOMMU space - Asus M2N32 AMD 8GB memory

Gerardo Exequiel Pozzi <[email protected]> writes:

>> REF: http://article.gmane.org/gmane.linux.kernel/725293 (thread >start)
>>
>>but I'll try that once more with this 2.6.26 kernel to double check.
>
> You have mixed options for "swiotlb" and "amd gart", see the
> Documentation/x86_64/boot-options.txt

It's difficult to find the pointer that could explain the correct use of
the options and their combinations.

> iommu=soft,memaper=3 swiotlb=65536 BAD.
>
> memaper=3 don't have any effects here, memaper is only for "AMD GART"
>
> The noagp option in this motherboard is redundant.
>
> iommu=noagp,noaperture,off swiotlb=512M BAD combination options!
> OFF, OFF, OFF, ON: The result are unknown.
>
> You need to use the IOMMU (from AMD GART) that with your options: you
> never used.
>
> When you boot with no iommu options, kernels uses the AMD GART in this
> system, with an aperture of 64MB (too small) to fix, you need only one
> option: iommu=memaper=3

This sort of advice wouldbe wonderful if it wereincluded in the spec,
which as of now is too technical for mere mortals:

http://www.mjmwired.net/kernel/Documentation/x86_64/boot-options.txt

I'm not surprised that end user cannot know what combination of options
make or does not make sense. I'll try at next boot with plain:

iommu=memaper=3

Thank you,
Jari