Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751803AbWIRSto (ORCPT ); Mon, 18 Sep 2006 14:49:44 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1750761AbWIRSto (ORCPT ); Mon, 18 Sep 2006 14:49:44 -0400 Received: from mail.brumma.com ([213.147.174.48]:36794 "EHLO mail.brumma.com") by vger.kernel.org with ESMTP id S1751803AbWIRStl (ORCPT ); Mon, 18 Sep 2006 14:49:41 -0400 Message-ID: <450EEA1A.90806@kautzy.com> Date: Mon, 18 Sep 2006 20:48:58 +0200 From: kautzy User-Agent: Thunderbird 1.5.0.7 (Windows/20060909) MIME-Version: 1.0 To: Jon Mason CC: linux-kernel@vger.kernel.org Subject: Re: Dual Core Opteron hangs, iommu Entries (x86_64) References: <450E9B49.4030203@kautzy.com> <20060918183234.GA24163@kudzu.us> In-Reply-To: <20060918183234.GA24163@kudzu.us> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 15053 Lines: 343 Jon Mason wrote: > On Mon, Sep 18, 2006 at 03:12:41PM +0200, kautzy wrote: > >> Since this is my first post on this list, I would like to say hello to >> everyone! >> >> I am experiencing problems with a 2x dual core opteron servers. every >> 5-7 days the system hangs. while it still pings, it does not react on >> console inputs, i can't login via ssh either. when that happens, the >> only thing one can do is to reset the machine. there aren't any errors >> logged. >> >> i have checked the memory for errors, but it looks like it is ok. >> >> I found a post on this list describing a problem which looks similar to >> mine: >> >> http://www.gatago.com/linux/kernel/13699679.html >> >> as mentioned in the above post, a dmesg on my server also shows >> following entries: >> >> Allocating PCI resources starting at fb800000 (gap: fb000000:4780000) >> Checking aperture... >> CPU 0: aperture @ cc24000000 size 32 MB >> Aperture from northbridge cpu 0 too small (32 MB) >> No AGP bridge found >> Your BIOS doesn't leave a aperture memory hole >> Please enable the IOMMU option in the BIOS setup >> This costs you 64 MB of RAM >> Mapping aperture over 65536 KB of RAM @ 8000000 >> Built 1 zonelists >> >> can those entries have anything to do with the system crashes, and if, >> can booting with iommu=memaper=3 help to solve the problem? >> >> i am running kernel 2.6.17.11, sarge amd64 , the system has 6GB RAM >> >> i appreciate any suggestions :) >> > > Your problem is that you have more than 4GB of RAM and not enough room > in your IOMMU aperature to handle all of the pending DMA requests. > Dmesg suggests you go into your BIOS and increase your AGP aperature > from 32M to 64M, did you try that? > > Thanks, > Jon > > thanks alot for your reply jon! i should have mentioned that the mainboard has neither an agp slot nor AGP aperature settings in the bios :( the biggest problem i am facing is, that i always have to wait 5-7 days until i see if the changes i made since i had to reboot the computer had a positive effect (which unluckily never was the case until now) ;-) regards chris >> >> chris >> >> the full output of dmesg: >> >> Bootdata ok (command line is root=/dev/sda8 ro console=tty0 ) >> Linux version 2.6.17.11-mli1-opteron-v2 (root@mli1) (gcc version 3.3.5 >> (Debian 1:3.3.5-13)) #1 SMP Mon Sep 11 12:29:02 CEST 2006 >> BIOS-provided physical RAM map: >> BIOS-e820: 0000000000000000 - 000000000009f400 (usable) >> BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved) >> BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved) >> BIOS-e820: 0000000000100000 - 00000000faff0000 (usable) >> BIOS-e820: 00000000faff0000 - 00000000fafff000 (ACPI data) >> BIOS-e820: 00000000fafff000 - 00000000fb000000 (ACPI NVS) >> BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved) >> BIOS-e820: 0000000100000000 - 0000000180000000 (usable) >> DMI 2.3 present. >> On node 0 totalpages: 1529283 >> DMA zone: 2459 pages, LIFO batch:0 >> DMA32 zone: 1009704 pages, LIFO batch:31 >> Normal zone: 517120 pages, LIFO batch:31 >> Intel MultiProcessor Specification v1.1 >> Virtual Wire compatibility mode. >> OEM ID: TYAN Product ID: S2882 APIC at: 0xFEE00000 >> Processor #0 15:1 APIC version 16 >> Processor #1 15:1 APIC version 16 >> Processor #2 15:1 APIC version 16 >> Processor #3 15:1 APIC version 16 >> I/O APIC #4 Version 17 at 0xFEC00000. >> I/O APIC #5 Version 17 at 0xFEBFF000. >> I/O APIC #6 Version 17 at 0xFEBFE000. >> Setting APIC routing to flat >> Processors: 4 >> Allocating PCI resources starting at fb800000 (gap: fb000000:4780000) >> Checking aperture... >> CPU 0: aperture @ cc24000000 size 32 MB >> Aperture from northbridge cpu 0 too small (32 MB) >> No AGP bridge found >> Your BIOS doesn't leave a aperture memory hole >> Please enable the IOMMU option in the BIOS setup >> This costs you 64 MB of RAM >> Mapping aperture over 65536 KB of RAM @ 8000000 >> Built 1 zonelists >> Kernel command line: root=/dev/sda8 ro console=tty0 >> Initializing CPU#0 >> PID hash table entries: 4096 (order: 12, 32768 bytes) >> time.c: Using 1.193182 MHz WALL PIT GTOD PIT/TSC timer. >> time.c: Detected 2190.816 MHz processor. >> Console: colour VGA+ 80x25 >> Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes) >> Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes) >> Memory: 6038612k/6291456k available (3002k kernel code, 170092k >> reserved, 1269k data, 168k init) >> Calibrating delay using timer specific routine.. 4390.66 BogoMIPS >> (lpj=8781339) >> Mount-cache hash table entries: 256 >> CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) >> CPU: L2 Cache: 1024K (64 bytes/line) >> Using IO-APIC 4 >> Using IO-APIC 5 >> Using IO-APIC 6 >> GSI 18 sharing vector 0x89 and IRQ 18 >> GSI 19 sharing vector 0x91 and IRQ 19 >> GSI 24 sharing vector 0x99 and IRQ 24 >> GSI 25 sharing vector 0xA1 and IRQ 25 >> GSI 29 sharing vector 0xA9 and IRQ 29 >> Using local APIC timer interrupts. >> result 12447820 >> Detected 12.447 MHz APIC timer. >> Booting processor 1/4 APIC 0x1 >> Initializing CPU#1 >> Calibrating delay using timer specific routine.. 4381.80 BogoMIPS >> (lpj=8763613) >> CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) >> CPU: L2 Cache: 1024K (64 bytes/line) >> Dual Core AMD Opteron(tm) Processor 275 stepping 02 >> CPU 1: Syncing TSC to CPU 0. >> CPU 1: synchronized TSC with CPU 0 (last diff 6 cycles, maxerr 627 cycles) >> Booting processor 2/4 APIC 0x2 >> Initializing CPU#2 >> Calibrating delay using timer specific routine.. 4381.88 BogoMIPS >> (lpj=8763771) >> CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) >> CPU: L2 Cache: 1024K (64 bytes/line) >> Dual Core AMD Opteron(tm) Processor 275 stepping 02 >> CPU 2: Syncing TSC to CPU 0. >> CPU 2: synchronized TSC with CPU 0 (last diff 1 cycles, maxerr 876 cycles) >> Booting processor 3/4 APIC 0x3 >> Initializing CPU#3 >> Calibrating delay using timer specific routine.. 4381.92 BogoMIPS >> (lpj=8763852) >> CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) >> CPU: L2 Cache: 1024K (64 bytes/line) >> Dual Core AMD Opteron(tm) Processor 275 stepping 02 >> CPU 3: Syncing TSC to CPU 0. >> CPU 3: synchronized TSC with CPU 0 (last diff 7 cycles, maxerr 864 cycles) >> Brought up 4 CPUs >> testing NMI watchdog ... OK. >> migration_cost=460 >> NET: Registered protocol family 16 >> PCI: Using configuration type 1 >> SCSI subsystem initialized >> PCI: Probing PCI hardware >> PCI: Probing PCI hardware (bus 00) >> Boot video device is 0000:03:06.0 >> PCI: Using IRQ router default [1022/746b] at 0000:00:07.3 >> PCI->APIC IRQ transform: 0000:00:07.2[D] -> IRQ 19 >> PCI->APIC IRQ transform: 0000:03:06.0[A] -> IRQ 18 >> PCI->APIC IRQ transform: 0000:03:08.0[A] -> IRQ 18 >> PCI->APIC IRQ transform: 0000:02:09.0[A] -> IRQ 24 >> PCI->APIC IRQ transform: 0000:02:09.1[B] -> IRQ 25 >> PCI->APIC IRQ transform: 0000:01:04.0[A] -> IRQ 29 >> PCI-DMA: Disabling AGP. >> PCI-DMA: aperture base @ 8000000 size 65536 KB >> PCI-DMA: using GART IOMMU. >> PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture >> PCI: Bridge: 0000:00:06.0 >> IO window: 9000-bfff >> MEM window: fca00000-feafffff >> PREFETCH window: disabled. >> PCI: Bridge: 0000:00:0a.0 >> IO window: disabled. >> MEM window: fc900000-fc9fffff >> PREFETCH window: fc600000-fc6fffff >> PCI: Bridge: 0000:00:0b.0 >> IO window: 8000-8fff >> MEM window: fc800000-fc8fffff >> PREFETCH window: fb500000-fc5fffff >> NET: Registered protocol family 2 >> IP route cache hash table entries: 262144 (order: 9, 2097152 bytes) >> TCP established hash table entries: 262144 (order: 10, 4194304 bytes) >> TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) >> TCP: Hash tables configured (established 262144 bind 65536) >> TCP reno registered >> IA32 emulation $Id: sys_ia32.c,v 1.32 2002/03/24 13:02:28 ak Exp $ >> Installing knfsd (copyright (C) 1996 okir@monad.swb.de). >> Initializing Cryptographic API >> io scheduler noop registered >> io scheduler anticipatory registered >> io scheduler deadline registered (default) >> io scheduler cfq registered >> PCI: MSI quirk detected. PCI_BUS_FLAGS_NO_MSI set for subordinate bus. >> PCI: MSI quirk detected. PCI_BUS_FLAGS_NO_MSI set for subordinate bus. >> Real Time Clock Driver v1.12ac >> Linux agpgart interface v0.101 (c) Dave Jones >> Floppy drive(s): fd0 is 1.44M >> FDC 0 is a post-1991 82077 >> loop: loaded (max 8 devices) >> Intel(R) PRO/1000 Network Driver - version 7.0.33-k2 >> Copyright (c) 1999-2005 Intel Corporation. >> eepro100.c:v1.09j-t 9/29/99 Donald Becker >> http://www.scyld.com/network/eepro100.html >> eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin >> and others >> eth0: 0000:03:08.0, 00:E0:81:32:F6:36, IRQ 18. >> Board assembly 567812-052, Physical connectors present: RJ45 >> Primary interface chip i82555 PHY #1. >> General self-test: passed. >> Serial sub-system self-test: passed. >> Internal registers self-test: passed. >> ROM checksum self-test: passed (0xd0a6c714). >> e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI >> e100: Copyright(c) 1999-2005 Intel Corporation >> tg3.c:v3.59 (June 8, 2006) >> eth1: Tigon3 [partno(BCM95704A7) rev 2003 PHY(5704)] >> (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:32:f7:ac >> eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] >> TSOcap[1] >> eth1: dma_rwctrl[769f4000] dma_mask[64-bit] >> eth2: Tigon3 [partno(BCM95704A7) rev 2003 PHY(5704)] >> (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:32:f7:ad >> eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] >> TSOcap[1] >> eth2: dma_rwctrl[769f4000] dma_mask[64-bit] >> Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 >> ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx >> 3ware 9000 Storage Controller device driver for Linux v2.26.02.007. >> 3w-9xxx: scsi0: AEN: INFO (0x04:0x0055): Battery charging started:. >> 3w-9xxx: scsi0: AEN: INFO (0x04:0x0053): Battery capacity test is overdue:. >> scsi0 : 3ware 9000 Storage Controller >> 3w-9xxx: scsi0: Found a 3ware 9000 Storage Controller at 0xfc8ffc00, >> IRQ: 29. >> 3w-9xxx: scsi0: Firmware FE9X 2.08.00.005, BIOS BE9X 2.03.01.052, Ports: 8. >> Vendor: AMCC Model: 9500S-8 DISK Rev: 2.08 >> Type: Direct-Access ANSI SCSI revision: 03 >> SCSI device sda: 956884992 512-byte hdwr sectors (489925 MB) >> sda: Write Protect is off >> sda: Mode Sense: 23 00 00 00 >> SCSI device sda: drive cache: write back, no read (daft) >> SCSI device sda: 956884992 512-byte hdwr sectors (489925 MB) >> sda: Write Protect is off >> sda: Mode Sense: 23 00 00 00 >> SCSI device sda: drive cache: write back, no read (daft) >> sda: sda1 < sda5 sda6 sda7 sda8 sda9 sda10 > sda2 sda3 >> sd 0:0:0:0: Attached scsi disk sda >> serio: i8042 AUX port at 0x60,0x64 irq 12 >> serio: i8042 KBD port at 0x60,0x64 irq 1 >> mice: PS/2 mouse device common for all mice >> TCP bic registered >> NET: Registered protocol family 1 >> NET: Registered protocol family 10 >> IPv6 over IPv4 tunneling driver >> NET: Registered protocol family 17 >> NET: Registered protocol family 15 >> 802.1Q VLAN Support v1.8 Ben Greear >> All bugs added by David S. Miller >> ReiserFS: sda8: found reiserfs format "3.6" with standard journal >> ReiserFS: sda8: using ordered data mode >> ReiserFS: sda8: journal params: device sda8, size 8192, journal first >> block 18, max trans len 1024, max batch 900, max commit age 30, max >> trans age 30 >> ReiserFS: sda8: checking transaction log (sda8) >> input: AT Translated Set 2 keyboard as /class/input/input0 >> ReiserFS: sda8: replayed 15 transactions in 1 seconds >> ReiserFS: sda8: Using r5 hash to sort names >> VFS: Mounted root (reiserfs filesystem) readonly. >> Freeing unused kernel memory: 168k freed >> Adding 1951856k swap on /dev/sda5. Priority:-1 extents:1 across:1951856k >> Adding 1951856k swap on /dev/sda6. Priority:-2 extents:1 across:1951856k >> Adding 1951792k swap on /dev/sda7. Priority:-3 extents:1 across:1951792k >> ReiserFS: sda10: found reiserfs format "3.6" with standard journal >> ReiserFS: sda10: using ordered data mode >> ReiserFS: sda10: journal params: device sda10, size 8192, journal first >> block 18, max trans len 1024, max batch 900, max commit age 30, max >> trans age 30 >> ReiserFS: sda10: checking transaction log (sda10) >> ReiserFS: sda10: Using r5 hash to sort names >> ReiserFS: sda10: Removing [30 40588 0x0 SD]..done >> ReiserFS: sda10: Removing [3 40583 0x0 SD]..done >> ReiserFS: sda10: Removing [3 40582 0x0 SD]..done >> ReiserFS: sda10: Removing [3 40579 0x0 SD]..done >> ReiserFS: sda10: There were 4 uncompleted unlinks/truncates. Completed >> ReiserFS: sda2: found reiserfs format "3.6" with standard journal >> ReiserFS: sda2: using ordered data mode >> ReiserFS: sda2: journal params: device sda2, size 8192, journal first >> block 18, max trans len 1024, max batch 900, max commit age 30, max >> trans age 30 >> ReiserFS: sda2: checking transaction log (sda2) >> ReiserFS: sda2: Using r5 hash to sort names >> ReiserFS: sda2: Removing [1306 51393 0x0 SD]..done >> ReiserFS: sda2: Removing [1306 51193 0x0 SD]..done >> ReiserFS: sda2: There were 2 uncompleted unlinks/truncates. Completed >> ReiserFS: sda3: found reiserfs format "3.6" with standard journal >> ReiserFS: sda3: using ordered data mode >> ReiserFS: sda3: journal params: device sda3, size 8192, journal first >> block 18, max trans len 1024, max batch 900, max commit age 30, max >> trans age 30 >> ReiserFS: sda3: checking transaction log (sda3) >> ReiserFS: sda3: Using r5 hash to sort names >> PM: Writing back config space on device 0000:02:09.1 at offset b (was >> 164814e4, writing 164414e4) >> PM: Writing back config space on device 0000:02:09.1 at offset 3 (was >> 804000, writing 804010) >> PM: Writing back config space on device 0000:02:09.1 at offset 2 (was >> 2000000, writing 2000003) >> PM: Writing back config space on device 0000:02:09.1 at offset 1 (was >> 2b00000, writing 2b00106) >> ADDRCONF(NETDEV_UP): eth2: link is not ready >> tg3: eth2: Link is up at 1000 Mbps, full duplex. >> tg3: eth2: Flow control is off for TX and off for RX. >> ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready >> eth2: no IPv6 routers present >> 3w-9xxx: scsi0: AEN: INFO (0x04:0x0056): Battery charging completed:. >> >> - >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ >> >> > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/