2006-02-13 21:04:39

by Ryan Richter

[permalink] [raw]
Subject: Random reboots

Ever since upgrading our file server from 2.6.11.3 to 2.6.14+, it has
been experiencing random reboots about every 2-3 weeks. I'm pretty
certain it's a kernel issue: it shares a UPS with a few other machines,
so it's not the power. We had uptimes of ~6 months with 2.6.11.3, and
I've run memtest86 overnight since adding some memory a few months ago,
so I don't suspect hardware trouble. We've had 5 of these reboots now,
so it's a repeatable problem, albeit on an agonizing timescale for
testing.

Does anyone have any thoughts at all as to what this could be? Any help
would be much appreciated - the .config and dmesg are below.

Thanks,
-ryan


#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.15
# Tue Jan 31 01:33:18 2006
#
CONFIG_X86_64=y
CONFIG_64BIT=y
CONFIG_X86=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_MMU=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_CMPXCHG=y
CONFIG_EARLY_PRINTK=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_CLEAN_COMPILE=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
# CONFIG_POSIX_MQUEUE is not set
# CONFIG_BSD_PROCESS_ACCT is not set
CONFIG_SYSCTL=y
# CONFIG_AUDIT is not set
CONFIG_HOTPLUG=y
CONFIG_KOBJECT_UEVENT=y
# CONFIG_IKCONFIG is not set
# CONFIG_CPUSETS is not set
CONFIG_INITRAMFS_SOURCE=""
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
# CONFIG_EMBEDDED is not set
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SHMEM=y
CONFIG_CC_ALIGN_FUNCTIONS=0
CONFIG_CC_ALIGN_LABELS=0
CONFIG_CC_ALIGN_LOOPS=0
CONFIG_CC_ALIGN_JUMPS=0
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0

#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
CONFIG_OBSOLETE_MODPARM=y
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
# CONFIG_KMOD is not set
CONFIG_STOP_MACHINE=y

#
# Block layer
#
CONFIG_LBD=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
CONFIG_DEFAULT_AS=y
# CONFIG_DEFAULT_DEADLINE is not set
# CONFIG_DEFAULT_CFQ is not set
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="anticipatory"

#
# Processor type and features
#
CONFIG_MK8=y
# CONFIG_MPSC is not set
# CONFIG_GENERIC_CPU is not set
CONFIG_X86_L1_CACHE_BYTES=64
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_TSC=y
CONFIG_X86_GOOD_APIC=y
CONFIG_MICROCODE=y
# CONFIG_X86_MSR is not set
# CONFIG_X86_CPUID is not set
CONFIG_X86_IO_APIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_MTRR=y
CONFIG_SMP=y
# CONFIG_SCHED_SMT is not set
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
# CONFIG_PREEMPT_BKL is not set
CONFIG_NUMA=y
CONFIG_K8_NUMA=y
CONFIG_X86_64_ACPI_NUMA=y
# CONFIG_NUMA_EMU is not set
CONFIG_ARCH_DISCONTIGMEM_ENABLE=y
CONFIG_ARCH_DISCONTIGMEM_DEFAULT=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_SELECT_MEMORY_MODEL=y
# CONFIG_FLATMEM_MANUAL is not set
CONFIG_DISCONTIGMEM_MANUAL=y
# CONFIG_SPARSEMEM_MANUAL is not set
CONFIG_DISCONTIGMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
CONFIG_NEED_MULTIPLE_NODES=y
# CONFIG_SPARSEMEM_STATIC is not set
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID=y
CONFIG_NR_CPUS=8
# CONFIG_HOTPLUG_CPU is not set
CONFIG_HPET_TIMER=y
# CONFIG_X86_PM_TIMER is not set
CONFIG_HPET_EMULATE_RTC=y
CONFIG_GART_IOMMU=y
CONFIG_SWIOTLB=y
CONFIG_X86_MCE=y
# CONFIG_X86_MCE_INTEL is not set
CONFIG_X86_MCE_AMD=y
CONFIG_PHYSICAL_START=0x100000
# CONFIG_KEXEC is not set
# CONFIG_SECCOMP is not set
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_ISA_DMA_API=y
CONFIG_GENERIC_PENDING_IRQ=y

#
# Power management options
#
CONFIG_PM=y
CONFIG_PM_LEGACY=y
# CONFIG_PM_DEBUG is not set

#
# ACPI (Advanced Configuration and Power Interface) Support
#
CONFIG_ACPI=y
# CONFIG_ACPI_AC is not set
# CONFIG_ACPI_BATTERY is not set
CONFIG_ACPI_BUTTON=y
# CONFIG_ACPI_VIDEO is not set
# CONFIG_ACPI_HOTKEY is not set
CONFIG_ACPI_FAN=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_THERMAL=y
CONFIG_ACPI_NUMA=y
# CONFIG_ACPI_ASUS is not set
# CONFIG_ACPI_IBM is not set
# CONFIG_ACPI_TOSHIBA is not set
CONFIG_ACPI_BLACKLIST_YEAR=0
# CONFIG_ACPI_DEBUG is not set
CONFIG_ACPI_EC=y
CONFIG_ACPI_POWER=y
CONFIG_ACPI_SYSTEM=y
# CONFIG_ACPI_CONTAINER is not set

#
# CPU Frequency scaling
#
# CONFIG_CPU_FREQ is not set

#
# Bus options (PCI etc.)
#
CONFIG_PCI=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
# CONFIG_UNORDERED_IO is not set
# CONFIG_PCIEPORTBUS is not set
# CONFIG_PCI_MSI is not set
CONFIG_PCI_LEGACY_PROC=y
# CONFIG_PCI_DEBUG is not set

#
# PCCARD (PCMCIA/CardBus) support
#
# CONFIG_PCCARD is not set

#
# PCI Hotplug Support
#
# CONFIG_HOTPLUG_PCI is not set

#
# Executable file formats / Emulations
#
CONFIG_BINFMT_ELF=y
# CONFIG_BINFMT_MISC is not set
CONFIG_IA32_EMULATION=y
# CONFIG_IA32_AOUT is not set
CONFIG_COMPAT=y
CONFIG_SYSVIPC_COMPAT=y
CONFIG_UID16=y

#
# Networking
#
CONFIG_NET=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_UNIX=y
# CONFIG_NET_KEY is not set
CONFIG_INET=y
# CONFIG_IP_MULTICAST is not set
# CONFIG_IP_ADVANCED_ROUTER is not set
CONFIG_IP_FIB_HASH=y
# CONFIG_IP_PNP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE is not set
# CONFIG_ARPD is not set
CONFIG_SYN_COOKIES=y
# CONFIG_INET_AH is not set
# CONFIG_INET_ESP is not set
# CONFIG_INET_IPCOMP is not set
# CONFIG_INET_TUNNEL is not set
CONFIG_INET_DIAG=y
CONFIG_INET_TCP_DIAG=y
# CONFIG_TCP_CONG_ADVANCED is not set
CONFIG_TCP_CONG_BIC=y
# CONFIG_IPV6 is not set
# CONFIG_NETFILTER is not set

#
# DCCP Configuration (EXPERIMENTAL)
#
# CONFIG_IP_DCCP is not set

#
# SCTP Configuration (EXPERIMENTAL)
#
# CONFIG_IP_SCTP is not set
# CONFIG_ATM is not set
# CONFIG_BRIDGE is not set
# CONFIG_VLAN_8021Q is not set
# CONFIG_DECNET is not set
# CONFIG_LLC2 is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_NET_DIVERT is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set

#
# QoS and/or fair queueing
#
# CONFIG_NET_SCHED is not set

#
# Network testing
#
# CONFIG_NET_PKTGEN is not set
# CONFIG_HAMRADIO is not set
# CONFIG_IRDA is not set
# CONFIG_BT is not set
# CONFIG_IEEE80211 is not set

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
# CONFIG_FW_LOADER is not set
# CONFIG_DEBUG_DRIVER is not set

#
# Connector - unified userspace <-> kernelspace linker
#
# CONFIG_CONNECTOR is not set

#
# Memory Technology Devices (MTD)
#
# CONFIG_MTD is not set

#
# Parallel port support
#
CONFIG_PARPORT=y
CONFIG_PARPORT_PC=y
# CONFIG_PARPORT_SERIAL is not set
# CONFIG_PARPORT_PC_FIFO is not set
# CONFIG_PARPORT_PC_SUPERIO is not set
# CONFIG_PARPORT_GSC is not set
# CONFIG_PARPORT_1284 is not set

#
# Plug and Play support
#
# CONFIG_PNP is not set

#
# Block devices
#
CONFIG_BLK_DEV_FD=y
# CONFIG_PARIDE is not set
# CONFIG_BLK_CPQ_DA is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_BLK_DEV_DAC960 is not set
# CONFIG_BLK_DEV_UMEM is not set
# CONFIG_BLK_DEV_COW_COMMON is not set
CONFIG_BLK_DEV_LOOP=y
# CONFIG_BLK_DEV_CRYPTOLOOP is not set
# CONFIG_BLK_DEV_NBD is not set
# CONFIG_BLK_DEV_SX8 is not set
# CONFIG_BLK_DEV_UB is not set
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=4096
CONFIG_BLK_DEV_INITRD=y
# CONFIG_CDROM_PKTCDVD is not set
# CONFIG_ATA_OVER_ETH is not set

#
# ATA/ATAPI/MFM/RLL support
#
# CONFIG_IDE is not set

#
# SCSI device support
#
# CONFIG_RAID_ATTRS is not set
CONFIG_SCSI=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
CONFIG_CHR_DEV_ST=y
# CONFIG_CHR_DEV_OSST is not set
CONFIG_BLK_DEV_SR=y
# CONFIG_BLK_DEV_SR_VENDOR is not set
CONFIG_CHR_DEV_SG=y
CONFIG_CHR_DEV_SCH=y

#
# Some SCSI devices (e.g. CD jukebox) support multiple LUNs
#
# CONFIG_SCSI_MULTI_LUN is not set
# CONFIG_SCSI_CONSTANTS is not set
# CONFIG_SCSI_LOGGING is not set

#
# SCSI Transport Attributes
#
CONFIG_SCSI_SPI_ATTRS=y
# CONFIG_SCSI_FC_ATTRS is not set
# CONFIG_SCSI_ISCSI_ATTRS is not set
# CONFIG_SCSI_SAS_ATTRS is not set

#
# SCSI low-level drivers
#
# CONFIG_ISCSI_TCP is not set
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_ACARD is not set
# CONFIG_SCSI_AACRAID is not set
# CONFIG_SCSI_AIC7XXX is not set
# CONFIG_SCSI_AIC7XXX_OLD is not set
# CONFIG_SCSI_AIC79XX is not set
# CONFIG_MEGARAID_NEWGEN is not set
# CONFIG_MEGARAID_LEGACY is not set
# CONFIG_MEGARAID_SAS is not set
# CONFIG_SCSI_SATA is not set
# CONFIG_SCSI_BUSLOGIC is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_EATA is not set
# CONFIG_SCSI_FUTURE_DOMAIN is not set
# CONFIG_SCSI_GDTH is not set
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
# CONFIG_SCSI_PPA is not set
# CONFIG_SCSI_IMM is not set
CONFIG_SCSI_SYM53C8XX_2=y
CONFIG_SCSI_SYM53C8XX_DMA_ADDRESSING_MODE=1
CONFIG_SCSI_SYM53C8XX_DEFAULT_TAGS=16
CONFIG_SCSI_SYM53C8XX_MAX_TAGS=64
# CONFIG_SCSI_SYM53C8XX_IOMAPPED is not set
# CONFIG_SCSI_IPR is not set
# CONFIG_SCSI_QLOGIC_FC is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
CONFIG_SCSI_QLA2XXX=y
# CONFIG_SCSI_QLA21XX is not set
# CONFIG_SCSI_QLA22XX is not set
# CONFIG_SCSI_QLA2300 is not set
# CONFIG_SCSI_QLA2322 is not set
# CONFIG_SCSI_QLA6312 is not set
# CONFIG_SCSI_QLA24XX is not set
# CONFIG_SCSI_LPFC is not set
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_DC390T is not set
# CONFIG_SCSI_DEBUG is not set

#
# Multi-device support (RAID and LVM)
#
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
# CONFIG_MD_LINEAR is not set
# CONFIG_MD_RAID0 is not set
CONFIG_MD_RAID1=y
# CONFIG_MD_RAID10 is not set
CONFIG_MD_RAID5=y
# CONFIG_MD_RAID6 is not set
# CONFIG_MD_MULTIPATH is not set
# CONFIG_MD_FAULTY is not set
CONFIG_BLK_DEV_DM=y
# CONFIG_DM_CRYPT is not set
# CONFIG_DM_SNAPSHOT is not set
# CONFIG_DM_MIRROR is not set
# CONFIG_DM_ZERO is not set
# CONFIG_DM_MULTIPATH is not set

#
# Fusion MPT device support
#
CONFIG_FUSION=y
CONFIG_FUSION_SPI=y
# CONFIG_FUSION_FC is not set
# CONFIG_FUSION_SAS is not set
CONFIG_FUSION_MAX_SGE=128
CONFIG_FUSION_CTL=y

#
# IEEE 1394 (FireWire) support
#
# CONFIG_IEEE1394 is not set

#
# I2O device support
#
# CONFIG_I2O is not set

#
# Network device support
#
CONFIG_NETDEVICES=y
# CONFIG_DUMMY is not set
CONFIG_BONDING=m
# CONFIG_EQUALIZER is not set
# CONFIG_TUN is not set

#
# ARCnet devices
#
# CONFIG_ARCNET is not set

#
# PHY device support
#

#
# Ethernet (10 or 100Mbit)
#
# CONFIG_NET_ETHERNET is not set

#
# Ethernet (1000 Mbit)
#
# CONFIG_ACENIC is not set
# CONFIG_DL2K is not set
# CONFIG_E1000 is not set
# CONFIG_NS83820 is not set
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
# CONFIG_R8169 is not set
# CONFIG_SIS190 is not set
# CONFIG_SKGE is not set
# CONFIG_SK98LIN is not set
CONFIG_TIGON3=y
# CONFIG_BNX2 is not set

#
# Ethernet (10000 Mbit)
#
# CONFIG_CHELSIO_T1 is not set
# CONFIG_IXGB is not set
# CONFIG_S2IO is not set

#
# Token Ring devices
#
# CONFIG_TR is not set

#
# Wireless LAN (non-hamradio)
#
# CONFIG_NET_RADIO is not set

#
# Wan interfaces
#
# CONFIG_WAN is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_PLIP is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set
# CONFIG_NET_FC is not set
# CONFIG_SHAPER is not set
# CONFIG_NETCONSOLE is not set
# CONFIG_NETPOLL is not set
# CONFIG_NET_POLL_CONTROLLER is not set

#
# ISDN subsystem
#
# CONFIG_ISDN is not set

#
# Telephony Support
#
# CONFIG_PHONE is not set

#
# Input device support
#
CONFIG_INPUT=y

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_PSAUX=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1600
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=1200
# CONFIG_INPUT_JOYDEV is not set
# CONFIG_INPUT_TSDEV is not set
# CONFIG_INPUT_EVDEV is not set
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_XTKBD is not set
# CONFIG_KEYBOARD_NEWTON is not set
# CONFIG_INPUT_MOUSE is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
# CONFIG_INPUT_MISC is not set

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
# CONFIG_SERIO_SERPORT is not set
# CONFIG_SERIO_CT82C710 is not set
# CONFIG_SERIO_PARKBD is not set
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
# CONFIG_SERIO_RAW is not set
# CONFIG_GAMEPORT is not set

#
# Character devices
#
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
# CONFIG_SERIAL_NONSTANDARD is not set

#
# Serial drivers
#
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_ACPI=y
CONFIG_SERIAL_8250_NR_UARTS=4
# CONFIG_SERIAL_8250_EXTENDED is not set

#
# Non-8250 serial port support
#
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
# CONFIG_SERIAL_JSM is not set
CONFIG_UNIX98_PTYS=y
CONFIG_LEGACY_PTYS=y
CONFIG_LEGACY_PTY_COUNT=256
CONFIG_PRINTER=y
# CONFIG_LP_CONSOLE is not set
# CONFIG_PPDEV is not set
# CONFIG_TIPAR is not set

#
# IPMI
#
# CONFIG_IPMI_HANDLER is not set

#
# Watchdog Cards
#
# CONFIG_WATCHDOG is not set
CONFIG_HW_RANDOM=y
# CONFIG_NVRAM is not set
CONFIG_RTC=y
# CONFIG_DTLK is not set
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set

#
# Ftape, the floppy tape device driver
#
CONFIG_AGP=y
CONFIG_AGP_AMD64=y
# CONFIG_AGP_INTEL is not set
# CONFIG_DRM is not set
# CONFIG_MWAVE is not set
# CONFIG_RAW_DRIVER is not set
# CONFIG_HPET is not set
# CONFIG_HANGCHECK_TIMER is not set

#
# TPM devices
#
# CONFIG_TCG_TPM is not set
# CONFIG_TELCLOCK is not set

#
# I2C support
#
CONFIG_I2C=y
CONFIG_I2C_CHARDEV=y

#
# I2C Algorithms
#
# CONFIG_I2C_ALGOBIT is not set
# CONFIG_I2C_ALGOPCF is not set
# CONFIG_I2C_ALGOPCA is not set

#
# I2C Hardware Bus support
#
# CONFIG_I2C_ALI1535 is not set
# CONFIG_I2C_ALI1563 is not set
# CONFIG_I2C_ALI15X3 is not set
# CONFIG_I2C_AMD756 is not set
CONFIG_I2C_AMD8111=y
# CONFIG_I2C_I801 is not set
# CONFIG_I2C_I810 is not set
# CONFIG_I2C_PIIX4 is not set
CONFIG_I2C_ISA=y
# CONFIG_I2C_NFORCE2 is not set
# CONFIG_I2C_PARPORT is not set
# CONFIG_I2C_PARPORT_LIGHT is not set
# CONFIG_I2C_PROSAVAGE is not set
# CONFIG_I2C_SAVAGE4 is not set
# CONFIG_SCx200_ACB is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
# CONFIG_I2C_SIS96X is not set
# CONFIG_I2C_STUB is not set
# CONFIG_I2C_VIA is not set
# CONFIG_I2C_VIAPRO is not set
# CONFIG_I2C_VOODOO3 is not set
# CONFIG_I2C_PCA_ISA is not set

#
# Miscellaneous I2C Chip support
#
# CONFIG_SENSORS_DS1337 is not set
# CONFIG_SENSORS_DS1374 is not set
# CONFIG_SENSORS_EEPROM is not set
# CONFIG_SENSORS_PCF8574 is not set
# CONFIG_SENSORS_PCA9539 is not set
# CONFIG_SENSORS_PCF8591 is not set
# CONFIG_SENSORS_RTC8564 is not set
# CONFIG_SENSORS_MAX6875 is not set
# CONFIG_RTC_X1205_I2C is not set
# CONFIG_I2C_DEBUG_CORE is not set
# CONFIG_I2C_DEBUG_ALGO is not set
# CONFIG_I2C_DEBUG_BUS is not set
# CONFIG_I2C_DEBUG_CHIP is not set

#
# Dallas's 1-wire bus
#
# CONFIG_W1 is not set

#
# Hardware Monitoring support
#
CONFIG_HWMON=y
CONFIG_HWMON_VID=y
# CONFIG_SENSORS_ADM1021 is not set
# CONFIG_SENSORS_ADM1025 is not set
# CONFIG_SENSORS_ADM1026 is not set
# CONFIG_SENSORS_ADM1031 is not set
# CONFIG_SENSORS_ADM9240 is not set
# CONFIG_SENSORS_ASB100 is not set
# CONFIG_SENSORS_ATXP1 is not set
# CONFIG_SENSORS_DS1621 is not set
# CONFIG_SENSORS_FSCHER is not set
# CONFIG_SENSORS_FSCPOS is not set
# CONFIG_SENSORS_GL518SM is not set
# CONFIG_SENSORS_GL520SM is not set
# CONFIG_SENSORS_IT87 is not set
# CONFIG_SENSORS_LM63 is not set
# CONFIG_SENSORS_LM75 is not set
# CONFIG_SENSORS_LM77 is not set
# CONFIG_SENSORS_LM78 is not set
# CONFIG_SENSORS_LM80 is not set
# CONFIG_SENSORS_LM83 is not set
CONFIG_SENSORS_LM85=y
# CONFIG_SENSORS_LM87 is not set
# CONFIG_SENSORS_LM90 is not set
# CONFIG_SENSORS_LM92 is not set
# CONFIG_SENSORS_MAX1619 is not set
# CONFIG_SENSORS_PC87360 is not set
# CONFIG_SENSORS_SIS5595 is not set
# CONFIG_SENSORS_SMSC47M1 is not set
# CONFIG_SENSORS_SMSC47B397 is not set
# CONFIG_SENSORS_VIA686A is not set
CONFIG_SENSORS_W83781D=y
# CONFIG_SENSORS_W83792D is not set
# CONFIG_SENSORS_W83L785TS is not set
# CONFIG_SENSORS_W83627HF is not set
# CONFIG_SENSORS_W83627EHF is not set
# CONFIG_SENSORS_HDAPS is not set
# CONFIG_HWMON_DEBUG_CHIP is not set

#
# Misc devices
#
# CONFIG_IBM_ASM is not set

#
# Multimedia Capabilities Port drivers
#

#
# Multimedia devices
#
# CONFIG_VIDEO_DEV is not set

#
# Digital Video Broadcasting Devices
#
# CONFIG_DVB is not set

#
# Graphics support
#
# CONFIG_FB is not set
# CONFIG_VIDEO_SELECT is not set

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_DUMMY_CONSOLE=y

#
# Sound
#
# CONFIG_SOUND is not set

#
# USB support
#
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB_ARCH_HAS_OHCI=y
CONFIG_USB=y
# CONFIG_USB_DEBUG is not set

#
# Miscellaneous USB options
#
CONFIG_USB_DEVICEFS=y
# CONFIG_USB_BANDWIDTH is not set
# CONFIG_USB_DYNAMIC_MINORS is not set
# CONFIG_USB_SUSPEND is not set
# CONFIG_USB_OTG is not set

#
# USB Host Controller Drivers
#
CONFIG_USB_EHCI_HCD=y
# CONFIG_USB_EHCI_SPLIT_ISO is not set
# CONFIG_USB_EHCI_ROOT_HUB_TT is not set
# CONFIG_USB_ISP116X_HCD is not set
# CONFIG_USB_OHCI_HCD is not set
CONFIG_USB_UHCI_HCD=y
# CONFIG_USB_SL811_HCD is not set

#
# USB Device Class drivers
#
# CONFIG_USB_ACM is not set
# CONFIG_USB_PRINTER is not set

#
# NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support'
#

#
# may also be needed; see USB_STORAGE Help for more information
#
CONFIG_USB_STORAGE=y
# CONFIG_USB_STORAGE_DEBUG is not set
# CONFIG_USB_STORAGE_DATAFAB is not set
# CONFIG_USB_STORAGE_FREECOM is not set
# CONFIG_USB_STORAGE_DPCM is not set
# CONFIG_USB_STORAGE_USBAT is not set
# CONFIG_USB_STORAGE_SDDR09 is not set
# CONFIG_USB_STORAGE_SDDR55 is not set
# CONFIG_USB_STORAGE_JUMPSHOT is not set

#
# USB Input Devices
#
# CONFIG_USB_HID is not set

#
# USB HID Boot Protocol drivers
#
# CONFIG_USB_KBD is not set
# CONFIG_USB_MOUSE is not set
# CONFIG_USB_AIPTEK is not set
# CONFIG_USB_WACOM is not set
# CONFIG_USB_ACECAD is not set
# CONFIG_USB_KBTAB is not set
# CONFIG_USB_POWERMATE is not set
# CONFIG_USB_MTOUCH is not set
# CONFIG_USB_ITMTOUCH is not set
# CONFIG_USB_EGALAX is not set
# CONFIG_USB_YEALINK is not set
# CONFIG_USB_XPAD is not set
# CONFIG_USB_ATI_REMOTE is not set
# CONFIG_USB_KEYSPAN_REMOTE is not set
# CONFIG_USB_APPLETOUCH is not set

#
# USB Imaging devices
#
# CONFIG_USB_MDC800 is not set
# CONFIG_USB_MICROTEK is not set

#
# USB Multimedia devices
#
# CONFIG_USB_DABUSB is not set

#
# Video4Linux support is needed for USB Multimedia device support
#

#
# USB Network Adapters
#
# CONFIG_USB_CATC is not set
# CONFIG_USB_KAWETH is not set
# CONFIG_USB_PEGASUS is not set
# CONFIG_USB_RTL8150 is not set
# CONFIG_USB_USBNET is not set
# CONFIG_USB_MON is not set

#
# USB port drivers
#
# CONFIG_USB_USS720 is not set

#
# USB Serial Converter support
#
# CONFIG_USB_SERIAL is not set

#
# USB Miscellaneous drivers
#
# CONFIG_USB_EMI62 is not set
# CONFIG_USB_EMI26 is not set
# CONFIG_USB_AUERSWALD is not set
# CONFIG_USB_RIO500 is not set
# CONFIG_USB_LEGOTOWER is not set
# CONFIG_USB_LCD is not set
# CONFIG_USB_LED is not set
# CONFIG_USB_CYTHERM is not set
# CONFIG_USB_PHIDGETKIT is not set
# CONFIG_USB_PHIDGETSERVO is not set
# CONFIG_USB_IDMOUSE is not set
# CONFIG_USB_SISUSBVGA is not set
# CONFIG_USB_LD is not set
# CONFIG_USB_TEST is not set

#
# USB DSL modem support
#

#
# USB Gadget Support
#
# CONFIG_USB_GADGET is not set

#
# MMC/SD Card support
#
# CONFIG_MMC is not set

#
# InfiniBand support
#
# CONFIG_INFINIBAND is not set

#
# SN Devices
#

#
# Firmware Drivers
#
# CONFIG_EDD is not set
# CONFIG_DELL_RBU is not set
# CONFIG_DCDBAS is not set

#
# File systems
#
CONFIG_EXT2_FS=y
# CONFIG_EXT2_FS_XATTR is not set
# CONFIG_EXT2_FS_XIP is not set
CONFIG_EXT3_FS=y
# CONFIG_EXT3_FS_XATTR is not set
CONFIG_JBD=y
# CONFIG_JBD_DEBUG is not set
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
# CONFIG_FS_POSIX_ACL is not set
# CONFIG_XFS_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_ROMFS_FS is not set
# CONFIG_INOTIFY is not set
# CONFIG_QUOTA is not set
CONFIG_DNOTIFY=y
# CONFIG_AUTOFS_FS is not set
# CONFIG_AUTOFS4_FS is not set
# CONFIG_FUSE_FS is not set

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
# CONFIG_ZISOFS is not set
# CONFIG_UDF_FS is not set

#
# DOS/FAT/NT Filesystems
#
CONFIG_FAT_FS=y
CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=y
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1"
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
# CONFIG_HUGETLBFS is not set
# CONFIG_HUGETLB_PAGE is not set
CONFIG_RAMFS=y
# CONFIG_RELAYFS_FS is not set

#
# Miscellaneous filesystems
#
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
# CONFIG_CRAMFS is not set
# CONFIG_VXFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set

#
# Network File Systems
#
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
# CONFIG_NFS_V3_ACL is not set
# CONFIG_NFS_V4 is not set
# CONFIG_NFS_DIRECTIO is not set
CONFIG_NFSD=y
CONFIG_NFSD_V3=y
# CONFIG_NFSD_V3_ACL is not set
# CONFIG_NFSD_V4 is not set
CONFIG_NFSD_TCP=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_EXPORTFS=y
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=y
# CONFIG_RPCSEC_GSS_KRB5 is not set
# CONFIG_RPCSEC_GSS_SPKM3 is not set
# CONFIG_SMB_FS is not set
# CONFIG_CIFS is not set
# CONFIG_NCP_FS is not set
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set
# CONFIG_9P_FS is not set

#
# Partition Types
#
# CONFIG_PARTITION_ADVANCED is not set
CONFIG_MSDOS_PARTITION=y

#
# Native Language Support
#
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=y
# CONFIG_NLS_CODEPAGE_737 is not set
# CONFIG_NLS_CODEPAGE_775 is not set
# CONFIG_NLS_CODEPAGE_850 is not set
# CONFIG_NLS_CODEPAGE_852 is not set
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
# CONFIG_NLS_CODEPAGE_865 is not set
# CONFIG_NLS_CODEPAGE_866 is not set
# CONFIG_NLS_CODEPAGE_869 is not set
# CONFIG_NLS_CODEPAGE_936 is not set
# CONFIG_NLS_CODEPAGE_950 is not set
# CONFIG_NLS_CODEPAGE_932 is not set
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
# CONFIG_NLS_ISO8859_8 is not set
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
# CONFIG_NLS_ASCII is not set
CONFIG_NLS_ISO8859_1=y
# CONFIG_NLS_ISO8859_2 is not set
# CONFIG_NLS_ISO8859_3 is not set
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
# CONFIG_NLS_ISO8859_14 is not set
# CONFIG_NLS_ISO8859_15 is not set
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
CONFIG_NLS_UTF8=y

#
# Instrumentation Support
#
# CONFIG_PROFILING is not set
# CONFIG_KPROBES is not set

#
# Kernel hacking
#
# CONFIG_PRINTK_TIME is not set
CONFIG_DEBUG_KERNEL=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_LOG_BUF_SHIFT=17
# CONFIG_DETECT_SOFTLOCKUP is not set
# CONFIG_SCHEDSTATS is not set
# CONFIG_DEBUG_SLAB is not set
# CONFIG_DEBUG_SPINLOCK is not set
# CONFIG_DEBUG_SPINLOCK_SLEEP is not set
# CONFIG_DEBUG_KOBJECT is not set
# CONFIG_DEBUG_INFO is not set
# CONFIG_DEBUG_FS is not set
# CONFIG_DEBUG_VM is not set
# CONFIG_FRAME_POINTER is not set
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_INIT_DEBUG is not set
# CONFIG_IOMMU_DEBUG is not set

#
# Security options
#
# CONFIG_KEYS is not set
# CONFIG_SECURITY is not set

#
# Cryptographic options
#
# CONFIG_CRYPTO is not set

#
# Hardware crypto devices
#

#
# Library routines
#
# CONFIG_CRC_CCITT is not set
# CONFIG_CRC16 is not set
# CONFIG_CRC32 is not set
# CONFIG_LIBCRC32C is not set

Bootdata ok (command line is BOOT_IMAGE=Linux-test ro root=854 console=ttyS0,19200n8)
Linux version 2.6.15 (root@xarello) (gcc version 3.3.5 (Debian 1:3.3.5-13)) #1 SMP Fri Feb 10 10:02:11 EST 2006
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 00000000f6ff0000 (usable)
BIOS-e820: 00000000f6ff0000 - 00000000f6fff000 (ACPI data)
BIOS-e820: 00000000f6fff000 - 00000000f7000000 (ACPI NVS)
BIOS-e820: 00000000ff7c0000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 0000000180000000 (usable)
ACPI: RSDP (v002 ACPIAM ) @ 0x00000000000f4680
ACPI: XSDT (v001 A M I OEMXSDT 0x06000318 MSFT 0x00000097) @ 0x00000000f6ff0100
ACPI: FADT (v001 A M I OEMFACP 0x06000318 MSFT 0x00000097) @ 0x00000000f6ff0281
ACPI: MADT (v001 A M I OEMAPIC 0x06000318 MSFT 0x00000097) @ 0x00000000f6ff0380
ACPI: OEMB (v001 A M I OEMBIOS 0x06000318 MSFT 0x00000097) @ 0x00000000f6fff040
ACPI: ASF! (v001 AMIASF AMDSTRET 0x00000001 INTL 0x02002026) @ 0x00000000f6ff3530
ACPI: DSDT (v001 0ABCF 0ABCF007 0x00000007 INTL 0x02002026) @ 0x0000000000000000
Scanning NUMA topology in Northbridge 24
Number of nodes 2
Node 0 MemBase 0000000000000000 Limit 0000000100000000
Node 1 MemBase 0000000100000000 Limit 0000000180000000
Using 32 for the hash shift.
Using node hash shift of 32
Bootmem setup node 0 0000000000000000-0000000100000000
Bootmem setup node 1 0000000100000000-0000000180000000
On node 0 totalpages: 996245
DMA zone: 2925 pages, LIFO batch:0
DMA32 zone: 993320 pages, LIFO batch:31
Normal zone: 0 pages, LIFO batch:0
HighMem zone: 0 pages, LIFO batch:0
On node 1 totalpages: 517120
DMA zone: 0 pages, LIFO batch:0
DMA32 zone: 0 pages, LIFO batch:0
Normal zone: 517120 pages, LIFO batch:31
HighMem zone: 0 pages, LIFO batch:0
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:5 APIC version 16
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1 15:5 APIC version 16
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
ACPI: IOAPIC (id[0x03] address[0xfebfe000] gsi_base[24])
IOAPIC[1]: apic_id 3, version 17, address 0xfebfe000, GSI 24-27
ACPI: IOAPIC (id[0x04] address[0xfebff000] gsi_base[28])
IOAPIC[2]: apic_id 4, version 17, address 0xfebff000, GSI 28-31
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Setting APIC routing to flat
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at f8000000 (gap: f7000000:87c0000)
Checking aperture...
CPU 0: aperture @ 1ee0000000 size 64 MB
Aperture from northbridge cpu 0 beyond 4GB. Ignoring.
No AGP bridge found
Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Mapping aperture over 65536 KB of RAM @ 8000000
Built 2 zonelists
Kernel command line: BOOT_IMAGE=Linux-test ro root=854 console=ttyS0,19200n8
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 131072 bytes)
time.c: Using 1.193182 MHz PIT timer.
time.c: Detected 1393.747 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
Memory: 5975252k/6291456k available (2336k kernel code, 168296k reserved, 1061k data, 228k init)
Calibrating delay using timer specific routine.. 2793.28 BogoMIPS (lpj=5586570)
Mount-cache hash table entries: 256
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 0(1) -> Node 0 -> Core 0
mtrr: v2.0 (20020519)
Using local APIC timer interrupts.
Detected 12.444 MHz APIC timer.
Booting processor 1/2 APIC 0x1
Initializing CPU#1
Calibrating delay using timer specific routine.. 2787.66 BogoMIPS (lpj=5575338)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 1(1) -> Node 1 -> Core 0
AMD Opteron(tm) Processor 240 stepping 01
CPU 1: Syncing TSC to CPU 0.
CPU 1: synchronized TSC with CPU 0 (last diff 8 cycles, maxerr 937 cycles)
Brought up 2 CPUs
time.c: Using PIT/TSC based timekeeping.
testing NMI watchdog ... OK.
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using configuration type 1
ACPI: Subsystem revision 20050902
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Probing PCI hardware (bus 00)
Boot video device is 0000:03:06.0
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI1._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.GOLA._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.GOLB._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 *5 6 7 9 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 *10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 9 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 *9 10 11 12 14 15)
SCSI subsystem initialized
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report
PCI-DMA: Disabling AGP.
PCI-DMA: aperture base @ 8000000 size 65536 KB
PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture
PCI: Bridge: 0000:00:06.0
IO window: b000-bfff
MEM window: fca00000-feafffff
PREFETCH window: ff500000-ff5fffff
PCI: Bridge: 0000:00:0a.0
IO window: a000-afff
MEM window: fc400000-fc9fffff
PREFETCH window: ff400000-ff4fffff
PCI: Bridge: 0000:00:0b.0
IO window: disabled.
MEM window: fc300000-fc3fffff
PREFETCH window: ff300000-ff3fffff
IA-32 Microcode Update Driver: v1.14 <[email protected]>
IA32 emulation $Id: sys_ia32.c,v 1.32 2002/03/24 13:02:28 ak Exp $
Installing knfsd (copyright (C) 1996 [email protected]).
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
PCI: MSI quirk detected. pci_msi_quirk set.
PCI: MSI quirk detected. pci_msi_quirk set.
ACPI: Power Button (FF) [PWRF]
ACPI: Power Button (CM) [PWRB]
ACPI: Processor [CPU1] (supports 8 throttling states)
lp: driver loaded but no devices found
Real Time Clock Driver v1.12
hw_random: AMD768 system management I/O registers at 0x5000.
hw_random hardware driver 1.0.0 loaded
Linux agpgart interface v0.101 (c) Dave Jones
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
parport0: PC-style at 0x378 [PCSPP(,...)]
lp0: using parport0 (polling).
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
loop: loaded (max 8 devices)
tg3.c:v3.47 (Dec 28, 2005)
GSI 16 sharing vector 0xA9 and IRQ 16
ACPI: PCI Interrupt 0000:02:09.0[A] -> GSI 24 (level, low) -> IRQ 16
eth0: Tigon3 [partno(BCM95704A7) rev 2003 PHY(5704)] (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:51:d6:c9
eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1]
eth0: dma_rwctrl[769f4000]
GSI 17 sharing vector 0xB1 and IRQ 17
ACPI: PCI Interrupt 0000:02:09.1[B] -> GSI 25 (level, low) -> IRQ 17
eth1: Tigon3 [partno(BCM95704A7) rev 2003 PHY(5704)] (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:51:d6:ca
eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1]
eth1: dma_rwctrl[769f4000]
GSI 18 sharing vector 0xB9 and IRQ 18
ACPI: PCI Interrupt 0000:03:04.0[A] -> GSI 16 (level, low) -> IRQ 18
sym0: <875> rev 0x26 at pci 0000:03:04.0 irq 18
sym0: Symbios NVRAM, ID 7, Fast-20, SE, parity checking
sym0: open drain IRQ line driver, using on-chip SRAM
sym0: using LOAD/STORE-based firmware.
sym0: SCSI BUS has been reset.
scsi0 : sym-2.2.1
Vendor: SONY Model: SDX-400C Rev: 0702
Type: Sequential-Access ANSI SCSI revision: 02
target0:0:4: Beginning Domain Validation
target0:0:4: asynchronous.
target0:0:4: wide asynchronous.
target0:0:4: FAST-20 WIDE SCSI 40.0 MB/s ST (50 ns, offset 15)
target0:0:4: Domain Validation skipping write tests
target0:0:4: Ending Domain Validation
Vendor: OVERLAND Model: LIBRARYPRO Rev: 0417
Type: Medium Changer ANSI SCSI revision: 02
target0:0:6: Beginning Domain Validation
target0:0:6: asynchronous.
target0:0:6: wide asynchronous.
target0:0:6: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15)
target0:0:6: Domain Validation skipping write tests
target0:0:6: Ending Domain Validation
st: Version 20050830, fixed bufsize 32768, s/g segs 256
st 0:0:4:0: Attached scsi tape st0<4>st0: try direct i/o: yes (alignment 512 B), max page reachable by HBA 1572864
st 0:0:4:0: Attached scsi generic sg0 type 1
0:0:6:0: Attached scsi generic sg1 type 8
SCSI Media Changer driver v0.25
ch0: type #1 (mt): 0x0+1 [medium transport]
ch0: type #2 (st): 0x1+18 [storage]
ch0: type #3 (ie): 0xd0+1 [import/export]
ch0: type #4 (dt): 0xe0+1 [data transfer]
ch0: dt 0xe0: ID 4, LUN 0, name: SONY SDX-400C 0702
ch0: INITIALIZE ELEMENT STATUS, may take some time ...
ch0: ... finished
ch 0:0:6:0: Attached scsi changer ch0
Fusion MPT base driver 3.03.04
Copyright (c) 1999-2005 LSI Logic Corporation
Fusion MPT SPI Host driver 3.03.04
ACPI: PCI Interrupt 0000:02:0a.0[A] -> GSI 24 (level, low) -> IRQ 16
mptbase: Initiating ioc0 bringup
ioc0: 53C1030: Capabilities={Initiator}
scsi1 : ioc0: LSI53C1030, FwRev=01030600h, Ports=1, MaxQ=255, IRQ=16
Vendor: SUPER Model: GEM318 Rev: 0
Type: Processor ANSI SCSI revision: 02
1:0:8:0: Attached scsi generic sg2 type 3
Vendor: SEAGATE Model: ST3146807LC Rev: 0006
Type: Direct-Access ANSI SCSI revision: 03
SCSI device sda: 286749488 512-byte hdwr sectors (146816 MB)
SCSI device sda: drive cache: write back
SCSI device sda: 286749488 512-byte hdwr sectors (146816 MB)
SCSI device sda: drive cache: write back
sda: sda1
sd 1:0:9:0: Attached scsi disk sda
sd 1:0:9:0: Attached scsi generic sg3 type 0
Vendor: SEAGATE Model: ST3146807LC Rev: 0006
Type: Direct-Access ANSI SCSI revision: 03
SCSI device sdb: 286749488 512-byte hdwr sectors (146816 MB)
SCSI device sdb: drive cache: write back
SCSI device sdb: 286749488 512-byte hdwr sectors (146816 MB)
SCSI device sdb: drive cache: write back
sdb: sdb1
sd 1:0:10:0: Attached scsi disk sdb
sd 1:0:10:0: Attached scsi generic sg4 type 0
Vendor: SEAGATE Model: ST3146807LC Rev: 0006
Type: Direct-Access ANSI SCSI revision: 03
SCSI device sdc: 286749488 512-byte hdwr sectors (146816 MB)
SCSI device sdc: drive cache: write back
SCSI device sdc: 286749488 512-byte hdwr sectors (146816 MB)
SCSI device sdc: drive cache: write back
sdc: sdc1
sd 1:0:11:0: Attached scsi disk sdc
sd 1:0:11:0: Attached scsi generic sg5 type 0
Vendor: SEAGATE Model: ST3146807LC Rev: 0007
Type: Direct-Access ANSI SCSI revision: 03
SCSI device sdd: 286749488 512-byte hdwr sectors (146816 MB)
SCSI device sdd: drive cache: write back
SCSI device sdd: 286749488 512-byte hdwr sectors (146816 MB)
SCSI device sdd: drive cache: write back
sdd: sdd1
sd 1:0:12:0: Attached scsi disk sdd
sd 1:0:12:0: Attached scsi generic sg6 type 0
Vendor: SEAGATE Model: ST3146807LC Rev: 0007
Type: Direct-Access ANSI SCSI revision: 03
SCSI device sde: 286749488 512-byte hdwr sectors (146816 MB)
SCSI device sde: drive cache: write back
SCSI device sde: 286749488 512-byte hdwr sectors (146816 MB)
SCSI device sde: drive cache: write back
sde: sde1
sd 1:0:13:0: Attached scsi disk sde
sd 1:0:13:0: Attached scsi generic sg7 type 0
ACPI: PCI Interrupt 0000:02:0a.1[B] -> GSI 25 (level, low) -> IRQ 17
mptbase: Initiating ioc1 bringup
ioc1: 53C1030: Capabilities={Initiator}
scsi2 : ioc1: LSI53C1030, FwRev=01030600h, Ports=1, MaxQ=255, IRQ=17
Vendor: SEAGATE Model: ST336607LW Rev: 0007
Type: Direct-Access ANSI SCSI revision: 03
SCSI device sdf: 71687372 512-byte hdwr sectors (36704 MB)
SCSI device sdf: drive cache: write back
SCSI device sdf: 71687372 512-byte hdwr sectors (36704 MB)
SCSI device sdf: drive cache: write back
sdf: sdf1 sdf2 sdf3 sdf4
sd 2:0:0:0: Attached scsi disk sdf
sd 2:0:0:0: Attached scsi generic sg8 type 0
Fusion MPT misc device (ioctl) driver 3.03.04
mptctl: Registered with Fusion MPT base driver
mptctl: /dev/mptctl @ (major,minor=10,220)
USB Universal Host Controller Interface driver v2.3
Initializing USB Mass Storage driver...
usbcore: registered new driver usb-storage
USB Mass Storage support registered.
mice: PS/2 mouse device common for all mice
i2c /dev entries driver
md: raid1 personality registered as nr 3
md: raid5 personality registered as nr 4
raid5: automatically using best checksumming function: generic_sse
generic_sse: 4294.000 MB/sec
raid5: using function: generic_sse (4294.000 MB/sec)
md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: bitmap version 4.39
device-mapper: 4.4.0-ioctl (2005-01-12) initialised: [email protected]
NET: Registered protocol family 2
IP route cache hash table entries: 262144 (order: 9, 2097152 bytes)
TCP established hash table entries: 262144 (order: 10, 4194304 bytes)
TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
TCP: Hash tables configured (established 262144 bind 65536)
TCP reno registered
TCP bic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
md: Autodetecting RAID arrays.
md: autorun ...
md: considering sde1 ...
md: adding sde1 ...
md: adding sdd1 ...
md: adding sdc1 ...
md: adding sdb1 ...
md: adding sda1 ...
md: created md0
md: bind<sda1>
md: bind<sdb1>
md: bind<sdc1>
md: bind<sdd1>
md: bind<sde1>
md: running: <sde1><sdd1><sdc1><sdb1><sda1>
raid5: device sde1 operational as raid disk 4
raid5: device sdd1 operational as raid disk 3
raid5: device sdc1 operational as raid disk 2
raid5: device sdb1 operational as raid disk 1
raid5: device sda1 operational as raid disk 0
raid5: allocated 5312kB for md0
raid5: raid level 5 set md0 active with 5 out of 5 devices, algorithm 2
RAID5 conf printout:
--- rd:5 wd:5 fd:0
disk 0, o:1, dev:sda1
disk 1, o:1, dev:sdb1
disk 2, o:1, dev:sdc1
disk 3, o:1, dev:sdd1
disk 4, o:1, dev:sde1
md: ... autorun DONE.
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 228k freed
Adding 2097136k swap on /dev/sdf2. Priority:-1 extents:1 across:2097136k
EXT3 FS on sdf4, internal journal
kjournald starting. Commit interval 5 seconds
EXT3 FS on sdf3, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-32, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-34, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-33, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-35, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-36, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-38, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-39, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-40, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-41, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-2, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-3, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-4, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-5, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-6, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-7, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-8, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-9, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-10, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-11, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-12, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-13, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-14, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-15, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-16, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-17, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-18, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-19, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-20, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-21, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-22, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-23, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-24, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-25, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-26, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-27, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-28, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-29, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-30, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-31, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Ethernet Channel Bonding Driver: v2.6.5 (November 4, 2005)
bonding: MII link monitoring set to 100 ms
bonding: bond0: enslaving eth0 as an active interface with a down link.
bonding: bond0: enslaving eth1 as an active interface with a down link.
tg3: eth0: Link is up at 1000 Mbps, full duplex.
tg3: eth0: Flow control is off for TX and off for RX.
tg3: eth1: Link is up at 1000 Mbps, full duplex.
tg3: eth1: Flow control is off for TX and off for RX.
bonding: bond0: link status definitely up for interface eth0.
bonding: bond0: link status definitely up for interface eth1.


2006-02-13 21:11:21

by anders

[permalink] [raw]
Subject: Re: Random reboots


[email protected] said:
> Ever since upgrading our file server from 2.6.11.3 to 2.6.14+, it has been
> experiencing random reboots about every 2-3 weeks. I'm pretty certain it's a
> kernel issue: it shares a UPS with a few other machines, so it's not the
> power. We had uptimes of ~6 months with 2.6.11.3, and I've run memtest86
> overnight since adding some memory a few months ago, so I don't suspect
> hardware trouble. We've had 5 of these reboots now, so it's a repeatable
> problem, albeit on an agonizing timescale for testing.

Any chance you're running spamassassin on that box? I've seen similar issues
lately and, comparing /var/log/messages with crontab, have concluded that
sa-learn sometimes kills the box when run by cron. It seems so odd that I
haven't found the guts to ask about it figuring that I should do some more
test myself, but I don't have much else to go on...

/Anders

2006-02-13 21:22:46

by Ryan Richter

[permalink] [raw]
Subject: Re: Random reboots

On Mon, Feb 13, 2006 at 10:10:43PM +0100, [email protected] wrote:
>
> [email protected] said:
> > Ever since upgrading our file server from 2.6.11.3 to 2.6.14+, it
> > has been experiencing random reboots about every 2-3 weeks. I'm
> > pretty certain it's a kernel issue: it shares a UPS with a few other
> > machines, so it's not the power. We had uptimes of ~6 months with
> > 2.6.11.3, and I've run memtest86 overnight since adding some memory
> > a few months ago, so I don't suspect hardware trouble. We've had 5
> > of these reboots now, so it's a repeatable problem, albeit on an
> > agonizing timescale for testing.
>
> Any chance you're running spamassassin on that box? I've seen similar
> issues lately and, comparing /var/log/messages with crontab, have
> concluded that sa-learn sometimes kills the box when run by cron. It
> seems so odd that I haven't found the guts to ask about it figuring
> that I should do some more test myself, but I don't have much else to
> go on...

Nope, no spamassassin. It doesn't seem to happen at any particular time
of day/week/month or in conjunction with any particular load.

-ryan

2006-02-13 21:39:32

by Ryan Richter

[permalink] [raw]
Subject: Re: Random reboots

On Mon, Feb 13, 2006 at 09:32:21PM +0000, Nick Warne wrote:
> > Nope, no spamassassin. It doesn't seem to happen at any particular time
> > of day/week/month or in conjunction with any particular load.
>
> What is your base system on these boxes? I have had 3 'reboots' on 2
> boxes at work over the last year running RHEL 3, and there is nothing
> *at all* in logs, nor in the tests scripts RedHat guys got me to run
> (nor hardware issues) - it's as if someone just hit the power switch.

It runs Debian Sarge for AMD64. I have lots of other machines, but only
this one gets the reboots. None of the others have SCSI, and none are
dual-CPU with memory on both nodes, just to name two obvious things
different on this machine.

And my symptoms are the same - nothing in the logs, nothing sent to the
syslog server, nothing on the serial console. Just like a power cut.

-ryan

2006-02-13 21:49:59

by Ryan Richter

[permalink] [raw]
Subject: Re: Random reboots

On Mon, Feb 13, 2006 at 04:39:29PM -0500, ryan wrote:
> It runs Debian Sarge for AMD64. I have lots of other machines, but only
> this one gets the reboots. None of the others have SCSI, and none are
> dual-CPU with memory on both nodes, just to name two obvious things
> different on this machine.

Thinking about this some more... My home desktop also is a dual opteron
with memory on both nodes and SCSI, but it hasn't had any reboots. The
machine with the reboot trouble uses RAID5+LVM, unlike my desktop. Also
it's an NFS server, but I have another machine (single-cpu pentium 4, no
SCSI etc.) that's an NFS server without reboots. But none of the other
machines have RAID or LVM.

It's maddening because half of the evidence points to hardware trouble
with this one machine, but the other half contradicts this. It's
interesting to hear that other people are experiencing this, though.

-ryan

2006-02-14 08:54:51

by Erik Mouw

[permalink] [raw]
Subject: Re: Random reboots

On Mon, Feb 13, 2006 at 04:49:57PM -0500, Ryan Richter wrote:
> On Mon, Feb 13, 2006 at 04:39:29PM -0500, ryan wrote:
> > It runs Debian Sarge for AMD64. I have lots of other machines, but only
> > this one gets the reboots. None of the others have SCSI, and none are
> > dual-CPU with memory on both nodes, just to name two obvious things
> > different on this machine.
>
> Thinking about this some more... My home desktop also is a dual opteron
> with memory on both nodes and SCSI, but it hasn't had any reboots. The
> machine with the reboot trouble uses RAID5+LVM, unlike my desktop. Also
> it's an NFS server, but I have another machine (single-cpu pentium 4, no
> SCSI etc.) that's an NFS server without reboots. But none of the other
> machines have RAID or LVM.

We recently had such an issue with a dual AMD64 machine rebooting at
mke2fs. It turned out it was a faulty power supply. After we changed
the power supply, everything ran smooth again.

You could start to test by powering your drives from an old AT-style
power supply leaving more "juice" for the main board and CPUs.


Erik

--
+-- Erik Mouw -- http://www.harddisk-recovery.com -- +31 70 370 12 90 --
| Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands

2006-02-14 13:29:13

by Ryan Richter

[permalink] [raw]
Subject: Re: Random reboots

On Tue, Feb 14, 2006 at 09:54:47AM +0100, Erik Mouw wrote:
> On Mon, Feb 13, 2006 at 04:49:57PM -0500, Ryan Richter wrote:
> > On Mon, Feb 13, 2006 at 04:39:29PM -0500, ryan wrote:
> > > It runs Debian Sarge for AMD64. I have lots of other machines, but only
> > > this one gets the reboots. None of the others have SCSI, and none are
> > > dual-CPU with memory on both nodes, just to name two obvious things
> > > different on this machine.
> >
> > Thinking about this some more... My home desktop also is a dual opteron
> > with memory on both nodes and SCSI, but it hasn't had any reboots. The
> > machine with the reboot trouble uses RAID5+LVM, unlike my desktop. Also
> > it's an NFS server, but I have another machine (single-cpu pentium 4, no
> > SCSI etc.) that's an NFS server without reboots. But none of the other
> > machines have RAID or LVM.
>
> We recently had such an issue with a dual AMD64 machine rebooting at
> mke2fs. It turned out it was a faulty power supply. After we changed
> the power supply, everything ran smooth again.
>
> You could start to test by powering your drives from an old AT-style
> power supply leaving more "juice" for the main board and CPUs.

It's possible, but I doubt it. More often than not, the reboot happens
when the machine is completely idle - in fact I can't remember a single
time when it wasn't idle. I just spent a couple months debugging a
SCSI-tape crash, and I ran the backups a lot and had lots of RAID
resyncs and it *never* rebooted during either of these events. Anyway
it has quite a large 2+1 redundant power supply, and, like I said, we
routinely had 3+ months of uptime with older kernels.

During the years I've had this machine, I've experienced at least 10-15
strange kernel bugs that only happened on this machine. Each and every
time I was *convinced* that the hardware was at fault (and people on the
mailing list suggested it) until either a kernel came out that fixed the
problem or a kernel developer positively identified it as a kernel
problem and eventually fixed it. This machine just seems to be a magnet
for kernel bugs.

Thanks,
-ryan

2006-02-14 14:47:11

by Nick Warne

[permalink] [raw]
Subject: Re: Random reboots

> During the years I've had this machine, I've experienced at least 10-15
> strange kernel bugs that only happened on this machine. Each and every
> time I was *convinced* that the hardware was at fault (and people on the
> mailing list suggested it) until either a kernel came out that fixed the
> problem or a kernel developer positively identified it as a kernel
> problem and eventually fixed it. This machine just seems to be a magnet
> for kernel bugs.

OK, reference to the similar issues I have seen on 2 boxes here at
work (ukqip01 & 02), I have dug out an old mail I sent to colleague
after the 3 time it happened to me:



"In fact it has happened to both of them now. UKQIP01 restarted once
(about 3 years) ago in the middle of the night, then about 6 months
later UKQIP02 did the same. If you remember I then replaced memory
from 512 to 1024.

Now UKQIP02 has done it again.

I am not too concerned, as twice in 3 years is not really a problem
(yet), but I am more curious as to why because I want to fix it.

It also only happens in the dead twilight hours too.

I will look at running memtest, but as you know that takes hours and
the box will have to taken down for the duration.

I will keep you informed."



This was last September 2005 - the time it happened before that was
Janurary 2005.

Very strange.

Nick

2006-02-14 22:22:07

by Jean Delvare

[permalink] [raw]
Subject: Re: Random reboots

Hi Ryan,

> > We recently had such an issue with a dual AMD64 machine rebooting at
> > mke2fs. It turned out it was a faulty power supply. After we changed
> > the power supply, everything ran smooth again.
> >
> > You could start to test by powering your drives from an old AT-style
> > power supply leaving more "juice" for the main board and CPUs.
>
> It's possible, but I doubt it. More often than not, the reboot happens
> when the machine is completely idle - in fact I can't remember a single
> time when it wasn't idle. I just spent a couple months debugging a
> SCSI-tape crash, and I ran the backups a lot and had lots of RAID
> resyncs and it *never* rebooted during either of these events. Anyway
> it has quite a large 2+1 redundant power supply, and, like I said, we
> routinely had 3+ months of uptime with older kernels.

You seem to have hardware monitoring drivers loaded on the system, so
I'd suggest that you watch the returned values over time. If the
hardware is going wrong it might show there. Your system could be
overheating for some reason (stuck fan...)

The fact that older kernels were seemingly working better doesn't prove
much. You were running these kernels before, not now, and hardware
*does* age, contrary to what people seem to think. If you want to make
certain that older kernels were indeed working better for purely
software reasons, you should switch back to such an old kernel and see
if things actually improve or not.

A wild guess while I'm at it... Is the machine behind a KVM switch by
any chance? I have a fun (old) motherboard here which reboots when I
unplug the keyboard and plug it again. Never seen that before...

> During the years I've had this machine, I've experienced at least 10-15
> strange kernel bugs that only happened on this machine. Each and every
> time I was *convinced* that the hardware was at fault (and people on the
> mailing list suggested it) until either a kernel came out that fixed the
> problem or a kernel developer positively identified it as a kernel
> problem and eventually fixed it. This machine just seems to be a magnet
> for kernel bugs.

Note that the first case ("a kernel came out that fixed the problem")
doesn't mean that the hardware was not at fault. There are quite a few
quirks in the Linux kernel code which are just there to workaround
known hardware or BIOS bugs.

--
Jean Delvare

2006-02-15 14:28:18

by Ryan Richter

[permalink] [raw]
Subject: Re: Random reboots

On Tue, Feb 14, 2006 at 11:22:22PM +0100, Jean Delvare wrote:
> You seem to have hardware monitoring drivers loaded on the system, so
> I'd suggest that you watch the returned values over time. If the
> hardware is going wrong it might show there. Your system could be
> overheating for some reason (stuck fan...)
>
> The fact that older kernels were seemingly working better doesn't prove
> much. You were running these kernels before, not now, and hardware
> *does* age, contrary to what people seem to think. If you want to make
> certain that older kernels were indeed working better for purely
> software reasons, you should switch back to such an old kernel and see
> if things actually improve or not.
>
> Note that the first case ("a kernel came out that fixed the problem")
> doesn't mean that the hardware was not at fault. There are quite a few
> quirks in the Linux kernel code which are just there to workaround
> known hardware or BIOS bugs.

No, the old kernels still have all the bugs they ever did (of course).
I tested it during the st-iommu-doublefree debugging. I do not plan on
running the old kernel again, mainly because it has so many irritating
bugs (df doesn't work, the serial console stalls on boot, so it won't
boot without handholding, etc. etc.). I'd have to run it for at least a
month to verify, and the old kernel has security vulnerabilities and so
on.

The sensors report a bunch of obvious nonsesne as always... I keep them
configured in with the hope that one day they'll report useful
information, but that day hasn't come yet. I just checked, and all the
fans are still fine. It's in a huge case with lots of fans and it's
hardly warmer than room temp. The opteron 240s don't put out much heat.

I'm still thououghly convinced it's a kernel bug.

-ryan

2006-02-15 15:11:49

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: Random reboots


On Wed, 15 Feb 2006, Ryan Richter wrote:

> On Tue, Feb 14, 2006 at 11:22:22PM +0100, Jean Delvare wrote:
>> You seem to have hardware monitoring drivers loaded on the system, so
>> I'd suggest that you watch the returned values over time. If the
>> hardware is going wrong it might show there. Your system could be
>> overheating for some reason (stuck fan...)
>>
>> The fact that older kernels were seemingly working better doesn't prove
>> much. You were running these kernels before, not now, and hardware
>> *does* age, contrary to what people seem to think. If you want to make
>> certain that older kernels were indeed working better for purely
>> software reasons, you should switch back to such an old kernel and see
>> if things actually improve or not.
>>
>> Note that the first case ("a kernel came out that fixed the problem")
>> doesn't mean that the hardware was not at fault. There are quite a few
>> quirks in the Linux kernel code which are just there to workaround
>> known hardware or BIOS bugs.
>
> No, the old kernels still have all the bugs they ever did (of course).
> I tested it during the st-iommu-doublefree debugging. I do not plan on
> running the old kernel again, mainly because it has so many irritating
> bugs (df doesn't work, the serial console stalls on boot, so it won't
> boot without handholding, etc. etc.). I'd have to run it for at least a
> month to verify, and the old kernel has security vulnerabilities and so
> on.
>
> The sensors report a bunch of obvious nonsesne as always... I keep them
^^^^^^^^^^^^^^^^^^^_________ Hint?

I have a "new" machine with a "Thunder" board. It started to re-boot
for no good reason at all. It turns out that the plastic catch in
the fan/heatsink hold-down mechanism broke so the heatsink was
not tight against the CPU. I "fixed" it by tying it down with
some wire. The reboot problems, and some other "strange" problems
went away. One of the strange problems was that my 'C' runtime
library got corrupted, as well as some other read-only files,
even though e3fsck never found any problems.

> configured in with the hope that one day they'll report useful
> information, but that day hasn't come yet. I just checked, and all the
> fans are still fine. It's in a huge case with lots of fans and it's
> hardly warmer than room temp. The opteron 240s don't put out much heat.
>

It's the CPU that counts, not the air temperature. Check its hardware.

> I'm still thououghly convinced it's a kernel bug.
>
> -ryan
> -

Cheers,
Dick Johnson
Penguin : Linux version 2.6.13.4 on an i686 machine (5589.66 BogoMips).
Warning : 98.36% of all statistics are fiction.
_


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2006-02-15 15:14:01

by Ryan Richter

[permalink] [raw]
Subject: Re: Random reboots

On Wed, Feb 15, 2006 at 10:11:45AM -0500, linux-os (Dick Johnson) wrote:
> > The sensors report a bunch of obvious nonsesne as always... I keep them
> ^^^^^^^^^^^^^^^^^^^_________ Hint?
>
> I have a "new" machine with a "Thunder" board. It started to re-boot
> for no good reason at all. It turns out that the plastic catch in
> the fan/heatsink hold-down mechanism broke so the heatsink was
> not tight against the CPU. I "fixed" it by tying it down with
> some wire. The reboot problems, and some other "strange" problems
> went away. One of the strange problems was that my 'C' runtime
> library got corrupted, as well as some other read-only files,
> even though e3fsck never found any problems.

All the temps have always reported 77C. I felt the heatsinks this
morning, and they're fine.

-ryan

2006-02-15 15:44:38

by Jean Delvare

[permalink] [raw]
Subject: Re: Random reboots


Hi Ryan,

On 2006-02-15, Ryan Richter wrote:
> The sensors report a bunch of obvious nonsesne as always... I keep them
> configured in with the hope that one day they'll report useful
> information, but that day hasn't come yet. I just checked, and all the
> fans are still fine. It's in a huge case with lots of fans and it's
> hardly warmer than room temp. The opteron 240s don't put out much heat.

The sensors might just need some board-specific configuration. May I ask
which motherboard this is?

I may help you (in private) setup your sensors. If you're interested,
send the output of "sensors-detect" and "sensors" to me and I'll
see what can be done to improve the reported values.

Two more random thoughts:

Any reason why you run 2.6.15 rather than 2.6.15.4? That's where I would
start if I was suspecting a kernel bug.

Did you already update the BIOS to the latest version available? There
are a few kernel complaints in your dmesg which might be solved by a
newer BIOS (and/or parameter changes in the BIOS setup).

--
Jean Delvare

2006-02-15 16:00:38

by Ryan Richter

[permalink] [raw]
Subject: Re: Random reboots

On Wed, Feb 15, 2006 at 04:41:39PM +0100, Jean Delvare wrote:
>
> Hi Ryan,
>
> On 2006-02-15, Ryan Richter wrote:
> > The sensors report a bunch of obvious nonsesne as always... I keep them
> > configured in with the hope that one day they'll report useful
> > information, but that day hasn't come yet. I just checked, and all the
> > fans are still fine. It's in a huge case with lots of fans and it's
> > hardly warmer than room temp. The opteron 240s don't put out much heat.
>
> The sensors might just need some board-specific configuration. May I ask
> which motherboard this is?
>
> I may help you (in private) setup your sensors. If you're interested,
> send the output of "sensors-detect" and "sensors" to me and I'll
> see what can be done to improve the reported values.

It's a Tyan S2880, and I'm using their sensors.conf:

ftp://ftp.tyan.com/software/lms/lms_s2880.tgz

Here's what sensors reports:

w83627hf-isa-0290
Adapter: ISA adapter
VCore 1: +1.54 V (min = +1.47 V, max = +1.62 V) ALARM
VCore 2: +1.54 V (min = +1.47 V, max = +1.62 V) ALARM
+3.3V: +3.33 V (min = +3.14 V, max = +3.46 V)
+5V: +4.97 V (min = +4.73 V, max = +5.24 V)
+12V: +4.56 V (min = +10.82 V, max = +13.19 V)
-12V: -2.25 V (min = -13.18 V, max = -10.88 V)
-5V: -3.94 V (min = -5.25 V, max = -4.75 V)
V5SB: +5.51 V (min = +4.73 V, max = +5.24 V)
VBat: +1.28 V (min = +2.40 V, max = +3.60 V)
fan1: 4354 RPM (min = -1 RPM, div = 2)
fan2: 3479 RPM (min = 5273 RPM, div = 2)
fan3: 0 RPM (min = 30681 RPM, div = 2)
temp1: +77°C (high = -128°C, hyst = -128°C) sensor = thermistor
temp2: +77.5°C (high = +80°C, hyst = +75°C) sensor = thermistor
temp3: +77.5°C (high = +80°C, hyst = +75°C) sensor = thermistor
vid: +1.550 V (VRM Version 2.4)

The temps and +/-12V readings are obviously wrong, and always have been
AFAIR. I've run the machine with 6 more 10krpm old full-height drives
than it currently has. I checked the max 5V and 12V current draw of the
drives and specced the power supply carefully when we bought it a couple
years ago, and it has lots of headroom on both of those rails.

> Two more random thoughts:
>
> Any reason why you run 2.6.15 rather than 2.6.15.4? That's where I would
> start if I was suspecting a kernel bug.
>
> Did you already update the BIOS to the latest version available? There
> are a few kernel complaints in your dmesg which might be solved by a
> newer BIOS (and/or parameter changes in the BIOS setup).

I'll be booting 2.6.15.4 this weekend. The BIOS is indeed old, and I
see there's a newer one that came out a year ago. It'll be a while
before I can try it, I need to scare up a keyboard, video card, and
monitor, not to mention a DOS disk. You can tell why I haven't flashed
the BIOS in years...

Still, I don't see why the new kernel shouldn't be stable if 2.6.11.3
was.

Thanks,
-ryan

2006-02-15 16:23:22

by Jean Delvare

[permalink] [raw]
Subject: Re: Random reboots


Ryan,

On 2006-02-15, Ryan Richter wrote:
> It's a Tyan S2880, and I'm using their sensors.conf:
>
> ftp://ftp.tyan.com/software/lms/lms_s2880.tgz
>
> Here's what sensors reports:
>
> w83627hf-isa-0290
> Adapter: ISA adapter
> VCore 1: +1.54 V (min = +1.47 V, max = +1.62 V) ALARM
> VCore 2: +1.54 V (min = +1.47 V, max = +1.62 V) ALARM
> +3.3V: +3.33 V (min = +3.14 V, max = +3.46 V)
> +5V: +4.97 V (min = +4.73 V, max = +5.24 V)
> +12V: +4.56 V (min = +10.82 V, max = +13.19 V)
> -12V: -2.25 V (min = -13.18 V, max = -10.88 V)
> -5V: -3.94 V (min = -5.25 V, max = -4.75 V)
> V5SB: +5.51 V (min = +4.73 V, max = +5.24 V)
> VBat: +1.28 V (min = +2.40 V, max = +3.60 V)
> fan1: 4354 RPM (min = -1 RPM, div = 2)
> fan2: 3479 RPM (min = 5273 RPM, div = 2)
> fan3: 0 RPM (min = 30681 RPM, div = 2)
> temp1: +77?C (high = -128?C, hyst = -128?C) sensor = thermistor
> temp2: +77.5?C (high = +80?C, hyst = +75?C) sensor = thermistor
> temp3: +77.5?C (high = +80?C, hyst = +75?C) sensor = thermistor
> vid: +1.550 V (VRM Version 2.4)

There's one chip missing. If memory serves, this board has two hardware
monitoring chips: one Winbond Super-I/O and one LM85-compatible SMBus
chip. You are missing the i2c-amd756 driver in your kernel build
(CONFIG_I2C_AMD756) which prevents you from accessing that second chip.

Additionally, the Winbond Super-I/O chips are better supported by the
newer w83627hf driver than by the w83781d you are using.

So, you should change your kernel configuration to:

CONFIG_I2C_AMD756=y
#CONFIG_SENSORS_W83781D is not set
CONFIG_SENSORS_W83627HF=y

Then you'll probably have much better results - even if the
configuration file might need additional tweaking.

> Still, I don't see why the new kernel shouldn't be stable if 2.6.11.3
> was.

If not software regression, the aging of your hardware might have caused
it, as I mentioned earlier. But you are free to believe in the
hypothesis you prefer, given that we are not currently able to
demonstrate it anyway ;)

--
Jean Delvare

2006-02-15 16:31:03

by Ryan Richter

[permalink] [raw]
Subject: Re: Random reboots

On Wed, Feb 15, 2006 at 05:20:37PM +0100, Jean Delvare wrote:
> There's one chip missing. If memory serves, this board has two hardware
> monitoring chips: one Winbond Super-I/O and one LM85-compatible SMBus
> chip. You are missing the i2c-amd756 driver in your kernel build
> (CONFIG_I2C_AMD756) which prevents you from accessing that second chip.
>
> Additionally, the Winbond Super-I/O chips are better supported by the
> newer w83627hf driver than by the w83781d you are using.
>
> So, you should change your kernel configuration to:
>
> CONFIG_I2C_AMD756=y
> #CONFIG_SENSORS_W83781D is not set
> CONFIG_SENSORS_W83627HF=y
>
> Then you'll probably have much better results - even if the
> configuration file might need additional tweaking.

Aha, thanks. I probably configured out the AMD756 when we switched to
this board from an actual AMD 7xx board, thinking it was no longer
appropriate. I'll make the change this weekend.

> > Still, I don't see why the new kernel shouldn't be stable if 2.6.11.3
> > was.
>
> If not software regression, the aging of your hardware might have caused
> it, as I mentioned earlier. But you are free to believe in the
> hypothesis you prefer, given that we are not currently able to
> demonstrate it anyway ;)

It could certainly be hardware, but it seems awfully unlikely that that
would occur exactly when I upgraded the kernel. A kernel bug just seems
the most parsimonious explanation, to me.

-ryan

2006-02-15 18:46:48

by Ryan Richter

[permalink] [raw]
Subject: Re: Random reboots

I just remembered something that might be related. Another thing that's
unique about this machine is that it uses ethernet bonding. When I
first set this up (on some old kernel, I don't remember which) I of
course tried to see if I could saturate both gigabit ethernet
interfaces. I set up two UDP streams to different machines with 64-bit
PCI busses, and that didn't quite do it. So I started up a third, and
that did it, but caused the machine to instantly reboot itself after a
few seconds. I tried this a few more times, and it was repeatable.
Some later kernel version fixed this - probably 2.6.11.3.

I just tried again, and I can still saturate both interfaces with
outbound UDP traffic, but no reboot.

Just a thought.

-ryan

2006-02-27 20:35:14

by Ryan Richter

[permalink] [raw]
Subject: Re: Random reboots

On Wed, Feb 15, 2006 at 05:20:37PM +0100, Jean Delvare wrote:
> There's one chip missing. If memory serves, this board has two hardware
> monitoring chips: one Winbond Super-I/O and one LM85-compatible SMBus
> chip. You are missing the i2c-amd756 driver in your kernel build
> (CONFIG_I2C_AMD756) which prevents you from accessing that second chip.

The second chip now reports the 12V line is nominal, and its temps are
all at or below 40C. Reboots still occur with 2.6.15.4.

-ryan