2002-10-04 20:10:03

by Paul Erkkila

[permalink] [raw]
Subject: oops in bk pull (oct 03)

ksymoops 2.4.5 on i686 2.4.19-crypto-r7. Options used
-v ./vmlinux (specified)
-K (specified)
-L (specified)
-o /lib/modules/2.4.19-crypto-r7/ (default)
-m System.map (specified)

No modules in ksyms, skipping objects
Unable to handle kernel paging request at virtual address f8000008
c01c9d10
*pde = 00000000
Oops: 0002
CPU: 0
EIP: 0060:[<c01c9d10>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010202
eax: f8000008 ebx: 00000010 ecx: 00000000 edx: f8000008
esi: c1523c00 edi: c1523d38 ebp: 00000001 esp: dffcdb60
ds: 0068 es: 0068 ss: 0068
Stack: c01c9e91 f8000008 fffffff0 00000010 dffcdb78 c1523ef8 f8000008 00000008
d0000008 c1523c00 c1523f48 00000000 00000000 c01ca2a6 c1523c00 00000006
00000030 dffcdbac 00000000 00000600 c1523c00 dffcdc20 c03a1351 c1523c00
[<c01c9e91>] pci_read_bases+0x161/0x340
[<c01ca2a6>] pci_setup_device+0x1b6/0x3d0
[<c0105109>] init+0x79/0x200
[<c0105090>] init+0x0/0x200
[<c01073e5>] kernel_thread_helper+0x5/0x10
Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 48 c3 8d b4


>>EIP; c01c9d10 <pci_size+0/20> <=====

>>eax; f8000008 <END_OF_CODE+37aa9717/????>
>>edx; f8000008 <END_OF_CODE+37aa9717/????>
>>esi; c1523c00 <END_OF_CODE+fcd30f/????>
>>edi; c1523d38 <END_OF_CODE+fcd447/????>
>>esp; dffcdb60 <END_OF_CODE+1fa7726f/????>

Code; c01c9d10 <pci_size+0/20> <=====
00000000 <_EIP>: <=====
Code; c01c9d20 <pci_size+10/20>
10: 48 dec %eax
Code; c01c9d21 <pci_size+11/20>
11: c3 ret
Code; c01c9d22 <pci_size+12/20>
12: 8d b4 00 00 00 00 00 lea 0x0(%eax,%eax,1),%esi

<0>Kernel panic: Attempted to kill init!

===lspci -vv ===
00:00.0 Host bridge: VIA Technologies, Inc. VT82C693A/694x [Apollo PRO133x] (rev c4)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Latency: 0
Region 0: Memory at d0000000 (32-bit, prefetchable) [size=128M]
Capabilities: [a0] AGP version 2.0
Status: RQ=31 SBA+ 64bit- FW- Rate=x1,x2,x4
Command: RQ=0 SBA- AGP- 64bit- FW- Rate=<none>
Capabilities: [c0] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:01.0 PCI bridge: VIA Technologies, Inc. VT82C598/694x [Apollo MVP3/Pro133x AGP] (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Latency: 0
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
Memory behind bridge: da000000-dcffffff
Prefetchable memory behind bridge: d8000000-d9ffffff
BridgeCtl: Parity- SERR- NoISA+ VGA+ MAbort- >Reset- FastB2B-
Capabilities: [80] Power Management version 2
Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 22)
Subsystem: VIA Technologies, Inc. VT82C686/A PCI to ISA Bridge
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0

00:07.1 IDE interface: VIA Technologies, Inc. VT82C586B PIPC Bus Master IDE (rev 10) (prog-if 8a [Master SecP PriP])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32
Region 4: I/O ports at c000 [size=16]
Capabilities: [c0] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:07.2 USB Controller: VIA Technologies, Inc. USB (rev 10) (prog-if 00 [UHCI])
Subsystem: VIA Technologies, Inc. (Wrong ID) USB Controller
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32, cache line size 08
Interrupt: pin D routed to IRQ 5
Region 4: I/O ports at c400 [size=32]
Capabilities: [80] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:07.3 USB Controller: VIA Technologies, Inc. USB (rev 10) (prog-if 00 [UHCI])
Subsystem: VIA Technologies, Inc. (Wrong ID) USB Controller
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32, cache line size 08
Interrupt: pin D routed to IRQ 5
Region 4: I/O ports at c800 [size=32]
Capabilities: [80] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:07.4 Host bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 30)
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin ? routed to IRQ 9
Capabilities: [68] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:07.5 Multimedia audio controller: VIA Technologies, Inc. VT82C686 AC97 Audio Controller (rev 20)
Subsystem: VIA Technologies, Inc. VT82C686 AC97 Audio Controller
Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin C routed to IRQ 15
Region 0: I/O ports at cc00 [size=256]
Region 1: I/O ports at d000 [size=4]
Region 2: I/O ports at d400 [size=4]
Capabilities: [c0] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:0c.0 RAID bus controller: Promise Technology, Inc. 20265 (rev 02)
Subsystem: Promise Technology, Inc. Ultra100
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32
Interrupt: pin A routed to IRQ 15
Region 0: I/O ports at d800 [size=8]
Region 1: I/O ports at dc00 [size=4]
Region 2: I/O ports at e000 [size=8]
Region 3: I/O ports at e400 [size=4]
Region 4: I/O ports at e800 [size=64]
Region 5: Memory at de000000 (32-bit, non-prefetchable) [size=128K]
Expansion ROM at <unassigned> [disabled] [size=64K]
Capabilities: [58] Power Management version 1
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:12.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone] (rev 30)
Subsystem: 3Com Corporation 3C905B Fast Etherlink XL 10/100
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (2500ns min, 2500ns max), cache line size 08
Interrupt: pin A routed to IRQ 11
Region 0: I/O ports at ec00 [size=128]
Region 1: Memory at de020000 (32-bit, non-prefetchable) [size=128]
Expansion ROM at <unassigned> [disabled] [size=128K]
Capabilities: [dc] Power Management version 1
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

01:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G400 AGP (rev 82) (prog-if 00 [VGA])
Subsystem: Matrox Graphics, Inc. Millennium G450 Dual Head
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (4000ns min, 8000ns max), cache line size 08
Interrupt: pin A routed to IRQ 10
Region 0: Memory at d8000000 (32-bit, prefetchable) [size=32M]
Region 1: Memory at da000000 (32-bit, non-prefetchable) [size=16K]
Region 2: Memory at db000000 (32-bit, non-prefetchable) [size=8M]
Expansion ROM at <unassigned> [disabled] [size=128K]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [f0] AGP version 2.0
Status: RQ=31 SBA+ 64bit- FW- Rate=x1,x2,x4
Command: RQ=31 SBA+ AGP+ 64bit- FW- Rate=x1


Attachments:
koops.out (8.78 kB)

2002-10-04 21:53:05

by Linus Torvalds

[permalink] [raw]
Subject: Re: oops in bk pull (oct 03)

In article <[email protected]>,
Paul E. Erkkila <[email protected]> wrote:
>
>Oops in drivers/pci/probe.c
>
>Oops (copied), ksymoops, and lspci -vv attached
>
>No modules in ksyms, skipping objects
>Unable to handle kernel paging request at virtual address f8000008
>c01c9d10
>*pde = 00000000
>Oops: 0002
>CPU: 0
>EIP: 0060:[<c01c9d10>] Not tainted
>Using defaults from ksymoops -t elf32-i386 -a i386
>EFLAGS: 00010202
>eax: f8000008 ebx: 00000010 ecx: 00000000 edx: f8000008
>esi: c1523c00 edi: c1523d38 ebp: 00000001 esp: dffcdb60
>ds: 0068 es: 0068 ss: 0068
>Stack: c01c9e91 f8000008 fffffff0 00000010 dffcdb78 c1523ef8 f8000008 00000008
> d0000008 c1523c00 c1523f48 00000000 00000000 c01ca2a6 c1523c00 00000006
> 00000030 dffcdbac 00000000 00000600 c1523c00 dffcdc20 c03a1351 c1523c00
> [<c01c9e91>] pci_read_bases+0x161/0x340
> [<c01ca2a6>] pci_setup_device+0x1b6/0x3d0
> [<c0105109>] init+0x79/0x200
> [<c0105090>] init+0x0/0x200
> [<c01073e5>] kernel_thread_helper+0x5/0x10
>Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 48 c3 8d b4

Something has corrupted your kernel image. Those 16 0x00 bytes are
definitely not the right code, looks like an errant memset() through a
wild pointer cleared it or something.

Is this repeatable? Does it happen with current BK?

Linus

2002-10-04 23:58:16

by Paul Erkkila

[permalink] [raw]
Subject: Re: oops in bk pull (oct 03)

#
# Automatically generated by make menuconfig: don't edit
#
CONFIG_X86=y
CONFIG_ISA=y
# CONFIG_SBUS is not set
CONFIG_UID16=y
CONFIG_GENERIC_ISA_DMA=y

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y

#
# General setup
#
CONFIG_NET=y
CONFIG_SYSVIPC=y
# CONFIG_BSD_PROCESS_ACCT is not set
CONFIG_SYSCTL=y

#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODVERSIONS=y
CONFIG_KMOD=y

#
# Processor type and features
#
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
CONFIG_MPENTIUMIII=y
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MELAN is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MCYRIXIII is not set
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
# CONFIG_RWSEM_GENERIC_SPINLOCK is not set
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_X86_TSC=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
# CONFIG_HUGETLB_PAGE is not set
CONFIG_SMP=y
CONFIG_PREEMPT=y
# CONFIG_X86_NUMA is not set
CONFIG_X86_MCE=y
# CONFIG_X86_MCE_NONFATAL is not set
# CONFIG_X86_MCE_P4THERMAL is not set
# CONFIG_CPU_FREQ is not set
# CONFIG_TOSHIBA is not set
# CONFIG_I8K is not set
# CONFIG_MICROCODE is not set
# CONFIG_X86_MSR is not set
# CONFIG_X86_CPUID is not set
CONFIG_NOHIGHMEM=y
# CONFIG_HIGHMEM4G is not set
# CONFIG_HIGHMEM64G is not set
# CONFIG_MATH_EMULATION is not set
CONFIG_MTRR=y
CONFIG_HAVE_DEC_LOCK=y

#
# Power management options (ACPI, APM)
#

#
# ACPI Support
#
# CONFIG_ACPI is not set
# CONFIG_PM is not set
# CONFIG_APM is not set

#
# Bus options (PCI, PCMCIA, EISA, MCA, ISA)
#
CONFIG_X86_IO_APIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_PCI=y
# CONFIG_PCI_GOBIOS is not set
# CONFIG_PCI_GODIRECT is not set
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_NAMES=y
# CONFIG_EISA is not set
# CONFIG_MCA is not set
# CONFIG_HOTPLUG is not set
# CONFIG_PCMCIA is not set
# CONFIG_HOTPLUG_PCI is not set

#
# Executable file formats
#
CONFIG_KCORE_ELF=y
# CONFIG_KCORE_AOUT is not set
CONFIG_BINFMT_AOUT=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_MISC=y

#
# Memory Technology Devices (MTD)
#
# CONFIG_MTD is not set

#
# Parallel port support
#
# CONFIG_PARPORT is not set

#
# Plug and Play configuration
#
CONFIG_PNP=y
CONFIG_ISAPNP=y
# CONFIG_PNPBIOS is not set

#
# Block devices
#
CONFIG_BLK_DEV_FD=y
# CONFIG_BLK_DEV_XD is not set
# CONFIG_PARIDE is not set
# CONFIG_BLK_CPQ_DA is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_CISS_SCSI_TAPE is not set
# CONFIG_BLK_DEV_DAC960 is not set
# CONFIG_BLK_DEV_UMEM is not set
# CONFIG_BLK_DEV_LOOP is not set
# CONFIG_BLK_DEV_NBD is not set
# CONFIG_BLK_DEV_RAM is not set
# CONFIG_BLK_DEV_INITRD is not set

#
# ATA/ATAPI/MFM/RLL device support
#
CONFIG_IDE=y

#
# IDE, ATA and ATAPI Block devices
#
CONFIG_BLK_DEV_IDE=y
# CONFIG_BLK_DEV_HD_IDE is not set
# CONFIG_BLK_DEV_HD is not set
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_IDEDISK_MULTI_MODE=y
# CONFIG_IDEDISK_STROKE is not set
# CONFIG_BLK_DEV_IDECS is not set
CONFIG_BLK_DEV_IDECD=y
# CONFIG_BLK_DEV_IDEFLOPPY is not set
# CONFIG_BLK_DEV_IDESCSI is not set
# CONFIG_IDE_TASK_IOCTL is not set
# CONFIG_BLK_DEV_CMD640 is not set
# CONFIG_BLK_DEV_CMD640_ENHANCED is not set
# CONFIG_BLK_DEV_ISAPNP is not set
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_BLK_DEV_GENERIC=y
CONFIG_IDEPCI_SHARE_IRQ=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
# CONFIG_BLK_DEV_OFFBOARD is not set
# CONFIG_BLK_DEV_IDEDMA_FORCED is not set
CONFIG_IDEDMA_PCI_AUTO=y
# CONFIG_IDEDMA_ONLYDISK is not set
CONFIG_BLK_DEV_IDEDMA=y
# CONFIG_IDEDMA_PCI_WIP is not set
# CONFIG_IDEDMA_NEW_DRIVE_LISTINGS is not set
CONFIG_BLK_DEV_ADMA=y
# CONFIG_BLK_DEV_AEC62XX is not set
# CONFIG_BLK_DEV_ALI15X3 is not set
# CONFIG_WDC_ALI15X3 is not set
# CONFIG_BLK_DEV_AMD74XX is not set
# CONFIG_AMD74XX_OVERRIDE is not set
# CONFIG_BLK_DEV_CMD64X is not set
# CONFIG_BLK_DEV_CY82C693 is not set
# CONFIG_BLK_DEV_CS5530 is not set
# CONFIG_BLK_DEV_HPT34X is not set
# CONFIG_HPT34X_AUTODMA is not set
# CONFIG_BLK_DEV_HPT366 is not set
# CONFIG_BLK_DEV_PIIX is not set
# CONFIG_BLK_DEV_NFORCE is not set
# CONFIG_BLK_DEV_NS87415 is not set
# CONFIG_BLK_DEV_OPTI621 is not set
CONFIG_BLK_DEV_PDC202XX_OLD=y
CONFIG_PDC202XX_BURST=y
CONFIG_BLK_DEV_PDC202XX_NEW=y
CONFIG_PDC202XX_FORCE=y
# CONFIG_BLK_DEV_RZ1000 is not set
# CONFIG_BLK_DEV_SVWKS is not set
# CONFIG_BLK_DEV_SIIMAGE is not set
# CONFIG_BLK_DEV_SIS5513 is not set
# CONFIG_BLK_DEV_SLC90E66 is not set
# CONFIG_BLK_DEV_TRM290 is not set
# CONFIG_BLK_DEV_VIA82CXXX is not set
# CONFIG_IDE_CHIPSETS is not set
CONFIG_IDEDMA_AUTO=y
# CONFIG_IDEDMA_IVB is not set
# CONFIG_DMA_NONPCI is not set
CONFIG_BLK_DEV_PDC202XX=y
CONFIG_BLK_DEV_IDE_MODES=y

#
# SCSI device support
#
# CONFIG_SCSI is not set

#
# Old non-SCSI/ATAPI CD-ROM drives
#
# CONFIG_CD_NO_IDESCSI is not set

#
# Multi-device support (RAID and LVM)
#
# CONFIG_MD is not set
# CONFIG_BLK_DEV_MD is not set
# CONFIG_MD_LINEAR is not set
# CONFIG_MD_RAID0 is not set
# CONFIG_MD_RAID1 is not set
# CONFIG_MD_RAID5 is not set
# CONFIG_MD_MULTIPATH is not set
# CONFIG_BLK_DEV_LVM is not set

#
# Fusion MPT device support
#
# CONFIG_FUSION is not set
# CONFIG_FUSION_BOOT is not set
# CONFIG_FUSION_ISENSE is not set
# CONFIG_FUSION_CTL is not set
# CONFIG_FUSION_LAN is not set

#
# IEEE 1394 (FireWire) support (EXPERIMENTAL)
#
# CONFIG_IEEE1394 is not set

#
# I2O device support
#
# CONFIG_I2O is not set
# CONFIG_I2O_PCI is not set
# CONFIG_I2O_BLOCK is not set
# CONFIG_I2O_LAN is not set
# CONFIG_I2O_SCSI is not set
# CONFIG_I2O_PROC is not set

#
# Networking options
#
CONFIG_PACKET=y
# CONFIG_PACKET_MMAP is not set
# CONFIG_NETLINK_DEV is not set
# CONFIG_NETFILTER is not set
# CONFIG_FILTER is not set
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
# CONFIG_IP_ADVANCED_ROUTER is not set
# CONFIG_IP_PNP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE is not set
# CONFIG_IP_MROUTE is not set
# CONFIG_ARPD is not set
# CONFIG_INET_ECN is not set
# CONFIG_SYN_COOKIES is not set
# CONFIG_IPV6 is not set

#
# SCTP Configuration (EXPERIMENTAL)
#
CONFIG_IPV6_SCTP__=y
# CONFIG_IP_SCTP is not set
# CONFIG_ATM is not set
# CONFIG_VLAN_8021Q is not set
# CONFIG_LLC is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_DEV_APPLETALK is not set
# CONFIG_DECNET is not set
# CONFIG_BRIDGE is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_NET_DIVERT is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
# CONFIG_NET_FASTROUTE is not set
# CONFIG_NET_HW_FLOWCONTROL is not set

#
# QoS and/or fair queueing
#
# CONFIG_NET_SCHED is not set

#
# Network device support
#
CONFIG_NETDEVICES=y

#
# ARCnet devices
#
# CONFIG_ARCNET is not set
CONFIG_DUMMY=y
# CONFIG_BONDING is not set
# CONFIG_EQUALIZER is not set
# CONFIG_TUN is not set
# CONFIG_ETHERTAP is not set
# CONFIG_NET_SB1000 is not set

#
# Ethernet (10 or 100Mbit)
#
CONFIG_NET_ETHERNET=y
# CONFIG_SUNLANCE is not set
# CONFIG_HAPPYMEAL is not set
# CONFIG_SUNBMAC is not set
# CONFIG_SUNQE is not set
# CONFIG_SUNGEM is not set
CONFIG_NET_VENDOR_3COM=y
# CONFIG_EL1 is not set
# CONFIG_EL2 is not set
# CONFIG_ELPLUS is not set
# CONFIG_EL16 is not set
# CONFIG_EL3 is not set
# CONFIG_3C515 is not set
# CONFIG_ELMC is not set
# CONFIG_ELMC_II is not set
CONFIG_VORTEX=y
# CONFIG_LANCE is not set
# CONFIG_NET_VENDOR_SMC is not set
# CONFIG_NET_VENDOR_RACAL is not set

#
# Tulip family network device support
#
# CONFIG_NET_TULIP is not set
# CONFIG_AT1700 is not set
# CONFIG_DEPCA is not set
# CONFIG_HP100 is not set
# CONFIG_NET_ISA is not set
# CONFIG_NET_PCI is not set
# CONFIG_NET_POCKET is not set

#
# Ethernet (1000 Mbit)
#
# CONFIG_ACENIC is not set
# CONFIG_DL2K is not set
# CONFIG_E1000 is not set
# CONFIG_E1000_NAPI is not set
# CONFIG_MYRI_SBUS is not set
# CONFIG_NS83820 is not set
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
# CONFIG_SK98LIN is not set
# CONFIG_TIGON3 is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_PLIP is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set

#
# Wireless LAN (non-hamradio)
#
# CONFIG_NET_RADIO is not set

#
# Token Ring devices
#
# CONFIG_TR is not set
# CONFIG_NET_FC is not set
# CONFIG_RCPCI is not set
# CONFIG_SHAPER is not set

#
# Wan interfaces
#
# CONFIG_WAN is not set

#
# Amateur Radio support
#
# CONFIG_HAMRADIO is not set

#
# IrDA (infrared) support
#
# CONFIG_IRDA is not set

#
# ISDN subsystem
#
# CONFIG_ISDN_BOOL is not set

#
# Telephony Support
#
# CONFIG_PHONE is not set
# CONFIG_PHONE_IXJ is not set
# CONFIG_PHONE_IXJ_PCMCIA is not set

#
# Input device support
#
CONFIG_INPUT=y
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_PSAUX=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
# CONFIG_INPUT_JOYDEV is not set
# CONFIG_INPUT_TSDEV is not set
# CONFIG_INPUT_EVDEV is not set
# CONFIG_INPUT_EVBUG is not set
# CONFIG_GAMEPORT is not set
CONFIG_SOUND_GAMEPORT=y
# CONFIG_GAMEPORT_NS558 is not set
# CONFIG_GAMEPORT_L4 is not set
# CONFIG_GAMEPORT_EMU10K1 is not set
# CONFIG_GAMEPORT_VORTEX is not set
# CONFIG_GAMEPORT_FM801 is not set
# CONFIG_GAMEPORT_CS461x is not set
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
# CONFIG_SERIO_SERPORT is not set
# CONFIG_SERIO_CT82C710 is not set
# CONFIG_SERIO_PARKBD is not set
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_XTKBD is not set
# CONFIG_KEYBOARD_NEWTON is not set
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
# CONFIG_MOUSE_SERIAL is not set
# CONFIG_MOUSE_INPORT is not set
# CONFIG_MOUSE_LOGIBM is not set
# CONFIG_MOUSE_PC110PAD is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_JOYSTICK_ANALOG is not set
# CONFIG_JOYSTICK_A3D is not set
# CONFIG_JOYSTICK_ADI is not set
# CONFIG_JOYSTICK_COBRA is not set
# CONFIG_JOYSTICK_GF2K is not set
# CONFIG_JOYSTICK_GRIP is not set
# CONFIG_JOYSTICK_GRIP_MP is not set
# CONFIG_JOYSTICK_GUILLEMOT is not set
# CONFIG_JOYSTICK_INTERACT is not set
# CONFIG_JOYSTICK_SIDEWINDER is not set
# CONFIG_JOYSTICK_TMDC is not set
# CONFIG_JOYSTICK_IFORCE is not set
# CONFIG_JOYSTICK_WARRIOR is not set
# CONFIG_JOYSTICK_MAGELLAN is not set
# CONFIG_JOYSTICK_SPACEORB is not set
# CONFIG_JOYSTICK_SPACEBALL is not set
# CONFIG_JOYSTICK_STINGER is not set
# CONFIG_JOYSTICK_TWIDDLER is not set
# CONFIG_JOYSTICK_DB9 is not set
# CONFIG_JOYSTICK_GAMECON is not set
# CONFIG_JOYSTICK_TURBOGRAFX is not set
# CONFIG_INPUT_JOYDUMP is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
# CONFIG_TOUCHSCREEN_GUNZE is not set
# CONFIG_INPUT_MISC is not set
# CONFIG_INPUT_PCSPKR is not set
# CONFIG_INPUT_UINPUT is not set

#
# Character devices
#
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
# CONFIG_SERIAL_NONSTANDARD is not set

#
# Serial drivers
#
CONFIG_SERIAL_8250=y
# CONFIG_SERIAL_8250_CONSOLE is not set
# CONFIG_SERIAL_8250_CS is not set
# CONFIG_SERIAL_8250_EXTENDED is not set
# CONFIG_SERIAL_8250_MANY_PORTS is not set
# CONFIG_SERIAL_8250_SHARE_IRQ is not set
# CONFIG_SERIAL_8250_DETECT_IRQ is not set
# CONFIG_SERIAL_8250_MULTIPORT is not set
# CONFIG_SERIAL_8250_RSA is not set
CONFIG_SERIAL_CORE=y
CONFIG_UNIX98_PTYS=y
CONFIG_UNIX98_PTY_COUNT=256

#
# I2C support
#
# CONFIG_I2C is not set

#
# Mice
#
# CONFIG_BUSMOUSE is not set
# CONFIG_QIC02_TAPE is not set

#
# Watchdog Cards
#
# CONFIG_WATCHDOG is not set
CONFIG_INTEL_RNG=y
# CONFIG_AMD_RNG is not set
# CONFIG_NVRAM is not set
# CONFIG_RTC is not set
# CONFIG_GEN_RTC is not set
# CONFIG_DTLK is not set
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set
# CONFIG_SONYPI is not set

#
# Ftape, the floppy tape device driver
#
# CONFIG_FTAPE is not set
CONFIG_AGP=y
CONFIG_AGP_INTEL=y
CONFIG_AGP_I810=y
CONFIG_AGP_VIA=y
CONFIG_AGP_AMD=y
CONFIG_AGP_SIS=y
CONFIG_AGP_ALI=y
CONFIG_AGP_SWORKS=y
# CONFIG_AGP_AMD_8151 is not set
CONFIG_DRM=y
# CONFIG_DRM_TDFX is not set
# CONFIG_DRM_R128 is not set
CONFIG_DRM_RADEON=y
# CONFIG_DRM_I810 is not set
# CONFIG_DRM_I830 is not set
# CONFIG_DRM_MGA is not set
# CONFIG_MWAVE is not set
# CONFIG_RAW_DRIVER is not set

#
# Multimedia devices
#
# CONFIG_VIDEO_DEV is not set

#
# File systems
#
# CONFIG_QUOTA is not set
# CONFIG_QFMT_V1 is not set
# CONFIG_QFMT_V2 is not set
# CONFIG_AUTOFS_FS is not set
# CONFIG_AUTOFS4_FS is not set
# CONFIG_REISERFS_FS is not set
# CONFIG_REISERFS_CHECK is not set
# CONFIG_REISERFS_PROC_INFO is not set
# CONFIG_ADFS_FS is not set
# CONFIG_ADFS_FS_RW is not set
# CONFIG_AFFS_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_BFS_FS is not set
CONFIG_EXT3_FS=y
CONFIG_JBD=y
# CONFIG_JBD_DEBUG is not set
CONFIG_FAT_FS=y
CONFIG_MSDOS_FS=y
# CONFIG_UMSDOS_FS is not set
CONFIG_VFAT_FS=y
# CONFIG_EFS_FS is not set
# CONFIG_JFFS_FS is not set
# CONFIG_JFFS2_FS is not set
# CONFIG_CRAMFS is not set
CONFIG_TMPFS=y
CONFIG_RAMFS=y
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
# CONFIG_ZISOFS is not set
# CONFIG_JFS_FS is not set
# CONFIG_JFS_DEBUG is not set
# CONFIG_JFS_STATISTICS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_VXFS_FS is not set
# CONFIG_NTFS_FS is not set
# CONFIG_NTFS_DEBUG is not set
# CONFIG_NTFS_RW is not set
# CONFIG_HPFS_FS is not set
CONFIG_PROC_FS=y
CONFIG_DEVFS_FS=y
# CONFIG_DEVFS_MOUNT is not set
# CONFIG_DEVFS_DEBUG is not set
CONFIG_DEVPTS_FS=y
# CONFIG_QNX4FS_FS is not set
# CONFIG_QNX4FS_RW is not set
# CONFIG_ROMFS_FS is not set
CONFIG_EXT2_FS=y
# CONFIG_SYSV_FS is not set
# CONFIG_UDF_FS is not set
# CONFIG_UDF_RW is not set
# CONFIG_UFS_FS is not set
# CONFIG_UFS_FS_WRITE is not set
# CONFIG_XFS_FS is not set
# CONFIG_XFS_RT is not set
# CONFIG_XFS_QUOTA is not set

#
# Network File Systems
#
# CONFIG_CODA_FS is not set
# CONFIG_INTERMEZZO_FS is not set
# CONFIG_NFS_FS is not set
# CONFIG_NFS_V3 is not set
# CONFIG_ROOT_NFS is not set
# CONFIG_NFSD is not set
# CONFIG_NFSD_V3 is not set
# CONFIG_NFSD_TCP is not set
# CONFIG_SUNRPC is not set
# CONFIG_LOCKD is not set
# CONFIG_EXPORTFS is not set
# CONFIG_SMB_FS is not set
# CONFIG_NCP_FS is not set
# CONFIG_NCPFS_PACKET_SIGNING is not set
# CONFIG_NCPFS_IOCTL_LOCKING is not set
# CONFIG_NCPFS_STRONG is not set
# CONFIG_NCPFS_NFS_NS is not set
# CONFIG_NCPFS_OS2_NS is not set
# CONFIG_NCPFS_SMALLDOS is not set
# CONFIG_NCPFS_NLS is not set
# CONFIG_NCPFS_EXTRAS is not set
# CONFIG_ZISOFS_FS is not set

#
# Partition Types
#
# CONFIG_PARTITION_ADVANCED is not set
CONFIG_MSDOS_PARTITION=y
# CONFIG_SMB_NLS is not set
CONFIG_NLS=y

#
# Native Language Support
#
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=y
# CONFIG_NLS_CODEPAGE_737 is not set
# CONFIG_NLS_CODEPAGE_775 is not set
# CONFIG_NLS_CODEPAGE_850 is not set
# CONFIG_NLS_CODEPAGE_852 is not set
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
# CONFIG_NLS_CODEPAGE_865 is not set
# CONFIG_NLS_CODEPAGE_866 is not set
# CONFIG_NLS_CODEPAGE_869 is not set
# CONFIG_NLS_CODEPAGE_936 is not set
# CONFIG_NLS_CODEPAGE_950 is not set
# CONFIG_NLS_CODEPAGE_932 is not set
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
# CONFIG_NLS_ISO8859_8 is not set
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
CONFIG_NLS_ISO8859_1=y
# CONFIG_NLS_ISO8859_2 is not set
# CONFIG_NLS_ISO8859_3 is not set
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
# CONFIG_NLS_ISO8859_14 is not set
# CONFIG_NLS_ISO8859_15 is not set
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
# CONFIG_NLS_UTF8 is not set

#
# Console drivers
#
CONFIG_VGA_CONSOLE=y
# CONFIG_VIDEO_SELECT is not set
# CONFIG_MDA_CONSOLE is not set

#
# Frame-buffer support
#
# CONFIG_FB is not set

#
# Sound
#
# CONFIG_SOUND is not set

#
# USB support
#
# CONFIG_USB is not set

#
# Bluetooth support
#
# CONFIG_BLUEZ is not set

#
# Kernel hacking
#
# CONFIG_SOFTWARE_SUSPEND is not set
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_IOVIRT=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_KALLSYMS=y
CONFIG_X86_EXTRA_IRQS=y
CONFIG_X86_FIND_SMP_CONFIG=y
CONFIG_X86_MPPARSE=y

#
# Security options
#
CONFIG_SECURITY_CAPABILITIES=y

#
# Library routines
#
CONFIG_CRC32=y
# CONFIG_ZLIB_INFLATE is not set
# CONFIG_ZLIB_DEFLATE is not set
CONFIG_X86_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_BIOS_REBOOT=y


Attachments:
.config (16.38 kB)

2002-10-05 00:27:44

by Alexander Viro

[permalink] [raw]
Subject: Re: oops in bk pull (oct 03)



On Fri, 4 Oct 2002, Linus Torvalds wrote:

> > [<c01c9e91>] pci_read_bases+0x161/0x340
> > [<c01ca2a6>] pci_setup_device+0x1b6/0x3d0
> > [<c0105109>] init+0x79/0x200
> > [<c0105090>] init+0x0/0x200
> > [<c01073e5>] kernel_thread_helper+0x5/0x10
> >Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 48 c3 8d b4
>
> Something has corrupted your kernel image. Those 16 0x00 bytes are
> definitely not the right code, looks like an errant memset() through a
> wild pointer cleared it or something.
>
> Is this repeatable? Does it happen with current BK?

It is repeatable, it does happen with current BK (well, as of couple
of hours ago) and reverting pci/probe.c change apparently cures it.

2002-10-05 00:32:47

by Linus Torvalds

[permalink] [raw]
Subject: Re: oops in bk pull (oct 03)


On Fri, 4 Oct 2002, Alexander Viro wrote:
>
> It is repeatable, it does happen with current BK (well, as of couple
> of hours ago) and reverting pci/probe.c change apparently cures it.

Really? That probe.c diff is _really_ small, and looks truly obvious. In
particular, I don't see how it could possibly cause that kind of
behaviour. What am I missing?

Linus

2002-10-05 00:39:06

by Alexander Viro

[permalink] [raw]
Subject: Re: oops in bk pull (oct 03)



On Fri, 4 Oct 2002, Alexander Viro wrote:

> On Fri, 4 Oct 2002, Linus Torvalds wrote:
>
> > > [<c01c9e91>] pci_read_bases+0x161/0x340
> > > [<c01ca2a6>] pci_setup_device+0x1b6/0x3d0
> > > [<c0105109>] init+0x79/0x200
> > > [<c0105090>] init+0x0/0x200
> > > [<c01073e5>] kernel_thread_helper+0x5/0x10
> > >Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 48 c3 8d b4
> >
> > Something has corrupted your kernel image. Those 16 0x00 bytes are
> > definitely not the right code, looks like an errant memset() through a
> > wild pointer cleared it or something.
> >
> > Is this repeatable? Does it happen with current BK?
>
> It is repeatable, it does happen with current BK (well, as of couple
> of hours ago) and reverting pci/probe.c change apparently cures it.

PS: on my testbox it happens without apparent corruption of (printed) code.
However, %eip it prints _is_ odd - it's in the middle of pushing arguments
for second pci_read_config_dword() in pci_read_bases(). And AFAICS there's
no way in hell it could be legitimate - what I'm seeing is

(from pci_write_config_dword(dev, reg, 0); )
pushl $0
pushl %edi
movl 32(%esi),%eax
pushl %eax
movl 16(%esi),%eax
pushl %eax
call pci_bus_write_config_dword
addl $16,%esp
(from pci_read_config_dword(dev, reg, &l0); )
leal 24(%esp),%eax
pushl %eax
pushl %edi
movl 32(%esi),%eax
pushl %eax
movl 16(%esi),%eax
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pushl %eax
call pci_bus_read_config_dword
addl $16,%esp

and we die on the underlined (%eip points to push %eax). %esp is reasonable,
so is %esi (and we had just dereferenced both).

I'm at loss on that one - if somebody has bright (heck, any) ideas, you
are welcome.

2002-10-05 00:45:37

by Alexander Viro

[permalink] [raw]
Subject: Re: oops in bk pull (oct 03)



On Fri, 4 Oct 2002, Linus Torvalds wrote:

>
> On Fri, 4 Oct 2002, Alexander Viro wrote:
> >
> > It is repeatable, it does happen with current BK (well, as of couple
> > of hours ago) and reverting pci/probe.c change apparently cures it.
>
> Really? That probe.c diff is _really_ small, and looks truly obvious. In
> particular, I don't see how it could possibly cause that kind of
> behaviour. What am I missing?

Hell knows. The only explanation I see (and that's not worth much) is that
we somehow confuse the chipset and get crapped on something like next cache
miss.

I'm out of ideas on that one - if you have any suggestions / questions on
details of behaviour I'll be glad to try and see what I can do, but for
now I'm reverting the probe.c patch in my tree so that I could return to
initramfs work. Originally I thought it was a bug in my own code, but oops
is present in 2.5.40-BK and disappears in 2.5.40-BK minus probe.c changeset...

2002-10-05 00:55:00

by Linus Torvalds

[permalink] [raw]
Subject: Re: oops in bk pull (oct 03)


On Fri, 4 Oct 2002, Alexander Viro wrote:
>
> Hell knows. The only explanation I see (and that's not worth much) is that
> we somehow confuse the chipset and get crapped on something like next cache
> miss.

I don't see any better explanation right now, so I guess we just revert
that thing.

The only other notion I might come up with is stack corruption, ie the
code in pci_read_bases() might corrupt the return stack subtly (it does
add another local variable whose address is taken), causing a jump to a
random address on return. Compiler bug?

Linus

2002-10-05 01:08:54

by Alexander Viro

[permalink] [raw]
Subject: Re: oops in bk pull (oct 03)



On Fri, 4 Oct 2002, Linus Torvalds wrote:

>
> On Fri, 4 Oct 2002, Alexander Viro wrote:
> >
> > Hell knows. The only explanation I see (and that's not worth much) is that
> > we somehow confuse the chipset and get crapped on something like next cache
> > miss.
>
> I don't see any better explanation right now, so I guess we just revert
> that thing.
>
> The only other notion I might come up with is stack corruption, ie the
> code in pci_read_bases() might corrupt the return stack subtly (it does
> add another local variable whose address is taken), causing a jump to a
> random address on return. Compiler bug?

I doubt it. I've read through the objdump output and code looks OK.
Diff between old and new _definitely_ looks sane. FWIW, chipset is
Via 686A, gcc is from debian-stable (2.95.4-11woody1). I'll try to
find some RH box and build with the same .config, but I would be
surprised if it changes anything.

2002-10-05 01:15:10

by David Miller

[permalink] [raw]
Subject: Re: oops in bk pull (oct 03)

From: Linus Torvalds <[email protected]>
Date: Fri, 4 Oct 2002 18:02:15 -0700 (PDT)

On Fri, 4 Oct 2002, Alexander Viro wrote:
> Hell knows. The only explanation I see (and that's not worth much) is that
> we somehow confuse the chipset and get crapped on something like next cache
> miss.

I don't see any better explanation right now, so I guess we just revert
that thing.

The people seeing this don't happen to be on Serverworks chipsets
are they?

I've seen a bug on serverworks where back to back PCI config
space operations can cause some to be lost or corrupted.

Another theory is that some device just dislikes being given
a 0 in one of it's base registers, but somehow ~0 is ok :-)

2002-10-05 01:21:51

by Alexander Viro

[permalink] [raw]
Subject: Re: oops in bk pull (oct 03)



On Fri, 4 Oct 2002, David S. Miller wrote:

> The people seeing this don't happen to be on Serverworks chipsets
> are they?
>
> I've seen a bug on serverworks where back to back PCI config
> space operations can cause some to be lost or corrupted.

No serverwanks here. Abit-KT7 (no RAID), VIA chipset.

> Another theory is that some device just dislikes being given
> a 0 in one of it's base registers, but somehow ~0 is ok :-)

... FVO some device equal to host bridge, I'm afraid.

2002-10-05 01:34:27

by Linus Torvalds

[permalink] [raw]
Subject: Re: oops in bk pull (oct 03)


On Fri, 4 Oct 2002, David S. Miller wrote:
>
> Another theory is that some device just dislikes being given
> a 0 in one of it's base registers, but somehow ~0 is ok :-)

I think that is the real issue. We're mapping something - probably a host
bridge - at address 0, and then accessing RAM (which is also is mapped at
PCI address 0) and the host bridge is unhappy.

So excluding the change is probably the right thing to do - it's just
fundamentally buggy to blindly put a base register at zero.

Linus

2002-10-05 01:54:04

by Linus Torvalds

[permalink] [raw]
Subject: Re: oops in bk pull (oct 03)


On Fri, 4 Oct 2002, Linus Torvalds wrote:
>
> I think that is the real issue. We're mapping something - probably a host
> bridge - at address 0, and then accessing RAM (which is also is mapped at
> PCI address 0) and the host bridge is unhappy.
>
> So excluding the change is probably the right thing to do - it's just
> fundamentally buggy to blindly put a base register at zero.

The more I think about this, the more convinced I am this is the case. We
just _mustn't_ set up a live PCI window at address 0, and expect it to not
cause confusion.

Also, we've seen before that we must not blindly disable a PCI window
either, since that will kill the system when the host bridge is disabled
and there is any pending DMA, for example (*). We saw that earlier in the
2.4.x tree - some host bridges will just ignore the disable (which means
that then we'd trigger the zero-base bug), and others will honour the
disable (which in turn will cause the DMA and other random problems).

This is all probably dependently on host bridge / MCH behaviour, so it
probably works fine on 90%+ of all machines, but clearly breaks enough to
not be a viable approach in general.

Ergo, the patch that looked so simple at first glance was really broken
for a number of really subtle reasons.

Linus

(*) And pending DMA is actually _normal_ on PC's at early bootup when we
enumerate the PCI system - it's how USB keyboard and mouse emulation is
done, together with SMI support in the BIOS.

2002-10-05 02:02:52

by David Miller

[permalink] [raw]
Subject: Re: oops in bk pull (oct 03)

From: Linus Torvalds <[email protected]>
Date: Fri, 4 Oct 2002 18:41:25 -0700 (PDT)

On Fri, 4 Oct 2002, David S. Miller wrote:
> Another theory is that some device just dislikes being given
> a 0 in one of it's base registers, but somehow ~0 is ok :-)

I think that is the real issue. We're mapping something - probably a host
bridge - at address 0, and then accessing RAM (which is also is mapped at
PCI address 0) and the host bridge is unhappy.

We're current blindly putting ~0 in there, how can that be any
better? :-)

So excluding the change is probably the right thing to do - it's just
fundamentally buggy to blindly put a base register at zero.

And putting ~0 there is ok?

>From what you're saying, that whole routine is fundamentally broken.

2002-10-05 02:15:44

by Linus Torvalds

[permalink] [raw]
Subject: Re: oops in bk pull (oct 03)


On Fri, 4 Oct 2002, David S. Miller wrote:
>
> I think that is the real issue. We're mapping something - probably a host
> bridge - at address 0, and then accessing RAM (which is also is mapped at
> PCI address 0) and the host bridge is unhappy.
>
> We're current blindly putting ~0 in there, how can that be any
> better? :-)

Oh, ~0 is better for a lot of reasons:

- it doesn't clash with RAM on any normal platform, the there is no
confusion in the host bridge where a regular RAM access should go.

Even PC's with 4GB+ of RAM always leave the top of the 32-bit
address space for PCI mappings (ie they explicitly leave a hole in the
RAM mapping there, exactly so that 32-bit PCI cards can work)

- it doesn't clash with ISA mappings either

- it is the standard way of probing sizes, so unlike writing 0, this is
stuff that BIOS writers and system designers have actually seen in real
life, and tested against (since Windows also has to be doing it this
way, and it's in all the example books about PCI programming)

- it is - partly for the same previous reason - pretty much guaranteed to
be one of the few areas that won't even clash with other PCI mappings.

- Finally, on regular PC's the high 32-bit region is almost always
reserved for other things anyway, ie the APICs are mapped there (and
that mapping won't conflict with a host bridge, since APIC mappings
will be resolved on the CPU and never even hit the hostbridge world,
unlike RAM accesses).

So 0 and ~0 are quite fundamentally different here.

> And putting ~0 there is ok?

See above.

> From what you're saying, that whole routine is fundamentally broken.

No, the "write ~0, read it back, write the old value" is part of standard
PCI probing (there isn't any other way to figure out the size of these
ranges).

It's just that Ivan tried to extend it, and that _extension_ doesn't work.

Linus

2002-10-05 02:24:23

by David Miller

[permalink] [raw]
Subject: Re: oops in bk pull (oct 03)

From: Linus Torvalds <[email protected]>
Date: Fri, 4 Oct 2002 19:22:50 -0700 (PDT)

Oh, ~0 is better for a lot of reasons:

So the bug really is making the host bridge mapping alias
potentially with normal memory mappings.

Ok I buy that.

2002-10-05 03:18:54

by Linus Torvalds

[permalink] [raw]
Subject: Re: oops in bk pull (oct 03)


On Fri, 4 Oct 2002, Alexander Viro wrote:
>
> It's getting better. The thing _does_ survive if there is no cacheline
> boundary between the calls of pci_write_config_dword(); otherwise it
> dies on that boundary.

Ok, that definitely clinches it - it's the cache miss coupled with host
bridge confusion that causes it to start fetching from PCI space instead
of RAM (or, more likely just get really confused about it and maybe
fetch from both).

It's always good to understand why someting doesn't work, rather than just
revert it because it breaks inexplicably.

Linus

2002-10-05 03:14:40

by Alexander Viro

[permalink] [raw]
Subject: Re: oops in bk pull (oct 03)



On Fri, 4 Oct 2002, Linus Torvalds wrote:

> The more I think about this, the more convinced I am this is the case. We
> just _mustn't_ set up a live PCI window at address 0, and expect it to not
> cause confusion.
>
> Also, we've seen before that we must not blindly disable a PCI window
> either, since that will kill the system when the host bridge is disabled
> and there is any pending DMA, for example (*). We saw that earlier in the
> 2.4.x tree - some host bridges will just ignore the disable (which means
> that then we'd trigger the zero-base bug), and others will honour the
> disable (which in turn will cause the DMA and other random problems).
>
> This is all probably dependently on host bridge / MCH behaviour, so it
> probably works fine on 90%+ of all machines, but clearly breaks enough to
> not be a viable approach in general.

It's getting better. The thing _does_ survive if there is no cacheline
boundary between the calls of pci_write_config_dword(); otherwise it
dies on that boundary. So it depends not only on machine and compiler,
but on kernel config, and in a pretty random way (functions are aligned,
indeed, but not cacheline-aligned, so change of length in a function can
shift the rest of image relative to cachelines).

2002-10-05 09:30:43

by Ivan Kokshaysky

[permalink] [raw]
Subject: Re: oops in bk pull (oct 03)

On Fri, Oct 04, 2002 at 08:26:20PM -0700, Linus Torvalds wrote:
> Ok, that definitely clinches it - it's the cache miss coupled with host
> bridge confusion that causes it to start fetching from PCI space instead
> of RAM (or, more likely just get really confused about it and maybe
> fetch from both).
>
> It's always good to understand why someting doesn't work, rather than just
> revert it because it breaks inexplicably.

Ugh. I'm 99.9% sure that it was an AGP GART window. Being mapped at 0, it
immediately caused all sorts of havoc.

Sorry for that breakage.

Ivan.