2001-02-01 22:11:01

by Anders S. Buch

[permalink] [raw]
Subject: Bug report

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

It seems that the ide/cdrom/amd756 code can cause some bad lockups, at
least on my system. I have an Athlon 500 system running the 2.4.1 kernel
with Redhat 6.1 + updated modutils, etc.

I just managed to make the kernel lockup as follows:

I mounted my cdrom drive (a ZipCD 4x650 ATAPI CD-RW), and did a
rpm --nodeps -i /mnt/cdrom/RedHat/RPMS/xgammon-bla-bla.rpm

If I instead just did a "cp /mnt/cdrom/RedHat/RPMS/xgammon-bla-bla.rpm ."
things seemed to work fine.

The lockup was very "complete": even the power button on the front of
my box didn't do anything. I had to use the BIG BLACK power switch on
the back side to get it back to life. I have never seen this before.
(My computer doesn't have a reset button, it must be made for linux!!)

Also, when I reproduced the lockup in text-mode, I didn't even get the
normal oops-dump, the system just froze.

My motherboard has an AMD 756 chipset, I wonder if this means
anything. I did enable the 7409 support in my kernel
(CONFIG_BLK_DEV_AMD7409=y).

Here is some stuff from dmesg (the full dmesg is below):

AMD7409: IDE controller on PCI bus 00 dev 39
AMD7409: chipset revision 3
AMD7409: not 100% native mode: will probe irqs later
AMD7409: disabling single-word DMA support (revision < C4)
ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:pio, hdd:DMA

And some stuff from /var/log/messages:

Jan 31 22:09:09 localhost kernel: usb-ohci.c: bogus NDP=96 for OHCI usb-00:07.4
Jan 31 22:09:09 localhost kernel: usb-ohci.c: rereads as NDP=4
Jan 31 22:16:32 localhost kernel: usb-ohci.c: bogus NDP=96 for OHCI usb-00:07.4
Jan 31 22:16:32 localhost kernel: usb-ohci.c: rereads as NDP=4

and more:

Jan 31 22:37:31 localhost kernel: usb-ohci.c: bogus NDP=128 for OHCI usb-00:07.4
Jan 31 22:37:31 localhost kernel: usb-ohci.c: rereads as NDP=4


Regarding error output from the actual lockups, I usually run my cd-rw
drive as an scsi device using ide-scsi. When I did this, I got some
error output on the screen which also appeared in /var/log/messages.
At one lockup event I got lots of the following:

Feb 1 15:16:10 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 0, lun 0 0x00 00 00 00 00 00
Feb 1 15:16:20 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 0, lun 0 0x00 00 00 00 00 00
Feb 1 15:16:20 localhost kernel: SCSI host 0 abort (pid 0) timed out - resetting
Feb 1 15:16:20 localhost kernel: SCSI bus is being reset for host 0 channel 0.
Feb 1 15:16:20 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 0, lun 0 0x00 00 00 00 00 00
Feb 1 15:16:20 localhost kernel: SCSI host 0 abort (pid 0) timed out - resetting
Feb 1 15:16:20 localhost kernel: SCSI bus is being reset for host 0 channel 0.
Feb 1 15:16:21 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 0, lun 0 0x00 00 00 00 00 00
Feb 1 15:16:21 localhost kernel: SCSI host 0 abort (pid 0) timed out - resetting
Feb 1 15:16:21 localhost kernel: SCSI bus is being reset for host 0 channel 0.
...
Feb 1 15:16:29 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 0, lun 0 0x00 00 00 00 00 00
Feb 1 15:16:29 localhost kernel: SCSI host 0 abort (pid 0) timed out - resetting
Feb 1 15:16:29 localhost kernel: SCSI bus is being reset for host 0 channel 0.
Feb 1 15:20:32 localhost syslogd 1.3-3: restart.

At other times, I just got some error messages when using the above rpm
command:

Feb 1 15:27:18 localhost kernel: Detected scsi CD-ROM sr0 at scsi0, channel 0, id 0, lun 0
Feb 1 15:27:18 localhost kernel: sr0: scsi3-mmc drive: 24x/24x writer cd/rw xa/form2 cdda tray
Feb 1 15:27:18 localhost kernel: Uniform CD-ROM driver Revision: 3.12
Feb 1 15:27:25 localhost modprobe: modprobe: Can't locate module nls_iso8859-1
Feb 1 15:27:25 localhost modprobe: modprobe: Can't locate module nls_cp437
Feb 1 15:29:09 localhost kernel: ide-scsi: CoD != 0 in idescsi_pc_intr
Feb 1 15:29:09 localhost kernel: hdd: DMA disabled
Feb 1 15:29:12 localhost kernel: hdd: ATAPI reset complete
Feb 1 15:29:12 localhost kernel: I/O error: dev 0b:00, sector 1303020
Feb 1 15:29:12 localhost kernel: I/O error: dev 0b:00, sector 1303020
Feb 1 15:29:12 localhost kernel: I/O error: dev 0b:00, sector 1303276
Feb 1 15:29:12 localhost kernel: I/O error: dev 0b:00, sector 1303020
Feb 1 15:29:12 localhost kernel: I/O error: dev 0b:00, sector 1303020


Finally, I have also had lockup problems with my USB port, which
seems to be run by the same AMD 756 chip. I will attach a bug report
about this issue. However, I can reproduce the cdrom lockups also when no
usb-modules are loaded.

Well, thanks for the great hacking!! I hope this can help making the
kernel even better! If I can supply you with more information, please
let me know.

Sincerely, Anders Buch


Full /var/log/dmesg:

Linux version 2.4.1 ([email protected]) (gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #1 Tue Jan 30 11:12:07 EST 2001
BIOS-provided physical RAM map:
BIOS-e820: 000000000009fc00 @ 0000000000000000 (usable)
BIOS-e820: 0000000000000400 @ 000000000009fc00 (reserved)
BIOS-e820: 0000000000010000 @ 00000000000f0000 (reserved)
BIOS-e820: 0000000000010000 @ 00000000ffff0000 (reserved)
BIOS-e820: 0000000007ef0000 @ 0000000000100000 (usable)
BIOS-e820: 000000000000d000 @ 0000000007ff3000 (ACPI data)
BIOS-e820: 0000000000003000 @ 0000000007ff0000 (ACPI NVS)
On node 0 totalpages: 32752
zone(0): 4096 pages.
zone(1): 28656 pages.
zone(2): 0 pages.
Kernel command line: BOOT_IMAGE=241 ro root=306 ramdisk=0
Initializing CPU#0
Detected 499.045 MHz processor.
Console: colour VGA+ 80x50
Calibrating delay loop... 996.14 BogoMIPS
Memory: 127272k/131008k available (681k kernel code, 3348k reserved, 225k data, 56k init, 0k highmem)
Dentry-cache hash table entries: 16384 (order: 5, 131072 bytes)
Buffer-cache hash table entries: 4096 (order: 2, 16384 bytes)
Page-cache hash table entries: 32768 (order: 5, 131072 bytes)
Inode-cache hash table entries: 8192 (order: 4, 65536 bytes)
CPU: Before vendor init, caps: 0081f9ff c0c1f9ff 00000000, vendor = 2
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
CPU: After vendor init, caps: 0081f9ff c0c1f9ff 00000000 00000000
CPU: After generic, caps: 0081f9ff c0c1f9ff 00000000 00000000
CPU: Common caps: 0081f9ff c0c1f9ff 00000000 00000000
CPU: AMD-K7(tm) Processor stepping 02
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
PCI: PCI BIOS revision 2.10 entry at 0xfb460, last bus=1
PCI: Using configuration type 1
PCI: Probing PCI hardware
Unknown bridge resource 0: assuming transparent
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
DMI 2.1 present.
29 structures occupying 743 bytes.
DMI table at 0x000F0800.
BIOS Vendor: Award Software International, Inc.
BIOS Version: 4.51 PG
BIOS Release: 10/01/99
Starting kswapd v1.8
pty: 256 Unix98 ptys configured
block: queued sectors max/low 84552kB/28184kB, 256 slots per queue
Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
AMD7409: IDE controller on PCI bus 00 dev 39
AMD7409: chipset revision 3
AMD7409: not 100% native mode: will probe irqs later
AMD7409: disabling single-word DMA support (revision < C4)
ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:pio, hdd:DMA
hda: IBM-DJNA-371350, ATA DISK drive
hdd: ZIPCD 4x650, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
hda: 26520480 sectors (13578 MB) w/1966KiB Cache, CHS=26310/16/63, UDMA(33)
Partition check:
hda: hda1 hda2 < hda5 hda6 hda7 > hda3
Serial driver version 5.02 (2000-08-09) with MANY_PORTS SHARE_IRQ SERIAL_PCI enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
ttyS01 at 0x02f8 (irq = 3) is a 16550A
ttyS02 at 0x03e8 (irq = 4) is a 16550A
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP
IP: routing cache hash table of 512 buckets, 4Kbytes
TCP: Hash tables configured (established 8192 bind 8192)
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
VFS: Mounted root (ext2 filesystem) readonly.
Freeing unused kernel memory: 56k freed
Adding Swap: 102776k swap-space (priority -1)
es1371: version v0.27 time 11:16:28 Jan 30 2001
es1371: found chip, vendor id 0x1274 device id 0x1371 revision 0x06
es1371: found es1371 rev 6 at io 0xe400 irq 11
es1371: features: joystick 0x0
ac97_codec: AC97 codec, id: 0x5452:0x4103 (TriTech TR?????)
usb.c: registered new driver usbdevfs
usb.c: registered new driver hub


- ----

Anders Skovsted Buch Phone: (617) 253-4399
MIT Room 2-275 Fax: (617) 253-4358
77 Massachusetts Ave E-mail: [email protected]
Cambridge, MA 02139 http://www-math.mit.edu/~abuch/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.2 (SunOS)
Comment: For info see http://www.gnupg.org

iD8DBQE6ed61WyfD6jrb5n4RAhz+AJ9hu5GiH6lh6DnquQKgp83GDQhPGgCfdNpJ
Pxv1jCClXB7tket99Auua0M=
=5TNj
-----END PGP SIGNATURE-----


Attachments:
bug-report.txt (28.78 kB)

2001-02-03 01:45:33

by Jens Axboe

[permalink] [raw]
Subject: Re: Bug report

On Thu, Feb 01 2001, Anders S. Buch wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello,
>
> It seems that the ide/cdrom/amd756 code can cause some bad lockups, at
> least on my system. I have an Athlon 500 system running the 2.4.1 kernel
> with Redhat 6.1 + updated modutils, etc.

Have you tried disabling DMA on the atapi drive, not all do atapi
dma in an orderly fashion (yet)?

--
Jens Axboe