Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161120AbWJXRuM (ORCPT ); Tue, 24 Oct 2006 13:50:12 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1161121AbWJXRuM (ORCPT ); Tue, 24 Oct 2006 13:50:12 -0400 Received: from relay5.ptmail.sapo.pt ([212.55.154.25]:44778 "HELO sapo.pt") by vger.kernel.org with SMTP id S1161120AbWJXRuK (ORCPT ); Tue, 24 Oct 2006 13:50:10 -0400 X-AntiVirus: PTMail-AV 0.3-0.88.4 Subject: Re: PROBLEM: Oops when doing disk heavy disk I/O From: Sergio Monteiro Basto To: Michael Sallaway Cc: linux-kernel@vger.kernel.org In-Reply-To: <9f916e540610240856p263d5s7098e4e2edd0ed25@mail.gmail.com> References: <9f916e540610240856p263d5s7098e4e2edd0ed25@mail.gmail.com> Content-Type: text/plain Date: Tue, 24 Oct 2006 18:26:41 +0100 Message-Id: <1161710801.1242.6.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 (2.6.3-1.fc5.5) Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8396 Lines: 182 Hi, please boot with "report_lost_ticks" like http://marc.theaimsgroup.com/?l=linux-kernel&m=115545986619977&w=2 and see if you have time.c: Lost n timer tick(s)! and cat /sys/devices/system/clocksource/clocksource0/ cat /sys/devices/system/clocksource/clocksource0/current_clocksource Thanks, On Wed, 2006-10-25 at 01:56 +1000, Michael Sallaway wrote: > [1.] One line summary of the problem: > Kernel Oopses when doing I/O to a disk (using dd). > > [2.] Full description of the problem/report: > When writing to [any] disk (IDE or SCSI), the kernel will Oops after > short periods of time ranging from 30 seconds to 5-10 minutes. > Sometimes this is a complete crash (with "Aieee, killing interrupt > handler!"), sometimes it's just an oops but the system doesn't crash > comepletely (isn't very usable, though), and sometimes it just gives a > "general protection fault: 0000 [1] SMP". > > It's worth mentioning that I've managed to set up the entire system > without incident -- a debian netinstall, downloading new packages, > changing things, etc. The only reason I discovered this was when > copying large amounts of data off another machine, and it died > reproducably after a few gigabytes of data. (I originally thought it > was an XFS issue, but had the same problem with EXT3, and all other > combinations I tried - I can now reproduce it by using dd if=/dev/zero > of=/dev/hda6.) > > Ultimately, I've tried it with different (known good) devices and hard > drives. The only common things between all failures are the CPU > (Athlon 64 3200), Motherboard (Asus M2N-E), and RAM (2GB DDR2-533). > (Memtest x86 shows the memory to be fine.) As such, I'm suspecting > it's something to do with the motherboard -- It's using an x86_64 > kernel (although it does the same with an i386), on an nforce 570 > motherboard. > > Other things I have tried: > - SATA, SCSI and IDE drives -- all do the same thing > - removing *all* drives and cards and devices -- it does it with a > single IDE drive connected and no PCI cards > - kernels 2.6.16, 18, 18.1, 19-rc3. > - the patch suggested in > http://marc.theaimsgroup.com/?l=linux-kernel&m=115545986619977&w=2 > - booting with noapic and/or acpi=off (as suggested http://tinyurl.com/yn97woby) > - with and without md devices > > > [3.] Keywords (i.e., modules, networking, kernel): > kernel disk I/O nforce > > [4.] Kernel version (from /proc/version): > Linux version 2.6.19-rc3 (root@barbossa) (gcc version 4.1.2 20061007 > (prerelease) (Debian 4.1.1-16)) #1 SMP Tue Oct 24 22:23:07 EST 2006 > > [5.] Output of Oops.. message (if applicable) with symbolic > information resolved (see Documentation/oops-tracing.txt) > (as I understand it from Documentation/oops-tracing.txt, ksymoops > doesn't apply anymore? If that's not the case, I apologise -- could > someone tell me what I need to do with the below?) > > Oct 25 00:23:11 barbossa kernel: Unable to handle kernel NULL pointer > dereference at 0000000000000000 RIP: > Oct 25 00:23:11 barbossa kernel: [] > __block_write_full_page+0xa4/0x2df > Oct 25 00:23:11 barbossa kernel: PGD 2b93a067 PUD 7c7cc067 PMD 0 > Oct 25 00:23:11 barbossa kernel: Oops: 0000 [1] SMP > Oct 25 00:23:11 barbossa kernel: CPU 0 > Oct 25 00:23:11 barbossa kernel: Modules linked in: > Oct 25 00:23:11 barbossa kernel: Pid: 2442, comm: dd Not tainted 2.6.19-rc3 #1 > Oct 25 00:23:11 barbossa kernel: RIP: 0010:[] > [] __block_write_full_page+0xa4/0x2df > Oct 25 00:23:11 barbossa kernel: RSP: 0018:ffff81003fa358f8 EFLAGS: 00010283 > Oct 25 00:23:11 barbossa kernel: RAX: 0000000000000000 RBX: > 0000000000000000 RCX: 0000000000000002 > Oct 25 00:23:11 barbossa kernel: RDX: 000000000000000a RSI: > 000000000019eea9 RDI: ffff810037cc8440 > Oct 25 00:23:11 barbossa kernel: RBP: ffff810001602550 R08: > ffff810037cc8440 R09: ffff81003fa35b48 > Oct 25 00:23:11 barbossa kernel: R10: ffffffff802bee4c R11: > ffffffff80440b8b R12: ffff81001b3426e0 > Oct 25 00:23:11 barbossa kernel: R13: 000000000067baa6 R14: > ffff810037cc8440 R15: 0000000001c42574 > Oct 25 00:23:11 barbossa kernel: FS: 00002b64b70eb6d0(0000) > GS:ffffffff807d6000(0000) knlGS:0000000000000000 > Oct 25 00:23:11 barbossa kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 000000008005003b > Oct 25 00:23:11 barbossa kernel: CR2: 0000000000000000 CR3: > 000000007b6b7000 CR4: 00000000000006e0 > Oct 25 00:23:11 barbossa kernel: Process dd (pid: 2442, threadinfo > ffff81003fa34000, task ffff810037e44140) > Oct 25 00:23:11 barbossa kernel: Stack: ffff81003fa35b48 > ffffffff802bee4c 0000040001602550 ffff810001602550 > Oct 25 00:23:11 barbossa kernel: ffff81003fa35b48 ffff810037cc8550 > 000000000000000b ffff81007efe9b08 > Oct 25 00:23:11 barbossa kernel: 0000000000000000 ffffffff8029db1e > 000000000000000e ffffffff802bdff3 > Oct 25 00:23:11 barbossa kernel: Call Trace: > Oct 25 00:23:11 barbossa kernel: [] blkdev_get_block+0x0/0x46 > Oct 25 00:23:11 barbossa kernel: [] > generic_writepages+0x18e/0x2d8 > Oct 25 00:23:11 barbossa kernel: [] blkdev_writepage+0x0/0xf > Oct 25 00:23:11 barbossa kernel: [] do_writepages+0x20/0x2d > Oct 25 00:23:11 barbossa kernel: [] > __writeback_single_inode+0x1b4/0x38b > Oct 25 00:23:11 barbossa kernel: [] > sync_sb_inodes+0x1d1/0x2b5 > Oct 25 00:23:11 barbossa kernel: [] > writeback_inodes+0x82/0xd8 > Oct 25 00:23:11 barbossa kernel: [] > balance_dirty_pages_ratelimited_nr+0x115/0x1f6 > Oct 25 00:23:11 barbossa kernel: [] > generic_file_buffered_write+0x516/0x64b > Oct 25 00:23:11 barbossa kernel: [] remove_suid+0x1/0x1c > Oct 25 00:23:11 barbossa kernel: [] > __generic_file_aio_write_nolock+0x375/0x3e8 > Oct 25 00:23:11 barbossa kernel: [] unmap_vmas+0x372/0x716 > Oct 25 00:23:11 barbossa kernel: [] > generic_file_aio_write_nolock+0x3a/0x86 > Oct 25 00:23:11 barbossa kernel: [] do_sync_write+0xc9/0x10c > Oct 25 00:23:11 barbossa kernel: [] > autoremove_wake_function+0x0/0x2e > Oct 25 00:23:11 barbossa kernel: [] __clear_user+0x12/0x50 > Oct 25 00:23:11 barbossa kernel: [] read_zero+0x1d1/0x23c > Oct 25 00:23:11 barbossa kernel: [] vfs_write+0xce/0x174 > Oct 25 00:23:11 barbossa kernel: [] fget_light+0x18/0x7c > Oct 25 00:23:11 barbossa kernel: [] sys_write+0x45/0x6e > Oct 25 00:23:11 barbossa kernel: [] system_call+0x7e/0x83 > Oct 25 00:23:11 barbossa kernel: > Oct 25 00:23:11 barbossa kernel: > Oct 25 00:23:11 barbossa kernel: Code: 8b 03 a8 20 75 6c 8b 03 a8 02 > 74 66 8b 44 24 14 48 39 43 20 > Oct 25 00:23:11 barbossa kernel: RIP [] > __block_write_full_page+0xa4/0x2df > Oct 25 00:23:11 barbossa kernel: RSP > Oct 25 00:23:11 barbossa kernel: CR2: 0000000000000000 > > > [6.] A small shell script or example program which triggers the > problem (if possible) > dd if=/dev/zero of=/dev/hda6 bs=512K > > (note that it will also happen without the bs argument, however > usually takes longer. It's not related to a particular point in the > disk, or anything, though, just seems to last longer.) > > > [7.] Environment > > Please see the below output for more environment information -- I > didn't want to dump too much info in here. :-) > > http://sallaway.org/lkml/output.txt > > also, I've seen mention of this, but I'm not sure if it would be useful: > > http://sallaway.org/lkml/System.map-2.6.19-rc3 > > > Please let me know if you need any more info. This is my first bug > report, so apologies if this should have gone elsewhere. :-) > > Cheers, > Michael > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/