LinuxLists.cc - PROBLEM: Oops when doing disk heavy disk I/O

2006-10-24 15:56:42

Subject: PROBLEM: Oops when doing disk heavy disk I/O

[1.] One line summary of the problem:
Kernel Oopses when doing I/O to a disk (using dd).

[2.] Full description of the problem/report:
When writing to [any] disk (IDE or SCSI), the kernel will Oops after
short periods of time ranging from 30 seconds to 5-10 minutes.
Sometimes this is a complete crash (with "Aieee, killing interrupt
handler!"), sometimes it's just an oops but the system doesn't crash
comepletely (isn't very usable, though), and sometimes it just gives a
"general protection fault: 0000 [1] SMP".

It's worth mentioning that I've managed to set up the entire system
without incident -- a debian netinstall, downloading new packages,
changing things, etc. The only reason I discovered this was when
copying large amounts of data off another machine, and it died
reproducably after a few gigabytes of data. (I originally thought it
was an XFS issue, but had the same problem with EXT3, and all other
combinations I tried - I can now reproduce it by using dd if=/dev/zero
of=/dev/hda6.)

Ultimately, I've tried it with different (known good) devices and hard
drives. The only common things between all failures are the CPU
(Athlon 64 3200), Motherboard (Asus M2N-E), and RAM (2GB DDR2-533).
(Memtest x86 shows the memory to be fine.) As such, I'm suspecting
it's something to do with the motherboard -- It's using an x86_64
kernel (although it does the same with an i386), on an nforce 570
motherboard.

Other things I have tried:
- SATA, SCSI and IDE drives -- all do the same thing
- removing *all* drives and cards and devices -- it does it with a
single IDE drive connected and no PCI cards
- kernels 2.6.16, 18, 18.1, 19-rc3.
- the patch suggested in
http://marc.theaimsgroup.com/?l=linux-kernel&m=115545986619977&w=2
- booting with noapic and/or acpi=off (as suggested http://tinyurl.com/yn97woby)
- with and without md devices

[3.] Keywords (i.e., modules, networking, kernel):
kernel disk I/O nforce

[4.] Kernel version (from /proc/version):
Linux version 2.6.19-rc3 (root@barbossa) (gcc version 4.1.2 20061007
(prerelease) (Debian 4.1.1-16)) #1 SMP Tue Oct 24 22:23:07 EST 2006

[5.] Output of Oops.. message (if applicable) with symbolic
information resolved (see Documentation/oops-tracing.txt)
(as I understand it from Documentation/oops-tracing.txt, ksymoops
doesn't apply anymore? If that's not the case, I apologise -- could
someone tell me what I need to do with the below?)

Oct 25 00:23:11 barbossa kernel: Unable to handle kernel NULL pointer
dereference at 0000000000000000 RIP:
Oct 25 00:23:11 barbossa kernel: [<ffffffff8021a6c5>]
__block_write_full_page+0xa4/0x2df
Oct 25 00:23:11 barbossa kernel: PGD 2b93a067 PUD 7c7cc067 PMD 0
Oct 25 00:23:11 barbossa kernel: Oops: 0000 [1] SMP
Oct 25 00:23:11 barbossa kernel: CPU 0
Oct 25 00:23:11 barbossa kernel: Modules linked in:
Oct 25 00:23:11 barbossa kernel: Pid: 2442, comm: dd Not tainted 2.6.19-rc3 #1
Oct 25 00:23:11 barbossa kernel: RIP: 0010:[<ffffffff8021a6c5>]
[<ffffffff8021a6c5>] __block_write_full_page+0xa4/0x2df
Oct 25 00:23:11 barbossa kernel: RSP: 0018:ffff81003fa358f8 EFLAGS: 00010283
Oct 25 00:23:11 barbossa kernel: RAX: 0000000000000000 RBX:
0000000000000000 RCX: 0000000000000002
Oct 25 00:23:11 barbossa kernel: RDX: 000000000000000a RSI:
000000000019eea9 RDI: ffff810037cc8440
Oct 25 00:23:11 barbossa kernel: RBP: ffff810001602550 R08:
ffff810037cc8440 R09: ffff81003fa35b48
Oct 25 00:23:11 barbossa kernel: R10: ffffffff802bee4c R11:
ffffffff80440b8b R12: ffff81001b3426e0
Oct 25 00:23:11 barbossa kernel: R13: 000000000067baa6 R14:
ffff810037cc8440 R15: 0000000001c42574
Oct 25 00:23:11 barbossa kernel: FS: 00002b64b70eb6d0(0000)
GS:ffffffff807d6000(0000) knlGS:0000000000000000
Oct 25 00:23:11 barbossa kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
Oct 25 00:23:11 barbossa kernel: CR2: 0000000000000000 CR3:
000000007b6b7000 CR4: 00000000000006e0
Oct 25 00:23:11 barbossa kernel: Process dd (pid: 2442, threadinfo
ffff81003fa34000, task ffff810037e44140)
Oct 25 00:23:11 barbossa kernel: Stack: ffff81003fa35b48
ffffffff802bee4c 0000040001602550 ffff810001602550
Oct 25 00:23:11 barbossa kernel: ffff81003fa35b48 ffff810037cc8550
000000000000000b ffff81007efe9b08
Oct 25 00:23:11 barbossa kernel: 0000000000000000 ffffffff8029db1e
000000000000000e ffffffff802bdff3
Oct 25 00:23:11 barbossa kernel: Call Trace:
Oct 25 00:23:11 barbossa kernel: [<ffffffff802bee4c>] blkdev_get_block+0x0/0x46
Oct 25 00:23:11 barbossa kernel: [<ffffffff8029db1e>]
generic_writepages+0x18e/0x2d8
Oct 25 00:23:11 barbossa kernel: [<ffffffff802bdff3>] blkdev_writepage+0x0/0xf
Oct 25 00:23:11 barbossa kernel: [<ffffffff8025591f>] do_writepages+0x20/0x2d
Oct 25 00:23:11 barbossa kernel: [<ffffffff8022cf8c>]
__writeback_single_inode+0x1b4/0x38b
Oct 25 00:23:11 barbossa kernel: [<ffffffff8021f44f>]
sync_sb_inodes+0x1d1/0x2b5
Oct 25 00:23:11 barbossa kernel: [<ffffffff8024b3b7>]
writeback_inodes+0x82/0xd8
Oct 25 00:23:11 barbossa kernel: [<ffffffff8029de51>]
balance_dirty_pages_ratelimited_nr+0x115/0x1f6
Oct 25 00:23:11 barbossa kernel: [<ffffffff8020f4b1>]
generic_file_buffered_write+0x516/0x64b
Oct 25 00:23:11 barbossa kernel: [<ffffffff802295b6>] remove_suid+0x1/0x1c
Oct 25 00:23:11 barbossa kernel: [<ffffffff80215018>]
__generic_file_aio_write_nolock+0x375/0x3e8
Oct 25 00:23:11 barbossa kernel: [<ffffffff802078bb>] unmap_vmas+0x372/0x716
Oct 25 00:23:11 barbossa kernel: [<ffffffff8029bdea>]
generic_file_aio_write_nolock+0x3a/0x86
Oct 25 00:23:11 barbossa kernel: [<ffffffff802166a9>] do_sync_write+0xc9/0x10c
Oct 25 00:23:11 barbossa kernel: [<ffffffff802899a3>]
autoremove_wake_function+0x0/0x2e
Oct 25 00:23:11 barbossa kernel: [<ffffffff8022c1e9>] __clear_user+0x12/0x50
Oct 25 00:23:11 barbossa kernel: [<ffffffff8048c584>] read_zero+0x1d1/0x23c
Oct 25 00:23:11 barbossa kernel: [<ffffffff802153ee>] vfs_write+0xce/0x174
Oct 25 00:23:11 barbossa kernel: [<ffffffff8020afee>] fget_light+0x18/0x7c
Oct 25 00:23:11 barbossa kernel: [<ffffffff80215d2f>] sys_write+0x45/0x6e
Oct 25 00:23:11 barbossa kernel: [<ffffffff8025811e>] system_call+0x7e/0x83
Oct 25 00:23:11 barbossa kernel:
Oct 25 00:23:11 barbossa kernel:
Oct 25 00:23:11 barbossa kernel: Code: 8b 03 a8 20 75 6c 8b 03 a8 02
74 66 8b 44 24 14 48 39 43 20
Oct 25 00:23:11 barbossa kernel: RIP [<ffffffff8021a6c5>]
__block_write_full_page+0xa4/0x2df
Oct 25 00:23:11 barbossa kernel: RSP <ffff81003fa358f8>
Oct 25 00:23:11 barbossa kernel: CR2: 0000000000000000

[6.] A small shell script or example program which triggers the
problem (if possible)
dd if=/dev/zero of=/dev/hda6 bs=512K

(note that it will also happen without the bs argument, however
usually takes longer. It's not related to a particular point in the
disk, or anything, though, just seems to last longer.)

[7.] Environment

Please see the below output for more environment information -- I
didn't want to dump too much info in here. :-)

http://sallaway.org/lkml/output.txt

also, I've seen mention of this, but I'm not sure if it would be useful:

http://sallaway.org/lkml/System.map-2.6.19-rc3

Please let me know if you need any more info. This is my first bug
report, so apologies if this should have gone elsewhere. :-)

Cheers,
Michael

2006-10-24 17:43:19

by Badari Pulavarty

[permalink] [raw]

Subject: Re: PROBLEM: Oops when doing disk heavy disk I/O

On Wed, 2006-10-25 at 01:56 +1000, Michael Sallaway wrote:
> [1.] One line summary of the problem:
> Kernel Oopses when doing I/O to a disk (using dd).
>
> [2.] Full description of the problem/report:
> When writing to [any] disk (IDE or SCSI), the kernel will Oops after
> short periods of time ranging from 30 seconds to 5-10 minutes.
> Sometimes this is a complete crash (with "Aieee, killing interrupt
> handler!"), sometimes it's just an oops but the system doesn't crash
> comepletely (isn't very usable, though), and sometimes it just gives a
> "general protection fault: 0000 [1] SMP".
>
> It's worth mentioning that I've managed to set up the entire system
> without incident -- a debian netinstall, downloading new packages,
> changing things, etc. The only reason I discovered this was when
> copying large amounts of data off another machine, and it died
> reproducably after a few gigabytes of data. (I originally thought it
> was an XFS issue, but had the same problem with EXT3, and all other
> combinations I tried - I can now reproduce it by using dd if=/dev/zero
> of=/dev/hda6.)
>
> Ultimately, I've tried it with different (known good) devices and hard
> drives. The only common things between all failures are the CPU
> (Athlon 64 3200), Motherboard (Asus M2N-E), and RAM (2GB DDR2-533).
> (Memtest x86 shows the memory to be fine.) As such, I'm suspecting
> it's something to do with the motherboard -- It's using an x86_64
> kernel (although it does the same with an i386), on an nforce 570
> motherboard.
>
> Other things I have tried:
> - SATA, SCSI and IDE drives -- all do the same thing
> - removing *all* drives and cards and devices -- it does it with a
> single IDE drive connected and no PCI cards
> - kernels 2.6.16, 18, 18.1, 19-rc3.

All of these kernels are having the same problem ? Or just noticed
it only in 19-rc3 ?

Thanks,
Badari

2006-10-24 17:50:12

by Sergio Monteiro Basto

[permalink] [raw]

Subject: Re: PROBLEM: Oops when doing disk heavy disk I/O

Hi,
please boot with "report_lost_ticks" like
http://marc.theaimsgroup.com/?l=linux-kernel&m=115545986619977&w=2
and see if you have
time.c: Lost n timer tick(s)!

and
cat /sys/devices/system/clocksource/clocksource0/

cat /sys/devices/system/clocksource/clocksource0/current_clocksource

Thanks,

On Wed, 2006-10-25 at 01:56 +1000, Michael Sallaway wrote:
> [1.] One line summary of the problem:
> Kernel Oopses when doing I/O to a disk (using dd).
>
> [2.] Full description of the problem/report:
> When writing to [any] disk (IDE or SCSI), the kernel will Oops after
> short periods of time ranging from 30 seconds to 5-10 minutes.
> Sometimes this is a complete crash (with "Aieee, killing interrupt
> handler!"), sometimes it's just an oops but the system doesn't crash
> comepletely (isn't very usable, though), and sometimes it just gives a
> "general protection fault: 0000 [1] SMP".
>
> It's worth mentioning that I've managed to set up the entire system
> without incident -- a debian netinstall, downloading new packages,
> changing things, etc. The only reason I discovered this was when
> copying large amounts of data off another machine, and it died
> reproducably after a few gigabytes of data. (I originally thought it
> was an XFS issue, but had the same problem with EXT3, and all other
> combinations I tried - I can now reproduce it by using dd if=/dev/zero
> of=/dev/hda6.)
>
> Ultimately, I've tried it with different (known good) devices and hard
> drives. The only common things between all failures are the CPU
> (Athlon 64 3200), Motherboard (Asus M2N-E), and RAM (2GB DDR2-533).
> (Memtest x86 shows the memory to be fine.) As such, I'm suspecting
> it's something to do with the motherboard -- It's using an x86_64
> kernel (although it does the same with an i386), on an nforce 570
> motherboard.
>
> Other things I have tried:
> - SATA, SCSI and IDE drives -- all do the same thing
> - removing *all* drives and cards and devices -- it does it with a
> single IDE drive connected and no PCI cards
> - kernels 2.6.16, 18, 18.1, 19-rc3.
> - the patch suggested in
> http://marc.theaimsgroup.com/?l=linux-kernel&m=115545986619977&w=2
> - booting with noapic and/or acpi=off (as suggested http://tinyurl.com/yn97woby)
> - with and without md devices
>
>
> [3.] Keywords (i.e., modules, networking, kernel):
> kernel disk I/O nforce
>
> [4.] Kernel version (from /proc/version):
> Linux version 2.6.19-rc3 (root@barbossa) (gcc version 4.1.2 20061007
> (prerelease) (Debian 4.1.1-16)) #1 SMP Tue Oct 24 22:23:07 EST 2006
>
> [5.] Output of Oops.. message (if applicable) with symbolic
> information resolved (see Documentation/oops-tracing.txt)
> (as I understand it from Documentation/oops-tracing.txt, ksymoops
> doesn't apply anymore? If that's not the case, I apologise -- could
> someone tell me what I need to do with the below?)
>
> Oct 25 00:23:11 barbossa kernel: Unable to handle kernel NULL pointer
> dereference at 0000000000000000 RIP:
> Oct 25 00:23:11 barbossa kernel: [<ffffffff8021a6c5>]
> __block_write_full_page+0xa4/0x2df
> Oct 25 00:23:11 barbossa kernel: PGD 2b93a067 PUD 7c7cc067 PMD 0
> Oct 25 00:23:11 barbossa kernel: Oops: 0000 [1] SMP
> Oct 25 00:23:11 barbossa kernel: CPU 0
> Oct 25 00:23:11 barbossa kernel: Modules linked in:
> Oct 25 00:23:11 barbossa kernel: Pid: 2442, comm: dd Not tainted 2.6.19-rc3 #1
> Oct 25 00:23:11 barbossa kernel: RIP: 0010:[<ffffffff8021a6c5>]
> [<ffffffff8021a6c5>] __block_write_full_page+0xa4/0x2df
> Oct 25 00:23:11 barbossa kernel: RSP: 0018:ffff81003fa358f8 EFLAGS: 00010283
> Oct 25 00:23:11 barbossa kernel: RAX: 0000000000000000 RBX:
> 0000000000000000 RCX: 0000000000000002
> Oct 25 00:23:11 barbossa kernel: RDX: 000000000000000a RSI:
> 000000000019eea9 RDI: ffff810037cc8440
> Oct 25 00:23:11 barbossa kernel: RBP: ffff810001602550 R08:
> ffff810037cc8440 R09: ffff81003fa35b48
> Oct 25 00:23:11 barbossa kernel: R10: ffffffff802bee4c R11:
> ffffffff80440b8b R12: ffff81001b3426e0
> Oct 25 00:23:11 barbossa kernel: R13: 000000000067baa6 R14:
> ffff810037cc8440 R15: 0000000001c42574
> Oct 25 00:23:11 barbossa kernel: FS: 00002b64b70eb6d0(0000)
> GS:ffffffff807d6000(0000) knlGS:0000000000000000
> Oct 25 00:23:11 barbossa kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
> 000000008005003b
> Oct 25 00:23:11 barbossa kernel: CR2: 0000000000000000 CR3:
> 000000007b6b7000 CR4: 00000000000006e0
> Oct 25 00:23:11 barbossa kernel: Process dd (pid: 2442, threadinfo
> ffff81003fa34000, task ffff810037e44140)
> Oct 25 00:23:11 barbossa kernel: Stack: ffff81003fa35b48
> ffffffff802bee4c 0000040001602550 ffff810001602550
> Oct 25 00:23:11 barbossa kernel: ffff81003fa35b48 ffff810037cc8550
> 000000000000000b ffff81007efe9b08
> Oct 25 00:23:11 barbossa kernel: 0000000000000000 ffffffff8029db1e
> 000000000000000e ffffffff802bdff3
> Oct 25 00:23:11 barbossa kernel: Call Trace:
> Oct 25 00:23:11 barbossa kernel: [<ffffffff802bee4c>] blkdev_get_block+0x0/0x46
> Oct 25 00:23:11 barbossa kernel: [<ffffffff8029db1e>]
> generic_writepages+0x18e/0x2d8
> Oct 25 00:23:11 barbossa kernel: [<ffffffff802bdff3>] blkdev_writepage+0x0/0xf
> Oct 25 00:23:11 barbossa kernel: [<ffffffff8025591f>] do_writepages+0x20/0x2d
> Oct 25 00:23:11 barbossa kernel: [<ffffffff8022cf8c>]
> __writeback_single_inode+0x1b4/0x38b
> Oct 25 00:23:11 barbossa kernel: [<ffffffff8021f44f>]
> sync_sb_inodes+0x1d1/0x2b5
> Oct 25 00:23:11 barbossa kernel: [<ffffffff8024b3b7>]
> writeback_inodes+0x82/0xd8
> Oct 25 00:23:11 barbossa kernel: [<ffffffff8029de51>]
> balance_dirty_pages_ratelimited_nr+0x115/0x1f6
> Oct 25 00:23:11 barbossa kernel: [<ffffffff8020f4b1>]
> generic_file_buffered_write+0x516/0x64b
> Oct 25 00:23:11 barbossa kernel: [<ffffffff802295b6>] remove_suid+0x1/0x1c
> Oct 25 00:23:11 barbossa kernel: [<ffffffff80215018>]
> __generic_file_aio_write_nolock+0x375/0x3e8
> Oct 25 00:23:11 barbossa kernel: [<ffffffff802078bb>] unmap_vmas+0x372/0x716
> Oct 25 00:23:11 barbossa kernel: [<ffffffff8029bdea>]
> generic_file_aio_write_nolock+0x3a/0x86
> Oct 25 00:23:11 barbossa kernel: [<ffffffff802166a9>] do_sync_write+0xc9/0x10c
> Oct 25 00:23:11 barbossa kernel: [<ffffffff802899a3>]
> autoremove_wake_function+0x0/0x2e
> Oct 25 00:23:11 barbossa kernel: [<ffffffff8022c1e9>] __clear_user+0x12/0x50
> Oct 25 00:23:11 barbossa kernel: [<ffffffff8048c584>] read_zero+0x1d1/0x23c
> Oct 25 00:23:11 barbossa kernel: [<ffffffff802153ee>] vfs_write+0xce/0x174
> Oct 25 00:23:11 barbossa kernel: [<ffffffff8020afee>] fget_light+0x18/0x7c
> Oct 25 00:23:11 barbossa kernel: [<ffffffff80215d2f>] sys_write+0x45/0x6e
> Oct 25 00:23:11 barbossa kernel: [<ffffffff8025811e>] system_call+0x7e/0x83
> Oct 25 00:23:11 barbossa kernel:
> Oct 25 00:23:11 barbossa kernel:
> Oct 25 00:23:11 barbossa kernel: Code: 8b 03 a8 20 75 6c 8b 03 a8 02
> 74 66 8b 44 24 14 48 39 43 20
> Oct 25 00:23:11 barbossa kernel: RIP [<ffffffff8021a6c5>]
> __block_write_full_page+0xa4/0x2df
> Oct 25 00:23:11 barbossa kernel: RSP <ffff81003fa358f8>
> Oct 25 00:23:11 barbossa kernel: CR2: 0000000000000000
>
>
> [6.] A small shell script or example program which triggers the
> problem (if possible)
> dd if=/dev/zero of=/dev/hda6 bs=512K
>
> (note that it will also happen without the bs argument, however
> usually takes longer. It's not related to a particular point in the
> disk, or anything, though, just seems to last longer.)
>
>
> [7.] Environment
>
> Please see the below output for more environment information -- I
> didn't want to dump too much info in here. :-)
>
> http://sallaway.org/lkml/output.txt
>
> also, I've seen mention of this, but I'm not sure if it would be useful:
>
> http://sallaway.org/lkml/System.map-2.6.19-rc3
>
>
> Please let me know if you need any more info. This is my first bug
> report, so apologies if this should have gone elsewhere. :-)
>
> Cheers,
> Michael
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2006-10-25 00:09:20

by Michael

[permalink] [raw]

Subject: RE: PROBLEM: Oops when doing disk heavy disk I/O

> From: Badari Pulavarty [mailto:[email protected]]
> Sent: Wednesday, 25 October 2006 3:43 AM
>
> > Other things I have tried:
> > - SATA, SCSI and IDE drives -- all do the same thing
> > - removing *all* drives and cards and devices -- it does it with a
> > single IDE drive connected and no PCI cards
> > - kernels 2.6.16, 18, 18.1, 19-rc3.
>
> All of these kernels are having the same problem ? Or just noticed
> it only in 19-rc3 ?
>

All of them. I was going to try earlier kernels, but from my googling it
sounds like nforce 5xx support wasn't incorporated until recently (could be
wrong, though.)

Thanks,
Michael

2006-10-25 00:21:47

by Michael

[permalink] [raw]

Subject: RE: PROBLEM: Oops when doing disk heavy disk I/O

> -----Original Message-----
> From: Sergio Monteiro Basto [mailto:[email protected]]
> Sent: Wednesday, 25 October 2006 3:27 AM
>
> Hi,
> please boot with "report_lost_ticks" like
> http://marc.theaimsgroup.com/?l=linux-kernel&m=115545986619977&w=2
> and see if you have
> time.c: Lost n timer tick(s)!
>
> and
> cat /sys/devices/system/clocksource/clocksource0/
>
> cat /sys/devices/system/clocksource/clocksource0/current_clocksource
>
>

Hi,

Yep, seems to be losing some ticks -- I'm not sure what that means, but is
it bad? :-)

[ root @ barbossa : ~ ] # dmesg | grep tick
Command line: root=/dev/md1 ro report_lost_ticks
Kernel command line: root=/dev/md1 ro report_lost_ticks
time.c: Lost 1 timer tick(s)! rip release_console_sem+0x196/0x20c)
time.c: Lost 12 timer tick(s)! rip setup_boot_APIC_clock+0x11f/0x121)
time.c: Lost 2 timer tick(s)! rip __do_softirq+0x4a/0xc4)

(I've put a full dmesg below).

Also, for the clocksource:

[ root @ barbossa : ~ ] # ls /sys/devices/system/clocksource/clocksource0/
available_clocksource current_clocksource

[ root @ barbossa : ~ ] # cat /sys/devices/system/clocksource/clocksource0/
cat: /sys/devices/system/clocksource/clocksource0/: Is a directory

[ root @ barbossa : ~ ] # cat
/sys/devices/system/clocksource/clocksource0/available_clocksource
jiffies

[ root @ barbossa : ~ ] # cat
/sys/devices/system/clocksource/clocksource0/current_clocksource
jiffies

Thanks for your help.

Cheers,
Michael

Full dmesg:

[ root @ barbossa : ~ ] # dmesg
Linux version 2.6.19-rc3 (root@barbossa) (gcc version 4.1.2 20061007
(prerelease) (Debian 4.1.1-16)) #1 SMP Tue Oct 24 22:23:07 EST 2006
Command line: root=/dev/md1 ro report_lost_ticks
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000007fee0000 (usable)
BIOS-e820: 000000007fee0000 - 000000007fee3000 (ACPI NVS)
BIOS-e820: 000000007fee3000 - 000000007fef0000 (ACPI data)
BIOS-e820: 000000007fef0000 - 000000007ff00000 (reserved)
BIOS-e820: 00000000f0000000 - 00000000f4000000 (reserved)
BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
Entering add_active_range(0, 0, 159) 0 entries of 256 used
Entering add_active_range(0, 256, 524000) 1 entries of 256 used
end_pfn_map = 1048576
DMI 2.4 present.
ACPI: RSDP (v002 Nvidia ) @
0x00000000000f7760
ACPI: XSDT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @
0x000000007fee30c0
ACPI: FADT (v003 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @
0x000000007feec380
ACPI: HPET (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000098) @
0x000000007feec580
ACPI: MCFG (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @
0x000000007feec600
ACPI: MADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @
0x000000007feec4c0
ACPI: DSDT (v001 NVIDIA AWRDACPI 0x00001000 MSFT 0x0100000e) @
0x0000000000000000
Entering add_active_range(0, 0, 159) 0 entries of 256 used
Entering add_active_range(0, 256, 524000) 1 entries of 256 used
Zone PFN ranges:
DMA 0 -> 4096
DMA32 4096 -> 1048576
Normal 1048576 -> 1048576
early_node_map[2] active PFN ranges
0: 0 -> 159
0: 256 -> 524000
On node 0 totalpages: 523903
DMA zone: 56 pages used for memmap
DMA zone: 1737 pages reserved
DMA zone: 2206 pages, LIFO batch:0
DMA32 zone: 7108 pages used for memmap
DMA32 zone: 512796 pages, LIFO batch:31
Normal zone: 0 pages used for memmap
ACPI: PM-Timer IO Port: 0x1008
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 (Bootup-CPU)
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] disabled)
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge)
ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
ACPI: IRQ14 used by override.
ACPI: IRQ15 used by override.
Setting APIC routing to flat
ACPI: HPET id: 0x10de8201 base: 0xfefff000
Using ACPI (MADT) for SMP configuration information
Nosave address range: 000000000009f000 - 00000000000a0000
Nosave address range: 00000000000a0000 - 00000000000f0000
Nosave address range: 00000000000f0000 - 0000000000100000
Allocating PCI resources starting at 80000000 (gap: 7ff00000:70100000)
PERCPU: Allocating 33408 bytes of per cpu data
Built 1 zonelists. Total pages: 515002
Kernel command line: root=/dev/md1 ro report_lost_ticks
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
Console: colour VGA+ 80x25
time.c: Lost 1 timer tick(s)! rip release_console_sem+0x196/0x20c)
Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
Checking aperture...
CPU 0: aperture @ c8f4000000 size 32 MB
Aperture too small (32 MB)
No AGP bridge found
Memory: 2045020k/2096000k available (4241k kernel code, 50228k reserved,
1733k data, 272k init)
Calibrating delay using timer specific routine.. 4022.98 BogoMIPS
(lpj=8045974)
Security Framework v1.0.0 initialized
SELinux: Disabled at boot.
Capability LSM initialized
Mount-cache hash table entries: 256
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
SMP alternatives: switching to UP code
Freeing SMP alternatives: 48k freed
ACPI: Core revision 20060707
Using local APIC timer interrupts.
result 12559460
Detected 12.559 MHz APIC timer.
time.c: Lost 12 timer tick(s)! rip setup_boot_APIC_clock+0x11f/0x121)
Brought up 1 CPUs
testing NMI watchdog ... OK.
time.c: Using 25.000000 MHz WALL HPET GTOD HPET/TSC timer.
time.c: Detected 2009.511 MHz processor.
checking if image is initramfs... it is
Freeing initrd memory: 3210k freed
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using configuration type 1
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Probing PCI hardware (bus 00)
Boot video device is 0000:01:07.0
PCI: Transparent bridge - 0000:00:06.0
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.HUB0._PRT]
ACPI: PCI Interrupt Link [LNK1] (IRQs 5 7 9 10 11 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNK2] (IRQs 5 7 9 *10 11 14 15)
ACPI: PCI Interrupt Link [LNK3] (IRQs 5 7 9 10 *11 14 15)
ACPI: PCI Interrupt Link [LNK4] (IRQs 5 7 9 10 11 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNK5] (IRQs 5 7 9 10 11 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNK6] (IRQs 5 7 9 10 11 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNK7] (IRQs 5 7 9 10 11 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNK8] (IRQs 5 7 9 10 11 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LP2P] (IRQs 5 7 9 10 11 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LUBA] (IRQs 5 7 9 *10 11 14 15)
ACPI: PCI Interrupt Link [LMAC] (IRQs 5 7 9 *10 11 14 15)
ACPI: PCI Interrupt Link [LAZA] (IRQs 5 7 9 10 11 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LPMU] (IRQs 5 7 9 10 11 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LSMB] (IRQs *5 7 9 10 11 14 15)
ACPI: PCI Interrupt Link [LUB2] (IRQs 5 7 9 10 *11 14 15)
ACPI: PCI Interrupt Link [LIDE] (IRQs 5 7 9 10 11 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LSID] (IRQs 5 7 9 10 11 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LFID] (IRQs 5 7 9 10 11 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LSA2] (IRQs 5 7 9 10 11 14 15) *0, disabled.
ACPI: PCI Interrupt Link [APC1] (IRQs 16) *0, disabled.
ACPI: PCI Interrupt Link [APC2] (IRQs 17) *0, disabled.
ACPI: PCI Interrupt Link [APC3] (IRQs 18) *0, disabled.
ACPI: PCI Interrupt Link [APC4] (IRQs 19) *0, disabled.
ACPI: PCI Interrupt Link [APC5] (IRQs 16) *0, disabled.
ACPI: PCI Interrupt Link [APC6] (IRQs 16) *0, disabled.
ACPI: PCI Interrupt Link [APC7] (IRQs 16) *0, disabled.
ACPI: PCI Interrupt Link [APC8] (IRQs 16) *0, disabled.
ACPI: PCI Interrupt Link [APCF] (IRQs 20 21 22 23) *0, disabled.
ACPI: PCI Interrupt Link [APCH] (IRQs 20 21 22 23) *0, disabled.
ACPI: PCI Interrupt Link [APMU] (IRQs 20 21 22 23) *0, disabled.
ACPI: PCI Interrupt Link [AAZA] (IRQs 20 21 22 23) *0, disabled.
ACPI: PCI Interrupt Link [APCS] (IRQs 20 21 22 23) *0, disabled.
ACPI: PCI Interrupt Link [APCL] (IRQs 20 21 22 23) *0, disabled.
ACPI: PCI Interrupt Link [APCM] (IRQs 20 21 22 23) *0, disabled.
ACPI: PCI Interrupt Link [APCZ] (IRQs 20 21 22 23) *0, disabled.
ACPI: PCI Interrupt Link [APSI] (IRQs 20 21 22 23) *0, disabled.
ACPI: PCI Interrupt Link [APSJ] (IRQs 20 21 22 23) *0, disabled.
ACPI: PCI Interrupt Link [ASA2] (IRQs 20 21 22 23) *0, disabled.
SCSI subsystem initialized
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a
report
PCI-DMA: Disabling IOMMU.
PCI: Bridge: 0000:00:06.0
IO window: d000-dfff
MEM window: fc800000-fd7fffff
PREFETCH window: 80000000-800fffff
PCI: Bridge: 0000:00:0a.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:00:0b.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:00:0c.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:00:0d.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:00:0e.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:00:0f.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Setting latency timer of device 0000:00:06.0 to 64
PCI: Setting latency timer of device 0000:00:0a.0 to 64
PCI: Setting latency timer of device 0000:00:0b.0 to 64
PCI: Setting latency timer of device 0000:00:0c.0 to 64
PCI: Setting latency timer of device 0000:00:0d.0 to 64
PCI: Setting latency timer of device 0000:00:0e.0 to 64
PCI: Setting latency timer of device 0000:00:0f.0 to 64
NET: Registered protocol family 2
IP route cache hash table entries: 65536 (order: 7, 524288 bytes)
TCP established hash table entries: 262144 (order: 10, 4194304 bytes)
TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
TCP: Hash tables configured (established 262144 bind 65536)
TCP reno registered
audit: initializing netlink socket (disabled)
audit(1161771160.632:1): initialized
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
Installing knfsd (copyright (C) 1996 [email protected]).
fuse init (API version 7.7)
JFS: nTxBlock = 8192, nTxLock = 65536
SGI XFS with ACLs, security attributes, realtime, large block/inode numbers,
no debug enabled
SGI XFS Quota Management subsystem
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
time.c: Lost 2 timer tick(s)! rip __do_softirq+0x4a/0xc4)
PCI: Setting latency timer of device 0000:00:0a.0 to 64
pcie_portdrv_probe->Dev[0376:10de] has invalid IRQ. Check vendor BIOS
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:0a.0:pcie00]
PCI: Setting latency timer of device 0000:00:0b.0 to 64
pcie_portdrv_probe->Dev[0374:10de] has invalid IRQ. Check vendor BIOS
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:0b.0:pcie00]
PCI: Setting latency timer of device 0000:00:0c.0 to 64
pcie_portdrv_probe->Dev[0374:10de] has invalid IRQ. Check vendor BIOS
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:0c.0:pcie00]
PCI: Setting latency timer of device 0000:00:0d.0 to 64
pcie_portdrv_probe->Dev[0378:10de] has invalid IRQ. Check vendor BIOS
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:0d.0:pcie00]
PCI: Setting latency timer of device 0000:00:0e.0 to 64
pcie_portdrv_probe->Dev[0375:10de] has invalid IRQ. Check vendor BIOS
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:0e.0:pcie00]
PCI: Setting latency timer of device 0000:00:0f.0 to 64
pcie_portdrv_probe->Dev[0377:10de] has invalid IRQ. Check vendor BIOS
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:0f.0:pcie00]
vga16fb: initializing
vga16fb: mapped to 0xffff8100000a0000
fb0: VGA16 VGA frame buffer device
ACPI: Power Button (FF) [PWRF]
ACPI: Power Button (CM) [PWRB]
ACPI: Fan [FAN] (on)
ACPI: Getting cpuindex for acpiid 0x1
ACPI: Thermal Zone [THRM] (40 C)
Real Time Clock Driver v1.12ac
Linux agpgart interface v0.101 (c) Dave Jones
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
RAMDISK driver initialized: 16 RAM disks of 65536K size 1024 blocksize
loop: loaded (max 8 devices)
forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.57.
ACPI: PCI Interrupt Link [APCH] enabled at IRQ 23
ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [APCH] -> GSI 23 (level, low) ->
IRQ 23
PCI: Setting latency timer of device 0000:00:08.0 to 64
forcedeth: using HIGHDMA
eth0: forcedeth.c: subsystem: 01043:8239 bound to 0000:00:08.0
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
NFORCE-MCP55: IDE controller at PCI slot 0000:00:04.0
NFORCE-MCP55: chipset revision 161
NFORCE-MCP55: not 100% native mode: will probe irqs later
NFORCE-MCP55: BIOS didn't set cable bits correctly. Enabling workaround.
NFORCE-MCP55: 0000:00:04.0 (rev a1) UDMA133 controller
ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:DMA
Probing IDE interface ide0...
hda: IC35L040AVVN07-0, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hda: max request size: 128KiB
hda: 80418240 sectors (41174 MB) w/1863KiB Cache, CHS=65535/16/63, UDMA(100)
hda: cache flushes supported
hda: hda1 hda2 hda3 hda4 < hda5 hda6 >
ACPI: PCI Interrupt Link [APC3] enabled at IRQ 18
ACPI: PCI Interrupt 0000:01:08.0[A] -> Link [APC3] -> GSI 18 (level, low) ->
IRQ 18
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0
<Adaptec 29160B Ultra160 SCSI adapter>
aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs

usbmon: debugfs is not available
ACPI: PCI Interrupt Link [APCL] enabled at IRQ 22
ACPI: PCI Interrupt 0000:00:02.1[B] -> Link [APCL] -> GSI 22 (level, low) ->
IRQ 22
PCI: Setting latency timer of device 0000:00:02.1 to 64
ehci_hcd 0000:00:02.1: EHCI Host Controller
ehci_hcd 0000:00:02.1: new USB bus registered, assigned bus number 1
ehci_hcd 0000:00:02.1: debug port 1
PCI: cache line size of 64 is not supported by device 0000:00:02.1
ehci_hcd 0000:00:02.1: irq 22, io mem 0xfe02e000
ehci_hcd 0000:00:02.1: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 10 ports detected
116x: driver isp116x-hcd, 03 Nov 2005
ohci_hcd: 2006 August 04 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI)
ACPI: PCI Interrupt Link [APCF] enabled at IRQ 21
ACPI: PCI Interrupt 0000:00:02.0[A] -> Link [APCF] -> GSI 21 (level, low) ->
IRQ 21
PCI: Setting latency timer of device 0000:00:02.0 to 64
ohci_hcd 0000:00:02.0: OHCI Host Controller
ohci_hcd 0000:00:02.0: new USB bus registered, assigned bus number 2
ohci_hcd 0000:00:02.0: irq 21, io mem 0xfe02f000
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 10 ports detected
USB Universal Host Controller Interface driver v3.0
sl811: driver sl811-hcd, 19 May 2005
usbcore: registered new interface driver cdc_acm
drivers/usb/class/cdc-acm.c: v0.25:USB Abstract Control Model driver for USB
modems and ISDN adapters
Initializing USB Mass Storage driver...
usbcore: registered new interface driver usb-storage
USB Mass Storage support registered.
usbcore: registered new interface driver hiddev
usbcore: registered new interface driver usbhid
drivers/usb/input/hid-core.c: v2.6:USB HID core driver
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
mice: PS/2 mouse device common for all mice
input: PC Speaker as /class/input/input0
input: AT Translated Set 2 keyboard as /class/input/input1
I2O subsystem v1.325
i2o: max drivers = 8
I2O Configuration OSM v1.323
I2O Bus Adapter OSM v1.317
I2O Block Device OSM v1.325
I2O SCSI Peripheral OSM v1.316
I2O ProcFS OSM v1.316
i2c /dev entries driver
i2c_adapter i2c-0: nForce2 SMBus adapter at 0x1c00
i2c_adapter i2c-1: nForce2 SMBus adapter at 0x1c40
md: linear personality registered for level -1
md: raid0 personality registered for level 0
md: raid1 personality registered for level 1
md: raid10 personality registered for level 10
raid6: int64x1 1847 MB/s
raid6: int64x2 2357 MB/s
raid6: int64x4 2379 MB/s
raid6: int64x8 1679 MB/s
raid6: sse2x1 2697 MB/s
raid6: sse2x2 3699 MB/s
raid6: sse2x4 3815 MB/s
raid6: using algorithm sse2x4 (3815 MB/s)
md: raid6 personality registered for level 6
md: raid5 personality registered for level 5
md: raid4 personality registered for level 4
raid5: automatically using best checksumming function: generic_sse
generic_sse: 6186.000 MB/sec
raid5: using function: generic_sse (6186.000 MB/sec)
md: multipath personality registered for level -4
md: faulty personality registered for level -5
device-mapper: ioctl: 4.10.0-ioctl (2006-09-14) initialised:
[email protected]
device-mapper: multipath: version 1.0.5 loaded
device-mapper: multipath round-robin: version 1.0.0 loaded
device-mapper: multipath emc: version 0.0.3 loaded
Netfilter messages via NETLINK v0.30.
ip_conntrack version 2.4 (8192 buckets, 65536 max) - 312 bytes per conntrack
ctnetlink v0.90: registering with nfnetlink.
ip_conntrack_pptp version 3.1 loaded
ip_nat_pptp version 3.0 loaded
ip_tables: (C) 2000-2006 Netfilter Core Team
ClusterIP Version 0.8 loaded successfully
arp_tables: (C) 2002 David S. Miller
TCP bic registered
Initializing XFRM netlink socket
NET: Registered protocol family 1
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
ip6_tables: (C) 2000-2006 Netfilter Core Team
IPv6 over IPv4 tunneling driver
NET: Registered protocol family 17
NET: Registered protocol family 15
NET: Registered protocol family 8
NET: Registered protocol family 20
Freeing unused kernel memory: 272k freed
md: md0 stopped.
md: bind<hda1>
raid1: raid set md0 active with 1 out of 2 mirrors
md: md1 stopped.
md: bind<hda2>
raid1: raid set md1 active with 1 out of 2 mirrors
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
(fs/jbd/recovery.c, 255): journal_recover: JBD: recovery, exit status 0,
recovered transactions 15992 to 16718
(fs/jbd/recovery.c, 257): journal_recover: JBD: Replayed 6324 and revoked
1/4 blocks
kjournald starting. Commit interval 5 seconds
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
Adding 1052248k swap on /dev/hda3. Priority:-1 extents:1 across:1052248k
EXT3 FS on md1, internal journal
kjournald starting. Commit interval 5 seconds
EXT3 FS on md0, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
eth0: no IPv6 routers present

2006-10-25 00:42:00

by Sergio Monteiro Basto

[permalink] [raw]

Subject: RE: PROBLEM: Oops when doing disk heavy disk I/O

On Wed, 2006-10-25 at 10:21 +1000, Michael wrote:
>
> > -----Original Message-----
> > From: Sergio Monteiro Basto [mailto:[email protected]]
> > Sent: Wednesday, 25 October 2006 3:27 AM
> >
> > Hi,
> > please boot with "report_lost_ticks" like
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=115545986619977&w=2
> > and see if you have
> > time.c: Lost n timer tick(s)!
> >
> > and
> > cat /sys/devices/system/clocksource/clocksource0/
> >
> > cat /sys/devices/system/clocksource/clocksource0/current_clocksource
> >
> >
>
> Hi,
>
> Yep, seems to be losing some ticks -- I'm not sure what that means, but is
> it bad? :-)
>

well I don't know for sure , but after this initials Lost Timer tick ,
it appears more after a few time of uptime ?

> [ root @ barbossa : ~ ] # dmesg | grep tick
> Command line: root=/dev/md1 ro report_lost_ticks
> Kernel command line: root=/dev/md1 ro report_lost_ticks
> time.c: Lost 1 timer tick(s)! rip release_console_sem+0x196/0x20c)
> time.c: Lost 12 timer tick(s)! rip setup_boot_APIC_clock+0x11f/0x121)
> time.c: Lost 2 timer tick(s)! rip __do_softirq+0x4a/0xc4)
>

>
> Also, for the clocksource:
> [ root @ barbossa : ~ ] # cat
> /sys/devices/system/clocksource/clocksource0/available_clocksource
> jiffies
>
> [ root @ barbossa : ~ ] # cat
> /sys/devices/system/clocksource/clocksource0/current_clocksource
> jiffies
>
>
> Thanks for your help.
>
> Cheers,
> Michael
>

--
S?rgio M.B.

Attachments:

smime.p7s (2.12 kB)

2006-10-25 01:25:21

by Michael

[permalink] [raw]

Subject: RE: PROBLEM: Oops when doing disk heavy disk I/O

> -----Original Message-----
> From: Sergio Monteiro Basto [mailto:[email protected]]
> Sent: Wednesday, 25 October 2006 10:42 AM
>
> well I don't know for sure , but after this initials Lost Timer tick ,
> it appears more after a few time of uptime ?
>

Well, when the machine was just sitting there, I don't get any lost timer
ticks -- it's perfectly happy. However, I started to `dd if=/dev/zero
of=/dev/hda6` again, and after a few minutes, it oops'ed again (below),
losing 2 timer ticks: <4>time.c: Lost 2 timer tick(s)! rip
__do_softirq+0x4a/0xc4)

Also, I rebooted and did the same command, but with bs=512K. It Oops'ed
again (after about 30 seconds?), but this time, with:
<4>time.c: Lost 1 timer tick(s)! rip _spin_unlock_irqrestore+0x8/0x9)
Kernel panic - not syncing: Aiee, killing interrupt handler!

It seems to be related to interrupts, somehow... since I'm not too familiar
with what's happening, would anyone have any suggestions about what I could
do next to troubleshoot?

Thanks!
Michael

Full Oops:

Unable to handle kernel NULL pointer dereference at 000000000000002b RIP:
[<ffffffff8021a6c5>] __block_write_full_page+0xa4/0x2df
PGD 7a312067 PUD 7a374067 PMD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in:
Pid: 263, comm: pdflush Not tainted 2.6.19-rc3 #1
RIP: 0010:[<ffffffff8021a6c5>] [<ffffffff8021a6c5>]
__block_write_full_page+0xa4/0x2df
RSP: 0018:ffff810037ebbc30 EFLAGS: 00010287
RAX: 0000000000000000 RBX: 000000000000002b RCX: 0000000000000002
RDX: 000000000000000a RSI: 00000000000554a9 RDI: ffff810037cc8440
RBP: ffff8100015f1c30 R08: ffff810037cc8440 R09: ffff810037ebbe70
R10: ffffffff802bee4c R11: ffffffff80440b8b R12: ffff81001b2c9dc8
R13: 00000000001552a7 R14: ffff810037cc8440 R15: 0000000001c42574
FS: 00002b1c6f9536d0(0000) GS:ffffffff807d6000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000000002b CR3: 000000007a315000 CR4: 00000000000006e0
Process pdflush (pid: 263, threadinfo ffff810037eba000, task
ffff810037f250e0)
Stack: ffff810037ebbe70 ffffffff802bee4c 00000400015f1c30 ffff8100015f1c30
ffff810037ebbe70 ffff810037cc8550 000000000000000b ffff81007efe9b08
0000000000000000 ffffffff8029db1e 000000000000000e ffffffff802bdff3
Call Trace:
[<ffffffff802bee4c>] blkdev_get_block+0x0/0x46
[<ffffffff8029db1e>] generic_writepages+0x18e/0x2d8
[<ffffffff802bdff3>] blkdev_writepage+0x0/0xf
[<ffffffff8025591f>] do_writepages+0x20/0x2d
[<ffffffff8022cf8c>] __writeback_single_inode+0x1b4/0x38b
[<ffffffff8021f44f>] sync_sb_inodes+0x1d1/0x2b5
[<ffffffff802897e0>] keventd_create_kthread+0x0/0x61
[<ffffffff8024b3b7>] writeback_inodes+0x82/0xd8
[<ffffffff8029dfb4>] background_writeout+0x82/0xb5
[<ffffffff8025108f>] pdflush+0x0/0x1e3
[<ffffffff802511c8>] pdflush+0x139/0x1e3
[<ffffffff8029df32>] background_writeout+0x0/0xb5
[<ffffffff8022f78a>] kthread+0xd1/0x101
[<ffffffff80258ed8>] child_rip+0xa/0x12
[<ffffffff802897e0>] keventd_create_kthread+0x0/0x61
[<ffffffff8022f6b9>] kthread+0x0/0x101
[<ffffffff80258ece>] child_rip+0x0/0x12

Code: 8b 03 a8 20 75 6c 8b 03 a8 02 74 66 8b 44 24 14 48 39 43 20
RIP [<ffffffff8021a6c5>] __block_write_full_page+0xa4/0x2df
RSP <ffff810037ebbc30>
CR2: 000000000000002b
<4>time.c: Lost 2 timer tick(s)! rip __do_softirq+0x4a/0xc4)

2006-10-25 04:14:31

by Ray Lee

[permalink] [raw]

Subject: Re: PROBLEM: Oops when doing disk heavy disk I/O

> It seems to be related to interrupts, somehow... since I'm not too familiar
> with what's happening, would anyone have any suggestions about what I could
> do next to troubleshoot?

Try swapping out the RAM (or getting it down to 1Gig). Try a really
old kernel, such as debian's 2.6.8 package.

Ray

2006-10-25 12:28:17

by Michael

[permalink] [raw]

Subject: RE: PROBLEM: Oops when doing disk heavy disk I/O

> -----Original Message-----
> From: Ray Lee [mailto:[email protected]]
> Sent: Wednesday, 25 October 2006 2:13 PM
>
> Try swapping out the RAM (or getting it down to 1Gig). Try a really
> old kernel, such as debian's 2.6.8 package.
>
> Ray

Well, what do you know -- that seems to have fixed it! I took out one stick
of RAM (so it's down to 1 gig) and it seems to work fine, now, without any
boot parameters or anything. (mind you, murphy's law will dictate that it'll
crash about 30 seconds after I send this...)

I'm amazed at that -- but I'm not going to look a gift horse in the mouth,
this has been frustrating me for far too long. :-)

Although, having said that, I'm curious... It is working because there's
only 1 gig of RAM in there, or because it's only a single stick (ie. not
dual-channel)? It works fine with both sticks, individually, just not both
together... I wonder what the cause of it actually is...

Thanks heaps for the suggestion!

Cheers,
Michael

2006-10-25 15:08:47

by Ray Lee

[permalink] [raw]

Subject: Re: PROBLEM: Oops when doing disk heavy disk I/O

On 10/25/06, Michael <[email protected]> wrote:
> > Try swapping out the RAM (or getting it down to 1Gig). Try a really
> > old kernel, such as debian's 2.6.8 package.
>
> Well, what do you know -- that seems to have fixed it! I took out one stick
> of RAM (so it's down to 1 gig) and it seems to work fine, now, without any
> boot parameters or anything. (mind you, murphy's law will dictate that it'll
> crash about 30 seconds after I send this...)
>
> I'm amazed at that -- but I'm not going to look a gift horse in the mouth,
> this has been frustrating me for far too long. :-)
>
> Although, having said that, I'm curious... It is working because there's
> only 1 gig of RAM in there, or because it's only a single stick (ie. not
> dual-channel)? It works fine with both sticks, individually, just not both
> together... I wonder what the cause of it actually is...

That I don't know. Marginal power supplies can cause very strange
problems, or there can be motherboard/memory timing issues with lots
of RAM, or just plain old more-than-one-gig-of-RAM-tickles-software
bugs. Though the latter, especially on an x86-64 kernel, really
shouldn't be very common if present at all.

What I'd suggest is resending to the list the first oops after boot
(several of them would be good) while using 2 gigs of RAM, with a new
subject line to attract those who are especially good at reading these
things. (I can barely find the stack trace.)

Ray

2006-10-25 15:24:44

by Paulo Marques

[permalink] [raw]

Subject: Re: PROBLEM: Oops when doing disk heavy disk I/O

Michael wrote:
>> From: Ray Lee [mailto:[email protected]]
>> Try swapping out the RAM (or getting it down to 1Gig). Try a really
>> old kernel, such as debian's 2.6.8 package.
>
> [...]
> Although, having said that, I'm curious... It is working because there's
> only 1 gig of RAM in there, or because it's only a single stick (ie. not
> dual-channel)? It works fine with both sticks, individually, just not both
> together... I wonder what the cause of it actually is...

Another thing that I would try is to tweak the /proc/sys/vm/dirty_ratio
and /proc/sys/vm/dirty_background_ratio settings.

pdflush appears in your trace and with twice the RAM there is twice the
dirty data to write out. Maybe choosing half the ratio in both settings
with 2Gb of RAM would produce the same amount of dirty data as using
just 1Gb of RAM with the original settings.

This is still a bug, though. This test would just give more debug
information.

--
Paulo "grasping at straws" Marques - http://www.grupopie.com

"The face of a child can say it all, especially the
mouth part of the face."