2006-11-21 09:28:04

by Jesper Juhl

[permalink] [raw]
Subject: 2.6.19-rc6 : Spontaneous reboots, stack overflows - seems to implicate xfs, scsi, networking, SMP

Hi,

I have a server that has long suffered from spontaneous reboots and random
crashes.
The problems seem to be partly SMP related since the machine is rock solid
with a UP version of 2.6.11.11, but the same kernel compiled for SMP has
issues.

The server initially had 1 Intel Xeon CPU with HT and was very recently
upgraded with an additional one (in the blind hope that the issues would be
fixed). The kernel *seems* to die faster with 2 CPU's and a SMP kernel than
it previously did with just one (HT) CPU and a SMP kernel.

I've been trying newer kernels, such as 2.6.17.x, 2.6.18.x, 2.6.19-rc*
(all SMP), hoping that the problem(s) would be fixed, but that does not seem
to be the case.

Recently I've been using netconsole and have lots of debug options enabled
in the hope that I could capture some relevant info. Unfortunately nothing
ever really made it to the remote log - except one little incomplete bit I
got the other day (with 2.6.19-rc6) :

do_IRQ: stack overflow: 492

That is all that made it to the log, but it does indicate that the problem
might be stack-usage related.
Since the kernel was compiled with 4K stacks, perhaps if it was changed to
use 8K stacks it would stay up long enough for a complete dump to reach the
logs. But, if 8K stacks really did help, it would be nice if the dumps still
happened at the same point where they would have with 4K stacks.
So, I changed STACK_WARN in include/asm/thread_info.h from (THREAD_SIZE/8)
to(4608). This way I should get stack traces at the point where the kernel
would be in trouble with a 4K stack but since it's actually using a 8K stack
it should survive and let me capture the trace.

I got more than I could ever have hoped for. I still got spontaneous
reboots, but this time my remote log server captured tons of stack dumps.
I've got far too many to send here (more than 2G) and most of them are
identical anyway, so I'll just submit a few representative samples
initially.

Most of the traces include XFS functions and some also involve scsi and/or
networking. This is the reason I'm submitting this to the XFS & netdev lists
in addition to LKML.

All of these traces were collected with 2.6.19-rc6 with the modification
mentioned above.

Hardware details and software environment info is at the end of the email.


This is the most often captured trace :

do_IRQ: stack overflow: 4416
[<c01039a7>] dump_trace+0x1e7/0x1fd
[<c0103a67>] show_trace_log_lvl+0x1c/0x33
[<c0103a90>] show_trace+0x12/0x16
[<c0103b8f>] dump_stack+0x19/0x1d
[<c0105133>] do_IRQ+0xaf/0xd6
[<c01034fe>] common_interrupt+0x1a/0x20
[<c011ee20>] __do_softirq+0x59/0xd0
[<c011eece>] do_softirq+0x37/0x39
[<c011ef09>] irq_exit+0x39/0x3b
[<c01050f8>] do_IRQ+0x74/0xd6
[<c01034fe>] common_interrupt+0x1a/0x20
[<c02ee313>] make_request+0x320/0x426
[<c0236eec>] generic_make_request+0x14f/0x1b7
[<c02fbac7>] __map_bio+0x4c/0x93
[<c02fbcfd>] __clone_and_map+0xdb/0x30a
[<c02fbfd0>] __split_bio+0xa4/0xc7
[<c02fc093>] dm_request+0xa0/0xbf
[<c0236eec>] generic_make_request+0x14f/0x1b7
[<c0236fbc>] submit_bio+0x68/0x109
[<c02257bc>] _xfs_buf_ioapply+0x1cf/0x28d
[<c02258a3>] xfs_buf_iorequest+0x29/0x6e
[<c020992a>] xlog_bdstrat_cb+0x19/0x41
[<c020a212>] xlog_sync+0x24e/0x457
[<c020b7ce>] xlog_state_release_iclog+0x75/0xd0
[<c020bc8d>] xlog_state_sync+0x175/0x269
[<c0208ef6>] _xfs_log_force+0x7f/0x88
[<c01d1786>] xfs_alloc_search_busy+0xdf/0xe1
[<c01d0d1c>] xfs_alloc_get_freelist+0xe7/0xf5
[<c01d2d9b>] xfs_alloc_newroot+0x21/0x34f
[<c01d25c4>] xfs_alloc_insrec+0x3b0/0x3ce
[<c01d3cdf>] xfs_alloc_insert+0x5a/0xc3
[<c01d071a>] xfs_free_ag_extent+0x57f/0x5f2
[<c01d09f9>] xfs_alloc_fix_freelist+0x220/0x45c
[<c01d1285>] xfs_alloc_vextent+0x24e/0x47a
[<c01dfcb9>] xfs_bmap_btalloc+0x31f/0x966
[<c01e031e>] xfs_bmap_alloc+0x1e/0x29
[<c01e3b92>] xfs_bmapi+0x1134/0x1545
[<c0206a20>] xfs_iomap_write_allocate+0x2bb/0x509
[<c020576b>] xfs_iomap+0x357/0x459
[<c022b02f>] xfs_bmap+0x2e/0x35
[<c0222bbb>] xfs_map_blocks+0x3c/0x70
[<c0223b24>] xfs_page_state_convert+0x3cc/0x629
[<c0223ddd>] xfs_vm_writepage+0x5c/0xd3
[<c01417a4>] generic_writepages+0x1b9/0x2d5
[<c0223e78>] xfs_vm_writepages+0x24/0x4a
[<c01418ea>] do_writepages+0x2a/0x46
[<c0172c03>] __sync_single_inode+0x5c/0x1de
[<c0172e0a>] __writeback_single_inode+0x85/0x18f
[<c01730c7>] sync_sb_inodes+0x1b3/0x2b2
[<c0173278>] writeback_inodes+0xb2/0xbe
[<c0141395>] background_writeout+0x66/0x9a
[<c0141f37>] __pdflush+0xcf/0x184
[<c014201e>] pdflush+0x32/0x36
[<c012c918>] kthread+0xa9/0xae
[<c010373b>] kernel_thread_helper+0x7/0x10

another very common one is this one :

do_IRQ: stack overflow: 4532
[<c01039a7>] dump_trace+0x1e7/0x1fd
[<c0103a67>] show_trace_log_lvl+0x1c/0x33
[<c0103a90>] show_trace+0x12/0x16
[<c0103b8f>] dump_stack+0x19/0x1d
[<c0105133>] do_IRQ+0xaf/0xd6
[<c01034fe>] common_interrupt+0x1a/0x20
[<c02255a7>] xfs_buf_bio_end_io+0xd9/0x11f
[<c017a546>] bio_endio+0x55/0x7a
[<c02fb914>] dec_pending+0x3d/0x6b
[<c02fb9c7>] clone_endio+0x85/0xb1
[<c017a546>] bio_endio+0x55/0x7a
[<c0237435>] __end_that_request_first+0x1df/0x271
[<c02374dc>] end_that_request_chunk+0x8/0xa
[<c02adab9>] scsi_end_request+0x25/0xcb
[<c02adca4>] scsi_io_completion+0x82/0x301
[<c02b6b6d>] sd_rw_intr+0x76/0x20f
[<c02a9bf5>] scsi_finish_command+0x43/0x5e
[<c02ae3df>] scsi_softirq_done+0x70/0xd5
[<c0237540>] blk_done_softirq+0x62/0x6b
[<c011ee82>] __do_softirq+0xbb/0xd0
[<c011eece>] do_softirq+0x37/0x39
[<c011ef09>] irq_exit+0x39/0x3b
[<c01050f8>] do_IRQ+0x74/0xd6
[<c01034fe>] common_interrupt+0x1a/0x20
[<c013edf8>] mempool_alloc+0x21/0xce
[<c01796dd>] bio_alloc_bioset+0x79/0x13f
[<c02fbbdb>] clone_bio+0x36/0x7d
[<c02fbcf0>] __clone_and_map+0xce/0x30a
[<c02fbfd0>] __split_bio+0xa4/0xc7
[<c02fc093>] dm_request+0xa0/0xbf
[<c0236eec>] generic_make_request+0x14f/0x1b7
[<c0236fbc>] submit_bio+0x68/0x109
[<c02257bc>] _xfs_buf_ioapply+0x1cf/0x28d
[<c02258a3>] xfs_buf_iorequest+0x29/0x6e
[<c02254a4>] xfs_buf_iostart+0x6d/0x97
[<c0224e32>] xfs_buf_read_flags+0x8a/0x8c
[<c021768e>] xfs_trans_read_buf+0x153/0x2fc
[<c01eb559>] xfs_btree_read_bufs+0x6e/0x84
[<c01d27d9>] xfs_alloc_lookup+0x10a/0x39e
[<c01d3d76>] xfs_alloc_lookup_ge+0x17/0x1a
[<c01cf314>] xfs_alloc_ag_vextent_near+0x5f/0x957
[<c01cf107>] xfs_alloc_ag_vextent+0x104/0x106
[<c01d13a9>] xfs_alloc_vextent+0x372/0x47a
[<c01dfcb9>] xfs_bmap_btalloc+0x31f/0x966
[<c01e031e>] xfs_bmap_alloc+0x1e/0x29
[<c01e3b92>] xfs_bmapi+0x1134/0x1545
[<c0206a20>] xfs_iomap_write_allocate+0x2bb/0x509
[<c020576b>] xfs_iomap+0x357/0x459
[<c022b02f>] xfs_bmap+0x2e/0x35
[<c0222bbb>] xfs_map_blocks+0x3c/0x70
[<c0223b24>] xfs_page_state_convert+0x3cc/0x629
[<c0223ddd>] xfs_vm_writepage+0x5c/0xd3
[<c01417a4>] generic_writepages+0x1b9/0x2d5
[<c0223e78>] xfs_vm_writepages+0x24/0x4a
[<c01418ea>] do_writepages+0x2a/0x46
[<c0172c03>] __sync_single_inode+0x5c/0x1de
[<c0172e0a>] __writeback_single_inode+0x85/0x18f
[<c01730c7>] sync_sb_inodes+0x1b3/0x2b2
[<c0173278>] writeback_inodes+0xb2/0xbe
[<c0141478>] wb_kupdate+0x80/0xe9
[<c0141f37>] __pdflush+0xcf/0x184
[<c014201e>] pdflush+0x32/0x36
[<c012c918>] kthread+0xa9/0xae
[<c010373b>] kernel_thread_helper+0x7/0x10

This one seems to involve scsi :

do_IRQ: stack overflow: 4568
[<c01039a7>] dump_trace+0x1e7/0x1fd
[<c0103a67>] show_trace_log_lvl+0x1c/0x33
[<c0103a90>] show_trace+0x12/0x16
[<c0103b8f>] dump_stack+0x19/0x1d
[<c0105133>] do_IRQ+0xaf/0xd6
[<c01034fe>] common_interrupt+0x1a/0x20
[<c03872bf>] _spin_unlock_irq+0xa/0xb
[<c0235974>] blk_run_queue+0x42/0x77
[<c02ad99e>] scsi_run_queue+0xc9/0xf1
[<c02ada4f>] scsi_next_command+0x33/0x49
[<c02adb44>] scsi_end_request+0xb0/0xcb
[<c02adca4>] scsi_io_completion+0x82/0x301
[<c02b6b6d>] sd_rw_intr+0x76/0x20f
[<c02a9bf5>] scsi_finish_command+0x43/0x5e
[<c02ae3df>] scsi_softirq_done+0x70/0xd5
[<c0237540>] blk_done_softirq+0x62/0x6b
[<c011ee82>] __do_softirq+0xbb/0xd0
[<c011eece>] do_softirq+0x37/0x39
[<c011ef09>] irq_exit+0x39/0x3b
[<c01050f8>] do_IRQ+0x74/0xd6
[<c01034fe>] common_interrupt+0x1a/0x20
[<c03872bf>] _spin_unlock_irq+0xa/0xb
[<c0236eec>] generic_make_request+0x14f/0x1b7
[<c02fbac7>] __map_bio+0x4c/0x93
[<c02fbcfd>] __clone_and_map+0xdb/0x30a
[<c02fbfd0>] __split_bio+0xa4/0xc7
[<c02fc093>] dm_request+0xa0/0xbf
[<c0236eec>] generic_make_request+0x14f/0x1b7
[<c0236fbc>] submit_bio+0x68/0x109
[<c02257bc>] _xfs_buf_ioapply+0x1cf/0x28d
[<c02258a3>] xfs_buf_iorequest+0x29/0x6e
[<c02254a4>] xfs_buf_iostart+0x6d/0x97
[<c0224e32>] xfs_buf_read_flags+0x8a/0x8c
[<c021768e>] xfs_trans_read_buf+0x153/0x2fc
[<c01eb559>] xfs_btree_read_bufs+0x6e/0x84
[<c01d27d9>] xfs_alloc_lookup+0x10a/0x39e
[<c01d3d5c>] xfs_alloc_lookup_eq+0x14/0x17
[<c01ceef8>] xfs_alloc_fixup_trees+0x252/0x2a9
[<c01cff24>] xfs_alloc_ag_vextent_size+0x318/0x405
[<c01cf0e5>] xfs_alloc_ag_vextent+0xe2/0x106
[<c01d13a9>] xfs_alloc_vextent+0x372/0x47a
[<c01dfcb9>] xfs_bmap_btalloc+0x31f/0x966
[<c01e031e>] xfs_bmap_alloc+0x1e/0x29
[<c01e3b92>] xfs_bmapi+0x1134/0x1545
[<c0206a20>] xfs_iomap_write_allocate+0x2bb/0x509
[<c020576b>] xfs_iomap+0x357/0x459
[<c022b02f>] xfs_bmap+0x2e/0x35
[<c0222bbb>] xfs_map_blocks+0x3c/0x70
[<c0223b24>] xfs_page_state_convert+0x3cc/0x629
[<c0223ddd>] xfs_vm_writepage+0x5c/0xd3
[<c01417a4>] generic_writepages+0x1b9/0x2d5
[<c0223e78>] xfs_vm_writepages+0x24/0x4a
[<c01418ea>] do_writepages+0x2a/0x46
[<c0172c03>] __sync_single_inode+0x5c/0x1de
[<c0172e0a>] __writeback_single_inode+0x85/0x18f
[<c01730c7>] sync_sb_inodes+0x1b3/0x2b2
[<c0173278>] writeback_inodes+0xb2/0xbe
[<c0141395>] background_writeout+0x66/0x9a
[<c0141f37>] __pdflush+0xcf/0x184
[<c014201e>] pdflush+0x32/0x36
[<c012c918>] kthread+0xa9/0xae
[<c010373b>] kernel_thread_helper+0x7/0x10

And then there are some where stack space is really low, which would
certainly have killed us if running with 4K stacks :

First this :

do_IRQ: stack overflow: 3376
[<c01039a7>] dump_trace+0x1e7/0x1fd
[<c0103a67>] show_trace_log_lvl+0x1c/0x33
[<c0103a90>] show_trace+0x12/0x16
[<c0103b8f>] dump_stack+0x19/0x1d
[<c0105133>] do_IRQ+0xaf/0xd6
[<c01034fe>] common_interrupt+0x1a/0x20
[<c03872b2>] _spin_unlock_irqrestore+0xd/0x10
[<c029ab47>] e1000_xmit_frame+0x269/0x3b1
[<c0315f1f>] dev_hard_start_xmit+0x5a/0xd3
[<c03224da>] __qdisc_run+0x95/0x1d7
[<c03161b8>] dev_queue_xmit+0x220/0x285
[<c03841c1>] vlan_dev_hwaccel_hard_start_xmit+0x8a/0x92
[<c0315f1f>] dev_hard_start_xmit+0x5a/0xd3
[<c03160f7>] dev_queue_xmit+0x15f/0x285
[<c031b63e>] neigh_connected_output+0x93/0xba
[<c0330869>] ip_output+0x170/0x250
[<c0330d21>] ip_queue_xmit+0x3d8/0x4e1
[<c03415da>] tcp_transmit_skb+0x29e/0x45d
[<c0344511>] tcp_send_ack+0xb3/0xf4
[<c033e343>] tcp_send_dupack+0x28/0x7f
[<c033fa7f>] tcp_rcv_established+0x141/0x6c8
[<c03478c9>] tcp_v4_do_rcv+0xcb/0xcd
[<c0347f3e>] tcp_v4_rcv+0x673/0x7e3
[<c032d351>] ip_local_deliver+0xf8/0x22d
[<c032d6ca>] ip_rcv+0x244/0x4e4
[<c031667f>] netif_receive_skb+0x1f9/0x26a
[<c029bbea>] e1000_clean_rx_irq+0x17f/0x4b9
[<c029b6f9>] e1000_clean+0x66/0xfb
[<c0316890>] net_rx_action+0x96/0x174
[<c011ee82>] __do_softirq+0xbb/0xd0
[<c011eece>] do_softirq+0x37/0x39
[<c011ef09>] irq_exit+0x39/0x3b
[<c01050f8>] do_IRQ+0x74/0xd6
[<c01034fe>] common_interrupt+0x1a/0x20
[<c02fba08>] max_io_len+0x15/0x88
[<c02fbc66>] __clone_and_map+0x44/0x30a
[<c02fbfd0>] __split_bio+0xa4/0xc7
[<c02fc093>] dm_request+0xa0/0xbf
[<c0236eec>] generic_make_request+0x14f/0x1b7
[<c0236fbc>] submit_bio+0x68/0x109
[<c02257bc>] _xfs_buf_ioapply+0x1cf/0x28d
[<c02258a3>] xfs_buf_iorequest+0x29/0x6e
[<c02254a4>] xfs_buf_iostart+0x6d/0x97
[<c0224e32>] xfs_buf_read_flags+0x8a/0x8c
[<c021768e>] xfs_trans_read_buf+0x153/0x2fc
[<c01eb559>] xfs_btree_read_bufs+0x6e/0x84
[<c01d27d9>] xfs_alloc_lookup+0x10a/0x39e
[<c01d3d76>] xfs_alloc_lookup_ge+0x17/0x1a
[<c01cf314>] xfs_alloc_ag_vextent_near+0x5f/0x957
[<c01cf107>] xfs_alloc_ag_vextent+0x104/0x106
[<c01d13a9>] xfs_alloc_vextent+0x372/0x47a
[<c01dfcb9>] xfs_bmap_btalloc+0x31f/0x966
[<c01e031e>] xfs_bmap_alloc+0x1e/0x29
[<c01e3b92>] xfs_bmapi+0x1134/0x1545
[<c0206a20>] xfs_iomap_write_allocate+0x2bb/0x509
[<c020576b>] xfs_iomap+0x357/0x459
[<c022b02f>] xfs_bmap+0x2e/0x35
[<c0222bbb>] xfs_map_blocks+0x3c/0x70
[<c0223b24>] xfs_page_state_convert+0x3cc/0x629
[<c0223ddd>] xfs_vm_writepage+0x5c/0xd3
[<c01417a4>] generic_writepages+0x1b9/0x2d5
[<c0223e78>] xfs_vm_writepages+0x24/0x4a
[<c01418ea>] do_writepages+0x2a/0x46
[<c0172c03>] __sync_single_inode+0x5c/0x1de
[<c0172e0a>] __writeback_single_inode+0x85/0x18f
[<c01730c7>] sync_sb_inodes+0x1b3/0x2b2
[<c0173278>] writeback_inodes+0xb2/0xbe
[<c014117f>] balance_dirty_pages+0xa6/0x15c
[<c01412ca>] balance_dirty_pages_ratelimited_nr+0x59/0x5b
[<c013dc7f>] generic_file_buffered_write+0x2ef/0x61f
[<c022ae09>] xfs_write+0x96f/0xb1c
[<c0226774>] xfs_file_aio_write+0x78/0x8a
[<c01586d9>] do_sync_write+0xc1/0x100
[<c01587a9>] vfs_write+0x91/0x137
[<c01588fb>] sys_write+0x41/0x6b
[<c0102b93>] syscall_call+0x7/0xb
[<b7f6b95e>] 0xb7f6b95e

and this :

e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
Tx Queue <0>
TDH <d49>
TDT <d49>
next_to_use <d49>
next_to_clean <d8e>
buffer_info[next_to_clean]
time_stamp <aaa1fc>
next_to_watch <d8e>
jiffies <aaae6f>
next_to_watch.status <1>
do_IRQ: stack overflow: 3836
[<c01039a7>] dump_trace+0x1e7/0x1fd
[<c0103a67>] show_trace_log_lvl+0x1c/0x33
[<c0103a90>] show_trace+0x12/0x16
[<c0103b8f>] dump_stack+0x19/0x1d
[<c0105133>] do_IRQ+0xaf/0xd6
[<c01034fe>] common_interrupt+0x1a/0x20
[<c02443d0>] csum_partial+0xb8/0x120
DWARF2 unwinder stuck at csum_partial+0xb8/0x120
Leftover inexact backtrace:
[<c0313833>] __skb_checksum_complete+0x20/0x67
[<c035c397>] nf_ip_checksum+0xe0/0x125
[<c035fb91>] udp_error+0x105/0x184
[<c035da4c>] ip_conntrack_in+0x7d/0x294
[<c0326add>] nf_iterate+0x62/0x7c
[<c0326b4f>] nf_hook_slow+0x58/0xbf
[<c032d892>] ip_rcv+0x40c/0x4e4
[<c031667f>] netif_receive_skb+0x1f9/0x26a
[<c029bbea>] e1000_clean_rx_irq+0x17f/0x4b9
[<c029b6f9>] e1000_clean+0x66/0xfb
[<c0316890>] net_rx_action+0x96/0x174
[<c011ee82>] __do_softirq+0xbb/0xd0
[<c011eece>] do_softirq+0x37/0x39
[<c011ef09>] irq_exit+0x39/0x3b
[<c01050f8>] do_IRQ+0x74/0xd6
[<c01034fe>] common_interrupt+0x1a/0x20
[<c02fbc66>] __clone_and_map+0x44/0x30a
[<c02fbfd0>] __split_bio+0xa4/0xc7
[<c02fc093>] dm_request+0xa0/0xbf
[<c0236eec>] generic_make_request+0x14f/0x1b7
[<c0236fbc>] submit_bio+0x68/0x109
[<c02257bc>] _xfs_buf_ioapply+0x1cf/0x28d
[<c02258a3>] xfs_buf_iorequest+0x29/0x6e
[<c02254a4>] xfs_buf_iostart+0x6d/0x97
[<c0224e32>] xfs_buf_read_flags+0x8a/0x8c
[<c021768e>] xfs_trans_read_buf+0x153/0x2fc
[<c01eb559>] xfs_btree_read_bufs+0x6e/0x84
[<c01d27d9>] xfs_alloc_lookup+0x10a/0x39e
[<c01d3d76>] xfs_alloc_lookup_ge+0x17/0x1a
[<c01cf314>] xfs_alloc_ag_vextent_near+0x5f/0x957
[<c01cf107>] xfs_alloc_ag_vextent+0x104/0x106
[<c01d13a9>] xfs_alloc_vextent+0x372/0x47a
[<c01dfcb9>] xfs_bmap_btalloc+0x31f/0x966
[<c01e031e>] xfs_bmap_alloc+0x1e/0x29
[<c01e3b92>] xfs_bmapi+0x1134/0x1545
[<c0206a20>] xfs_iomap_write_allocate+0x2bb/0x509
[<c020576b>] xfs_iomap+0x357/0x459
[<c022b02f>] xfs_bmap+0x2e/0x35
[<c0222bbb>] xfs_map_blocks+0x3c/0x70
[<c0223b24>] xfs_page_state_convert+0x3cc/0x629
[<c0223ddd>] xfs_vm_writepage+0x5c/0xd3
[<c01417a4>] generic_writepages+0x1b9/0x2d5
[<c0223e78>] xfs_vm_writepages+0x24/0x4a
[<c01418ea>] do_writepages+0x2a/0x46
[<c0172c03>] __sync_single_inode+0x5c/0x1de
[<c0172e0a>] __writeback_single_inode+0x85/0x18f
[<c01730c7>] sync_sb_inodes+0x1b3/0x2b2
[<c0173278>] writeback_inodes+0xb2/0xbe
[<c014117f>] balance_dirty_pages+0xa6/0x15c
[<c01412ca>] balance_dirty_pages_ratelimited_nr+0x59/0x5b
[<c013dc7f>] generic_file_buffered_write+0x2ef/0x61f
[<c022ae09>] xfs_write+0x96f/0xb1c
[<c0226774>] xfs_file_aio_write+0x78/0x8a
[<c01586d9>] do_sync_write+0xc1/0x100
[<c01587a9>] vfs_write+0x91/0x137
[<c01588fb>] sys_write+0x41/0x6b
[<c0102b93>] syscall_call+0x7/0xb

and finally this one :

do_IRQ: stack overflow: 3916
[<c01039a7>] dump_trace+0x1e7/0x1fd
[<c0103a67>] show_trace_log_lvl+0x1c/0x33
[<c0103a90>] show_trace+0x12/0x16
[<c0103b8f>] dump_stack+0x19/0x1d
[<c0105133>] do_IRQ+0xaf/0xd6
[<c01034fe>] common_interrupt+0x1a/0x20
[<c0342148>] tcp_init_tso_segs+0x17/0x4c
[<c0342ac2>] tcp_write_xmit+0x5d/0x266
[<c0342cf4>] __tcp_push_pending_frames+0x29/0x81
[<c033fb46>] tcp_rcv_established+0x208/0x6c8
[<c03478c9>] tcp_v4_do_rcv+0xcb/0xcd
[<c0347f3e>] tcp_v4_rcv+0x673/0x7e3
[<c032d351>] ip_local_deliver+0xf8/0x22d
[<c032d6ca>] ip_rcv+0x244/0x4e4
[<c031667f>] netif_receive_skb+0x1f9/0x26a
[<c029bbea>] e1000_clean_rx_irq+0x17f/0x4b9
[<c029b6f9>] e1000_clean+0x66/0xfb
[<c0316890>] net_rx_action+0x96/0x174
[<c011ee82>] __do_softirq+0xbb/0xd0
[<c011eece>] do_softirq+0x37/0x39
[<c011ef09>] irq_exit+0x39/0x3b
[<c01050f8>] do_IRQ+0x74/0xd6
[<c01034fe>] common_interrupt+0x1a/0x20
[<c02fba08>] max_io_len+0x15/0x88
[<c02fbc66>] __clone_and_map+0x44/0x30a
[<c02fbfd0>] __split_bio+0xa4/0xc7
[<c02fc093>] dm_request+0xa0/0xbf
[<c0236eec>] generic_make_request+0x14f/0x1b7
[<c0236fbc>] submit_bio+0x68/0x109
[<c02257bc>] _xfs_buf_ioapply+0x1cf/0x28d
[<c02258a3>] xfs_buf_iorequest+0x29/0x6e
[<c02254a4>] xfs_buf_iostart+0x6d/0x97
[<c0224e32>] xfs_buf_read_flags+0x8a/0x8c
[<c021768e>] xfs_trans_read_buf+0x153/0x2fc
[<c01eb559>] xfs_btree_read_bufs+0x6e/0x84
[<c01d27d9>] xfs_alloc_lookup+0x10a/0x39e
[<c01d3d76>] xfs_alloc_lookup_ge+0x17/0x1a
[<c01cf314>] xfs_alloc_ag_vextent_near+0x5f/0x957
[<c01cf107>] xfs_alloc_ag_vextent+0x104/0x106
[<c01d13a9>] xfs_alloc_vextent+0x372/0x47a
[<c01dfcb9>] xfs_bmap_btalloc+0x31f/0x966
[<c01e031e>] xfs_bmap_alloc+0x1e/0x29
[<c01e3b92>] xfs_bmapi+0x1134/0x1545
[<c0206a20>] xfs_iomap_write_allocate+0x2bb/0x509
[<c020576b>] xfs_iomap+0x357/0x459
[<c022b02f>] xfs_bmap+0x2e/0x35
[<c0222bbb>] xfs_map_blocks+0x3c/0x70
[<c0223b24>] xfs_page_state_convert+0x3cc/0x629
[<c0223ddd>] xfs_vm_writepage+0x5c/0xd3
[<c01417a4>] generic_writepages+0x1b9/0x2d5
[<c0223e78>] xfs_vm_writepages+0x24/0x4a
[<c01418ea>] do_writepages+0x2a/0x46
[<c0172c03>] __sync_single_inode+0x5c/0x1de
[<c0172e0a>] __writeback_single_inode+0x85/0x18f
[<c01730c7>] sync_sb_inodes+0x1b3/0x2b2
[<c0173278>] writeback_inodes+0xb2/0xbe
[<c014117f>] balance_dirty_pages+0xa6/0x15c
[<c01412ca>] balance_dirty_pages_ratelimited_nr+0x59/0x5b
[<c013dc7f>] generic_file_buffered_write+0x2ef/0x61f
[<c022ae09>] xfs_write+0x96f/0xb1c
[<c0226774>] xfs_file_aio_write+0x78/0x8a
[<c01586d9>] do_sync_write+0xc1/0x100
[<c01587a9>] vfs_write+0x91/0x137
[<c01588fb>] sys_write+0x41/0x6b
[<c0102b93>] syscall_call+0x7/0xb
[<b7f6b95e>] 0xb7f6b95e

And there are lots of other ones as well that differ slightly from the ones
above.


Some hardware/software details :

# scripts/ver_linux
If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.

Linux server.mydomain.net 2.6.19-rc6 #1 SMP Mon Nov 20 14:33:26 CET 2006 i686
GNU/Linux

Gnu C 3.3.5
Gnu make 3.80
binutils 2.15
util-linux 2.12p
mount 2.12p
module-init-tools 3.2-pre1
e2fsprogs 1.37
xfsprogs 2.6.20
nfs-utils 1.0.6
Linux C Library 2.3.2
Dynamic linker (ldd) 2.3.2
Procps 3.2.1
Net-tools 1.60
Console-tools 0.2.3
Sh-utils 5.2.1
udev 056
Modules Loaded sky2 piix ide_core eeprom

# lspci -vvx
0000:00:00.0 Host bridge: Intel Corp. Server Memory Controller Hub (rev 0c)
Subsystem: Intel Corp.: Unknown device 3439
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Capabilities: [40] #09 [4105]
00: 86 80 90 35 46 01 90 00 0c 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 39 34
30: 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00

0000:00:00.1 ff00: Intel Corp. Memory Controller Hub Error Reporting Register
(rev 0c)
Subsystem: Intel Corp.: Unknown device 3439
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
00: 86 80 91 35 00 01 00 00 0c 00 00 ff 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 39 34
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

0000:00:01.0 System peripheral: Intel Corp. Memory Controller Hub DMA
Controller (rev 0c)
Subsystem: Intel Corp.: Unknown device 3439
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin A routed to IRQ 10
Region 0: Memory at fcbff000 (32-bit, non-prefetchable) [size=4K]
Capabilities: [b0] Message Signalled Interrupts: 64bit- Queue=0/1
Enable-
Address: fee00000 Data: 0000
00: 86 80 94 35 02 01 10 00 0c 00 80 08 00 00 00 00
10: 00 f0 bf fc 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 39 34
30: 00 00 00 00 b0 00 00 00 00 00 00 00 0a 01 00 00

0000:00:02.0 PCI bridge: Intel Corp. Memory Controller Hub PCI Express Port A0
(rev 0c) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0, Cache Line Size: 0x10 (64 bytes)
Bus: primary=00, secondary=01, subordinate=03, sec-latency=0
I/O behind bridge: 0000b000-0000cfff
Memory behind bridge: fcc00000-fcefffff
Prefetchable memory behind bridge: 00000000fa000000-00000000fb700000
BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Message Signalled Interrupts: 64bit- Queue=0/1
Enable-
Address: fee00000 Data: 0000
Capabilities: [64] #10 [0041]
00: 86 80 95 35 47 01 10 00 0c 00 04 06 10 00 01 00
10: 00 00 00 00 00 00 00 00 00 01 03 00 b0 c0 00 00
20: c0 fc e0 fc 01 fa 71 fb 00 00 00 00 00 00 00 00
30: 00 00 00 00 50 00 00 00 00 00 00 00 0a 01 06 00

0000:00:04.0 PCI bridge: Intel Corp. Memory Controller Hub PCI Express Port B0
(rev 0c) (prog-if 00 [Normal decode])
Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0, Cache Line Size: 0x10 (64 bytes)
Bus: primary=00, secondary=04, subordinate=04, sec-latency=0
BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Message Signalled Interrupts: 64bit- Queue=0/1
Enable-
Address: fee00000 Data: 0000
Capabilities: [64] #10 [0041]
00: 86 80 97 35 44 01 10 00 0c 00 04 06 10 00 01 00
10: 00 00 00 00 00 00 00 00 00 04 04 00 f0 00 00 20
20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 50 00 00 00 00 00 00 00 0a 01 06 00

0000:00:05.0 PCI bridge: Intel Corp. Memory Controller Hub PCI Express Port B1
(rev 0c) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0, Cache Line Size: 0x10 (64 bytes)
Bus: primary=00, secondary=05, subordinate=05, sec-latency=0
I/O behind bridge: 0000d000-0000dfff
Memory behind bridge: fcf00000-fcffffff
BridgeCtl: Parity+ SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Message Signalled Interrupts: 64bit- Queue=0/1
Enable-
Address: fee00000 Data: 0000
Capabilities: [64] #10 [0041]
00: 86 80 98 35 47 01 18 00 0c 00 04 06 10 00 01 00
10: 00 00 00 00 00 00 00 00 00 05 05 00 d0 d0 00 00
20: f0 fc f0 fc f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 50 00 00 00 00 00 00 00 0a 01 07 00

0000:00:06.0 PCI bridge: Intel Corp. Memory Controller Hub PCI Express Port C0
(rev 0c) (prog-if 00 [Normal decode])
Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0, Cache Line Size: 0x10 (64 bytes)
Bus: primary=00, secondary=06, subordinate=06, sec-latency=0
BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Message Signalled Interrupts: 64bit- Queue=0/1
Enable-
Address: fee00000 Data: 0000
Capabilities: [64] #10 [0041]
00: 86 80 99 35 44 01 10 00 0c 00 04 06 10 00 01 00
10: 00 00 00 00 00 00 00 00 00 06 06 00 f0 00 00 20
20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 50 00 00 00 00 00 00 00 0a 01 06 00

0000:00:1d.0 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB UHCI #1
(rev 02) (prog-if 00 [UHCI])
Subsystem: Intel Corp.: Unknown device 3439
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin A routed to IRQ 10
Region 4: I/O ports at a880 [size=32]
00: 86 80 d2 24 05 00 80 02 02 00 03 0c 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 81 a8 00 00 00 00 00 00 00 00 00 00 86 80 39 34
30: 00 00 00 00 00 00 00 00 00 00 00 00 0a 01 00 00

0000:00:1d.1 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB UHCI #2
(rev 02) (prog-if 00 [UHCI])
Subsystem: Intel Corp.: Unknown device 3439
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin B routed to IRQ 7
Region 4: I/O ports at ac00 [size=32]
00: 86 80 d4 24 05 00 80 02 02 00 03 0c 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 01 ac 00 00 00 00 00 00 00 00 00 00 86 80 39 34
30: 00 00 00 00 00 00 00 00 00 00 00 00 07 02 00 00

0000:00:1d.2 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB UHCI #3
(rev 02) (prog-if 00 [UHCI])
Subsystem: Intel Corp.: Unknown device 3439
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin C routed to IRQ 15
Region 4: I/O ports at ac80 [size=32]
00: 86 80 d7 24 05 00 80 02 02 00 03 0c 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 81 ac 00 00 00 00 00 00 00 00 00 00 86 80 39 34
30: 00 00 00 00 00 00 00 00 00 00 00 00 0f 03 00 00

0000:00:1d.7 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB2 EHCI
Controller (rev 02) (prog-if 20 [EHCI])
Subsystem: Intel Corp.: Unknown device 3439
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin D routed to IRQ 21
Region 0: Memory at fcbfec00 (32-bit, non-prefetchable) [size=1K]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] #0a [20a0]
00: 86 80 dd 24 06 01 90 02 02 20 03 0c 00 00 00 00
10: 00 ec bf fc 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 39 34
30: 00 00 00 00 50 00 00 00 00 00 00 00 05 04 00 00

0000:00:1e.0 PCI bridge: Intel Corp. 82801 PCI Bridge (rev c2) (prog-if 00
[Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Bus: primary=00, secondary=07, subordinate=07, sec-latency=32
I/O behind bridge: 0000e000-0000efff
Memory behind bridge: fd000000-febfffff
Prefetchable memory behind bridge: fb800000-fbffffff
BridgeCtl: Parity+ SERR+ NoISA- VGA+ MAbort- >Reset- FastB2B-
00: 86 80 4e 24 47 01 80 00 c2 00 04 06 00 00 01 00
10: 00 00 00 00 00 00 00 00 00 07 07 20 e0 e0 80 02
20: 00 fd b0 fe 80 fb f0 fb 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0b 00

0000:00:1f.0 ISA bridge: Intel Corp. 82801EB/ER (ICH5/ICH5R) LPC Bridge (rev
02)
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0
00: 86 80 d0 24 4f 01 80 02 02 00 01 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

0000:00:1f.1 IDE interface: Intel Corp. 82801EB/ER (ICH5/ICH5R) Ultra ATA 100
Storage Controller (rev 02) (prog-if 8a [Master SecP PriP])
Subsystem: Intel Corp.: Unknown device 3439
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin A routed to IRQ 20
Region 0: I/O ports at <unassigned>
Region 1: I/O ports at <unassigned>
Region 2: I/O ports at <unassigned>
Region 3: I/O ports at <unassigned>
Region 4: I/O ports at fc00 [size=16]
Region 5: Memory at 88000000 (32-bit, non-prefetchable) [size=1K]
00: 86 80 db 24 07 00 88 02 02 8a 01 01 00 00 00 00
10: 01 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00
20: 01 fc 00 00 00 00 00 88 00 00 00 00 86 80 39 34
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00

0000:00:1f.2 IDE interface: Intel Corp. 82801EB (ICH5) Serial ATA 150 Storage
Controller (rev 02) (prog-if 8f [Master SecP SecO PriP PriO])
Subsystem: Intel Corp.: Unknown device 3460
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR- FastB2B-
Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin A routed to IRQ 20
Region 0: I/O ports at a800 [size=8]
Region 1: I/O ports at a480 [size=4]
Region 2: I/O ports at a400 [size=8]
Region 3: I/O ports at a080 [size=4]
Region 4: I/O ports at a000 [size=16]
00: 86 80 d1 24 45 00 a0 02 02 8f 01 01 00 00 00 00
10: 01 a8 00 00 81 a4 00 00 01 a4 00 00 81 a0 00 00
20: 01 a0 00 00 00 00 00 00 00 00 00 00 86 80 60 34
30: 00 00 00 00 00 00 00 00 00 00 00 00 0f 01 00 00

0000:00:1f.3 SMBus: Intel Corp. 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev
02)
Subsystem: Intel Corp.: Unknown device 3439
Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin B routed to IRQ 22
Region 4: I/O ports at 0540 [size=32]
00: 86 80 d3 24 01 00 80 02 02 00 05 0c 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 41 05 00 00 00 00 00 00 00 00 00 00 86 80 39 34
30: 00 00 00 00 00 00 00 00 00 00 00 00 0b 02 00 00

0000:01:00.0 PCI bridge: Intel Corp. PCI Bridge Hub A (rev 09) (prog-if 00
[Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0, Cache Line Size: 0x10 (64 bytes)
Bus: primary=01, secondary=02, subordinate=02, sec-latency=48
I/O behind bridge: 0000b000-0000bfff
Memory behind bridge: fcd00000-fcdfffff
Prefetchable memory behind bridge: 00000000fa000000-00000000faf00000
BridgeCtl: Parity+ SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
Capabilities: [44] #10 [0071]
Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/0
Enable-
Address: 0000000000000000 Data: 0000
Capabilities: [6c] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [d8] 00: 86 80 29 03 47 01 10 00 09 00 04 06 10 00 81 00
10: 00 00 00 00 00 00 00 00 01 02 02 30 b0 b0 a0 02
20: d0 fc d0 fc 01 fa f1 fa 00 00 00 00 00 00 00 00
30: 00 00 00 00 44 00 00 00 00 00 00 00 00 00 07 00

0000:01:00.1 PIC: Intel Corp. PCI Bridge Hub I/OxAPIC Interrupt Controller A
(rev 09) (prog-if 20 [IO(X)-APIC])
Subsystem: Intel Corp.: Unknown device 3439
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Region 0: Memory at fccfe000 (32-bit, non-prefetchable) [size=4K]
Capabilities: [44] #10 [0001]
Capabilities: [6c] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: 86 80 26 03 46 01 10 00 09 20 00 08 00 00 80 00
10: 00 e0 cf fc 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 39 34
30: 00 00 00 00 44 00 00 00 00 00 00 00 00 00 00 00

0000:01:00.2 PCI bridge: Intel Corp. PCI Bridge Hub B (rev 09) (prog-if 00
[Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0, Cache Line Size: 0x10 (64 bytes)
Bus: primary=01, secondary=03, subordinate=03, sec-latency=48
I/O behind bridge: 0000c000-0000cfff
Memory behind bridge: fce00000-fcefffff
Prefetchable memory behind bridge: 00000000fb000000-00000000fb700000
BridgeCtl: Parity+ SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
Capabilities: [44] #10 [0071]
Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/0
Enable-
Address: 0000000000000000 Data: 0000
Capabilities: [6c] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [d8] 00: 86 80 2a 03 47 01 10 00 09 00 04 06 10 00 81 00
10: 00 00 00 00 00 00 00 00 01 03 03 30 c0 c0 a0 02
20: e0 fc e0 fc 01 fb 71 fb 00 00 00 00 00 00 00 00
30: 00 00 00 00 44 00 00 00 00 00 00 00 00 00 07 00

0000:01:00.3 PIC: Intel Corp. PCI Bridge Hub I/OxAPIC Interrupt Controller B
(rev 09) (prog-if 20 [IO(X)-APIC])
Subsystem: Intel Corp.: Unknown device 3439
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Region 0: Memory at fccff000 (32-bit, non-prefetchable) [size=4K]
Capabilities: [44] #10 [0001]
Capabilities: [6c] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: 86 80 27 03 46 01 10 00 09 20 00 08 00 00 80 00
10: 00 f0 cf fc 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 39 34
30: 00 00 00 00 44 00 00 00 00 00 00 00 00 00 00 00

0000:02:02.0 RAID bus controller: 3ware Inc 3ware ATA-RAID
Subsystem: 3ware Inc 3ware ATA-RAID
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV+ VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (2250ns min), Cache Line Size: 0x10 (64 bytes)
Interrupt: pin A routed to IRQ 17
Region 0: I/O ports at bc00 [size=256]
Region 1: Memory at fcdffc00 (64-bit, non-prefetchable) [size=256]
Region 3: Memory at fa800000 (64-bit, prefetchable) [size=8M]
Expansion ROM at fcde0000 [disabled] [size=64K]
Capabilities: [48] Power Management version 2
Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA
PME(D0+,D1+,D2-,D3hot+,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: c1 13 02 10 5f 01 30 02 00 00 04 01 10 20 00 00
10: 01 bc 00 00 04 fc df fc 00 00 00 00 0c 00 80 fa
20: 00 00 00 00 00 00 00 00 00 00 00 00 c1 13 02 10
30: 00 00 de fc 48 00 00 00 00 00 00 00 0a 01 09 00

0000:02:03.0 RAID bus controller: 3ware Inc 3ware ATA-RAID
Subsystem: 3ware Inc 3ware ATA-RAID
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV+ VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (2250ns min), Cache Line Size: 0x10 (64 bytes)
Interrupt: pin A routed to IRQ 18
Region 0: I/O ports at b800 [size=256]
Region 1: Memory at fcdff800 (64-bit, non-prefetchable) [size=256]
Region 3: Memory at fa000000 (64-bit, prefetchable) [size=8M]
Expansion ROM at fcdd0000 [disabled] [size=64K]
Capabilities: [40] PCI-X non-bridge device.
Command: DPERE- ERO- RBC=0 OST=0
Status: Bus=255 Dev=31 Func=0 64bit+ 133MHz+ SCD- USC-,
DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-
Capabilities: [48] Power Management version 2
Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA
PME(D0+,D1+,D2-,D3hot+,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: c1 13 02 10 5f 01 30 02 00 00 04 01 10 20 00 00
10: 01 b8 00 00 04 f8 df fc 00 00 00 00 0c 00 00 fa
20: 00 00 00 00 00 00 00 00 00 00 00 00 c1 13 02 10
30: 00 00 dd fc 40 00 00 00 00 00 00 00 0a 01 09 00

0000:03:01.0 RAID bus controller: 3ware Inc 3ware ATA-RAID
Subsystem: 3ware Inc 3ware ATA-RAID
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV+ VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (2250ns min), Cache Line Size: 0x10 (64 bytes)
Interrupt: pin A routed to IRQ 19
Region 0: I/O ports at cc00 [size=256]
Region 1: Memory at fceffc00 (64-bit, non-prefetchable) [size=256]
Region 3: Memory at fb000000 (64-bit, prefetchable) [size=8M]
Expansion ROM at fcee0000 [disabled] [size=64K]
Capabilities: [48] Power Management version 2
Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA
PME(D0+,D1+,D2-,D3hot+,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: c1 13 02 10 5f 01 30 02 00 00 04 01 10 20 00 00
10: 01 cc 00 00 04 fc ef fc 00 00 00 00 0c 00 00 fb
20: 00 00 00 00 00 00 00 00 00 00 00 00 c1 13 02 10
30: 00 00 ee fc 48 00 00 00 00 00 00 00 0a 01 09 00

0000:05:00.0 Ethernet controller: Marvell Technology Group Ltd.: Unknown
device 4361 (rev 18)
Subsystem: Intel Corp.: Unknown device 3439
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0, Cache Line Size: 0x10 (64 bytes)
Interrupt: pin A routed to IRQ 223
Region 0: Memory at fcffc000 (64-bit, non-prefetchable) [size=16K]
Region 2: I/O ports at dc00 [size=256]
Expansion ROM at fcfc0000 [disabled] [size=128K]
Capabilities: [48] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] Vital Product Data
Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/1
Enable+
Address: 00000000fee0f00c Data: 41e1
Capabilities: [e0] #10 [0011]
00: ab 11 61 43 47 05 10 00 18 00 00 02 10 00 00 00
10: 04 c0 ff fc 00 00 00 00 01 dc 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 39 34
30: 00 00 fc fc 48 00 00 00 00 00 00 00 0a 01 00 00

0000:07:04.0 Ethernet controller: Intel Corp. 82541GI/PI Gigabit Ethernet
Controller (rev 05)
Subsystem: Intel Corp.: Unknown device 3439
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (63750ns min), Cache Line Size: 0x10 (64 bytes)
Interrupt: pin A routed to IRQ 16
Region 0: Memory at febe0000 (32-bit, non-prefetchable) [size=128K]
Region 2: I/O ports at ec80 [size=64]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [e4] PCI-X non-bridge device.
Command: DPERE- ERO+ RBC=0 OST=0
Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-,
DC=simple, DMMRBC=2, DMOST=0, DMCRS=0, RSCEM-
00: 86 80 76 10 57 01 30 02 05 00 00 02 10 20 00 00
10: 00 00 be fe 00 00 00 00 81 ec 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 39 34
30: 00 00 00 00 dc 00 00 00 00 00 00 00 0a 01 ff 00

0000:07:06.0 RAID bus controller: 3ware Inc 3ware ATA-RAID
Subsystem: 3ware Inc 3ware ATA-RAID
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV+ VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (2250ns min), Cache Line Size: 0x10 (64 bytes)
Interrupt: pin A routed to IRQ 20
Region 0: I/O ports at e800 [size=256]
Region 1: Memory at febdbc00 (64-bit, non-prefetchable) [size=256]
Region 3: Memory at fb800000 (64-bit, prefetchable) [size=8M]
Expansion ROM at fe020000 [disabled] [size=64K]
Capabilities: [48] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0+,D1+,D2+,D3hot+,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: c1 13 02 10 5f 01 30 02 00 00 04 01 10 20 00 00
10: 01 e8 00 00 04 bc bd fe 00 00 00 00 0c 00 80 fb
20: 00 00 00 00 00 00 00 00 00 00 00 00 c1 13 02 10
30: 00 00 bc fe 48 00 00 00 00 00 00 00 0f 01 09 00

0000:07:0c.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
(prog-if 00 [VGA])
Subsystem: Intel Corp.: Unknown device 3439
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping+ SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (2000ns min), Cache Line Size: 0x10 (64 bytes)
Interrupt: pin A routed to IRQ 11
Region 0: Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
Region 1: I/O ports at e400 [size=256]
Region 2: Memory at febda000 (32-bit, non-prefetchable) [size=4K]
Expansion ROM at fe000000 [disabled] [size=128K]
Capabilities: [5c] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: 02 10 52 47 87 00 90 02 27 00 00 03 10 20 00 00
10: 00 00 00 fd 01 e4 00 00 00 a0 bd fe 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 39 34
30: 00 00 ba fe 5c 00 00 00 00 00 00 00 0b 01 08 00

# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Xeon(TM) CPU 3.20GHz
stepping : 10
cpu MHz : 3192.358
cache size : 2048 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
constant_tsc pni monitor ds_cpl cid cx16 xtpr lahf_lm
bogomips : 6388.75

processor : 1
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Xeon(TM) CPU 3.20GHz
stepping : 10
cpu MHz : 3192.358
cache size : 2048 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
constant_tsc pni monitor ds_cpl cid cx16 xtpr lahf_lm
bogomips : 6384.53

processor : 2
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Xeon(TM) CPU 3.20GHz
stepping : 10
cpu MHz : 3192.358
cache size : 2048 KB
physical id : 3
siblings : 2
core id : 0
cpu cores : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
constant_tsc pni monitor ds_cpl cid cx16 xtpr lahf_lm
bogomips : 6384.50

processor : 3
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Xeon(TM) CPU 3.20GHz
stepping : 10
cpu MHz : 3192.358
cache size : 2048 KB
physical id : 3
siblings : 2
core id : 0
cpu cores : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
constant_tsc pni monitor ds_cpl cid cx16 xtpr lahf_lm
bogomips : 6384.54

# cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
0: 17246519 2140 2140 2148 IO-APIC-edge timer
1: 3 3 2 2 IO-APIC-edge i8042
8: 3 0 1 0 IO-APIC-edge rtc
9: 0 0 0 0 IO-APIC-fasteoi acpi
12: 0 1 0 2 IO-APIC-edge i8042
14: 0 0 0 0 IO-APIC-edge libata
15: 0 0 0 0 IO-APIC-edge libata
16: 31000929 7811302 39894575 15870447 IO-APIC-fasteoi eth0
17: 1078416 3044826 2489404 1780707 IO-APIC-fasteoi 3w-9xxx
18: 7107793 931865 5531801 862511 IO-APIC-fasteoi 3w-9xxx
19: 494962 141885 25640 282908 IO-APIC-fasteoi 3w-9xxx
20: 2130674 3511229 1293435 2256288 IO-APIC-fasteoi 3w-9xxx,
libata
21: 0 0 0 0 IO-APIC-fasteoi
ehci_hcd:usb1
223: 0 0 0 1 PCI-MSI-edge eth1
NMI: 17252842 17252814 17252813 17252812
LOC: 17234121 17234121 17231585 17231584
ERR: 0
MIS: 0

# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: AMCC Model: 9500S-12MI DISK Rev: 2.08
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 01 Lun: 00
Vendor: AMCC Model: 9500S-12MI DISK Rev: 2.08
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 02 Lun: 00
Vendor: AMCC Model: 9500S-12MI DISK Rev: 2.08
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 03 Lun: 00
Vendor: AMCC Model: 9500S-12MI DISK Rev: 2.08
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 04 Lun: 00
Vendor: AMCC Model: 9500S-12MI DISK Rev: 2.08
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi1 Channel: 00 Id: 00 Lun: 00
Vendor: AMCC Model: 9500S-12MI DISK Rev: 2.08
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi1 Channel: 00 Id: 01 Lun: 00
Vendor: AMCC Model: 9500S-12MI DISK Rev: 2.08
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi1 Channel: 00 Id: 02 Lun: 00
Vendor: AMCC Model: 9500S-12MI DISK Rev: 2.08
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi1 Channel: 00 Id: 03 Lun: 00
Vendor: AMCC Model: 9500S-12MI DISK Rev: 2.08
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi1 Channel: 00 Id: 04 Lun: 00
Vendor: AMCC Model: 9500S-12MI DISK Rev: 2.08
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi1 Channel: 00 Id: 05 Lun: 00
Vendor: AMCC Model: 9500S-12MI DISK Rev: 2.08
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi2 Channel: 00 Id: 00 Lun: 00
Vendor: AMCC Model: 9500S-8MI DISK Rev: 2.08
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi3 Channel: 00 Id: 00 Lun: 00
Vendor: AMCC Model: 9500S-8MI DISK Rev: 2.08
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi6 Channel: 00 Id: 00 Lun: 00
Vendor: ATA Model: ST3200822AS Rev: 3.01
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi7 Channel: 00 Id: 00 Lun: 00
Vendor: ATA Model: ST3200822AS Rev: 3.01
Type: Direct-Access ANSI SCSI revision: 05



Any advice on how to go about fixing this would be appreciated.



Kind regards,

Jesper Juhl <[email protected]>


PS. Please keep me on Cc: when replying.




2006-11-21 21:54:04

by David Chatterton

[permalink] [raw]
Subject: Re: 2.6.19-rc6 : Spontaneous reboots, stack overflows - seems to implicate xfs, scsi, networking, SMP

Jesper,

In the short term, the best workaround is to use 8K stacks. We do not see stack
overflow problems with NFS + XFS + volume managers + disk devices.

Audits have been done in the past and will again be done in the future to try to
identify areas where XFS could use less stack space by reducing/avoid large
local variables. Reducing the code path is far more difficult.

There is active discussion about reducing inlining:
http://bugzilla.kernel.org/show_bug.cgi?id=7364

I can't speak for the scsi stack usage.

Thanks for traces, I've captured this information.

Thanks,

David


Jesper Juhl wrote:
> Hi,
>
> I have a server that has long suffered from spontaneous reboots and random
> crashes.
> The problems seem to be partly SMP related since the machine is rock solid
> with a UP version of 2.6.11.11, but the same kernel compiled for SMP has
> issues.
>
> The server initially had 1 Intel Xeon CPU with HT and was very recently
> upgraded with an additional one (in the blind hope that the issues would be
> fixed). The kernel *seems* to die faster with 2 CPU's and a SMP kernel than
> it previously did with just one (HT) CPU and a SMP kernel.
>
> I've been trying newer kernels, such as 2.6.17.x, 2.6.18.x, 2.6.19-rc*
> (all SMP), hoping that the problem(s) would be fixed, but that does not seem
> to be the case.
>
> Recently I've been using netconsole and have lots of debug options enabled
> in the hope that I could capture some relevant info. Unfortunately nothing
> ever really made it to the remote log - except one little incomplete bit I
> got the other day (with 2.6.19-rc6) :
>
> do_IRQ: stack overflow: 492
>
> That is all that made it to the log, but it does indicate that the problem
> might be stack-usage related.
> Since the kernel was compiled with 4K stacks, perhaps if it was changed to
> use 8K stacks it would stay up long enough for a complete dump to reach the
> logs. But, if 8K stacks really did help, it would be nice if the dumps still
> happened at the same point where they would have with 4K stacks.
> So, I changed STACK_WARN in include/asm/thread_info.h from (THREAD_SIZE/8)
> to(4608). This way I should get stack traces at the point where the kernel
> would be in trouble with a 4K stack but since it's actually using a 8K stack
> it should survive and let me capture the trace.
>
> I got more than I could ever have hoped for. I still got spontaneous
> reboots, but this time my remote log server captured tons of stack dumps.
> I've got far too many to send here (more than 2G) and most of them are
> identical anyway, so I'll just submit a few representative samples
> initially.
>
> Most of the traces include XFS functions and some also involve scsi and/or
> networking. This is the reason I'm submitting this to the XFS & netdev lists
> in addition to LKML.
>
> All of these traces were collected with 2.6.19-rc6 with the modification
> mentioned above.
>
> Hardware details and software environment info is at the end of the email.
>
>
> This is the most often captured trace :
>
> do_IRQ: stack overflow: 4416
> [<c01039a7>] dump_trace+0x1e7/0x1fd
> [<c0103a67>] show_trace_log_lvl+0x1c/0x33
> [<c0103a90>] show_trace+0x12/0x16
> [<c0103b8f>] dump_stack+0x19/0x1d
> [<c0105133>] do_IRQ+0xaf/0xd6
> [<c01034fe>] common_interrupt+0x1a/0x20
> [<c011ee20>] __do_softirq+0x59/0xd0
> [<c011eece>] do_softirq+0x37/0x39
> [<c011ef09>] irq_exit+0x39/0x3b
> [<c01050f8>] do_IRQ+0x74/0xd6
> [<c01034fe>] common_interrupt+0x1a/0x20
> [<c02ee313>] make_request+0x320/0x426
> [<c0236eec>] generic_make_request+0x14f/0x1b7
> [<c02fbac7>] __map_bio+0x4c/0x93
> [<c02fbcfd>] __clone_and_map+0xdb/0x30a
> [<c02fbfd0>] __split_bio+0xa4/0xc7
> [<c02fc093>] dm_request+0xa0/0xbf
> [<c0236eec>] generic_make_request+0x14f/0x1b7
> [<c0236fbc>] submit_bio+0x68/0x109
> [<c02257bc>] _xfs_buf_ioapply+0x1cf/0x28d
> [<c02258a3>] xfs_buf_iorequest+0x29/0x6e
> [<c020992a>] xlog_bdstrat_cb+0x19/0x41
> [<c020a212>] xlog_sync+0x24e/0x457
> [<c020b7ce>] xlog_state_release_iclog+0x75/0xd0
> [<c020bc8d>] xlog_state_sync+0x175/0x269
> [<c0208ef6>] _xfs_log_force+0x7f/0x88
> [<c01d1786>] xfs_alloc_search_busy+0xdf/0xe1
> [<c01d0d1c>] xfs_alloc_get_freelist+0xe7/0xf5
> [<c01d2d9b>] xfs_alloc_newroot+0x21/0x34f
> [<c01d25c4>] xfs_alloc_insrec+0x3b0/0x3ce
> [<c01d3cdf>] xfs_alloc_insert+0x5a/0xc3
> [<c01d071a>] xfs_free_ag_extent+0x57f/0x5f2
> [<c01d09f9>] xfs_alloc_fix_freelist+0x220/0x45c
> [<c01d1285>] xfs_alloc_vextent+0x24e/0x47a
> [<c01dfcb9>] xfs_bmap_btalloc+0x31f/0x966
> [<c01e031e>] xfs_bmap_alloc+0x1e/0x29
> [<c01e3b92>] xfs_bmapi+0x1134/0x1545
> [<c0206a20>] xfs_iomap_write_allocate+0x2bb/0x509
> [<c020576b>] xfs_iomap+0x357/0x459
> [<c022b02f>] xfs_bmap+0x2e/0x35
> [<c0222bbb>] xfs_map_blocks+0x3c/0x70
> [<c0223b24>] xfs_page_state_convert+0x3cc/0x629
> [<c0223ddd>] xfs_vm_writepage+0x5c/0xd3
> [<c01417a4>] generic_writepages+0x1b9/0x2d5
> [<c0223e78>] xfs_vm_writepages+0x24/0x4a
> [<c01418ea>] do_writepages+0x2a/0x46
> [<c0172c03>] __sync_single_inode+0x5c/0x1de
> [<c0172e0a>] __writeback_single_inode+0x85/0x18f
> [<c01730c7>] sync_sb_inodes+0x1b3/0x2b2
> [<c0173278>] writeback_inodes+0xb2/0xbe
> [<c0141395>] background_writeout+0x66/0x9a
> [<c0141f37>] __pdflush+0xcf/0x184
> [<c014201e>] pdflush+0x32/0x36
> [<c012c918>] kthread+0xa9/0xae
> [<c010373b>] kernel_thread_helper+0x7/0x10
>
> another very common one is this one :
>
> do_IRQ: stack overflow: 4532
> [<c01039a7>] dump_trace+0x1e7/0x1fd
> [<c0103a67>] show_trace_log_lvl+0x1c/0x33
> [<c0103a90>] show_trace+0x12/0x16
> [<c0103b8f>] dump_stack+0x19/0x1d
> [<c0105133>] do_IRQ+0xaf/0xd6
> [<c01034fe>] common_interrupt+0x1a/0x20
> [<c02255a7>] xfs_buf_bio_end_io+0xd9/0x11f
> [<c017a546>] bio_endio+0x55/0x7a
> [<c02fb914>] dec_pending+0x3d/0x6b
> [<c02fb9c7>] clone_endio+0x85/0xb1
> [<c017a546>] bio_endio+0x55/0x7a
> [<c0237435>] __end_that_request_first+0x1df/0x271
> [<c02374dc>] end_that_request_chunk+0x8/0xa
> [<c02adab9>] scsi_end_request+0x25/0xcb
> [<c02adca4>] scsi_io_completion+0x82/0x301
> [<c02b6b6d>] sd_rw_intr+0x76/0x20f
> [<c02a9bf5>] scsi_finish_command+0x43/0x5e
> [<c02ae3df>] scsi_softirq_done+0x70/0xd5
> [<c0237540>] blk_done_softirq+0x62/0x6b
> [<c011ee82>] __do_softirq+0xbb/0xd0
> [<c011eece>] do_softirq+0x37/0x39
> [<c011ef09>] irq_exit+0x39/0x3b
> [<c01050f8>] do_IRQ+0x74/0xd6
> [<c01034fe>] common_interrupt+0x1a/0x20
> [<c013edf8>] mempool_alloc+0x21/0xce
> [<c01796dd>] bio_alloc_bioset+0x79/0x13f
> [<c02fbbdb>] clone_bio+0x36/0x7d
> [<c02fbcf0>] __clone_and_map+0xce/0x30a
> [<c02fbfd0>] __split_bio+0xa4/0xc7
> [<c02fc093>] dm_request+0xa0/0xbf
> [<c0236eec>] generic_make_request+0x14f/0x1b7
> [<c0236fbc>] submit_bio+0x68/0x109
> [<c02257bc>] _xfs_buf_ioapply+0x1cf/0x28d
> [<c02258a3>] xfs_buf_iorequest+0x29/0x6e
> [<c02254a4>] xfs_buf_iostart+0x6d/0x97
> [<c0224e32>] xfs_buf_read_flags+0x8a/0x8c
> [<c021768e>] xfs_trans_read_buf+0x153/0x2fc
> [<c01eb559>] xfs_btree_read_bufs+0x6e/0x84
> [<c01d27d9>] xfs_alloc_lookup+0x10a/0x39e
> [<c01d3d76>] xfs_alloc_lookup_ge+0x17/0x1a
> [<c01cf314>] xfs_alloc_ag_vextent_near+0x5f/0x957
> [<c01cf107>] xfs_alloc_ag_vextent+0x104/0x106
> [<c01d13a9>] xfs_alloc_vextent+0x372/0x47a
> [<c01dfcb9>] xfs_bmap_btalloc+0x31f/0x966
> [<c01e031e>] xfs_bmap_alloc+0x1e/0x29
> [<c01e3b92>] xfs_bmapi+0x1134/0x1545
> [<c0206a20>] xfs_iomap_write_allocate+0x2bb/0x509
> [<c020576b>] xfs_iomap+0x357/0x459
> [<c022b02f>] xfs_bmap+0x2e/0x35
> [<c0222bbb>] xfs_map_blocks+0x3c/0x70
> [<c0223b24>] xfs_page_state_convert+0x3cc/0x629
> [<c0223ddd>] xfs_vm_writepage+0x5c/0xd3
> [<c01417a4>] generic_writepages+0x1b9/0x2d5
> [<c0223e78>] xfs_vm_writepages+0x24/0x4a
> [<c01418ea>] do_writepages+0x2a/0x46
> [<c0172c03>] __sync_single_inode+0x5c/0x1de
> [<c0172e0a>] __writeback_single_inode+0x85/0x18f
> [<c01730c7>] sync_sb_inodes+0x1b3/0x2b2
> [<c0173278>] writeback_inodes+0xb2/0xbe
> [<c0141478>] wb_kupdate+0x80/0xe9
> [<c0141f37>] __pdflush+0xcf/0x184
> [<c014201e>] pdflush+0x32/0x36
> [<c012c918>] kthread+0xa9/0xae
> [<c010373b>] kernel_thread_helper+0x7/0x10
>
> This one seems to involve scsi :
>
> do_IRQ: stack overflow: 4568
> [<c01039a7>] dump_trace+0x1e7/0x1fd
> [<c0103a67>] show_trace_log_lvl+0x1c/0x33
> [<c0103a90>] show_trace+0x12/0x16
> [<c0103b8f>] dump_stack+0x19/0x1d
> [<c0105133>] do_IRQ+0xaf/0xd6
> [<c01034fe>] common_interrupt+0x1a/0x20
> [<c03872bf>] _spin_unlock_irq+0xa/0xb
> [<c0235974>] blk_run_queue+0x42/0x77
> [<c02ad99e>] scsi_run_queue+0xc9/0xf1
> [<c02ada4f>] scsi_next_command+0x33/0x49
> [<c02adb44>] scsi_end_request+0xb0/0xcb
> [<c02adca4>] scsi_io_completion+0x82/0x301
> [<c02b6b6d>] sd_rw_intr+0x76/0x20f
> [<c02a9bf5>] scsi_finish_command+0x43/0x5e
> [<c02ae3df>] scsi_softirq_done+0x70/0xd5
> [<c0237540>] blk_done_softirq+0x62/0x6b
> [<c011ee82>] __do_softirq+0xbb/0xd0
> [<c011eece>] do_softirq+0x37/0x39
> [<c011ef09>] irq_exit+0x39/0x3b
> [<c01050f8>] do_IRQ+0x74/0xd6
> [<c01034fe>] common_interrupt+0x1a/0x20
> [<c03872bf>] _spin_unlock_irq+0xa/0xb
> [<c0236eec>] generic_make_request+0x14f/0x1b7
> [<c02fbac7>] __map_bio+0x4c/0x93
> [<c02fbcfd>] __clone_and_map+0xdb/0x30a
> [<c02fbfd0>] __split_bio+0xa4/0xc7
> [<c02fc093>] dm_request+0xa0/0xbf
> [<c0236eec>] generic_make_request+0x14f/0x1b7
> [<c0236fbc>] submit_bio+0x68/0x109
> [<c02257bc>] _xfs_buf_ioapply+0x1cf/0x28d
> [<c02258a3>] xfs_buf_iorequest+0x29/0x6e
> [<c02254a4>] xfs_buf_iostart+0x6d/0x97
> [<c0224e32>] xfs_buf_read_flags+0x8a/0x8c
> [<c021768e>] xfs_trans_read_buf+0x153/0x2fc
> [<c01eb559>] xfs_btree_read_bufs+0x6e/0x84
> [<c01d27d9>] xfs_alloc_lookup+0x10a/0x39e
> [<c01d3d5c>] xfs_alloc_lookup_eq+0x14/0x17
> [<c01ceef8>] xfs_alloc_fixup_trees+0x252/0x2a9
> [<c01cff24>] xfs_alloc_ag_vextent_size+0x318/0x405
> [<c01cf0e5>] xfs_alloc_ag_vextent+0xe2/0x106
> [<c01d13a9>] xfs_alloc_vextent+0x372/0x47a
> [<c01dfcb9>] xfs_bmap_btalloc+0x31f/0x966
> [<c01e031e>] xfs_bmap_alloc+0x1e/0x29
> [<c01e3b92>] xfs_bmapi+0x1134/0x1545
> [<c0206a20>] xfs_iomap_write_allocate+0x2bb/0x509
> [<c020576b>] xfs_iomap+0x357/0x459
> [<c022b02f>] xfs_bmap+0x2e/0x35
> [<c0222bbb>] xfs_map_blocks+0x3c/0x70
> [<c0223b24>] xfs_page_state_convert+0x3cc/0x629
> [<c0223ddd>] xfs_vm_writepage+0x5c/0xd3
> [<c01417a4>] generic_writepages+0x1b9/0x2d5
> [<c0223e78>] xfs_vm_writepages+0x24/0x4a
> [<c01418ea>] do_writepages+0x2a/0x46
> [<c0172c03>] __sync_single_inode+0x5c/0x1de
> [<c0172e0a>] __writeback_single_inode+0x85/0x18f
> [<c01730c7>] sync_sb_inodes+0x1b3/0x2b2
> [<c0173278>] writeback_inodes+0xb2/0xbe
> [<c0141395>] background_writeout+0x66/0x9a
> [<c0141f37>] __pdflush+0xcf/0x184
> [<c014201e>] pdflush+0x32/0x36
> [<c012c918>] kthread+0xa9/0xae
> [<c010373b>] kernel_thread_helper+0x7/0x10
>
> And then there are some where stack space is really low, which would
> certainly have killed us if running with 4K stacks :
>
> First this :
>
> do_IRQ: stack overflow: 3376
> [<c01039a7>] dump_trace+0x1e7/0x1fd
> [<c0103a67>] show_trace_log_lvl+0x1c/0x33
> [<c0103a90>] show_trace+0x12/0x16
> [<c0103b8f>] dump_stack+0x19/0x1d
> [<c0105133>] do_IRQ+0xaf/0xd6
> [<c01034fe>] common_interrupt+0x1a/0x20
> [<c03872b2>] _spin_unlock_irqrestore+0xd/0x10
> [<c029ab47>] e1000_xmit_frame+0x269/0x3b1
> [<c0315f1f>] dev_hard_start_xmit+0x5a/0xd3
> [<c03224da>] __qdisc_run+0x95/0x1d7
> [<c03161b8>] dev_queue_xmit+0x220/0x285
> [<c03841c1>] vlan_dev_hwaccel_hard_start_xmit+0x8a/0x92
> [<c0315f1f>] dev_hard_start_xmit+0x5a/0xd3
> [<c03160f7>] dev_queue_xmit+0x15f/0x285
> [<c031b63e>] neigh_connected_output+0x93/0xba
> [<c0330869>] ip_output+0x170/0x250
> [<c0330d21>] ip_queue_xmit+0x3d8/0x4e1
> [<c03415da>] tcp_transmit_skb+0x29e/0x45d
> [<c0344511>] tcp_send_ack+0xb3/0xf4
> [<c033e343>] tcp_send_dupack+0x28/0x7f
> [<c033fa7f>] tcp_rcv_established+0x141/0x6c8
> [<c03478c9>] tcp_v4_do_rcv+0xcb/0xcd
> [<c0347f3e>] tcp_v4_rcv+0x673/0x7e3
> [<c032d351>] ip_local_deliver+0xf8/0x22d
> [<c032d6ca>] ip_rcv+0x244/0x4e4
> [<c031667f>] netif_receive_skb+0x1f9/0x26a
> [<c029bbea>] e1000_clean_rx_irq+0x17f/0x4b9
> [<c029b6f9>] e1000_clean+0x66/0xfb
> [<c0316890>] net_rx_action+0x96/0x174
> [<c011ee82>] __do_softirq+0xbb/0xd0
> [<c011eece>] do_softirq+0x37/0x39
> [<c011ef09>] irq_exit+0x39/0x3b
> [<c01050f8>] do_IRQ+0x74/0xd6
> [<c01034fe>] common_interrupt+0x1a/0x20
> [<c02fba08>] max_io_len+0x15/0x88
> [<c02fbc66>] __clone_and_map+0x44/0x30a
> [<c02fbfd0>] __split_bio+0xa4/0xc7
> [<c02fc093>] dm_request+0xa0/0xbf
> [<c0236eec>] generic_make_request+0x14f/0x1b7
> [<c0236fbc>] submit_bio+0x68/0x109
> [<c02257bc>] _xfs_buf_ioapply+0x1cf/0x28d
> [<c02258a3>] xfs_buf_iorequest+0x29/0x6e
> [<c02254a4>] xfs_buf_iostart+0x6d/0x97
> [<c0224e32>] xfs_buf_read_flags+0x8a/0x8c
> [<c021768e>] xfs_trans_read_buf+0x153/0x2fc
> [<c01eb559>] xfs_btree_read_bufs+0x6e/0x84
> [<c01d27d9>] xfs_alloc_lookup+0x10a/0x39e
> [<c01d3d76>] xfs_alloc_lookup_ge+0x17/0x1a
> [<c01cf314>] xfs_alloc_ag_vextent_near+0x5f/0x957
> [<c01cf107>] xfs_alloc_ag_vextent+0x104/0x106
> [<c01d13a9>] xfs_alloc_vextent+0x372/0x47a
> [<c01dfcb9>] xfs_bmap_btalloc+0x31f/0x966
> [<c01e031e>] xfs_bmap_alloc+0x1e/0x29
> [<c01e3b92>] xfs_bmapi+0x1134/0x1545
> [<c0206a20>] xfs_iomap_write_allocate+0x2bb/0x509
> [<c020576b>] xfs_iomap+0x357/0x459
> [<c022b02f>] xfs_bmap+0x2e/0x35
> [<c0222bbb>] xfs_map_blocks+0x3c/0x70
> [<c0223b24>] xfs_page_state_convert+0x3cc/0x629
> [<c0223ddd>] xfs_vm_writepage+0x5c/0xd3
> [<c01417a4>] generic_writepages+0x1b9/0x2d5
> [<c0223e78>] xfs_vm_writepages+0x24/0x4a
> [<c01418ea>] do_writepages+0x2a/0x46
> [<c0172c03>] __sync_single_inode+0x5c/0x1de
> [<c0172e0a>] __writeback_single_inode+0x85/0x18f
> [<c01730c7>] sync_sb_inodes+0x1b3/0x2b2
> [<c0173278>] writeback_inodes+0xb2/0xbe
> [<c014117f>] balance_dirty_pages+0xa6/0x15c
> [<c01412ca>] balance_dirty_pages_ratelimited_nr+0x59/0x5b
> [<c013dc7f>] generic_file_buffered_write+0x2ef/0x61f
> [<c022ae09>] xfs_write+0x96f/0xb1c
> [<c0226774>] xfs_file_aio_write+0x78/0x8a
> [<c01586d9>] do_sync_write+0xc1/0x100
> [<c01587a9>] vfs_write+0x91/0x137
> [<c01588fb>] sys_write+0x41/0x6b
> [<c0102b93>] syscall_call+0x7/0xb
> [<b7f6b95e>] 0xb7f6b95e
>
> and this :
>
> e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
> Tx Queue <0>
> TDH <d49>
> TDT <d49>
> next_to_use <d49>
> next_to_clean <d8e>
> buffer_info[next_to_clean]
> time_stamp <aaa1fc>
> next_to_watch <d8e>
> jiffies <aaae6f>
> next_to_watch.status <1>
> do_IRQ: stack overflow: 3836
> [<c01039a7>] dump_trace+0x1e7/0x1fd
> [<c0103a67>] show_trace_log_lvl+0x1c/0x33
> [<c0103a90>] show_trace+0x12/0x16
> [<c0103b8f>] dump_stack+0x19/0x1d
> [<c0105133>] do_IRQ+0xaf/0xd6
> [<c01034fe>] common_interrupt+0x1a/0x20
> [<c02443d0>] csum_partial+0xb8/0x120
> DWARF2 unwinder stuck at csum_partial+0xb8/0x120
> Leftover inexact backtrace:
> [<c0313833>] __skb_checksum_complete+0x20/0x67
> [<c035c397>] nf_ip_checksum+0xe0/0x125
> [<c035fb91>] udp_error+0x105/0x184
> [<c035da4c>] ip_conntrack_in+0x7d/0x294
> [<c0326add>] nf_iterate+0x62/0x7c
> [<c0326b4f>] nf_hook_slow+0x58/0xbf
> [<c032d892>] ip_rcv+0x40c/0x4e4
> [<c031667f>] netif_receive_skb+0x1f9/0x26a
> [<c029bbea>] e1000_clean_rx_irq+0x17f/0x4b9
> [<c029b6f9>] e1000_clean+0x66/0xfb
> [<c0316890>] net_rx_action+0x96/0x174
> [<c011ee82>] __do_softirq+0xbb/0xd0
> [<c011eece>] do_softirq+0x37/0x39
> [<c011ef09>] irq_exit+0x39/0x3b
> [<c01050f8>] do_IRQ+0x74/0xd6
> [<c01034fe>] common_interrupt+0x1a/0x20
> [<c02fbc66>] __clone_and_map+0x44/0x30a
> [<c02fbfd0>] __split_bio+0xa4/0xc7
> [<c02fc093>] dm_request+0xa0/0xbf
> [<c0236eec>] generic_make_request+0x14f/0x1b7
> [<c0236fbc>] submit_bio+0x68/0x109
> [<c02257bc>] _xfs_buf_ioapply+0x1cf/0x28d
> [<c02258a3>] xfs_buf_iorequest+0x29/0x6e
> [<c02254a4>] xfs_buf_iostart+0x6d/0x97
> [<c0224e32>] xfs_buf_read_flags+0x8a/0x8c
> [<c021768e>] xfs_trans_read_buf+0x153/0x2fc
> [<c01eb559>] xfs_btree_read_bufs+0x6e/0x84
> [<c01d27d9>] xfs_alloc_lookup+0x10a/0x39e
> [<c01d3d76>] xfs_alloc_lookup_ge+0x17/0x1a
> [<c01cf314>] xfs_alloc_ag_vextent_near+0x5f/0x957
> [<c01cf107>] xfs_alloc_ag_vextent+0x104/0x106
> [<c01d13a9>] xfs_alloc_vextent+0x372/0x47a
> [<c01dfcb9>] xfs_bmap_btalloc+0x31f/0x966
> [<c01e031e>] xfs_bmap_alloc+0x1e/0x29
> [<c01e3b92>] xfs_bmapi+0x1134/0x1545
> [<c0206a20>] xfs_iomap_write_allocate+0x2bb/0x509
> [<c020576b>] xfs_iomap+0x357/0x459
> [<c022b02f>] xfs_bmap+0x2e/0x35
> [<c0222bbb>] xfs_map_blocks+0x3c/0x70
> [<c0223b24>] xfs_page_state_convert+0x3cc/0x629
> [<c0223ddd>] xfs_vm_writepage+0x5c/0xd3
> [<c01417a4>] generic_writepages+0x1b9/0x2d5
> [<c0223e78>] xfs_vm_writepages+0x24/0x4a
> [<c01418ea>] do_writepages+0x2a/0x46
> [<c0172c03>] __sync_single_inode+0x5c/0x1de
> [<c0172e0a>] __writeback_single_inode+0x85/0x18f
> [<c01730c7>] sync_sb_inodes+0x1b3/0x2b2
> [<c0173278>] writeback_inodes+0xb2/0xbe
> [<c014117f>] balance_dirty_pages+0xa6/0x15c
> [<c01412ca>] balance_dirty_pages_ratelimited_nr+0x59/0x5b
> [<c013dc7f>] generic_file_buffered_write+0x2ef/0x61f
> [<c022ae09>] xfs_write+0x96f/0xb1c
> [<c0226774>] xfs_file_aio_write+0x78/0x8a
> [<c01586d9>] do_sync_write+0xc1/0x100
> [<c01587a9>] vfs_write+0x91/0x137
> [<c01588fb>] sys_write+0x41/0x6b
> [<c0102b93>] syscall_call+0x7/0xb
>
> and finally this one :
>
> do_IRQ: stack overflow: 3916
> [<c01039a7>] dump_trace+0x1e7/0x1fd
> [<c0103a67>] show_trace_log_lvl+0x1c/0x33
> [<c0103a90>] show_trace+0x12/0x16
> [<c0103b8f>] dump_stack+0x19/0x1d
> [<c0105133>] do_IRQ+0xaf/0xd6
> [<c01034fe>] common_interrupt+0x1a/0x20
> [<c0342148>] tcp_init_tso_segs+0x17/0x4c
> [<c0342ac2>] tcp_write_xmit+0x5d/0x266
> [<c0342cf4>] __tcp_push_pending_frames+0x29/0x81
> [<c033fb46>] tcp_rcv_established+0x208/0x6c8
> [<c03478c9>] tcp_v4_do_rcv+0xcb/0xcd
> [<c0347f3e>] tcp_v4_rcv+0x673/0x7e3
> [<c032d351>] ip_local_deliver+0xf8/0x22d
> [<c032d6ca>] ip_rcv+0x244/0x4e4
> [<c031667f>] netif_receive_skb+0x1f9/0x26a
> [<c029bbea>] e1000_clean_rx_irq+0x17f/0x4b9
> [<c029b6f9>] e1000_clean+0x66/0xfb
> [<c0316890>] net_rx_action+0x96/0x174
> [<c011ee82>] __do_softirq+0xbb/0xd0
> [<c011eece>] do_softirq+0x37/0x39
> [<c011ef09>] irq_exit+0x39/0x3b
> [<c01050f8>] do_IRQ+0x74/0xd6
> [<c01034fe>] common_interrupt+0x1a/0x20
> [<c02fba08>] max_io_len+0x15/0x88
> [<c02fbc66>] __clone_and_map+0x44/0x30a
> [<c02fbfd0>] __split_bio+0xa4/0xc7
> [<c02fc093>] dm_request+0xa0/0xbf
> [<c0236eec>] generic_make_request+0x14f/0x1b7
> [<c0236fbc>] submit_bio+0x68/0x109
> [<c02257bc>] _xfs_buf_ioapply+0x1cf/0x28d
> [<c02258a3>] xfs_buf_iorequest+0x29/0x6e
> [<c02254a4>] xfs_buf_iostart+0x6d/0x97
> [<c0224e32>] xfs_buf_read_flags+0x8a/0x8c
> [<c021768e>] xfs_trans_read_buf+0x153/0x2fc
> [<c01eb559>] xfs_btree_read_bufs+0x6e/0x84
> [<c01d27d9>] xfs_alloc_lookup+0x10a/0x39e
> [<c01d3d76>] xfs_alloc_lookup_ge+0x17/0x1a
> [<c01cf314>] xfs_alloc_ag_vextent_near+0x5f/0x957
> [<c01cf107>] xfs_alloc_ag_vextent+0x104/0x106
> [<c01d13a9>] xfs_alloc_vextent+0x372/0x47a
> [<c01dfcb9>] xfs_bmap_btalloc+0x31f/0x966
> [<c01e031e>] xfs_bmap_alloc+0x1e/0x29
> [<c01e3b92>] xfs_bmapi+0x1134/0x1545
> [<c0206a20>] xfs_iomap_write_allocate+0x2bb/0x509
> [<c020576b>] xfs_iomap+0x357/0x459
> [<c022b02f>] xfs_bmap+0x2e/0x35
> [<c0222bbb>] xfs_map_blocks+0x3c/0x70
> [<c0223b24>] xfs_page_state_convert+0x3cc/0x629
> [<c0223ddd>] xfs_vm_writepage+0x5c/0xd3
> [<c01417a4>] generic_writepages+0x1b9/0x2d5
> [<c0223e78>] xfs_vm_writepages+0x24/0x4a
> [<c01418ea>] do_writepages+0x2a/0x46
> [<c0172c03>] __sync_single_inode+0x5c/0x1de
> [<c0172e0a>] __writeback_single_inode+0x85/0x18f
> [<c01730c7>] sync_sb_inodes+0x1b3/0x2b2
> [<c0173278>] writeback_inodes+0xb2/0xbe
> [<c014117f>] balance_dirty_pages+0xa6/0x15c
> [<c01412ca>] balance_dirty_pages_ratelimited_nr+0x59/0x5b
> [<c013dc7f>] generic_file_buffered_write+0x2ef/0x61f
> [<c022ae09>] xfs_write+0x96f/0xb1c
> [<c0226774>] xfs_file_aio_write+0x78/0x8a
> [<c01586d9>] do_sync_write+0xc1/0x100
> [<c01587a9>] vfs_write+0x91/0x137
> [<c01588fb>] sys_write+0x41/0x6b
> [<c0102b93>] syscall_call+0x7/0xb
> [<b7f6b95e>] 0xb7f6b95e
>
> And there are lots of other ones as well that differ slightly from the ones
> above.
>
>
> Some hardware/software details :
>
> # scripts/ver_linux
> If some fields are empty or look unusual you may have an old version.
> Compare to the current minimal requirements in Documentation/Changes.
>
> Linux server.mydomain.net 2.6.19-rc6 #1 SMP Mon Nov 20 14:33:26 CET 2006 i686
> GNU/Linux
>
> Gnu C 3.3.5
> Gnu make 3.80
> binutils 2.15
> util-linux 2.12p
> mount 2.12p
> module-init-tools 3.2-pre1
> e2fsprogs 1.37
> xfsprogs 2.6.20
> nfs-utils 1.0.6
> Linux C Library 2.3.2
> Dynamic linker (ldd) 2.3.2
> Procps 3.2.1
> Net-tools 1.60
> Console-tools 0.2.3
> Sh-utils 5.2.1
> udev 056
> Modules Loaded sky2 piix ide_core eeprom
>
> # lspci -vvx
> 0000:00:00.0 Host bridge: Intel Corp. Server Memory Controller Hub (rev 0c)
> Subsystem: Intel Corp.: Unknown device 3439
> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
> Stepping- SERR+ FastB2B-
> Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 0
> Capabilities: [40] #09 [4105]
> 00: 86 80 90 35 46 01 90 00 0c 00 00 06 00 00 80 00
> 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 39 34
> 30: 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00
>
> 0000:00:00.1 ff00: Intel Corp. Memory Controller Hub Error Reporting Register
> (rev 0c)
> Subsystem: Intel Corp.: Unknown device 3439
> Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR+ FastB2B-
> Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> 00: 86 80 91 35 00 01 00 00 0c 00 00 ff 00 00 00 00
> 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 39 34
> 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 0000:00:01.0 System peripheral: Intel Corp. Memory Controller Hub DMA
> Controller (rev 0c)
> Subsystem: Intel Corp.: Unknown device 3439
> Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR+ FastB2B-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Interrupt: pin A routed to IRQ 10
> Region 0: Memory at fcbff000 (32-bit, non-prefetchable) [size=4K]
> Capabilities: [b0] Message Signalled Interrupts: 64bit- Queue=0/1
> Enable-
> Address: fee00000 Data: 0000
> 00: 86 80 94 35 02 01 10 00 0c 00 80 08 00 00 00 00
> 10: 00 f0 bf fc 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 39 34
> 30: 00 00 00 00 b0 00 00 00 00 00 00 00 0a 01 00 00
>
> 0000:00:02.0 PCI bridge: Intel Corp. Memory Controller Hub PCI Express Port A0
> (rev 0c) (prog-if 00 [Normal decode])
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
> Stepping- SERR+ FastB2B-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 0, Cache Line Size: 0x10 (64 bytes)
> Bus: primary=00, secondary=01, subordinate=03, sec-latency=0
> I/O behind bridge: 0000b000-0000cfff
> Memory behind bridge: fcc00000-fcefffff
> Prefetchable memory behind bridge: 00000000fa000000-00000000fb700000
> BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
> Capabilities: [50] Power Management version 2
> Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
> PME(D0+,D1-,D2-,D3hot+,D3cold+)
> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [58] Message Signalled Interrupts: 64bit- Queue=0/1
> Enable-
> Address: fee00000 Data: 0000
> Capabilities: [64] #10 [0041]
> 00: 86 80 95 35 47 01 10 00 0c 00 04 06 10 00 01 00
> 10: 00 00 00 00 00 00 00 00 00 01 03 00 b0 c0 00 00
> 20: c0 fc e0 fc 01 fa 71 fb 00 00 00 00 00 00 00 00
> 30: 00 00 00 00 50 00 00 00 00 00 00 00 0a 01 06 00
>
> 0000:00:04.0 PCI bridge: Intel Corp. Memory Controller Hub PCI Express Port B0
> (rev 0c) (prog-if 00 [Normal decode])
> Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
> Stepping- SERR+ FastB2B-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 0, Cache Line Size: 0x10 (64 bytes)
> Bus: primary=00, secondary=04, subordinate=04, sec-latency=0
> BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
> Capabilities: [50] Power Management version 2
> Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
> PME(D0+,D1-,D2-,D3hot+,D3cold+)
> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [58] Message Signalled Interrupts: 64bit- Queue=0/1
> Enable-
> Address: fee00000 Data: 0000
> Capabilities: [64] #10 [0041]
> 00: 86 80 97 35 44 01 10 00 0c 00 04 06 10 00 01 00
> 10: 00 00 00 00 00 00 00 00 00 04 04 00 f0 00 00 20
> 20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
> 30: 00 00 00 00 50 00 00 00 00 00 00 00 0a 01 06 00
>
> 0000:00:05.0 PCI bridge: Intel Corp. Memory Controller Hub PCI Express Port B1
> (rev 0c) (prog-if 00 [Normal decode])
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
> Stepping- SERR+ FastB2B-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 0, Cache Line Size: 0x10 (64 bytes)
> Bus: primary=00, secondary=05, subordinate=05, sec-latency=0
> I/O behind bridge: 0000d000-0000dfff
> Memory behind bridge: fcf00000-fcffffff
> BridgeCtl: Parity+ SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
> Capabilities: [50] Power Management version 2
> Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
> PME(D0+,D1-,D2-,D3hot+,D3cold+)
> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [58] Message Signalled Interrupts: 64bit- Queue=0/1
> Enable-
> Address: fee00000 Data: 0000
> Capabilities: [64] #10 [0041]
> 00: 86 80 98 35 47 01 18 00 0c 00 04 06 10 00 01 00
> 10: 00 00 00 00 00 00 00 00 00 05 05 00 d0 d0 00 00
> 20: f0 fc f0 fc f1 ff 01 00 00 00 00 00 00 00 00 00
> 30: 00 00 00 00 50 00 00 00 00 00 00 00 0a 01 07 00
>
> 0000:00:06.0 PCI bridge: Intel Corp. Memory Controller Hub PCI Express Port C0
> (rev 0c) (prog-if 00 [Normal decode])
> Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
> Stepping- SERR+ FastB2B-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 0, Cache Line Size: 0x10 (64 bytes)
> Bus: primary=00, secondary=06, subordinate=06, sec-latency=0
> BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
> Capabilities: [50] Power Management version 2
> Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
> PME(D0+,D1-,D2-,D3hot+,D3cold+)
> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [58] Message Signalled Interrupts: 64bit- Queue=0/1
> Enable-
> Address: fee00000 Data: 0000
> Capabilities: [64] #10 [0041]
> 00: 86 80 99 35 44 01 10 00 0c 00 04 06 10 00 01 00
> 10: 00 00 00 00 00 00 00 00 00 06 06 00 f0 00 00 20
> 20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
> 30: 00 00 00 00 50 00 00 00 00 00 00 00 0a 01 06 00
>
> 0000:00:1d.0 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB UHCI #1
> (rev 02) (prog-if 00 [UHCI])
> Subsystem: Intel Corp.: Unknown device 3439
> Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B-
> Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 0
> Interrupt: pin A routed to IRQ 10
> Region 4: I/O ports at a880 [size=32]
> 00: 86 80 d2 24 05 00 80 02 02 00 03 0c 00 00 80 00
> 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 81 a8 00 00 00 00 00 00 00 00 00 00 86 80 39 34
> 30: 00 00 00 00 00 00 00 00 00 00 00 00 0a 01 00 00
>
> 0000:00:1d.1 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB UHCI #2
> (rev 02) (prog-if 00 [UHCI])
> Subsystem: Intel Corp.: Unknown device 3439
> Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B-
> Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 0
> Interrupt: pin B routed to IRQ 7
> Region 4: I/O ports at ac00 [size=32]
> 00: 86 80 d4 24 05 00 80 02 02 00 03 0c 00 00 00 00
> 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 01 ac 00 00 00 00 00 00 00 00 00 00 86 80 39 34
> 30: 00 00 00 00 00 00 00 00 00 00 00 00 07 02 00 00
>
> 0000:00:1d.2 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB UHCI #3
> (rev 02) (prog-if 00 [UHCI])
> Subsystem: Intel Corp.: Unknown device 3439
> Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B-
> Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 0
> Interrupt: pin C routed to IRQ 15
> Region 4: I/O ports at ac80 [size=32]
> 00: 86 80 d7 24 05 00 80 02 02 00 03 0c 00 00 00 00
> 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 81 ac 00 00 00 00 00 00 00 00 00 00 86 80 39 34
> 30: 00 00 00 00 00 00 00 00 00 00 00 00 0f 03 00 00
>
> 0000:00:1d.7 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB2 EHCI
> Controller (rev 02) (prog-if 20 [EHCI])
> Subsystem: Intel Corp.: Unknown device 3439
> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR+ FastB2B-
> Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 0
> Interrupt: pin D routed to IRQ 21
> Region 0: Memory at fcbfec00 (32-bit, non-prefetchable) [size=1K]
> Capabilities: [50] Power Management version 2
> Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA
> PME(D0+,D1-,D2-,D3hot+,D3cold+)
> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [58] #0a [20a0]
> 00: 86 80 dd 24 06 01 90 02 02 20 03 0c 00 00 00 00
> 10: 00 ec bf fc 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 39 34
> 30: 00 00 00 00 50 00 00 00 00 00 00 00 05 04 00 00
>
> 0000:00:1e.0 PCI bridge: Intel Corp. 82801 PCI Bridge (rev c2) (prog-if 00
> [Normal decode])
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
> Stepping- SERR+ FastB2B-
> Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 0
> Bus: primary=00, secondary=07, subordinate=07, sec-latency=32
> I/O behind bridge: 0000e000-0000efff
> Memory behind bridge: fd000000-febfffff
> Prefetchable memory behind bridge: fb800000-fbffffff
> BridgeCtl: Parity+ SERR+ NoISA- VGA+ MAbort- >Reset- FastB2B-
> 00: 86 80 4e 24 47 01 80 00 c2 00 04 06 00 00 01 00
> 10: 00 00 00 00 00 00 00 00 00 07 07 20 e0 e0 80 02
> 20: 00 fd b0 fe 80 fb f0 fb 00 00 00 00 00 00 00 00
> 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0b 00
>
> 0000:00:1f.0 ISA bridge: Intel Corp. 82801EB/ER (ICH5/ICH5R) LPC Bridge (rev
> 02)
> Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr+
> Stepping- SERR+ FastB2B-
> Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 0
> 00: 86 80 d0 24 4f 01 80 02 02 00 01 06 00 00 80 00
> 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 0000:00:1f.1 IDE interface: Intel Corp. 82801EB/ER (ICH5/ICH5R) Ultra ATA 100
> Storage Controller (rev 02) (prog-if 8a [Master SecP PriP])
> Subsystem: Intel Corp.: Unknown device 3439
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B-
> Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 0
> Interrupt: pin A routed to IRQ 20
> Region 0: I/O ports at <unassigned>
> Region 1: I/O ports at <unassigned>
> Region 2: I/O ports at <unassigned>
> Region 3: I/O ports at <unassigned>
> Region 4: I/O ports at fc00 [size=16]
> Region 5: Memory at 88000000 (32-bit, non-prefetchable) [size=1K]
> 00: 86 80 db 24 07 00 88 02 02 8a 01 01 00 00 00 00
> 10: 01 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00
> 20: 01 fc 00 00 00 00 00 88 00 00 00 00 86 80 39 34
> 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00
>
> 0000:00:1f.2 IDE interface: Intel Corp. 82801EB (ICH5) Serial ATA 150 Storage
> Controller (rev 02) (prog-if 8f [Master SecP SecO PriP PriO])
> Subsystem: Intel Corp.: Unknown device 3460
> Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
> Stepping- SERR- FastB2B-
> Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 0
> Interrupt: pin A routed to IRQ 20
> Region 0: I/O ports at a800 [size=8]
> Region 1: I/O ports at a480 [size=4]
> Region 2: I/O ports at a400 [size=8]
> Region 3: I/O ports at a080 [size=4]
> Region 4: I/O ports at a000 [size=16]
> 00: 86 80 d1 24 45 00 a0 02 02 8f 01 01 00 00 00 00
> 10: 01 a8 00 00 81 a4 00 00 01 a4 00 00 81 a0 00 00
> 20: 01 a0 00 00 00 00 00 00 00 00 00 00 86 80 60 34
> 30: 00 00 00 00 00 00 00 00 00 00 00 00 0f 01 00 00
>
> 0000:00:1f.3 SMBus: Intel Corp. 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev
> 02)
> Subsystem: Intel Corp.: Unknown device 3439
> Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B-
> Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Interrupt: pin B routed to IRQ 22
> Region 4: I/O ports at 0540 [size=32]
> 00: 86 80 d3 24 01 00 80 02 02 00 05 0c 00 00 00 00
> 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 41 05 00 00 00 00 00 00 00 00 00 00 86 80 39 34
> 30: 00 00 00 00 00 00 00 00 00 00 00 00 0b 02 00 00
>
> 0000:01:00.0 PCI bridge: Intel Corp. PCI Bridge Hub A (rev 09) (prog-if 00
> [Normal decode])
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
> Stepping- SERR+ FastB2B-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 0, Cache Line Size: 0x10 (64 bytes)
> Bus: primary=01, secondary=02, subordinate=02, sec-latency=48
> I/O behind bridge: 0000b000-0000bfff
> Memory behind bridge: fcd00000-fcdfffff
> Prefetchable memory behind bridge: 00000000fa000000-00000000faf00000
> BridgeCtl: Parity+ SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
> Capabilities: [44] #10 [0071]
> Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/0
> Enable-
> Address: 0000000000000000 Data: 0000
> Capabilities: [6c] Power Management version 2
> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> PME(D0+,D1-,D2-,D3hot+,D3cold+)
> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [d8] 00: 86 80 29 03 47 01 10 00 09 00 04 06 10 00 81 00
> 10: 00 00 00 00 00 00 00 00 01 02 02 30 b0 b0 a0 02
> 20: d0 fc d0 fc 01 fa f1 fa 00 00 00 00 00 00 00 00
> 30: 00 00 00 00 44 00 00 00 00 00 00 00 00 00 07 00
>
> 0000:01:00.1 PIC: Intel Corp. PCI Bridge Hub I/OxAPIC Interrupt Controller A
> (rev 09) (prog-if 20 [IO(X)-APIC])
> Subsystem: Intel Corp.: Unknown device 3439
> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
> Stepping- SERR+ FastB2B-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 0
> Region 0: Memory at fccfe000 (32-bit, non-prefetchable) [size=4K]
> Capabilities: [44] #10 [0001]
> Capabilities: [6c] Power Management version 2
> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> 00: 86 80 26 03 46 01 10 00 09 20 00 08 00 00 80 00
> 10: 00 e0 cf fc 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 39 34
> 30: 00 00 00 00 44 00 00 00 00 00 00 00 00 00 00 00
>
> 0000:01:00.2 PCI bridge: Intel Corp. PCI Bridge Hub B (rev 09) (prog-if 00
> [Normal decode])
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
> Stepping- SERR+ FastB2B-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 0, Cache Line Size: 0x10 (64 bytes)
> Bus: primary=01, secondary=03, subordinate=03, sec-latency=48
> I/O behind bridge: 0000c000-0000cfff
> Memory behind bridge: fce00000-fcefffff
> Prefetchable memory behind bridge: 00000000fb000000-00000000fb700000
> BridgeCtl: Parity+ SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
> Capabilities: [44] #10 [0071]
> Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/0
> Enable-
> Address: 0000000000000000 Data: 0000
> Capabilities: [6c] Power Management version 2
> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> PME(D0+,D1-,D2-,D3hot+,D3cold+)
> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [d8] 00: 86 80 2a 03 47 01 10 00 09 00 04 06 10 00 81 00
> 10: 00 00 00 00 00 00 00 00 01 03 03 30 c0 c0 a0 02
> 20: e0 fc e0 fc 01 fb 71 fb 00 00 00 00 00 00 00 00
> 30: 00 00 00 00 44 00 00 00 00 00 00 00 00 00 07 00
>
> 0000:01:00.3 PIC: Intel Corp. PCI Bridge Hub I/OxAPIC Interrupt Controller B
> (rev 09) (prog-if 20 [IO(X)-APIC])
> Subsystem: Intel Corp.: Unknown device 3439
> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
> Stepping- SERR+ FastB2B-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 0
> Region 0: Memory at fccff000 (32-bit, non-prefetchable) [size=4K]
> Capabilities: [44] #10 [0001]
> Capabilities: [6c] Power Management version 2
> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> 00: 86 80 27 03 46 01 10 00 09 20 00 08 00 00 80 00
> 10: 00 f0 cf fc 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 39 34
> 30: 00 00 00 00 44 00 00 00 00 00 00 00 00 00 00 00
>
> 0000:02:02.0 RAID bus controller: 3ware Inc 3ware ATA-RAID
> Subsystem: 3ware Inc 3ware ATA-RAID
> Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV+ VGASnoop- ParErr+
> Stepping- SERR+ FastB2B-
> Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 32 (2250ns min), Cache Line Size: 0x10 (64 bytes)
> Interrupt: pin A routed to IRQ 17
> Region 0: I/O ports at bc00 [size=256]
> Region 1: Memory at fcdffc00 (64-bit, non-prefetchable) [size=256]
> Region 3: Memory at fa800000 (64-bit, prefetchable) [size=8M]
> Expansion ROM at fcde0000 [disabled] [size=64K]
> Capabilities: [48] Power Management version 2
> Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA
> PME(D0+,D1+,D2-,D3hot+,D3cold-)
> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> 00: c1 13 02 10 5f 01 30 02 00 00 04 01 10 20 00 00
> 10: 01 bc 00 00 04 fc df fc 00 00 00 00 0c 00 80 fa
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 c1 13 02 10
> 30: 00 00 de fc 48 00 00 00 00 00 00 00 0a 01 09 00
>
> 0000:02:03.0 RAID bus controller: 3ware Inc 3ware ATA-RAID
> Subsystem: 3ware Inc 3ware ATA-RAID
> Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV+ VGASnoop- ParErr+
> Stepping- SERR+ FastB2B-
> Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 32 (2250ns min), Cache Line Size: 0x10 (64 bytes)
> Interrupt: pin A routed to IRQ 18
> Region 0: I/O ports at b800 [size=256]
> Region 1: Memory at fcdff800 (64-bit, non-prefetchable) [size=256]
> Region 3: Memory at fa000000 (64-bit, prefetchable) [size=8M]
> Expansion ROM at fcdd0000 [disabled] [size=64K]
> Capabilities: [40] PCI-X non-bridge device.
> Command: DPERE- ERO- RBC=0 OST=0
> Status: Bus=255 Dev=31 Func=0 64bit+ 133MHz+ SCD- USC-,
> DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-
> Capabilities: [48] Power Management version 2
> Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA
> PME(D0+,D1+,D2-,D3hot+,D3cold-)
> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> 00: c1 13 02 10 5f 01 30 02 00 00 04 01 10 20 00 00
> 10: 01 b8 00 00 04 f8 df fc 00 00 00 00 0c 00 00 fa
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 c1 13 02 10
> 30: 00 00 dd fc 40 00 00 00 00 00 00 00 0a 01 09 00
>
> 0000:03:01.0 RAID bus controller: 3ware Inc 3ware ATA-RAID
> Subsystem: 3ware Inc 3ware ATA-RAID
> Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV+ VGASnoop- ParErr+
> Stepping- SERR+ FastB2B-
> Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 32 (2250ns min), Cache Line Size: 0x10 (64 bytes)
> Interrupt: pin A routed to IRQ 19
> Region 0: I/O ports at cc00 [size=256]
> Region 1: Memory at fceffc00 (64-bit, non-prefetchable) [size=256]
> Region 3: Memory at fb000000 (64-bit, prefetchable) [size=8M]
> Expansion ROM at fcee0000 [disabled] [size=64K]
> Capabilities: [48] Power Management version 2
> Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA
> PME(D0+,D1+,D2-,D3hot+,D3cold-)
> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> 00: c1 13 02 10 5f 01 30 02 00 00 04 01 10 20 00 00
> 10: 01 cc 00 00 04 fc ef fc 00 00 00 00 0c 00 00 fb
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 c1 13 02 10
> 30: 00 00 ee fc 48 00 00 00 00 00 00 00 0a 01 09 00
>
> 0000:05:00.0 Ethernet controller: Marvell Technology Group Ltd.: Unknown
> device 4361 (rev 18)
> Subsystem: Intel Corp.: Unknown device 3439
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
> Stepping- SERR+ FastB2B-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 0, Cache Line Size: 0x10 (64 bytes)
> Interrupt: pin A routed to IRQ 223
> Region 0: Memory at fcffc000 (64-bit, non-prefetchable) [size=16K]
> Region 2: I/O ports at dc00 [size=256]
> Expansion ROM at fcfc0000 [disabled] [size=128K]
> Capabilities: [48] Power Management version 2
> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> Status: D0 PME-Enable- DSel=0 DScale=1 PME-
> Capabilities: [50] Vital Product Data
> Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/1
> Enable+
> Address: 00000000fee0f00c Data: 41e1
> Capabilities: [e0] #10 [0011]
> 00: ab 11 61 43 47 05 10 00 18 00 00 02 10 00 00 00
> 10: 04 c0 ff fc 00 00 00 00 01 dc 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 39 34
> 30: 00 00 fc fc 48 00 00 00 00 00 00 00 0a 01 00 00
>
> 0000:07:04.0 Ethernet controller: Intel Corp. 82541GI/PI Gigabit Ethernet
> Controller (rev 05)
> Subsystem: Intel Corp.: Unknown device 3439
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+
> Stepping- SERR+ FastB2B-
> Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 32 (63750ns min), Cache Line Size: 0x10 (64 bytes)
> Interrupt: pin A routed to IRQ 16
> Region 0: Memory at febe0000 (32-bit, non-prefetchable) [size=128K]
> Region 2: I/O ports at ec80 [size=64]
> Capabilities: [dc] Power Management version 2
> Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
> PME(D0+,D1-,D2-,D3hot+,D3cold+)
> Status: D0 PME-Enable- DSel=0 DScale=1 PME-
> Capabilities: [e4] PCI-X non-bridge device.
> Command: DPERE- ERO+ RBC=0 OST=0
> Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-,
> DC=simple, DMMRBC=2, DMOST=0, DMCRS=0, RSCEM-
> 00: 86 80 76 10 57 01 30 02 05 00 00 02 10 20 00 00
> 10: 00 00 be fe 00 00 00 00 81 ec 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 39 34
> 30: 00 00 00 00 dc 00 00 00 00 00 00 00 0a 01 ff 00
>
> 0000:07:06.0 RAID bus controller: 3ware Inc 3ware ATA-RAID
> Subsystem: 3ware Inc 3ware ATA-RAID
> Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV+ VGASnoop- ParErr+
> Stepping- SERR+ FastB2B-
> Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 32 (2250ns min), Cache Line Size: 0x10 (64 bytes)
> Interrupt: pin A routed to IRQ 20
> Region 0: I/O ports at e800 [size=256]
> Region 1: Memory at febdbc00 (64-bit, non-prefetchable) [size=256]
> Region 3: Memory at fb800000 (64-bit, prefetchable) [size=8M]
> Expansion ROM at fe020000 [disabled] [size=64K]
> Capabilities: [48] Power Management version 2
> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
> PME(D0+,D1+,D2+,D3hot+,D3cold-)
> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> 00: c1 13 02 10 5f 01 30 02 00 00 04 01 10 20 00 00
> 10: 01 e8 00 00 04 bc bd fe 00 00 00 00 0c 00 80 fb
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 c1 13 02 10
> 30: 00 00 bc fe 48 00 00 00 00 00 00 00 0f 01 09 00
>
> 0000:07:0c.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
> (prog-if 00 [VGA])
> Subsystem: Intel Corp.: Unknown device 3439
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping+ SERR- FastB2B-
> Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 32 (2000ns min), Cache Line Size: 0x10 (64 bytes)
> Interrupt: pin A routed to IRQ 11
> Region 0: Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
> Region 1: I/O ports at e400 [size=256]
> Region 2: Memory at febda000 (32-bit, non-prefetchable) [size=4K]
> Expansion ROM at fe000000 [disabled] [size=128K]
> Capabilities: [5c] Power Management version 2
> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> 00: 02 10 52 47 87 00 90 02 27 00 00 03 10 20 00 00
> 10: 00 00 00 fd 01 e4 00 00 00 a0 bd fe 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 39 34
> 30: 00 00 ba fe 5c 00 00 00 00 00 00 00 0b 01 08 00
>
> # cat /proc/cpuinfo
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 15
> model : 4
> model name : Intel(R) Xeon(TM) CPU 3.20GHz
> stepping : 10
> cpu MHz : 3192.358
> cache size : 2048 KB
> physical id : 0
> siblings : 2
> core id : 0
> cpu cores : 1
> fdiv_bug : no
> hlt_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 5
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
> constant_tsc pni monitor ds_cpl cid cx16 xtpr lahf_lm
> bogomips : 6388.75
>
> processor : 1
> vendor_id : GenuineIntel
> cpu family : 15
> model : 4
> model name : Intel(R) Xeon(TM) CPU 3.20GHz
> stepping : 10
> cpu MHz : 3192.358
> cache size : 2048 KB
> physical id : 0
> siblings : 2
> core id : 0
> cpu cores : 1
> fdiv_bug : no
> hlt_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 5
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
> constant_tsc pni monitor ds_cpl cid cx16 xtpr lahf_lm
> bogomips : 6384.53
>
> processor : 2
> vendor_id : GenuineIntel
> cpu family : 15
> model : 4
> model name : Intel(R) Xeon(TM) CPU 3.20GHz
> stepping : 10
> cpu MHz : 3192.358
> cache size : 2048 KB
> physical id : 3
> siblings : 2
> core id : 0
> cpu cores : 1
> fdiv_bug : no
> hlt_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 5
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
> constant_tsc pni monitor ds_cpl cid cx16 xtpr lahf_lm
> bogomips : 6384.50
>
> processor : 3
> vendor_id : GenuineIntel
> cpu family : 15
> model : 4
> model name : Intel(R) Xeon(TM) CPU 3.20GHz
> stepping : 10
> cpu MHz : 3192.358
> cache size : 2048 KB
> physical id : 3
> siblings : 2
> core id : 0
> cpu cores : 1
> fdiv_bug : no
> hlt_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 5
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
> constant_tsc pni monitor ds_cpl cid cx16 xtpr lahf_lm
> bogomips : 6384.54
>
> # cat /proc/interrupts
> CPU0 CPU1 CPU2 CPU3
> 0: 17246519 2140 2140 2148 IO-APIC-edge timer
> 1: 3 3 2 2 IO-APIC-edge i8042
> 8: 3 0 1 0 IO-APIC-edge rtc
> 9: 0 0 0 0 IO-APIC-fasteoi acpi
> 12: 0 1 0 2 IO-APIC-edge i8042
> 14: 0 0 0 0 IO-APIC-edge libata
> 15: 0 0 0 0 IO-APIC-edge libata
> 16: 31000929 7811302 39894575 15870447 IO-APIC-fasteoi eth0
> 17: 1078416 3044826 2489404 1780707 IO-APIC-fasteoi 3w-9xxx
> 18: 7107793 931865 5531801 862511 IO-APIC-fasteoi 3w-9xxx
> 19: 494962 141885 25640 282908 IO-APIC-fasteoi 3w-9xxx
> 20: 2130674 3511229 1293435 2256288 IO-APIC-fasteoi 3w-9xxx,
> libata
> 21: 0 0 0 0 IO-APIC-fasteoi
> ehci_hcd:usb1
> 223: 0 0 0 1 PCI-MSI-edge eth1
> NMI: 17252842 17252814 17252813 17252812
> LOC: 17234121 17234121 17231585 17231584
> ERR: 0
> MIS: 0
>
> # cat /proc/scsi/scsi
> Attached devices:
> Host: scsi0 Channel: 00 Id: 00 Lun: 00
> Vendor: AMCC Model: 9500S-12MI DISK Rev: 2.08
> Type: Direct-Access ANSI SCSI revision: 03
> Host: scsi0 Channel: 00 Id: 01 Lun: 00
> Vendor: AMCC Model: 9500S-12MI DISK Rev: 2.08
> Type: Direct-Access ANSI SCSI revision: 03
> Host: scsi0 Channel: 00 Id: 02 Lun: 00
> Vendor: AMCC Model: 9500S-12MI DISK Rev: 2.08
> Type: Direct-Access ANSI SCSI revision: 03
> Host: scsi0 Channel: 00 Id: 03 Lun: 00
> Vendor: AMCC Model: 9500S-12MI DISK Rev: 2.08
> Type: Direct-Access ANSI SCSI revision: 03
> Host: scsi0 Channel: 00 Id: 04 Lun: 00
> Vendor: AMCC Model: 9500S-12MI DISK Rev: 2.08
> Type: Direct-Access ANSI SCSI revision: 03
> Host: scsi1 Channel: 00 Id: 00 Lun: 00
> Vendor: AMCC Model: 9500S-12MI DISK Rev: 2.08
> Type: Direct-Access ANSI SCSI revision: 03
> Host: scsi1 Channel: 00 Id: 01 Lun: 00
> Vendor: AMCC Model: 9500S-12MI DISK Rev: 2.08
> Type: Direct-Access ANSI SCSI revision: 03
> Host: scsi1 Channel: 00 Id: 02 Lun: 00
> Vendor: AMCC Model: 9500S-12MI DISK Rev: 2.08
> Type: Direct-Access ANSI SCSI revision: 03
> Host: scsi1 Channel: 00 Id: 03 Lun: 00
> Vendor: AMCC Model: 9500S-12MI DISK Rev: 2.08
> Type: Direct-Access ANSI SCSI revision: 03
> Host: scsi1 Channel: 00 Id: 04 Lun: 00
> Vendor: AMCC Model: 9500S-12MI DISK Rev: 2.08
> Type: Direct-Access ANSI SCSI revision: 03
> Host: scsi1 Channel: 00 Id: 05 Lun: 00
> Vendor: AMCC Model: 9500S-12MI DISK Rev: 2.08
> Type: Direct-Access ANSI SCSI revision: 03
> Host: scsi2 Channel: 00 Id: 00 Lun: 00
> Vendor: AMCC Model: 9500S-8MI DISK Rev: 2.08
> Type: Direct-Access ANSI SCSI revision: 03
> Host: scsi3 Channel: 00 Id: 00 Lun: 00
> Vendor: AMCC Model: 9500S-8MI DISK Rev: 2.08
> Type: Direct-Access ANSI SCSI revision: 03
> Host: scsi6 Channel: 00 Id: 00 Lun: 00
> Vendor: ATA Model: ST3200822AS Rev: 3.01
> Type: Direct-Access ANSI SCSI revision: 05
> Host: scsi7 Channel: 00 Id: 00 Lun: 00
> Vendor: ATA Model: ST3200822AS Rev: 3.01
> Type: Direct-Access ANSI SCSI revision: 05
>
>
>
> Any advice on how to go about fixing this would be appreciated.
>
>
>
> Kind regards,
>
> Jesper Juhl <[email protected]>
>
>
> PS. Please keep me on Cc: when replying.
>
>
>
>

--
David Chatterton
XFS Engineering Manager
SGI Australia

2006-11-21 22:02:27

by Jesper Juhl

[permalink] [raw]
Subject: Re: 2.6.19-rc6 : Spontaneous reboots, stack overflows - seems to implicate xfs, scsi, networking, SMP

On 21/11/06, David Chatterton <[email protected]> wrote:
> Jesper,
>
> In the short term, the best workaround is to use 8K stacks.

Yeah, that's what I'm currently doing and the box seems more stable
(at least it has not crashed yet, but with 4K stacks it usually would
have by now).

> We do not see stack
> overflow problems with NFS + XFS + volume managers + disk devices.
>
Could the size of my devices be part of the cause? some of the logical
volumes I have mounted are multiple TB in size?


> Audits have been done in the past and will again be done in the future to try to
> identify areas where XFS could use less stack space by reducing/avoid large
> local variables. Reducing the code path is far more difficult.
>
I realize that fixing the problem may be difficult. I just wanted to
make sure that people were informed that there is an actual problem
and provide as much info as possible so that perhaps in the future it
can be fixed... :)
I'm reading through the XFS code myself at the moment and I'll be sure
to submit patches if I spot something that could help reduce stack
usage.


> There is active discussion about reducing inlining:
> http://bugzilla.kernel.org/show_bug.cgi?id=7364
>

Thanks, I'll check that out.


> I can't speak for the scsi stack usage.
>
> Thanks for traces, I've captured this information.
>
You are welcome. If you want/need more traces then I've got ~2.1G
worth of traces that you can have :)


Thank you for your reply.


--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2006-11-21 23:32:00

by David Chinner

[permalink] [raw]
Subject: Re: 2.6.19-rc6 : Spontaneous reboots, stack overflows - seems to implicate xfs, scsi, networking, SMP

On Tue, Nov 21, 2006 at 11:02:23PM +0100, Jesper Juhl wrote:
> On 21/11/06, David Chatterton <[email protected]> wrote:
> >Jesper,
> >
> >In the short term, the best workaround is to use 8K stacks.
>
> Yeah, that's what I'm currently doing and the box seems more stable
> (at least it has not crashed yet, but with 4K stacks it usually would
> have by now).
>
> >We do not see stack
> >overflow problems with NFS + XFS + volume managers + disk devices.
> >
> Could the size of my devices be part of the cause? some of the logical
> volumes I have mounted are multiple TB in size?

No.

> >Audits have been done in the past and will again be done in the future to
> >try to
> >identify areas where XFS could use less stack space by reducing/avoid large
> >local variables. Reducing the code path is far more difficult.
> >
> I realize that fixing the problem may be difficult. I just wanted to
> make sure that people were informed that there is an actual problem
> and provide as much info as possible so that perhaps in the future it
> can be fixed... :)

I've got one that prevents gcc from inlining single use functions in XFS
that I need to finish off, and that results in some significant stack
usage reductions in some XFS functions.

However, XFS is only one part of the picture - when you put NFS on top,
DM+md then scsi/FC below and then you nest a soft irq that might go
20 functions deep as well - then 4k stacks simply aren't big enough.

> I'm reading through the XFS code myself at the moment and I'll be sure
> to submit patches if I spot something that could help reduce stack
> usage.

Most of the low hanging fruit is already gone. The problem we are
facing now for further reductions in stack usage is the fact that we
need to factor code. That is a major undertaking and has a _lot_ of
risk associated with it....

> >There is active discussion about reducing inlining:
> >http://bugzilla.kernel.org/show_bug.cgi?id=7364
>
> Thanks, I'll check that out.

That's one of the few remaining low hanging fruit, and that's fixed
in the patches I already have.

> >Thanks for traces, I've captured this information.
> >
> You are welcome. If you want/need more traces then I've got ~2.1G
> worth of traces that you can have :)

Well, we don't need that many, but it would be nice to have a
set of unique traces that lead to overflows - could you process
them in some way just to extract just the unique XFS traces that
occur?

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2006-11-21 23:51:46

by Jesper Juhl

[permalink] [raw]
Subject: Re: 2.6.19-rc6 : Spontaneous reboots, stack overflows - seems to implicate xfs, scsi, networking, SMP

On 22/11/06, David Chinner <[email protected]> wrote:
> On Tue, Nov 21, 2006 at 11:02:23PM +0100, Jesper Juhl wrote:
> > On 21/11/06, David Chatterton <[email protected]> wrote:
...
>
> > >Audits have been done in the past and will again be done in the future to
> > >try to
> > >identify areas where XFS could use less stack space by reducing/avoid large
> > >local variables. Reducing the code path is far more difficult.
> > >
> > I realize that fixing the problem may be difficult. I just wanted to
> > make sure that people were informed that there is an actual problem
> > and provide as much info as possible so that perhaps in the future it
> > can be fixed... :)
>
> I've got one that prevents gcc from inlining single use functions in XFS
> that I need to finish off, and that results in some significant stack
> usage reductions in some XFS functions.
>
That sounds good. I'll be keeping an eye out for that one :)

> However, XFS is only one part of the picture - when you put NFS on top,
> DM+md then scsi/FC below and then you nest a soft irq that might go
> 20 functions deep as well - then 4k stacks simply aren't big enough.
>
True, there are a lot of players involved here, although XFS seems (to
me) to be the biggest one.

> > I'm reading through the XFS code myself at the moment and I'll be sure
> > to submit patches if I spot something that could help reduce stack
> > usage.
>
> Most of the low hanging fruit is already gone. The problem we are
> facing now for further reductions in stack usage is the fact that we
> need to factor code. That is a major undertaking and has a _lot_ of
> risk associated with it....
>
I'll try to spot some of the remaining low hanging fruit ;)


> > >There is active discussion about reducing inlining:
> > >http://bugzilla.kernel.org/show_bug.cgi?id=7364
> >
> > Thanks, I'll check that out.
>
> That's one of the few remaining low hanging fruit, and that's fixed
> in the patches I already have.
>
Nice. Will be good to get that in.


> > >Thanks for traces, I've captured this information.
> > >
> > You are welcome. If you want/need more traces then I've got ~2.1G
> > worth of traces that you can have :)
>
> Well, we don't need that many, but it would be nice to have a
> set of unique traces that lead to overflows - could you process
> them in some way just to extract just the unique XFS traces that
> occur?
>
I'll try to extract a copy of each unique trace that involves xfs,
sometime tomorrow or the day after, and then send you the result.


--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2006-11-22 12:58:13

by Jesper Juhl

[permalink] [raw]
Subject: Re: 2.6.19-rc6 : Spontaneous reboots, stack overflows - seems to implicate xfs, scsi, networking, SMP

On 22/11/06, Jesper Juhl <[email protected]> wrote:
> On 22/11/06, David Chinner <[email protected]> wrote:
> > On Tue, Nov 21, 2006 at 11:02:23PM +0100, Jesper Juhl wrote:
> > > On 21/11/06, David Chatterton <[email protected]> wrote:
...
> > > >Thanks for traces, I've captured this information.
> > > >
> > > You are welcome. If you want/need more traces then I've got ~2.1G
> > > worth of traces that you can have :)
> >
> > Well, we don't need that many, but it would be nice to have a
> > set of unique traces that lead to overflows - could you process
> > them in some way just to extract just the unique XFS traces that
> > occur?
> >
> I'll try to extract a copy of each unique trace that involves xfs,
> sometime tomorrow or the day after, and then send you the result.
>

Attached are two files. The one named stack_overflows.txt.gz contains
one instance of each unique stack overflow + trace that I've got. The
other file named kernel_BUG.txt.gz contains a few BUG() messages that
were also in the logs.

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html


Attachments:
(No filename) (1.17 kB)
stack_overflows.txt.gz (6.16 kB)
kernel_BUG.txt.gz (2.83 kB)
Download all attachments

2006-11-22 20:01:30

by Stephen Hemminger

[permalink] [raw]
Subject: Re: 2.6.19-rc6 : Spontaneous reboots, stack overflows - seems to implicate xfs, scsi, networking, SMP

On Wed, 22 Nov 2006 13:58:11 +0100
"Jesper Juhl" <[email protected]> wrote:

> On 22/11/06, Jesper Juhl <[email protected]> wrote:
> > On 22/11/06, David Chinner <[email protected]> wrote:
> > > On Tue, Nov 21, 2006 at 11:02:23PM +0100, Jesper Juhl wrote:
> > > > On 21/11/06, David Chatterton <[email protected]> wrote:
> ...
> > > > >Thanks for traces, I've captured this information.
> > > > >
> > > > You are welcome. If you want/need more traces then I've got ~2.1G
> > > > worth of traces that you can have :)
> > >
> > > Well, we don't need that many, but it would be nice to have a
> > > set of unique traces that lead to overflows - could you process
> > > them in some way just to extract just the unique XFS traces that
> > > occur?
> > >
> > I'll try to extract a copy of each unique trace that involves xfs,
> > sometime tomorrow or the day after, and then send you the result.
> >
>
> Attached are two files. The one named stack_overflows.txt.gz contains
> one instance of each unique stack overflow + trace that I've got. The
> other file named kernel_BUG.txt.gz contains a few BUG() messages that
> were also in the logs.
>

You have a kind of worst case scenario there:
XFS + Block layer
TCP receive/transmit
VLAN

It is hard to know who to blame, there is no information about stack
level at each call. Since it doesn't show up for filesystems other than
XFS, I would pick on that. Perhaps the following:


--- 2.6.19-rc6.orig/arch/i386/Kconfig.debug 2006-11-22 11:59:32.000000000 -0800
+++ 2.6.19-rc6/arch/i386/Kconfig.debug 2006-11-22 12:00:28.000000000 -0800
@@ -58,7 +58,7 @@

config 4KSTACKS
bool "Use 4Kb for kernel stacks instead of 8Kb"
- depends on DEBUG_KERNEL
+ depends on DEBUG_KERNEL && !XFS_FS
help
If you say Y here the kernel will use a 4Kb stacksize for the
kernel stack attached to each process/thread. This facilitates

2006-11-23 01:18:28

by David Chinner

[permalink] [raw]
Subject: Re: 2.6.19-rc6 : Spontaneous reboots, stack overflows - seems to implicate xfs, scsi, networking, SMP

On Wed, Nov 22, 2006 at 01:58:11PM +0100, Jesper Juhl wrote:
> On 22/11/06, Jesper Juhl <[email protected]> wrote:
> >On 22/11/06, David Chinner <[email protected]> wrote:
> >> On Tue, Nov 21, 2006 at 11:02:23PM +0100, Jesper Juhl wrote:
> >> > On 21/11/06, David Chatterton <[email protected]> wrote:
> ...
> >> > >Thanks for traces, I've captured this information.
> >> > >
> >> > You are welcome. If you want/need more traces then I've got ~2.1G
> >> > worth of traces that you can have :)
> >>
> >> Well, we don't need that many, but it would be nice to have a
> >> set of unique traces that lead to overflows - could you process
> >> them in some way just to extract just the unique XFS traces that
> >> occur?
> >>
> >I'll try to extract a copy of each unique trace that involves xfs,
> >sometime tomorrow or the day after, and then send you the result.
> >
>
> Attached are two files. The one named stack_overflows.txt.gz contains
> one instance of each unique stack overflow + trace that I've got. The
> other file named kernel_BUG.txt.gz contains a few BUG() messages that
> were also in the logs.

Thank you, Jesper. The common paths through XFS here are:

[<c0236eec>] generic_make_request+0x14f/0x1b7
[<c0236fbc>] submit_bio+0x68/0x109
[<c02257bc>] _xfs_buf_ioapply+0x1cf/0x28d
[<c02258a3>] xfs_buf_iorequest+0x29/0x6e
[<c02254a4>] xfs_buf_iostart+0x6d/0x97
[<c0224e32>] xfs_buf_read_flags+0x8a/0x8c
[<c021768e>] xfs_trans_read_buf+0x153/0x2fc
[<c01eb559>] xfs_btree_read_bufs+0x6e/0x84
[<c01d27d9>] xfs_alloc_lookup+0x10a/0x39e
[<c01d3d5c>] xfs_alloc_lookup_eq+0x14/0x17
[<c01ceef8>] xfs_alloc_fixup_trees+0x252/0x2a9
[<c01cff24>] xfs_alloc_ag_vextent_size+0x318/0x405
[<c01cf0e5>] xfs_alloc_ag_vextent+0xe2/0x106
[<c01d13a9>] xfs_alloc_vextent+0x372/0x47a
[<c01dfcb9>] xfs_bmap_btalloc+0x31f/0x966
[<c01e031e>] xfs_bmap_alloc+0x1e/0x29
[<c01e3b92>] xfs_bmapi+0x1134/0x1545
[<c0206a20>] xfs_iomap_write_allocate+0x2bb/0x509
[<c020576b>] xfs_iomap+0x357/0x459
[<c022b02f>] xfs_bmap+0x2e/0x35
[<c0222bbb>] xfs_map_blocks+0x3c/0x70
[<c0223b24>] xfs_page_state_convert+0x3cc/0x629
[<c0223ddd>] xfs_vm_writepage+0x5c/0xd3
[<c01417a4>] generic_writepages+0x1b9/0x2d5
[<c0223e78>] xfs_vm_writepages+0x24/0x4a
[<c01418ea>] do_writepages+0x2a/0x46
[<c0172c03>] __sync_single_inode+0x5c/0x1de
[<c0172e0a>] __writeback_single_inode+0x85/0x18f
[<c01730c7>] sync_sb_inodes+0x1b3/0x2b2


i.e. through the allocator. This is delayed allocation occurring
here during background writeback and we are having to read a
free space btree block while preparing enough free single blocks
to allow btree splits to occur (on top of the extent needed
for the delalloc write).

There are several variations on this (e.g. via write throttling,
from nfsds, inode allocation, etc) which increase the stack usage
before we get to XFS, and the subsequent stack overflow is almost
always during softirq processing when we are deep down in that
stack:

% grep "stack overflow" stack_overflows.txt |wc -l
35
% grep __do_softirq stack_overflows.txt | wc -l
29

So part of the problem is softirqs that use a fair bit of stack
space running on stacks that don't have a lot of space free to
begin with.

I've just checked on a 2.6.17 build on i386 how much stack we
are using (from checkstack.pl with min size reported set to 32 bytes)
here in XFS:

32 _xfs_buf_ioapply
<32 xfs_buf_iorequest
<32 xfs_buf_iostart
<32 xfs_buf_read_flags
<32 xfs_trans_read_buf
<32 xfs_btree_read_bufs
44 xfs_alloc_lookup
<32 xfs_alloc_lookup_eq
<32 xfs_alloc_fixup_trees
36 xfs_alloc_ag_vextent_size
<32 xfs_alloc_ag_vextent
44 xfs_alloc_vextent
148 xfs_bmap_btalloc
<32 xfs_bmap_alloc
272 xfs_bmapi
160 xfs_iomap_write_allocate
68 xfs_iomap
<32 xfs_bmap
<32 xfs_map_blocks
128 xfs_page_state_convert
<32 xfs_vm_writepage
<32 generic_writepages
<32 xfs_vm_writepages

So, assuming the stacks less than 32 bytes are 32 bytes, we've got
1380 bytes in the XFS stack there, and very few functions where it
can be reduced further. Still, 1380 bytes is way, way short of 4KB,
so unless there is extra stack usage that checkstack doesn't tell us
about I'm not sure why this amount of usage is causing repeated
stack overflows with very little stack usage on either side of it.

Can someone enlighten me as to where all the rest of the stack
is being used up here?

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2006-11-23 04:10:13

by David Miller

[permalink] [raw]
Subject: Re: 2.6.19-rc6 : Spontaneous reboots, stack overflows - seems to implicate xfs, scsi, networking, SMP

From: David Chinner <[email protected]>
Date: Thu, 23 Nov 2006 12:18:09 +1100

> So, assuming the stacks less than 32 bytes are 32 bytes, we've got
> 1380 bytes in the XFS stack there,

On sparc64 just the XFS parts of the backtrace would be a minimum of
2816 bytes (each function has a minimum 8 * 16 byte stack frame, and
there are about 22 calls in that trace). It's probably a lot more
with local variables and such.

It's way too much. You guys have to fix this stuff.

If TCP's full send and receive path can be done in less function
calls, XFS can allocate blocks in less too.

I would even say 10 function calls deep to allocate file blocks
is overkill, but 22 it just astronomically bad.

2006-11-23 04:35:50

by Al Viro

[permalink] [raw]
Subject: Re: 2.6.19-rc6 : Spontaneous reboots, stack overflows - seems to implicate xfs, scsi, networking, SMP

On Wed, Nov 22, 2006 at 08:10:13PM -0800, David Miller wrote:
> From: David Chinner <[email protected]>
> Date: Thu, 23 Nov 2006 12:18:09 +1100
>
> > So, assuming the stacks less than 32 bytes are 32 bytes, we've got
> > 1380 bytes in the XFS stack there,
>
> On sparc64 just the XFS parts of the backtrace would be a minimum of
> 2816 bytes (each function has a minimum 8 * 16 byte stack frame, and
> there are about 22 calls in that trace). It's probably a lot more
> with local variables and such.
>
> It's way too much. You guys have to fix this stuff.
>
> If TCP's full send and receive path can be done in less function
> calls, XFS can allocate blocks in less too.
>
> I would even say 10 function calls deep to allocate file blocks
> is overkill, but 22 it just astronomically bad.

Especially since a large part is due to cxfs...

2006-11-23 06:47:34

by Matthew Wilcox

[permalink] [raw]
Subject: Re: 2.6.19-rc6 : Spontaneous reboots, stack overflows - seems to implicate xfs, scsi, networking, SMP

On Thu, Nov 23, 2006 at 04:35:43AM +0000, Al Viro wrote:
> On Wed, Nov 22, 2006 at 08:10:13PM -0800, David Miller wrote:
> > I would even say 10 function calls deep to allocate file blocks
> > is overkill, but 22 it just astronomically bad.
>
> Especially since a large part is due to cxfs...

... and patches sent in the past to remove layers from XFS have been
NAKed due to CXFS

http://oss.sgi.com/archives/xfs/2003-08/msg00166.html
http://oss.sgi.com/archives/xfs/2003-08/msg00167.html
http://oss.sgi.com/archives/xfs/2003-08/msg00168.html
http://oss.sgi.com/archives/xfs/2003-08/msg00171.html

Maybe IRIX is now sufficiently dead that the last argument doesn't
matter any more.

2006-11-23 07:08:57

by David Chinner

[permalink] [raw]
Subject: Re: 2.6.19-rc6 : Spontaneous reboots, stack overflows - seems to implicate xfs, scsi, networking, SMP

On Wed, Nov 22, 2006 at 08:10:13PM -0800, David Miller wrote:
> From: David Chinner <[email protected]>
> Date: Thu, 23 Nov 2006 12:18:09 +1100
>
> > So, assuming the stacks less than 32 bytes are 32 bytes, we've got
> > 1380 bytes in the XFS stack there,
>
> On sparc64 just the XFS parts of the backtrace would be a minimum of
> 2816 bytes (each function has a minimum 8 * 16 byte stack frame, and
> there are about 22 calls in that trace).

Ok, I didn't think of stack frames - I thought they tiny on x86 but
I'm not intimately familiar with x86 which is why I'm asking....

> It's probably a lot more
> with local variables and such.

Not much more than above - the same call path on ia64 and x86_64
using the stack checker addition method I used showed about a 35%
increase in stack usage compared to ia32. I'd say about 4.5k stack
usage on sparc64 for that call path, then.

FWIW, I've never heard of XFS related stack overflows on the
sparc64 platform - have you heard of any reports of this
being a problem?

> It's way too much. You guys have to fix this stuff.

Sure, we're trying to but it takes time. However, if it's such a
problem for you now and you want this process sped up, then _please,
please, please_ submit patches to help fix the problem.....

Realistically, XFS is only part of the problem here - the other part
of the problem is the amount of stack that softirqs are using (e.g.
the entire tcp send and receive path) which means we really have
much, much less than 4k of stack space to play with.

If the softirqs were run on a different stack, then a lot of these
problems would go away (29 of the 35 reported overflows had softirqs
running) and we'd be much more likely to get XFS to run reliably on
4k stacks...

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2006-11-23 08:12:32

by Arjan van de Ven

[permalink] [raw]
Subject: Re: 2.6.19-rc6 : Spontaneous reboots, stack overflows - seems to implicate xfs, scsi, networking, SMP

On Thu, 2006-11-23 at 04:35 +0000, Al Viro wrote:
> > I would even say 10 function calls deep to allocate file blocks
> > is overkill, but 22 it just astronomically bad.
>
> Especially since a large part is due to cxfs...
> -

it's a bit sad to see XFS this crippled in linux due to an external,
proprietary module ;(

--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2006-11-23 10:27:29

by Jesper Juhl

[permalink] [raw]
Subject: Re: 2.6.19-rc6 : Spontaneous reboots, stack overflows - seems to implicate xfs, scsi, networking, SMP

On 22/11/06, Stephen Hemminger <[email protected]> wrote:
> On Wed, 22 Nov 2006 13:58:11 +0100
> "Jesper Juhl" <[email protected]> wrote:
>
> > On 22/11/06, Jesper Juhl <[email protected]> wrote:
> > > On 22/11/06, David Chinner <[email protected]> wrote:
> > > > On Tue, Nov 21, 2006 at 11:02:23PM +0100, Jesper Juhl wrote:
> > > > > On 21/11/06, David Chatterton <[email protected]> wrote:
> > ...
> > > > > >Thanks for traces, I've captured this information.
> > > > > >
> > > > > You are welcome. If you want/need more traces then I've got ~2.1G
> > > > > worth of traces that you can have :)
> > > >
> > > > Well, we don't need that many, but it would be nice to have a
> > > > set of unique traces that lead to overflows - could you process
> > > > them in some way just to extract just the unique XFS traces that
> > > > occur?
> > > >
> > > I'll try to extract a copy of each unique trace that involves xfs,
> > > sometime tomorrow or the day after, and then send you the result.
> > >
> >
> > Attached are two files. The one named stack_overflows.txt.gz contains
> > one instance of each unique stack overflow + trace that I've got. The
> > other file named kernel_BUG.txt.gz contains a few BUG() messages that
> > were also in the logs.
> >
>
> You have a kind of worst case scenario there:
> XFS + Block layer
> TCP receive/transmit
> VLAN
>
> It is hard to know who to blame, there is no information about stack
> level at each call. Since it doesn't show up for filesystems other than
> XFS, I would pick on that. Perhaps the following:
>
Well, there's a very good explanation for that. The server has nothing
but XFS file systems (well, /boot is ext3, but that doesn't get used
for anything but the kernel image and System.map file).

>
> --- 2.6.19-rc6.orig/arch/i386/Kconfig.debug 2006-11-22 11:59:32.000000000 -0800
> +++ 2.6.19-rc6/arch/i386/Kconfig.debug 2006-11-22 12:00:28.000000000 -0800
> @@ -58,7 +58,7 @@
>
> config 4KSTACKS
> bool "Use 4Kb for kernel stacks instead of 8Kb"
> - depends on DEBUG_KERNEL
> + depends on DEBUG_KERNEL && !XFS_FS
> help
> If you say Y here the kernel will use a 4Kb stacksize for the
> kernel stack attached to each process/thread. This facilitates
>
Using 8K stacks works arround the problem, so for now the above patch
may well make sense. But, it would be better to get XFS fixed rather
than start adding dependencies for 4KSTACKS - it might be troublesome
getting rid of it again.

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2006-11-23 13:16:22

by Ingo Oeser

[permalink] [raw]
Subject: Re: 2.6.19-rc6 : Spontaneous reboots, stack overflows - seems to implicate xfs, scsi, networking, SMP

Hi there,

David Chinner schrieb:
> If the softirqs were run on a different stack, then a lot of these
> problems would go away (29 of the 35 reported overflows had softirqs
> running) and we'd be much more likely to get XFS to run reliably on
> 4k stacks...

Ok, that seem like another approach. What about putting your allocation slowpath
in a kernel thread. They might have more stack available.

This is inferior to the complexity reduction suggested from the kernel people,
but if you cannot reduce complexity anymore this might work, too.


Regards

Ingo Oeser

2006-11-23 18:37:11

by Arjan van de Ven

[permalink] [raw]
Subject: Re: 2.6.19-rc6 : Spontaneous reboots, stack overflows - seems to implicate xfs, scsi, networking, SMP

On Thu, 2006-11-23 at 14:16 +0100, Ingo Oeser wrote:
> Hi there,
>
> David Chinner schrieb:
> > If the softirqs were run on a different stack, then a lot of these

softirqs DO run on their own stack!


2006-11-23 19:42:43

by David Miller

[permalink] [raw]
Subject: Re: 2.6.19-rc6 : Spontaneous reboots, stack overflows - seems to implicate xfs, scsi, networking, SMP

From: David Chinner <[email protected]>
Date: Thu, 23 Nov 2006 18:08:37 +1100

> Sure, we're trying to but it takes time. However, if it's such a
> problem for you now and you want this process sped up, then _please,
> please, please_ submit patches to help fix the problem.....

As noted in other replies, such patch submissions have been rejected
by the XFS maintainers in the past. So don't give the false
impression that people aren't trying to submit patches to fix this
problem.

The problem is the XFS maintainers rejecting the layering removal
patches.

2006-11-23 19:54:00

by David Miller

[permalink] [raw]
Subject: Re: 2.6.19-rc6 : Spontaneous reboots, stack overflows - seems to implicate xfs, scsi, networking, SMP

From: Arjan van de Ven <[email protected]>
Date: Thu, 23 Nov 2006 19:37:00 +0100

> On Thu, 2006-11-23 at 14:16 +0100, Ingo Oeser wrote:
> > Hi there,
> >
> > David Chinner schrieb:
> > > If the softirqs were run on a different stack, then a lot of these
>
> softirqs DO run on their own stack!

This is a platform specific assertion :-)

2006-11-23 22:07:53

by Nathan Scott

[permalink] [raw]
Subject: Re: [xfs-masters] Re: 2.6.19-rc6 : Spontaneous reboots, stack overflows - seems to implicate xfs, scsi, networking, SMP

On Thu, 2006-11-23 at 09:12 +0100, Arjan van de Ven wrote:
> On Thu, 2006-11-23 at 04:35 +0000, Al Viro wrote:
> > > I would even say 10 function calls deep to allocate file blocks
> > > is overkill, but 22 it just astronomically bad.
> >
> > Especially since a large part is due to cxfs...
> > -
>
> it's a bit sad to see XFS this crippled in linux due to an external,
> proprietary module ;(

Heh, never let reality get in the way of a good conspiracy theory.
The stack depth in XFS is more a factor of the complexity of the
XFS space allocation algorithms, and is unrelated to CXFS.

I'm sure if people would point to specific stack issues they would
(continue to) get addressed. Its just so much easier to speculate
randomly though...

cheers.

--
Nathan

2006-11-24 00:56:07

by David Chinner

[permalink] [raw]
Subject: Re: 2.6.19-rc6 : Spontaneous reboots, stack overflows - seems to implicate xfs, scsi, networking, SMP

On Thu, Nov 23, 2006 at 07:37:00PM +0100, Arjan van de Ven wrote:
> On Thu, 2006-11-23 at 14:16 +0100, Ingo Oeser wrote:
> > Hi there,
> >
> > David Chinner schrieb:
> > > If the softirqs were run on a different stack, then a lot of these
>
> softirqs DO run on their own stack!

So they run on a separate stack for 4k stacks on x86?

They don't run on a separate stack for 8k stacks on x86 -
Jesper's traces show that - so this may indicate an issue
with the methodology used to generate the stack overflow
traces inteh first place. i.e. if 4k stacks use a separate
stack, then most of the reported overflows are spurious
and would not normally occur on 4k stack systems..

Can you confirm this, Arjan?

Also, that means that while XFS is apparently only using <1500 bytes
of stack through this path according to the static stack checker
tool, there's more than 2k of extra stack usage that the tool is not
telling me about. i.e. XFS and whatever is above/below it should
have a full 4k to work with. I'd really like to know where that
extra stack space is being used....

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2006-11-24 01:08:58

by Jesper Juhl

[permalink] [raw]
Subject: Re: 2.6.19-rc6 : Spontaneous reboots, stack overflows - seems to implicate xfs, scsi, networking, SMP

On 24/11/06, David Chinner <[email protected]> wrote:
> On Thu, Nov 23, 2006 at 07:37:00PM +0100, Arjan van de Ven wrote:
> > On Thu, 2006-11-23 at 14:16 +0100, Ingo Oeser wrote:
> > > Hi there,
> > >
> > > David Chinner schrieb:
> > > > If the softirqs were run on a different stack, then a lot of these
> >
> > softirqs DO run on their own stack!
>
> So they run on a separate stack for 4k stacks on x86?
>

Yes, with 4K stacks there's sepperate IRQ stack.

>From the help text for CONFIG_4KSTACKS :

"If you say Y here the kernel will use a 4Kb stacksize for the
kernel stack attached to each process/thread. This facilitates
running more threads on a system and also reduces the pressure
on the VM subsystem for higher order allocations. This option
will also use IRQ stacks to compensate for the reduced stackspace."


> They don't run on a separate stack for 8k stacks on x86 -
> Jesper's traces show that - so this may indicate an issue
> with the methodology used to generate the stack overflow
> traces inteh first place. i.e. if 4k stacks use a separate
> stack, then most of the reported overflows are spurious
> and would not normally occur on 4k stack systems..
>

Well, some of the traces show that we were down to ~3K stack free with
8K stacks, so ~5K used. Even with 4K stacks and sepperate stack for
IRQs we will still be uncomfortably close to the edge in those cases.
Also, I did manage to capture a single line via netconsole while
running with 4K stacks :
do_IRQ: stack overflow: 492
Unfortunately that was the only line that made it to the remote log
server, so I don't have the actual trace for that one. But it does
show that there really is an issue when running with 4K stacks, IRQ
stacks or no.


--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2006-11-24 02:05:45

by David Chinner

[permalink] [raw]
Subject: Re: 2.6.19-rc6 : Spontaneous reboots, stack overflows - seems to implicate xfs, scsi, networking, SMP

On Fri, Nov 24, 2006 at 02:08:53AM +0100, Jesper Juhl wrote:
> On 24/11/06, David Chinner <[email protected]> wrote:
> >On Thu, Nov 23, 2006 at 07:37:00PM +0100, Arjan van de Ven wrote:
> >> On Thu, 2006-11-23 at 14:16 +0100, Ingo Oeser wrote:
> >> > Hi there,
> >> >
> >> > David Chinner schrieb:
> >> > > If the softirqs were run on a different stack, then a lot of these
> >>
> >> softirqs DO run on their own stack!
> >
> >So they run on a separate stack for 4k stacks on x86?
>
> Yes, with 4K stacks there's sepperate IRQ stack.

Ok, thanks.

> >They don't run on a separate stack for 8k stacks on x86 -
> >Jesper's traces show that - so this may indicate an issue
> >with the methodology used to generate the stack overflow
> >traces inteh first place. i.e. if 4k stacks use a separate
> >stack, then most of the reported overflows are spurious
> >and would not normally occur on 4k stack systems..
> >
>
> Well, some of the traces show that we were down to ~3K stack free with
> 8K stacks, so ~5K used. Even with 4K stacks and sepperate stack for
> IRQs we will still be uncomfortably close to the edge in those cases.

Sure - i didn't say there wasn't a problem - more just indicating
that most of the traces would not have happened on a 4k stack box so
it's harder to tell which of the traces you posted would actually
lead to an overflow.

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2006-11-24 07:52:47

by Arjan van de Ven

[permalink] [raw]
Subject: Re: 2.6.19-rc6 : Spontaneous reboots, stack overflows - seems to implicate xfs, scsi, networking, SMP

On Fri, 2006-11-24 at 11:55 +1100, David Chinner wrote:
> On Thu, Nov 23, 2006 at 07:37:00PM +0100, Arjan van de Ven wrote:
> > On Thu, 2006-11-23 at 14:16 +0100, Ingo Oeser wrote:
> > > Hi there,
> > >
> > > David Chinner schrieb:
> > > > If the softirqs were run on a different stack, then a lot of these
> >
> > softirqs DO run on their own stack!
>
> So they run on a separate stack for 4k stacks on x86?
>
> They don't run on a separate stack for 8k stacks on x86 -
> Jesper's traces show that - so this may indicate an issue
> with the methodology used to generate the stack overflow
> traces inteh first place. i.e. if 4k stacks use a separate
> stack, then most of the reported overflows are spurious
> and would not normally occur on 4k stack systems..
>
> Can you confirm this, Arjan?

yes there are separate stacks for softirq and hardirq context with 4K
stacks, but not for 8K stacks.


--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2006-11-26 14:31:36

by Eric Sandeen

[permalink] [raw]
Subject: Re: 2.6.19-rc6 : Spontaneous reboots, stack overflows - seems to implicate xfs, scsi, networking, SMP

Arjan van de Ven wrote:
> On Thu, 2006-11-23 at 04:35 +0000, Al Viro wrote:
>>> I would even say 10 function calls deep to allocate file blocks
>>> is overkill, but 22 it just astronomically bad.
>> Especially since a large part is due to cxfs...
>> -
>
> it's a bit sad to see XFS this crippled in linux due to an external,
> proprietary module ;(
>

I understand that cxfs is a bit of a whipping-boy, but the stacks in
question in this thread really don't have much if anything to do with
the filesystem layering in xfs. They are deep callchains & large
functions in core xfs code, it seems.

-Eric

2006-11-29 01:56:26

by David Chinner

[permalink] [raw]
Subject: Re: 2.6.19-rc6 : Spontaneous reboots, stack overflows - seems to implicate xfs, scsi, networking, SMP

On Thu, Nov 23, 2006 at 12:18:09PM +1100, David Chinner wrote:
> On Wed, Nov 22, 2006 at 01:58:11PM +0100, Jesper Juhl wrote:
> >
> > Attached are two files. The one named stack_overflows.txt.gz contains
> > one instance of each unique stack overflow + trace that I've got. The
> > other file named kernel_BUG.txt.gz contains a few BUG() messages that
> > were also in the logs.
....
> I've just checked on a 2.6.17 build on i386 how much stack we
> are using (from checkstack.pl with min size reported set to 32 bytes)
> here in XFS:
....
> So, assuming the stacks less than 32 bytes are 32 bytes, we've got
> 1380 bytes in the XFS stack there, and very few functions where it
> can be reduced further. Still, 1380 bytes is way, way short of 4KB,
> so unless there is extra stack usage that checkstack doesn't tell us
> about I'm not sure why this amount of usage is causing repeated
> stack overflows with very little stack usage on either side of it.
>
> Can someone enlighten me as to where all the rest of the stack
> is being used up here?

FYI.

With some help from Keith Owens, we've determined that gcc 3.3.5
resulted in XFS stack usage of about 1.9KB through the writeback and
allocation path with another ~800 bytes of stack usage in generic
code in this path.

The big difference between the numbers I was getting from checkstack
and reality was CONFIG_CC_OPTIMISE_FOR_SIZE=y being set on the
kernels I was stack checking. IOWs, CONFIG_CC_OPTIMISE_FOR_SIZE=y
appears to reduce XFS stack usage by at least 20% and so probably
should be used with XFS on 4k stacks.

Keith also confirmed that gcc-4.1's aggressive inlining of static
functions substantially increases stack usage (by ~15%) through this
call chain. Given that many of the inlined static functions are not
required by the critical path (i.e. they'd previously been factored
out to reduce stack usage), gcc is effectively undoing past mods
that had substantially reduced XFS's stack usage.

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group