Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758658AbXKAH5x (ORCPT ); Thu, 1 Nov 2007 03:57:53 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756615AbXKAH5f (ORCPT ); Thu, 1 Nov 2007 03:57:35 -0400 Received: from smtp.ustc.edu.cn ([202.38.64.16]:34594 "HELO ustc.edu.cn" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with SMTP id S1755858AbXKAH5e (ORCPT ); Thu, 1 Nov 2007 03:57:34 -0400 Message-ID: <393903856.06449@ustc.edu.cn> X-EYOUMAIL-SMTPAUTH: wfg@mail.ustc.edu.cn Date: Thu, 1 Nov 2007 15:57:30 +0800 From: Fengguang Wu To: Torsten Kaiser Cc: Maxim Levitsky , Peter Zijlstra , linux-kernel@vger.kernel.org, Andrew Morton Subject: Re: 100% iowait on one of cpus in current -git References: <200710220822.52370.maximlevitsky@gmail.com> <200710221421.21439.maximlevitsky@gmail.com> <393056632.00561@ustc.edu.cn> <200710221505.35397.maximlevitsky@gmail.com> <20071022131045.GA5357@mail.ustc.edu.cn> <393060478.03650@ustc.edu.cn> <64bb37e0710310822r5ca6b793p8fd97db2f72a8655@mail.gmail.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="fUYQa+Pmc3FrFX/N" Content-Disposition: inline In-Reply-To: <64bb37e0710310822r5ca6b793p8fd97db2f72a8655@mail.gmail.com> X-GPG-Fingerprint: 53D2 DDCE AB5C 8DC6 188B 1CB1 F766 DA34 8D8B 1C6D User-Agent: Mutt/1.5.16 (2007-06-11) Message-Id: Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9968 Lines: 257 --fUYQa+Pmc3FrFX/N Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Wed, Oct 31, 2007 at 04:22:10PM +0100, Torsten Kaiser wrote: > On 10/22/07, Fengguang Wu wrote: > > On Mon, Oct 22, 2007 at 09:10:45PM +0800, Fengguang Wu wrote: > > > Hmm, Maybe it's an reiserfs related issue. Do you have the full log file? > > > > Bingo! It can be reproduced in -mm on reiserfs: > > > > # mkfs.reiserfs /dev/sdb1 > > # mount /dev/sdb1 /test > > # cp bin /test > > > > # dmesg > > [...] > > [ 418.346113] requeue_io 308: inode 6 size 302 at 08:11(sdb1) > > [ 418.346119] requeue_io 308: inode 7 size 196 at 08:11(sdb1) > > [ 418.346125] requeue_io 308: inode 8 size 85 at 08:11(sdb1) > > Since 2.6.23-mm1 I also experience strange hangs during heavy writeouts. > Each time I noticed this I was using emerge (package util from the > gentoo distribution) to install/upgrade a package. The last step, > where this hang occurred, is moving the prepared files from a tmpfs > partion to the main xfs filesystem. > The hangs where not fatal, after a few second everything resumed > normal, so I was not able to capture a good image of what was > happening. Thank you for the detailed report. How severe was the hangs? Only writeouts stalled, all apps stalled, or cannot type and run new commands? > Today it happend again, but a little more obvious. During the moving > process the writeout stalled completly for several minutes until I hit > SysRq+W. > > /proc/meminfo: > MemTotal: 4061808 kB > MemFree: 881332 kB > Buffers: 0 kB > Cached: 2566628 kB > SwapCached: 64 kB > Active: 926612 kB > Inactive: 1959136 kB > SwapTotal: 9775416 kB > SwapFree: 9775296 kB > Dirty: 44948 kB > Writeback: 0 kB > AnonPages: 319068 kB > Mapped: 52844 kB > Slab: 235572 kB > SReclaimable: 164408 kB > SUnreclaim: 71164 kB > PageTables: 9576 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > CommitLimit: 11806320 kB > Committed_AS: 544520 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 35004 kB > VmallocChunk: 34359702447 kB > HugePages_Total: 0 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 2048 kB > > The 'Dirty' count did not decrease during this time and 'Writeback' stayed at 0. > I also have /proc/pagetypeinfo ,but I see nothing interessing in > there. (But will send it, if needed) > > The output from SysRq+W: > SysRq : Show Blocked State > task PC stack pid father > pdflush D ffff81001fcc2a88 0 285 2 > ffff810005d55580 0000000000000046 0000000000000800 0000007000000001 > 0000000000000400 ffffffff8022d61c ffffffff80817b00 ffffffff80817b00 > ffffffff80813f40 ffffffff80817b00 ffff810100893b18 0000000000000000 > Call Trace: > [] task_rq_lock+0x4c/0x90 > [] __wake_up_common+0x5a/0x90 > [] __down+0xa7/0x11e > [] default_wake_function+0x0/0x10 > [] __down_failed+0x35/0x3a > [] xfs_buf_lock+0x3e/0x40 > [] _xfs_buf_find+0x13e/0x240 > [] xfs_buf_get_flags+0x6f/0x190 > [] xfs_buf_read_flags+0x12/0xa0 > [] xfs_trans_read_buf+0x64/0x340 > [] xfs_itobp+0x81/0x1e0 > [] xfs_buf_rele+0x2e/0xd0 > [] xfs_iflush+0xfe/0x520 > [] __down_read_trylock+0x42/0x60 > [] xfs_inode_item_push+0x12/0x20 > [] xfs_trans_push_ail+0x267/0x2b0 > [] xlog_ticket_get+0xfb/0x140 > [] xfs_log_reserve+0xee/0x120 > [] xfs_trans_reserve+0xa8/0x210 > [] xfs_iomap_write_allocate+0xfa/0x410 > [] __split_bio+0x38a/0x3c0 > [] xfs_start_page_writeback+0x27/0x60 > [] xfs_iomap+0x26c/0x310 > [] xfs_map_blocks+0x38/0x90 > [] xfs_page_state_convert+0x2b8/0x630 > [] xfs_vm_writepage+0x6f/0x120 > [] __writepage+0xa/0x30 > [] write_cache_pages+0x23e/0x330 > [] __writepage+0x0/0x30 > [] xfs_iflush+0x3b7/0x520 > [] _xfs_buf_ioapply+0x222/0x320 > [] __up_read+0x21/0xb0 > [] xfs_iunlock+0x5c/0xc0 > [] do_writepages+0x20/0x40 > [] __writeback_single_inode+0xb0/0x380 > [] dm_table_any_congested+0x2e/0x80 > [] generic_sync_sb_inodes+0x20d/0x330 > [] writeback_inodes+0xa2/0xe0 > [] wb_kupdate+0xa6/0x120 > [] pdflush+0x0/0x1e0 > [] pdflush+0x110/0x1e0 > [] wb_kupdate+0x0/0x120 > [] kthread+0x4b/0x80 > [] child_rip+0xa/0x12 > [] kthread+0x0/0x80 > [] child_rip+0x0/0x12 > > emerge D ffff81001fcc2a88 0 3221 8163 > ffff81008c0679f8 0000000000000086 ffff81008c067988 ffffffff8024a719 > ffff8100060fb008 ffffffff8022c8ea ffffffff80817b00 ffffffff80817b00 > ffffffff80813f40 ffffffff80817b00 ffff810100893b18 0000000000000000 > Call Trace: > [] autoremove_wake_function+0x9/0x30 > [] __wake_up_common+0x5a/0x90 > [] __wake_up_common+0x5a/0x90 > [] __down+0xa7/0x11e > [] default_wake_function+0x0/0x10 > [] __down_failed+0x35/0x3a > [] xfs_buf_lock+0x3e/0x40 > [] _xfs_buf_find+0x13e/0x240 > [] xfs_buf_get_flags+0x6f/0x190 > [] xfs_buf_read_flags+0x12/0xa0 > [] xfs_trans_read_buf+0x64/0x340 > [] xfs_itobp+0x81/0x1e0 > [] xfs_buf_rele+0x2e/0xd0 > [] xfs_iflush+0xfe/0x520 > [] __down_read_trylock+0x42/0x60 > [] xfs_inode_item_push+0x12/0x20 > [] xfs_trans_push_ail+0x267/0x2b0 > [] xfs_log_reserve+0x72/0x120 > [] xfs_trans_reserve+0xa8/0x210 > [] kmem_zone_zalloc+0x32/0x50 > [] xfs_itruncate_finish+0xfb/0x310 > [] xfs_free_eofblocks+0x23b/0x280 > [] xfs_release+0x153/0x200 > [] xfs_file_release+0x10/0x20 > [] __fput+0xb1/0x220 > [] filp_close+0x54/0x90 > [] sys_close+0x9f/0x100 > [] system_call+0x7e/0x83 > > > After this SysRq+W writeback resumed again. Possible that writing > above into the syslog triggered that. Maybe. Are the log files on another disk/partition? > The source tmpfs is mounted with any special parameters, but the > target xfs filesystem resides on a dm-crypt device that is on top a 3 > disk RAID5 md. > During the hang all CPUs where idle. No iowaits? ;-) > The system is x86_64 with CONFIG_NO_HZ=y, but was still receiving ~330 > interrupts per second because of the bttv driver. (But I was not using > that device at this time.) > > I'm willing to test patches or more provide more information, but lack > a good testcase to trigger this on demand. Thank you. Maybe we can start by the applied debug patch :-) Fengguang --fUYQa+Pmc3FrFX/N Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="writeback-debug.patch" --- mm/page-writeback.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) --- linux-2.6.23-rc8-mm2.orig/mm/page-writeback.c +++ linux-2.6.23-rc8-mm2/mm/page-writeback.c @@ -98,6 +98,26 @@ EXPORT_SYMBOL(laptop_mode); /* End of sysctl-exported parameters */ +#define writeback_debug_report(n, wbc) do { \ + __writeback_debug_report(n, wbc, __FILE__, __LINE__, __FUNCTION__); \ +} while (0) + +void __writeback_debug_report(long n, struct writeback_control *wbc, + const char *file, int line, const char *func) +{ + printk("%s %d %s: %s(%d) %ld " + "global %lu %lu %lu " + "wc %c%c tw %ld sk %ld\n", + file, line, func, + current->comm, current->pid, n, + global_page_state(NR_FILE_DIRTY), + global_page_state(NR_WRITEBACK), + global_page_state(NR_UNSTABLE_NFS), + wbc->encountered_congestion ? 'C':'_', + wbc->more_io ? 'M':'_', + wbc->nr_to_write, + wbc->pages_skipped); +} static void background_writeout(unsigned long _min_pages); @@ -404,6 +424,7 @@ static void balance_dirty_pages(struct a pages_written += write_chunk - wbc.nr_to_write; get_dirty_limits(&background_thresh, &dirty_thresh, &bdi_thresh, bdi); + writeback_debug_report(pages_written, &wbc); } /* @@ -568,6 +589,7 @@ static void background_writeout(unsigned wbc.pages_skipped = 0; writeback_inodes(&wbc); min_pages -= MAX_WRITEBACK_PAGES - wbc.nr_to_write; + writeback_debug_report(min_pages, &wbc); if (wbc.nr_to_write > 0 || wbc.pages_skipped > 0) { /* Wrote less than expected */ if (wbc.encountered_congestion) @@ -643,6 +665,7 @@ static void wb_kupdate(unsigned long arg wbc.encountered_congestion = 0; wbc.nr_to_write = MAX_WRITEBACK_PAGES; writeback_inodes(&wbc); + writeback_debug_report(nr_to_write, &wbc); if (wbc.nr_to_write > 0) { if (wbc.encountered_congestion) congestion_wait(WRITE, HZ/10); --fUYQa+Pmc3FrFX/N-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/