Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S262200AbUKQECt (ORCPT ); Tue, 16 Nov 2004 23:02:49 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S262204AbUKQECt (ORCPT ); Tue, 16 Nov 2004 23:02:49 -0500 Received: from fw.osdl.org ([65.172.181.6]:10206 "EHLO mail.osdl.org") by vger.kernel.org with ESMTP id S262200AbUKQECN (ORCPT ); Tue, 16 Nov 2004 23:02:13 -0500 Date: Tue, 16 Nov 2004 20:01:56 -0800 From: Andrew Morton To: Brad Fitzpatrick Cc: linux-kernel@vger.kernel.org, lisa@danga.com, marksmith@danga.com, linux-xfs@oss.sgi.com Subject: Re: 2.6.9: unkillable processes during heavy IO Message-Id: <20041116200156.2b2526e5.akpm@osdl.org> In-Reply-To: References: X-Mailer: Sylpheed version 0.9.7 (GTK+ 1.2.10; i386-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 46622 Lines: 696 Brad Fitzpatrick wrote: > > Here is the information requested regarding the regular lockup we're > seeing (details in original post at bottom) yup, that looks like a locking tangle in the XFS direct-io code. Nathan, this is 2.6.9. > # killall -9 mysqld > # killall -9 mysqld > # killall -9 mysqld > > PID WIDE-WCHAN-COLUMN COMMAND > 1 - init [2] > 2 migration_thread [migration/0] > 3 ksoftirqd [ksoftirqd/0] > 4 migration_thread [migration/1] > 5 ksoftirqd [ksoftirqd/1] > 6 worker_thread [events/0] > 8 worker_thread \_ [khelper] > 9 worker_thread \_ [kacpid] > 39 worker_thread \_ [kblockd/0] > 40 worker_thread \_ [kblockd/1] > 52 pdflush \_ [pdflush] > 53 pdflush \_ [pdflush] > 56 worker_thread \_ [aio/0] > 57 worker_thread \_ [aio/1] > 7 worker_thread [events/1] > 1154 worker_thread \_ [xfslogd/0] > 1155 worker_thread \_ [xfslogd/1] > 1156 worker_thread \_ [xfsdatad/0] > 1157 worker_thread \_ [xfsdatad/1] > 41 hub_thread [khubd] > 54 kswapd [kswapd1] > 55 kswapd [kswapd0] > 639 serio_thread [kseriod] > 726 - [scsi_eh_0] > 733 - [scsi_eh_1] > 1158 - [xfsbufd] > 1159 1109149712136 [xfssyncd] > 1160 1109149728520 [xfssyncd] > 1161 kjournald [kjournald] > 1162 1109157093128 [xfssyncd] > 1163 1109157166856 [xfssyncd] > 1164 1109172895496 [xfssyncd] > 2931 - /sbin/syslogd > 2934 syslog /sbin/klogd > 2942 - /usr/sbin/inetd > 3035 - /usr/lib/postfix/master > 3040 - \_ qmgr -l -t fifo -u -c > 8534 - \_ pickup -l -t fifo -u -c > 3045 - /usr/sbin/snmpd -Lsd -Lf /dev/null -p /var/run/snmpd.pid > 3051 - /usr/sbin/sshd > 6952 - \_ sshd: root@pts/1 > 6954 - | \_ -bash > 8569 - \_ sshd: root@pts/0 > 8571 wait \_ -bash > 8671 - \_ ps afx -o pid,wchan=WIDE-WCHAN-COLUMN,command > 3057 - /usr/sbin/atd > 3060 - /usr/sbin/cron > 3074 - /sbin/getty 38400 tty1 > 3076 - /sbin/getty 38400 tty2 > 3077 - /sbin/getty 38400 tty3 > 3078 - /sbin/getty 38400 tty4 > 3079 - /sbin/getty 38400 tty5 > 3080 - /sbin/getty 38400 tty6 > 5390 - /usr/local/mysql/bin/mysqld --defaults-extra-file=/usr/local/mysql/data/my.cnf --basedir=/usr/local/mysql/ --datadir=/data/mydb --user=root --pid-file=/var/run/mysqld/mysqld.pid --skip-locking --port=3306 --socket=/var/run/mysqld/mysqld.sock > 7663 - /usr/local/mysql/bin/mysqld --defaults-extra-file=/usr/local/mysql/data/my.cnf --basedir=/usr/local/mysql/ --datadir=/data/mydb --user=root --pid-file=/var/run/mysqld/mysqld.pid --skip-locking --port=3306 --socket=/var/run/mysqld/mysqld.sock > 8657 - /usr/local/mysql/bin/mysqld --defaults-extra-file=/usr/local/mysql/data/my.cnf --basedir=/usr/local/mysql/ --datadir=/data/mydb --user=root --pid-file=/var/run/mysqld/mysqld.pid --skip-locking --port=3306 --socket=/var/run/mysqld/mysqld.sock > 8658 exit \_ [mysqld] > > > And sysrq t: > > SysRq : Show State > > sibling > task PC pid father child younger older > init S 0000000000000400 0 1 0 2 (NOTLB) > 00000101450bdd88 0000000000000006 0000000000000256 0000010140000700 > 0000010140021480 0000000000000000 0000000000000007 ffffffff8015ae20 > 0000000000000216 0000000100000246 > Call Trace:{buffered_rmqueue+432} {__mod_timer+303} > {schedule_timeout+173} {process_timeout+0} > {do_select+819} {__pollwait+0} > {sys_select+960} {system_call+126} > > migration/0 S 000001000c004720 0 2 1 3 (L-TLB) > 000001000c03bee8 0000000000000046 0000010143860760 0000000000000012 > 0000007300000002 ffffffff801335c0 0000010211120810 0000000000000073 > 000001000c004740 0000000000000001 > Call Trace:{try_to_wake_up+528} {migration_thread+202} > {migration_thread+0} {kthread+146} > {child_rip+8} {migration_thread+0} > {kthread+0} {child_rip+0} > > ksoftirqd/0 S 0000000000000000 0 3 1 4 2 (L-TLB) > 000001023ff93f08 0000000000000046 0000000000000009 ffffffff804079e0 > 000000000000000a 0000000000000212 0000000000000212 0000000000000000 > 000001000c006940 00000000802f4e80 > Call Trace:{__do_softirq+113} {ksoftirqd+0} > {ksoftirqd+63} {kthread+146} > {child_rip+8} {ksoftirqd+0} > {kthread+0} {child_rip+0} > > migration/1 S 00000101438619a0 0 4 1 5 3 (L-TLB) > 000001023ff95ee8 0000000000000046 000001000c0034e0 0000000000000016 > 0000007300000002 ffffffff801335c0 0000010211120810 0000000000000073 > 00000101438619c0 0000000100000001 > Call Trace:{try_to_wake_up+528} {migration_thread+202} > {migration_thread+0} {kthread+146} > {child_rip+8} {migration_thread+0} > {kthread+0} {child_rip+0} > > ksoftirqd/1 S 0000000000000000 0 5 1 6 4 (L-TLB) > 000001023ffa7f08 0000000000000046 0000000000000212 0000000000000000 > 0000000000000000 ffffffff8024cc49 0000000000000000 0000010143863bc0 > 0000000000000006 000000018025c186 > Call Trace:{dst_destroy+169} {__do_softirq+113} > {ksoftirqd+0} {ksoftirqd+63} > {kthread+146} {child_rip+8} > {ksoftirqd+0} {kthread+0} > {child_rip+0} > events/0 S ffffffff801607e0 0 6 1 8 7 5 (L-TLB) > 000001000c08be58 0000000000000046 000001000c08bd88 0000000000000256 > 0000007380396e68 0000000000000256 0000000000000212 00000100fbf622c0 > 00000100fbf622c0 0000000000000212 > Call Trace:{__wake_up+67} {cache_reap+0} > {worker_thread+270} {default_wake_function+0} > {default_wake_function+0} {worker_thread+0} > {kthread+146} {child_rip+8} > {worker_thread+0} {kthread+0} > {child_rip+0} > events/1 S ffffffff801607e0 0 7 1 1154 41 6 (L-TLB) > 000001023fe4de58 0000000000000046 000001013fffee80 ffffffff8015eda7 > 0000007700000000 0000000000000012 0000000000000012 000001013fffee80 > 0000000000000003 0000000100000212 > Call Trace:{slab_destroy+151} {__wake_up+67} > {cache_reap+0} {worker_thread+270} > {default_wake_function+0} {default_wake_function+0} > {worker_thread+0} {kthread+146} > {child_rip+8} {worker_thread+0} > {kthread+0} {child_rip+0} > > khelper S 000001023fe26828 0 8 6 9 (L-TLB) > 000001000c08fe58 0000000000000046 00000100ad17dd88 0000000000000212 > 0000006a80149390 ffffffff80112446 0000010088e3e810 000000000000006a > 000001000c004740 000000003fe26800 > Call Trace:{kernel_thread+130} {__wake_up+67} > {child_rip+0} {__call_usermodehelper+0} > {worker_thread+270} {default_wake_function+0} > {default_wake_function+0} {keventd_create_kthread+0} > {worker_thread+0} {keventd_create_kthread+0} > {kthread+146} {child_rip+8} > {keventd_create_kthread+0} {worker_thread+0} > {kthread+0} {child_rip+0} > > kacpid S ffffffff8014de80 0 9 6 39 8 (L-TLB) > 000001014515fe58 0000000000000046 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000100000000 > Call Trace:{keventd_create_kthread+0} {worker_thread+270} > {default_wake_function+0} {default_wake_function+0} > {keventd_create_kthread+0} {worker_thread+0} > {keventd_create_kthread+0} {kthread+146} > {child_rip+8} {keventd_create_kthread+0} > {worker_thread+0} {kthread+0} > {child_rip+0} > kblockd/0 S ffffffff80225400 0 39 6 40 9 (L-TLB) > 000001013fe51e58 0000000000000046 000000000000007a ffffffffa00355b2 > 0000000000000212 0000000000000000 000000004520e800 0000010140009400 > 0000000000000000 0000000045281b40 > Call Trace:{:mptscsih:mptscsih_qcmd+770} {__wake_up+67} > {as_work_handler+0} {worker_thread+270} > {default_wake_function+0} {default_wake_function+0} > {keventd_create_kthread+0} {worker_thread+0} > {keventd_create_kthread+0} {kthread+146} > {child_rip+8} {keventd_create_kthread+0} > {worker_thread+0} {kthread+0} > {child_rip+0} > kblockd/1 S ffffffff8021e180 0 40 6 52 39 (L-TLB) > 0000010145193e58 0000000000000046 0000000000000216 0000000000000000 > 000000004520e800 0000010140009400 0000000000000000 0000010145281b40 > 0000010145216000 0000000180224858 > Call Trace:{__wake_up+67} {blk_unplug_work+0} > {worker_thread+270} {default_wake_function+0} > {default_wake_function+0} {keventd_create_kthread+0} > {worker_thread+0} {keventd_create_kthread+0} > {kthread+146} {child_rip+8} > {keventd_create_kthread+0} {worker_thread+0} > {kthread+0} {child_rip+0} > > khubd S 0000000000000000 0 41 1 54 7 (L-TLB) > 00000101451a5ec8 0000000000000046 00000300801c1ad0 0000000000000001 > 000001013ff78400 0000000000000300 000001023fe30600 0000000000000000 > 000001013ff78400 000000018022bad6 > Call Trace:{hub_events+56} {hub_thread+206} > {autoremove_wake_function+0} {finish_task_switch+64} > {autoremove_wake_function+0} {schedule_tail+17} > {child_rip+8} {hub_thread+0} > {child_rip+0} > pdflush S ffffffff8014de80 0 52 6 53 40 (L-TLB) > 000001013fe83eb8 0000000000000046 0000000000000000 000000002faef350 > 0000006c00000000 0000000000ff347f 000001023fe25810 000000000000006c > 000001000c004740 0000000000ff347f > Call Trace:{try_to_wake_up+528} {task_rq_lock+74} > {pdflush+0} {keventd_create_kthread+0} > {__pdflush+170} {pdflush+28} > {pdflush+0} {kthread+146} > {child_rip+8} {keventd_create_kthread+0} > {pdflush+0} {kthread+0} > {child_rip+0} > pdflush S ffffffff8014de80 0 53 6 56 52 (L-TLB) > 000001013fe85eb8 0000000000000046 0000000000000216 ffffffff801417ef > 00000000fffffffc 00000000000011bc 000001013fe85e48 000000010794cb83 > 00000000fffffffc 0000000100000212 > Call Trace:{__mod_timer+303} {pdflush+0} > {keventd_create_kthread+0} {__pdflush+170} > {pdflush+28} {kthread+146} > {child_rip+8} {keventd_create_kthread+0} > {pdflush+0} {kthread+0} > {child_rip+0} > aio/0 S 000001013fe4a028 0 56 6 57 53 (L-TLB) > 000001013fe89e58 0000000000000046 0000000000000000 0000000000000000 > 0000006c00000000 0000000000000000 000001014518d810 000000000000006c > 000001000c004740 0000000000000000 > Call Trace:{keventd_create_kthread+0} {worker_thread+270} > {default_wake_function+0} {default_wake_function+0} > {keventd_create_kthread+0} {worker_thread+0} > {keventd_create_kthread+0} {kthread+146} > {child_rip+8} {keventd_create_kthread+0} > {worker_thread+0} {kthread+0} > {child_rip+0} > kswapd1 S 0000010140001798 0 54 1 55 41 (L-TLB) > 000001013fe87eb8 0000000000000046 ffffffff802f5f48 ffffffff80162255 > 000000790000000c 0000000000000001 00000101cb7ef7d0 0000000000000079 > 00000101438619c0 00000001000f5995 > Call Trace:{shrink_slab+341} {balance_pgdat+2} > {kswapd+261} {autoremove_wake_function+0} > {finish_task_switch+64} {autoremove_wake_function+0} > {schedule_tail+17} {child_rip+8} > {kswapd+0} {child_rip+0} > > kswapd0 S 0000000000000000 0 55 1 639 54 (L-TLB) > 00000101451b9eb8 0000000000000046 ffffffff802f5f48 ffffffff80162255 > 0000008600000004 0000000000000000 000001014000d070 0000000000000086 > 000001000c004740 0000000000000b40 > Call Trace:{shrink_slab+341} {kswapd+261} > {autoremove_wake_function+0} {finish_task_switch+64} > {autoremove_wake_function+0} {schedule_tail+17} > {child_rip+8} {kswapd+0} > {child_rip+0} > aio/1 S ffffffff8014de80 0 57 6 56 (L-TLB) > 000001013fe8be58 0000000000000046 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000100000000 > Call Trace:{keventd_create_kthread+0} {worker_thread+270} > {default_wake_function+0} {default_wake_function+0} > {worker_thread+0} {keventd_create_kthread+0} > {kthread+146} {child_rip+8} > {keventd_create_kthread+0} {worker_thread+0} > {kthread+0} {child_rip+0} > > kseriod S 00000100fbc3df08 0 639 1 726 55 (L-TLB) > 00000100fbc3dec8 0000000000000046 0000003000000008 00000100fbc3ded8 > 00000077fbc3de18 ffffffff8013311c 00000100fbd177d0 0000000000000077 > 000001000c004740 00000000000927c0 > Call Trace:{deactivate_task+44} {serio_thread+246} > {autoremove_wake_function+0} {finish_task_switch+64} > {autoremove_wake_function+0} {schedule_tail+17} > {child_rip+8} {serio_thread+0} > {child_rip+0} > scsi_eh_0 S 000001014527ff18 0 726 1 733 639 (L-TLB) > 000001014527fe48 0000000000000046 0000000000000000 0000000000000000 > 0000007500000000 00000100fbc44baa 000001000565c810 0000000000000075 > 00000101438619c0 0000000100000000 > Call Trace:{__down_interruptible+209} {default_wake_function+0} > {__down_failed_interruptible+53} > {:scsi_mod:.text.lock.scsi_error+45} > {child_rip+8} {:scsi_mod:scsi_error_handler+0} > {child_rip+0} > scsi_eh_1 S 000001013fd47f18 0 733 1 1158 726 (L-TLB) > 000001013fd47e48 0000000000000046 0000000000000000 0000000000000000 > 0000007700000000 000001000565c44a 000001000565c810 0000000000000077 > 000001000c004740 0000000000000000 > Call Trace:{__down_interruptible+209} {default_wake_function+0} > {__down_failed_interruptible+53} > {:scsi_mod:.text.lock.scsi_error+45} > {child_rip+8} {:scsi_mod:scsi_error_handler+0} > {child_rip+0} > xfslogd/0 S ffffffffa013c820 0 1154 7 1155 (L-TLB) > 00000100fbc41e58 0000000000000046 000001023e911740 ffffffffa01232e0 > 000001013dcc1090 000001023ee5a458 000001023e90c800 000001023e547040 > 0000000000000212 00000000801c574e > Call Trace:{:xfs:xfs_log_move_tail+80} {__wake_up+67} > {:xfs:pagebuf_iodone_work+0} {:xfs:pagebuf_iodone_work+0} > {worker_thread+270} {default_wake_function+0} > {default_wake_function+0} {keventd_create_kthread+0} > {worker_thread+0} {keventd_create_kthread+0} > {kthread+146} {child_rip+8} > {keventd_create_kthread+0} {worker_thread+0} > {kthread+0} {child_rip+0} > > xfslogd/1 S ffffffffa013c820 0 1155 7 1156 1154 (L-TLB) > 000001023e915e58 0000000000000046 000001023e911b00 ffffffffa01232e0 > 000001023de55050 000001023ee5a668 000001023e90c880 000001023e547580 > 0000000000000212 00000001801c574e > Call Trace:{:xfs:xfs_log_move_tail+80} {__wake_up+67} > {:xfs:pagebuf_iodone_work+0} {:xfs:pagebuf_iodone_work+0} > {worker_thread+270} {default_wake_function+0} > {default_wake_function+0} {keventd_create_kthread+0} > {worker_thread+0} {keventd_create_kthread+0} > {kthread+146} {child_rip+8} > {keventd_create_kthread+0} {worker_thread+0} > {kthread+0} {child_rip+0} > > xfsdatad/0 S ffffffff8014de80 0 1156 7 1157 1155 (L-TLB) > 000001013ff61e58 0000000000000046 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 > Call Trace:{keventd_create_kthread+0} {worker_thread+270} > {default_wake_function+0} {default_wake_function+0} > {keventd_create_kthread+0} {worker_thread+0} > {keventd_create_kthread+0} {kthread+146} > {child_rip+8} {keventd_create_kthread+0} > {worker_thread+0} {kthread+0} > {child_rip+0} > xfsdatad/1 S 000001023e90c0a8 0 1157 7 1156 (L-TLB) > 000001000c3f1e58 0000000000000046 0000000000000010 0000000000000000 > 0000006d00000008 0000000000000000 000001023f64a7d0 000000000000006d > 00000101438619c0 0000000100002620 > Call Trace:{keventd_create_kthread+0} {worker_thread+270} > {default_wake_function+0} {default_wake_function+0} > {keventd_create_kthread+0} {worker_thread+0} > {keventd_create_kthread+0} {kthread+146} > {child_rip+8} {keventd_create_kthread+0} > {worker_thread+0} {kthread+0} > {child_rip+0} > xfsbufd S 0000000000508ae0 0 1158 1 1159 733 (L-TLB) > 0000010005645e98 0000000000000046 000001013ff8ca80 0000010145216330 > 000001014000ae00 0000000000000000 00000000000000dd ffffffffa00355b2 > 0000000000000212 0000000100000000 > Call Trace:{:mptscsih:mptscsih_qcmd+770} {__mod_timer+303} > {schedule_timeout+173} {process_timeout+0} > {:xfs:pagebuf_daemon+148} {child_rip+8} > {:xfs:pagebuf_daemon+0} {child_rip+0} > > xfssyncd S 0000000000007530 0 1159 1 1160 1158 (L-TLB) > 000001023e797e88 0000000000000046 0000010140009e50 ffffffffffffff00 > 000000748029cc23 0000000000000010 000001023d921070 0000000000000074 > 00000101438619c0 00000001a0125b3e > Call Trace:{__mod_timer+303} {schedule_timeout+173} > {process_timeout+0} {:xfs:xfssyncd+142} > {:xfs:linvfs_fill_super+0} {child_rip+8} > {:xfs:linvfs_fill_super+0} {:xfs:xfssyncd+0} > {child_rip+0} > xfssyncd S 0000000000007530 0 1160 1 1161 1159 (L-TLB) > 000001023e79be88 0000000000000046 000000003ffe3400 000001023ffe3400 > 000001013fcb1200 000001023ffe3450 0000000000000002 00000ad900000af6 > 0000000000000002 00000001a0125b3e > Call Trace:{__mod_timer+303} {schedule_timeout+173} > {process_timeout+0} {:xfs:xfssyncd+142} > {:xfs:linvfs_fill_super+0} {child_rip+8} > {:xfs:linvfs_fill_super+0} {:xfs:xfssyncd+0} > {child_rip+0} > kjournald S 000001023ffe4e98 0 1161 1 1162 1160 (L-TLB) > 000001023f6e9e58 0000000000000046 00000101bab75a28 00000101bab75a80 > 00000101bab75be0 00000101bab75f50 00000101bab758c8 00000101bab75b88 > 00000101bab75d98 00000001bab756b8 > Call Trace:{__wake_up+67} {:jbd:kjournald+532} > {autoremove_wake_function+0} {autoremove_wake_function+0} > {:jbd:commit_timeout+0} {child_rip+8} > {:jbd:kjournald+0} {child_rip+0} > > xfssyncd S 0000000000007530 0 1162 1 1163 1161 (L-TLB) > 000001023eea1e88 0000000000000046 000001023ffe4c50 ffffffffffffff00 > ffffffffa01259ca 0000000000000010 0000000000000246 000001023eea1df8 > 0000000000000018 00000001a0125b3e > Call Trace:{:xfs:xlog_state_sync_all+90} {__mod_timer+303} > {schedule_timeout+173} {process_timeout+0} > {:xfs:xfssyncd+142} {:xfs:linvfs_fill_super+0} > {child_rip+8} {:xfs:linvfs_fill_super+0} > {:xfs:xfssyncd+0} {child_rip+0} > > xfssyncd S 0000000000007530 0 1163 1 1164 1162 (L-TLB) > 000001023eeb3e88 0000000000000046 000000003ffe4a00 000001023ffe4a00 > 000001023e911d40 000001023ffe4a50 0000000000000002 0000000e000041da > 0000000000000002 00000000a0125b3e > Call Trace:{__mod_timer+303} {schedule_timeout+173} > {process_timeout+0} {:xfs:xfssyncd+142} > {:xfs:linvfs_fill_super+0} {child_rip+8} > {:xfs:linvfs_fill_super+0} {:xfs:xfssyncd+0} > {child_rip+0} > xfssyncd S 0000000000007530 0 1164 1 2931 1163 (L-TLB) > 000001023fdb3e88 0000000000000046 000000003ffe4800 000001023ffe4800 > 000001023e911b00 000001023ffe4850 0000000000000002 00000008000068f5 > 0000000000000002 00000001a0125b3e > Call Trace:{__mod_timer+303} {schedule_timeout+173} > {process_timeout+0} {:xfs:xfssyncd+142} > {:xfs:linvfs_fill_super+0} {child_rip+8} > {:xfs:linvfs_fill_super+0} {:xfs:xfssyncd+0} > {child_rip+0} > syslogd S 000001013ff51c80 0 2931 1 2934 1164 (NOTLB) > 000001013fa61d88 0000000000000006 0000000100000000 000001023fd5bb50 > 000000743e7d0468 0000000000000067 000001000c3a7030 0000000000000074 > 000001000c004740 0000000000000246 > Call Trace:{__alloc_pages+831} {schedule_timeout+37} > {add_wait_queue+28} {datagram_poll+21} > {do_select+819} {__pollwait+0} > {sys_select+960} {system_call+126} > > klogd R running task 0 2934 1 2942 2931 (NOTLB) > inetd S 000001023ef77180 0 2942 1 3035 2934 (NOTLB) > 000001023eb73d88 0000000000000002 0000000000000000 00000000000004f4 > 0000007600000007 ffffffff8015ae20 000001023f38c7d0 0000000000000076 > 00000101438619c0 0000000140000700 > Call Trace:{buffered_rmqueue+432} {schedule_timeout+37} > {add_wait_queue+28} {tcp_poll+33} > {do_select+819} {__pollwait+0} > {sys_select+960} {system_call+126} > > master S 0000000000036db6 0 3035 1 3040 3045 2942 (NOTLB) > 000001013fa75d88 0000000000000006 0000000000000256 000001013ff84a18 > 000001013ff84bf0 0000000100000000 0000000000000007 ffffffff8015ae20 > 0000000800000001 0000000000000246 > Call Trace:{buffered_rmqueue+432} {__mod_timer+303} > {schedule_timeout+173} {process_timeout+0} > {do_select+819} {__pollwait+0} > {__wake_up+67} {sys_select+960} > {system_call+126} > qmgr S 0000000000000060 0 3040 3035 8534 (NOTLB) > 000001023e6bbd88 0000000000000002 0000000000000216 ffffffff802425b4 > 000000743e8a95ec 000001023e8a95ec 00000100fbc0f030 0000000000000074 > 000001000c004740 0000000000000246 > Call Trace:{sock_def_readable+68} {__mod_timer+303} > {schedule_timeout+173} {process_timeout+0} > {do_select+819} {__pollwait+0} > {sys_select+960} {system_call+126} > > snmpd S 00000000000001a8 0 3045 1 3051 3035 (NOTLB) > 000001023e3c1d88 0000000000000006 00000000419a0429 000001023e3c1d78 > 000000743e64aa40 000001023e3c1ec8 000001013ff837d0 0000000000000074 > 00000101438619c0 0000000100000246 > Call Trace:{schedule_timeout+37} {add_wait_queue+28} > {fget+91} {do_select+819} > {__pollwait+0} {sys_select+960} > {system_call+126} > sshd S 000001013f805ac0 0 3051 1 6952 3057 3045 (NOTLB) > 000001023df63d88 0000000000000002 0000000000000256 0000010140000700 > 0000007400000007 ffffffff8015ae20 0000010060518070 0000000000000074 > 00000101438619c0 0000000140000700 > Call Trace:{buffered_rmqueue+432} {schedule_timeout+37} > {add_wait_queue+28} {tcp_poll+33} > {do_select+819} {__pollwait+0} > {clear_inode+9} {sys_select+960} > {system_call+126} > atd S 0000000000000000 0 3057 1 3060 3051 (NOTLB) > 000001023e6d7ee8 0000000000000006 0000000000000216 000001023de5d138 > 000001023e6d7ef8 ffffffff8018fb11 000001023e6d7e68 000001023e6d7ef8 > 0000000000000000 0000000180181ce9 > Call Trace:{dput+33} {__mod_timer+303} > {schedule_timeout+173} {process_timeout+0} > {sys_nanosleep+193} {system_call+126} > > cron S 0000000000000000 0 3060 1 3074 3057 (NOTLB) > 000001023deb7ee8 0000000000000002 0000000000000216 00000101e0d4b3f0 > 000001023deb7ef8 ffffffff8018fb97 000001023deb7e68 000001023deb7ef8 > 0000007fbffffc70 0000000180181ce9 > Call Trace:{dput+167} {__mod_timer+303} > {schedule_timeout+173} {process_timeout+0} > {sys_nanosleep+193} {system_call+126} > > getty S 000001013fa2cd28 0 3074 1 3076 3060 (NOTLB) > 000001023e59dd78 0000000000000006 000000000000001c ffffffff801575ea > 0000000000000019 000000000000001b 0000000000000000 0000000000000000 > 000001023da29348 0000000180157610 > Call Trace:{do_generic_mapping_read+1034} {do_con_write+1657} > {schedule_timeout+37} {vgacon_cursor+242} > {add_wait_queue+28} {read_chan+926} > {default_wake_function+0} {default_wake_function+0} > {tty_ldisc_deref+117} {tty_read+256} > {vfs_read+199} {sys_read+83} > {system_call+126} > getty S 000001023e0fcd28 0 3076 1 3077 3074 (NOTLB) > 000001023d835d78 0000000000000002 000000000000001c ffffffff801575ea > 000001023e0fc000 000000000000001b 0000000000000000 0000000000000000 > 000001023da29348 0000000180157610 > Call Trace:{do_generic_mapping_read+1034} {do_con_write+1657} > {schedule_timeout+37} {add_wait_queue+28} > {read_chan+926} {default_wake_function+0} > {default_wake_function+0} {tty_ldisc_deref+117} > {tty_read+256} {vfs_read+199} > {sys_read+83} {system_call+126} > > getty S 00000100fbca2d28 0 3077 1 3078 3076 (NOTLB) > 000001023d859d78 0000000000000002 000000000000001c ffffffff801575ea > 00000074fbca2000 000000000000001b 000001023d8047d0 0000000000000074 > 00000101438619c0 0000000180157610 > Call Trace:{do_generic_mapping_read+1034} {release_console_sem+138} > {do_con_write+1657} {schedule_timeout+37} > {add_wait_queue+28} {read_chan+926} > {default_wake_function+0} {default_wake_function+0} > {tty_ldisc_deref+117} {tty_read+256} > {vfs_read+199} {sys_read+83} > {system_call+126} > getty S 000001023e7c2d28 0 3078 1 3079 3077 (NOTLB) > 000001023d86bd78 0000000000000002 000000000000001c ffffffff801575ea > 000000743e7c2000 000000000000001b 000001023d804030 0000000000000074 > 00000101438619c0 0000000180157610 > Call Trace:{do_generic_mapping_read+1034} {release_console_sem+138} > {do_con_write+1657} {schedule_timeout+37} > {add_wait_queue+28} {read_chan+926} > {default_wake_function+0} {default_wake_function+0} > {tty_ldisc_deref+117} {tty_read+256} > {vfs_read+199} {sys_read+83} > {system_call+126} > getty S 000001023df85d28 0 3079 1 3080 3078 (NOTLB) > 000001023d88dd78 0000000000000006 000000000000001c ffffffff801575ea > 000000743df85000 000000000000001b 000001023d877810 0000000000000074 > 00000101438619c0 0000000180157610 > Call Trace:{do_generic_mapping_read+1034} {release_console_sem+138} > {do_con_write+1657} {schedule_timeout+37} > {add_wait_queue+28} {read_chan+926} > {default_wake_function+0} {default_wake_function+0} > {tty_ldisc_deref+117} {tty_read+256} > {vfs_read+199} {sys_read+83} > {system_call+126} > getty S 000001023e770d28 0 3080 1 5390 3079 (NOTLB) > 000001023d8a1d78 0000000000000002 000000000000001c ffffffff801575ea > 000001023e770000 000000000000001b 0000000000000000 0000000000000000 > 000001023da29348 0000000180157610 > Call Trace:{do_generic_mapping_read+1034} {do_con_write+1657} > {schedule_timeout+37} {add_wait_queue+28} > {read_chan+926} {default_wake_function+0} > {default_wake_function+0} {tty_ldisc_deref+117} > {tty_read+256} {vfs_read+199} > {sys_read+83} {system_call+126} > > mysqld D 000001023e034d80 0 5390 1 7663 3080 (NOTLB) > 0000010211817c88 0000000000000002 0000010211817c98 000000003b98c02b > 0000007400000000 0000000000000000 000001023f7d87d0 0000000000000074 > 00000101438619c0 0000000100000001 > Call Trace:{try_to_wake_up+528} {__down_write+132} > {:xfs:xfs_ilock+44} {:xfs:xfs_write+594} > {__alloc_pages+831} {:xfs:linvfs_write+101} > {do_sync_write+173} {handle_mm_fault+260} > {__up_read+29} {thread_return+41} > {autoremove_wake_function+0} {vfs_write+199} > {sys_pwrite64+104} {system_call+126} > > sshd S 0000000000000198 0 6952 3051 6954 8569 (NOTLB) > 00000101ed1e5d88 0000000000000002 00000101ed1e5ce8 0000010191826278 > 000000000055ba80 00000000659d37d0 0000000000000007 ffffffff8015ae20 > 00000101ed1e5de8 0000000000000246 > Call Trace:{buffered_rmqueue+432} {schedule_timeout+37} > {add_wait_queue+28} {:unix:unix_poll+21} > {do_select+819} {__pollwait+0} > {sys_select+960} {system_call+126} > > bash S 00000100dbcffd28 0 6954 6952 (NOTLB) > 000001007b57dd78 0000000000000002 000000003ff0da00 00000100000129d0 > 000000740000000f 0000010000012700 00000101fd69a070 0000000000000074 > 000001000c004740 000000008015a913 > Call Trace:{free_pages_bulk+663} {schedule_timeout+37} > {add_wait_queue+28} {read_chan+926} > {default_wake_function+0} {default_wake_function+0} > {remove_wait_queue+25} {tty_read+256} > {vfs_read+199} {sys_read+83} > {system_call+126} > mysqld D 000001022b239db8 0 7663 1 8657 5390 (NOTLB) > 000001009c017ae8 0000000000000002 00000102297ae070 0000000000000001 > 000000743ff0d180 00000102297ae070 00000101a76b4030 0000000000000074 > 00000101438619c0 00000001297ae070 > Call Trace:{__down+152} {default_wake_function+0} > {__down_read+130} {__down_failed+53} > {.text.lock.direct_io+15} {__mod_timer+303} > {:xfs:linvfs_direct_IO+186} {:xfs:linvfs_get_blocks_direct+0} > {:xfs:linvfs_unwritten_convert_direct+0} > {cleanup_rbuf+242} {generic_file_direct_IO+100} > {__generic_file_aio_read+214} {__up_read+29} > {:xfs:xfs_read+447} {:xfs:linvfs_read+99} > {setup_rt_frame+143} {do_sync_read+173} > {finish_task_switch+64} {thread_return+41} > {autoremove_wake_function+0} {vfs_read+199} > {sys_pread64+104} {system_call+126} > > pickup S 0000000000000060 0 8534 3035 3040 (NOTLB) > 00000100b49e5d88 0000000000000002 0000000000000216 ffffffff802425b4 > 000001013ff84c2c 000001013ff84c2c 0000000000000007 ffffffff8015ae20 > 0000000800000007 0000000100000246 > Call Trace:{sock_def_readable+68} {buffered_rmqueue+432} > {__mod_timer+303} {schedule_timeout+173} > {process_timeout+0} {do_select+819} > {__pollwait+0} {sys_select+960} > {system_call+126} > sshd S 0000000000000198 0 8569 3051 8571 6952 (NOTLB) > 000001015cffbd88 0000000000000006 000001015cffbce8 0000010005848778 > 0000007300573390 0000000011120810 0000010211120810 0000000000000073 > 00000101438619c0 0000000100000246 > Call Trace:{schedule_timeout+37} {add_wait_queue+28} > {:unix:unix_poll+21} {do_select+819} > {__pollwait+0} {sys_select+960} > {system_call+126} > bash R running task 0 8571 8569 (NOTLB) > mysqld D 0000000000000000 0 8657 1 8658 7663 (NOTLB) > 0000010146e07e80 0000000000000006 0000010146e07e18 0000010146e07e08 > 000001023f6c33c0 0000000000000101 000001022b22ac18 0000010146e07e18 > ffffffff8018fb11 0000000100000000 > Call Trace:{dput+33} {permission+37} > {may_open+95} {__down+152} > {default_wake_function+0} {__down_failed+53} > {.text.lock.read_write+5} {vfs_llseek+50} > {sys_lseek+82} {system_call+126} > > mysqld Z 000001005b0edf58 0 8658 8657 (L-TLB) > 000001005b0edde8 0000000000000046 0000000000000001 0000000000000010 > 0000007a0c3d7070 0000000000000010 000001007f5c4030 000000000000007a > 00000101438619c0 0000000100000001 > Call Trace:{exit_notify+2039} {do_exit+1003} > {do_group_exit+252} {get_signal_to_deliver+1082} > {sysret_signal+28} {do_signal+142} > {poll_freewait+76} {sys_poll+526} > {__pollwait+0} {ptregscall_common+103} > > > > > > > On Sun, 14 Nov 2004, Brad Fitzpatrick wrote: > > > We have two database servers which freeze up during heavy IO load. The > > machines themselves are responsible, but the mysqld processes are forever > > locked, unkillable with even kill -9. I can't restart with MySQL without > > rebooting the machines. > > > > I can reasonable rule out hardware, since this is happening in the > > same way on two identical machines. > > > > I'd like to know how I can debug this, to file a proper bug report. > > > > The hardware/software stack is: > > > > - Dual Opteron 246, SMP kernel, w/ NUMA > > - 9 GB of memory (4GB in one zone, 5GB in the other) > > - MySQL, running mostly InnoDB, but some MyISAM > > - MegaRAID raid-10 > > - device mapper > > - XFS (used as both O_DIRECT from InnoDB and regularly from MyISAM) > > > > At this point I'm going to try changing different variables on > > different machines in order to try and isolate it, but it's a slow > > process. > > > > - on raw partions, instead of device mapper > > - ext3 instead of XFS > > - not using O_DIRECT > > > > "Screenshot": > > > > roast:~# killall -9 mysqld > > roast:~# killall -9 mysqld > > roast:~# ps afx | grep mysqld > > 31495 ? D 0:08 /usr/local/mysql/bin/mysqld --defaults-extra-file=/usr/local/mysql/data/my.cnf --basedir=/usr/local/mysql/ --datadir=/data/mydb --user=root > > --pid-file=/var/run/mysqld/mysqld.pid --skip-locking --port=3306 --socket=/var/run/mysqld/mysqld.sock > > 32391 ? D 0:01 /usr/local/mysql/bin/mysqld --defaults-extra-file=/usr/local/mysql/data/my.cnf --basedir=/usr/local/mysql/ --datadir=/data/mydb --user=root > > --pid-file=/var/run/mysqld/mysqld.pid --skip-locking --port=3306 --socket=/var/run/mysqld/mysqld.sock > > 515 ? D 0:00 /usr/local/mysql/bin/mysqld --defaults-extra-file=/usr/local/mysql/data/my.cnf --basedir=/usr/local/mysql/ --datadir=/data/mydb --user=root > > --pid-file=/var/run/mysqld/mysqld.pid --skip-locking --port=3306 --socket=/var/run/mysqld/mysqld.sock > > 517 ? Z 0:00 \_ [mysqld] > > > > Next time it hangs like this, how can I get a kernel backtrace or other useful information > > for a certain process? > > > > Thanks! > > > > - Brad > > > > - > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > > > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/