Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756875Ab2JWIdK (ORCPT ); Tue, 23 Oct 2012 04:33:10 -0400 Received: from smtptls1-cha.cpub.univ-nantes.fr ([193.52.103.113]:53402 "EHLO smtp-tls.univ-nantes.fr" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1756818Ab2JWIdF (ORCPT ); Tue, 23 Oct 2012 04:33:05 -0400 X-Greylist: delayed 492 seconds by postgrey-1.27 at vger.kernel.org; Tue, 23 Oct 2012 04:33:05 EDT Message-ID: <50865453.5080708@univ-nantes.fr> Date: Tue, 23 Oct 2012 10:24:51 +0200 From: Yann Dupont User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121017 Thunderbird/16.0.1 MIME-Version: 1.0 To: xfs@oss.sgi.com CC: linux-kernel@vger.kernel.org Subject: Problems with kernel 3.6.x (vm ?) (was : Is kernel 3.6.1 or filestreams option toxic ?) References: <508554AF.5050005@univ-nantes.fr> In-Reply-To: <508554AF.5050005@univ-nantes.fr> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 24749 Lines: 411 Le 22/10/2012 16:14, Yann Dupont a ?crit : Hello. This mail is a follow up of a message on XFS mailing list. I had hang with 3.6.1, and then , damage on XFS filesystem. 3.6.1 is not alone. Tried 3.6.2, and had another hang with quite a different trace this time , so not really sure the 2 problems are related . Anyway the problem is maybe not XFS, but is just a consequence of what seems more like kernel problems. cc: to linux-kernel Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.991908] INFO: task ceph-osd:4409 blocked for more than 120 seconds. Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.991954] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.991999] ceph-osd D ffff88084c049030 0 4409 1 0x00000000 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992003] ffff88084c048d60 0000000000000086 ffff880a1421de78 ffff880a17caa820 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992054] ffff880a1421dfd8 ffff880a1421dfd8 ffff880a1421dfd8 ffff88084c048d60 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992105] 0000000003373001 ffff88084c048d60 ffff88051775cb20 ffffffffffffffff Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992156] Call Trace: Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992184] [] ? rwsem_down_failed_common+0xbd/0x150 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992215] [] ? call_rwsem_down_write_failed+0x13/0x20 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992248] [] ? cap_mmap_addr+0x50/0x50 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992275] [] ? down_write+0x1c/0x1d Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992303] [] ? vm_mmap_pgoff+0x64/0xb0 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992331] [] ? sys_mmap_pgoff+0x5c/0x190 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992360] [] ? do_sys_open+0x161/0x1e0 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992387] [] ? system_call_fastpath+0x1a/0x1f Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992423] INFO: task ceph-osd:25297 blocked for more than 120 seconds. Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992451] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992495] ceph-osd D ffff8801bce7b1a0 0 25297 1 0x00000000 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992497] ffff8801bce7aed0 0000000000000086 ffff88025d903fd8 ffff880a17cab580 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992548] ffff88025d903fd8 ffff88025d903fd8 ffff88025d903fd8 ffff8801bce7aed0 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992599] ffff8801bce7aed0 ffff8801bce7aed0 ffff88051775cb20 ffffffffffffffff Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992650] Call Trace: Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992673] [] ? rwsem_down_failed_common+0xbd/0x150 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992702] [] ? call_rwsem_down_read_failed+0x14/0x30 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992732] [] ? down_read+0xe/0x10 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992759] [] ? do_page_fault+0x16c/0x460 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992787] [] ? release_sock+0xd2/0x150 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992815] [] ? inet_stream_connect+0x4b/0x70 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992844] [] ? sys_connect+0xa5/0xe0 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992871] [] ? fd_install+0x33/0x70 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992898] [] ? page_fault+0x25/0x30 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992925] INFO: task ceph-osd:32469 blocked for more than 120 seconds. Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992953] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992996] ceph-osd D ffff880556237b30 0 32469 1 0x00000000 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.992999] ffff880556237860 0000000000000086 ffff88059fe5dfd8 ffff880a17c742e0 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.993050] ffff88059fe5dfd8 ffff88059fe5dfd8 ffff88059fe5dfd8 ffff880556237860 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.993101] ffff880556237860 ffff880556237860 ffff88051775cb20 ffffffffffffffff Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.993153] Call Trace: Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.993175] [] ? rwsem_down_failed_common+0xbd/0x150 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.993204] [] ? call_rwsem_down_read_failed+0x14/0x30 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.993233] [] ? down_read+0xe/0x10 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.993259] [] ? do_page_fault+0x16c/0x460 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.993286] [] ? release_sock+0xd2/0x150 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.993314] [] ? inet_stream_connect+0x4b/0x70 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.993342] [] ? sys_connect+0xa5/0xe0 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.994484] [] ? fd_install+0x33/0x70 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.994510] [] ? page_fault+0x25/0x30 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.994538] INFO: task ceph-osd:9660 blocked for more than 120 seconds. Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.994566] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.994609] ceph-osd D ffff8801659f82d0 0 9660 1 0x00000000 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.994612] ffff8801659f8000 0000000000000086 ffff88010f6bdfd8 ffff88084f0c9ac0 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.994662] ffff88010f6bdfd8 ffff88010f6bdfd8 ffff88010f6bdfd8 ffff8801659f8000 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.994713] ffff8801659f8000 ffff8801659f8000 ffff88051775cb20 ffffffffffffffff Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.994764] Call Trace: Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.994786] [] ? rwsem_down_failed_common+0xbd/0x150 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.994815] [] ? call_rwsem_down_read_failed+0x14/0x30 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.994844] [] ? down_read+0xe/0x10 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.994870] [] ? do_page_fault+0x16c/0x460 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.994898] [] ? release_sock+0xd2/0x150 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.994925] [] ? inet_stream_connect+0x4b/0x70 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.994953] [] ? sys_connect+0xa5/0xe0 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.994980] [] ? fd_install+0x33/0x70 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.995006] [] ? page_fault+0x25/0x30 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.995037] INFO: task grep:7014 blocked for more than 120 seconds. Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.995064] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.995108] grep D ffff8800c3f69030 0 7014 7011 0x00000000 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.995110] ffff8800c3f68d60 0000000000000082 0000000000000000 ffff880a17ca9410 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.995161] ffff88002dd2ffd8 ffff88002dd2ffd8 ffff88002dd2ffd8 ffff8800c3f68d60 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.995212] 0000000000000000 ffff8800c3f68d60 ffff88051775cb20 ffffffffffffffff Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.995264] Call Trace: Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.995286] [] ? rwsem_down_failed_common+0xbd/0x150 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.995428] [] ? proc_pid_cmdline+0xa5/0x130 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.995456] [] ? proc_info_read+0xb0/0x110 Oct 22 20:54:29 braeval.u14.univ-nantes.prive kernel: [629576.995484] [] ? vfs_read+0xa4/0x180 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.943923] INFO: task ceph-osd:4409 blocked for more than 120 seconds. Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.943954] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.943999] ceph-osd D ffff88084c049030 0 4409 1 0x00000000 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944003] ffff88084c048d60 0000000000000086 ffff880a1421de78 ffff880a17caa820 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944055] ffff880a1421dfd8 ffff880a1421dfd8 ffff880a1421dfd8 ffff88084c048d60 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944106] 0000000003373001 ffff88084c048d60 ffff88051775cb20 ffffffffffffffff Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944157] Call Trace: Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944185] [] ? rwsem_down_failed_common+0xbd/0x150 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944216] [] ? call_rwsem_down_write_failed+0x13/0x20 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944248] [] ? cap_mmap_addr+0x50/0x50 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944275] [] ? down_write+0x1c/0x1d Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944303] [] ? vm_mmap_pgoff+0x64/0xb0 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944330] [] ? sys_mmap_pgoff+0x5c/0x190 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944358] [] ? do_sys_open+0x161/0x1e0 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944386] [] ? system_call_fastpath+0x1a/0x1f Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944423] INFO: task ceph-osd:25297 blocked for more than 120 seconds. Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944451] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944494] ceph-osd D ffff8801bce7b1a0 0 25297 1 0x00000000 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944496] ffff8801bce7aed0 0000000000000086 ffff88025d903fd8 ffff880a17cab580 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944548] ffff88025d903fd8 ffff88025d903fd8 ffff88025d903fd8 ffff8801bce7aed0 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944599] ffff8801bce7aed0 ffff8801bce7aed0 ffff88051775cb20 ffffffffffffffff Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944650] Call Trace: Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944673] [] ? rwsem_down_failed_common+0xbd/0x150 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944702] [] ? call_rwsem_down_read_failed+0x14/0x30 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944731] [] ? down_read+0xe/0x10 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944758] [] ? do_page_fault+0x16c/0x460 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944786] [] ? release_sock+0xd2/0x150 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944814] [] ? inet_stream_connect+0x4b/0x70 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944843] [] ? sys_connect+0xa5/0xe0 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944870] [] ? fd_install+0x33/0x70 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944897] [] ? page_fault+0x25/0x30 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944923] INFO: task ceph-osd:12506 blocked for more than 120 seconds. Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944951] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944994] ceph-osd D ffff8800227f7480 0 12506 1 0x00000000 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.944996] ffff8800227f71b0 0000000000000086 0000000000000000 ffff880a17cab580 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945048] ffff880468df1fd8 ffff880468df1fd8 ffff880468df1fd8 ffff8800227f71b0 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945099] 0000000000000000 ffff8800227f71b0 ffff88051775cb20 ffffffffffffffff Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945150] Call Trace: Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945172] [] ? rwsem_down_failed_common+0xbd/0x150 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945201] [] ? call_rwsem_down_read_failed+0x14/0x30 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945231] [] ? down_read+0xe/0x10 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945257] [] ? do_page_fault+0x16c/0x460 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945284] [] ? sys_recvfrom+0x107/0x150 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945311] [] ? sys_connect+0xa5/0xe0 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945339] [] ? read_tsc+0x5/0x20 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945366] [] ? ktime_get_ts+0x3f/0xe0 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945394] [] ? poll_select_set_timeout+0x64/0x80 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945422] [] ? page_fault+0x25/0x30 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945449] INFO: task ceph-osd:25459 blocked for more than 120 seconds. Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945476] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945520] ceph-osd D ffff8803fc809d90 0 25459 1 0x00000000 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945522] ffff8803fc809ac0 0000000000000086 0000000000000000 ffff880a17c74990 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945573] ffff880468e25fd8 ffff880468e25fd8 ffff880468e25fd8 ffff8803fc809ac0 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945624] 0000000000000000 ffff8803fc809ac0 ffff88051775cb20 ffffffffffffffff Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945675] Call Trace: Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945697] [] ? rwsem_down_failed_common+0xbd/0x150 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945726] [] ? call_rwsem_down_read_failed+0x14/0x30 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945755] [] ? down_read+0xe/0x10 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945781] [] ? do_page_fault+0x16c/0x460 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945808] [] ? sys_recvfrom+0x107/0x150 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945835] [] ? ktime_get_ts+0x2/0xe0 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945862] [] ? read_tsc+0x5/0x20 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945888] [] ? ktime_get_ts+0x3f/0xe0 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945914] [] ? poll_select_set_timeout+0x64/0x80 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945942] [] ? page_fault+0x25/0x30 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945969] INFO: task ceph-osd:32469 blocked for more than 120 seconds. Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.945997] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.946041] ceph-osd D ffff880556237b30 0 32469 1 0x00000000 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.946043] ffff880556237860 0000000000000086 ffff88059fe5dfd8 ffff880a17c742e0 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.946096] ffff88059fe5dfd8 ffff88059fe5dfd8 ffff88059fe5dfd8 ffff880556237860 Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.946146] ffff880556237860 ffff880556237860 ffff88051775cb20 ffffffffffffffff Oct 22 20:56:29 braeval.u14.univ-nantes.prive kernel: [629696.946198] Call Trace: Well. at least, after the hard reset, xfs volume was still good this time. Old mail (send to xfs mailing list) for reference : > Hello, > Last week, I encountered problems with xfs volumes on several > machines. Kernel hanged under heavy load, I hard to hard reset. After > reboot, xfs volume was not able to mount, and xfs_repair didn't > managed to recover the volume cleanly on 2 different machines. > > Just to relax things, It wasn't production data, so it don't matter if > I recover data or not. But more important to me is to understand why > things went wrong... > > I'm using XFS since a long time, on lots of data, it's the first time > I encounter such a problem, but I was using unusual option : > filestreams, and was using kernel 3.6.1. So I wonder if it has > something to do with the crash. > > I have nothing very conclusive in the kernel logs, apart this : > > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.569890] > INFO: task ceph-osd:17856 blocked for more than 120 seconds. > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.569941] > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.569987] > ceph-osd D ffff88056416b1a0 0 17856 1 0x00000000 > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.569993] > ffff88056416aed0 0000000000000086 ffff880590751fd8 ffff88000c67eb00 > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570047] > ffff880590751fd8 ffff880590751fd8 ffff880590751fd8 ffff88056416aed0 > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570101] > 0000000000000001 ffff88056416aed0 ffff880a15240d00 ffff880a15240d60 > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570156] > Call Trace: > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570187] > [] ? exit_mm+0x85/0x120 > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570216] > [] ? do_exit+0x154/0x8e0 > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570248] > [] ? file_update_time+0xa9/0x100 > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570278] > [] ? do_group_exit+0x38/0xa0 > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570309] > [] ? get_signal_to_deliver+0x1a6/0x5e0 > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570341] > [] ? do_signal+0x4e/0x970 > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570371] > [] ? fsnotify+0x24e/0x340 > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570402] > [] ? fpu_finit+0x15/0x30 > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570431] > [] ? restore_i387_xstate+0x64/0x1c0 > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570464] > [] ? sys_futex+0x92/0x1b0 > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570493] > [] ? do_notify_resume+0x75/0xc0 > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570525] > [] ? int_signal+0x12/0x17 > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570553] > INFO: task ceph-osd:17857 blocked for more than 120 seconds. > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570583] > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570628] > ceph-osd D ffff8801161fe720 0 17857 1 0x00000000 > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570632] > ffff8801161fe450 0000000000000086 ffffffffffffffe0 ffff880a17c73c30 > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570687] > ffff88011347ffd8 ffff88011347ffd8 ffff88011347ffd8 ffff8801161fe450 > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570740] > ffff8801161fe450 ffff8801161fe450 ffff880a15240d00 ffff880a15240d60 > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570794] > Call Trace: > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570818] > [] ? exit_mm+0x85/0x120 > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570846] > [] ? do_exit+0x154/0x8e0 > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570875] > [] ? do_group_exit+0x38/0xa0 > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570905] > [] ? get_signal_to_deliver+0x1a6/0x5e0 > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570935] > [] ? do_signal+0x4e/0x970 > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570967] > [] ? sys_sendto+0x114/0x150 > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570996] > [] ? sys_futex+0x92/0x1b0 > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.571024] > [] ? do_notify_resume+0x75/0xc0 > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.571054] > [] ? int_signal+0x12/0x17 > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.571082] > INFO: task ceph-osd:17858 blocked for more than 120 seconds. > Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.571111] > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > -- Yann Dupont - Service IRTS, DSI Universit? de Nantes Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/