Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755723Ab3H3A34 (ORCPT ); Thu, 29 Aug 2013 20:29:56 -0400 Received: from mail-pd0-f174.google.com ([209.85.192.174]:44378 "EHLO mail-pd0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753245Ab3H3A3z (ORCPT ); Thu, 29 Aug 2013 20:29:55 -0400 MIME-Version: 1.0 Date: Thu, 29 Aug 2013 17:29:54 -0700 Message-ID: Subject: Bcache sleeps forever on random writes From: kernel neophyte To: Kent Overstreet Cc: "linux-bcache@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Stefan Priebe , Jens Axboe Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 11503 Lines: 199 We are evaluating to use bcache on our production systems where the caching devices are insanely fast, in this scenario under a moderate load of random 4k writes.. bcache fails miserably :-( [ 3588.513638] bcache: bch_cached_dev_attach() Caching sda4 as bcache0 on set b082ce66-04c6-43d5-8207-ebf39840191d [ 4442.163661] INFO: task kworker/0:0:4 blocked for more than 120 seconds. [ 4442.163671] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4442.163678] kworker/0:0 D ffffffff81813d40 0 4 2 0x00000000 [ 4442.163695] Workqueue: bcache bch_data_insert_keys [ 4442.163699] ffff882fa6ac93c8 0000000000000046 ffff882fa6ac93e8 0000000000000151 [ 4442.163705] ffff882fa6a84cb0 ffff882fa6ac9fd8 ffff882fa6ac9fd8 ffff882fa6ac9fd8 [ 4442.163711] ffff882fa6ad6640 ffff882fa6a84cb0 ffff882fa6a84cb0 ffff8822ca2c0d98 [ 4442.163716] Call Trace: [ 4442.163729] [] schedule+0x29/0x70 [ 4442.163735] [] schedule_preempt_disabled+0xe/0x10 [ 4442.163741] [] __mutex_lock_slowpath+0x112/0x1b0 [ 4442.163746] [] mutex_lock+0x2a/0x50 [ 4442.163752] [] bch_mca_shrink+0x1b5/0x2f0 [ 4442.163759] [] ? prune_super+0x162/0x1b0 [ 4442.163769] [] shrink_slab+0x154/0x300 [ 4442.163776] [] ? resched_task+0x68/0x70 [ 4442.163782] [] ? check_preempt_curr+0x75/0xa0 [ 4442.163788] [] ? fragmentation_index+0x19/0x70 [ 4442.163794] [] do_try_to_free_pages+0x20f/0x4b0 [ 4442.163800] [] try_to_free_pages+0xe4/0x1a0 [ 4442.163810] [] __alloc_pages_nodemask+0x60c/0x9b0 [ 4442.163818] [] alloc_pages_current+0xba/0x170 [ 4442.163824] [] __get_free_pages+0xe/0x40 [ 4442.163829] [] mca_data_alloc+0x73/0x1d0 [ 4442.163834] [] mca_bucket_alloc+0x14a/0x1f0 [ 4442.163838] [] mca_alloc+0x360/0x470 [ 4442.163843] [] bch_btree_node_alloc+0x8c/0x1c0 [ 4442.163849] [] btree_split+0x110/0x5c0 [ 4442.163854] [] ? bch_keylist_pop_front+0x47/0x50 [ 4442.163859] [] ? bch_btree_insert_keys+0x56/0x250 [ 4442.163867] [] ? cpumask_next_and+0x3c/0x50 [ 4442.163872] [] bch_btree_insert_node+0xb2/0x2f0 [ 4442.163877] [] btree_insert_fn+0x28/0x50 [ 4442.163881] [] bch_btree_map_nodes_recurse+0x6c/0x170 [ 4442.163886] [] ? bch_btree_insert_node+0x2f0/0x2f0 [ 4442.163891] [] ? down_write+0x16/0x40 [ 4442.163896] [] ? bch_btree_node_get+0x71/0x280 [ 4442.163901] [] bch_btree_map_nodes_recurse+0x110/0x170 [ 4442.163905] [] ? bch_btree_insert_node+0x2f0/0x2f0 [ 4442.163915] [] ? dio_bio_end_io+0x5a/0x90 [ 4442.163921] [] ? update_curr+0x141/0x1f0 [ 4442.163926] [] __bch_btree_map_nodes+0x13e/0x1c0 [ 4442.163931] [] ? bch_btree_insert_node+0x2f0/0x2f0 [ 4442.163936] [] bch_btree_insert+0xb4/0x120 [ 4442.163942] [] bch_data_insert_keys+0x3e/0x160 [ 4442.163949] [] process_one_work+0x174/0x490 [ 4442.163954] [] worker_thread+0x11b/0x370 [ 4442.163959] [] ? manage_workers.isra.23+0x2d0/0x2d0 [ 4442.163965] [] kthread+0xc0/0xd0 [ 4442.163970] [] ? flush_kthread_worker+0xb0/0xb0 [ 4442.163978] [] ret_from_fork+0x7c/0xb0 [ 4442.163982] [] ? flush_kthread_worker+0xb0/0xb0 [ 4442.163994] INFO: task kswapd0:80 blocked for more than 120 seconds. [ 4442.163998] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4442.164003] kswapd0 D 0000000000000001 0 80 2 0x00000000 [ 4442.164007] ffff882fa4b17ba8 0000000000000046 ffff882fa4b17bc8 ffff882fa60ff000 [ 4442.164013] ffff882fa593e640 ffff882fa4b17fd8 ffff882fa4b17fd8 ffff882fa4b17fd8 [ 4442.164018] ffff882f8a278000 ffff882fa593e640 ffff882fa6a84cb0 ffff8822ca2c0d98 [ 4442.164023] Call Trace: [ 4442.164029] [] schedule+0x29/0x70 [ 4442.164034] [] schedule_preempt_disabled+0xe/0x10 [ 4442.164039] [] __mutex_lock_slowpath+0x112/0x1b0 [ 4442.164044] [] mutex_lock+0x2a/0x50 [ 4442.164049] [] bch_mca_shrink+0x1b5/0x2f0 [ 4442.164054] [] ? prune_super+0x162/0x1b0 [ 4442.164059] [] shrink_slab+0x154/0x300 [ 4442.164065] [] kswapd+0x634/0x9b0 [ 4442.164071] [] ? add_wait_queue+0x60/0x60 [ 4442.164076] [] ? try_to_free_pages+0x1a0/0x1a0 [ 4442.164080] [] kthread+0xc0/0xd0 [ 4442.164085] [] ? flush_kthread_worker+0xb0/0xb0 [ 4442.164090] [] ret_from_fork+0x7c/0xb0 [ 4442.164094] [] ? flush_kthread_worker+0xb0/0xb0 [ 4442.164101] INFO: task kworker/1:1:201 blocked for more than 120 seconds. [ 4442.164105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4442.164110] kworker/1:1 D ffffffff81813a60 0 201 2 0x00000000 [ 4442.164117] Workqueue: bcache bch_data_insert_keys [ 4442.164119] ffff882f894c9be0 0000000000000046 0000000000000002 0000000000000002 [ 4442.164124] ffff882f89974cb0 ffff882f894c9fd8 ffff882f894c9fd8 ffff882f894c9fd8 [ 4442.164129] ffff882fa6ae8000 ffff882f89974cb0 0000000000000000 ffff882f89974cb0 [ 4442.164134] Call Trace: [ 4442.164140] [] schedule+0x29/0x70 [ 4442.164145] [] rwsem_down_read_failed+0x9d/0xe5 [ 4442.164152] [] call_rwsem_down_read_failed+0x14/0x30 [ 4442.164157] [] ? down_read+0x24/0x2b [ 4442.164162] [] __bch_btree_map_nodes+0xe5/0x1c0 [ 4442.164166] [] ? bch_btree_insert_node+0x2f0/0x2f0 [ 4442.164171] [] bch_btree_insert+0xb4/0x120 [ 4442.164177] [] bch_data_insert_keys+0x3e/0x160 [ 4442.164182] [] process_one_work+0x174/0x490 [ 4442.164187] [] worker_thread+0x11b/0x370 [ 4442.164192] [] ? manage_workers.isra.23+0x2d0/0x2d0 [ 4442.164196] [] kthread+0xc0/0xd0 [ 4442.164200] [] ? flush_kthread_worker+0xb0/0xb0 [ 4442.164206] [] ret_from_fork+0x7c/0xb0 [ 4442.164210] [] ? flush_kthread_worker+0xb0/0xb0 [ 4442.164215] INFO: task kworker/u64:2:377 blocked for more than 120 seconds. [ 4442.164219] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4442.164224] kworker/u64:2 D ffffffff81813a60 0 377 2 0x00000000 [ 4442.164231] Workqueue: bch_btree_io btree_node_write_work [ 4442.164233] ffff882f87cbbcc8 0000000000000046 000003e146257be1 0029002f87cbbc98 [ 4442.164238] ffff882f88053320 ffff882f87cbbfd8 ffff882f87cbbfd8 ffff882f87cbbfd8 [ 4442.164243] ffff882fa6ae9990 ffff882f88053320 ffff882f87cbbd18 ffff882f88053320 [ 4442.164249] Call Trace: [ 4442.164254] [] schedule+0x29/0x70 [ 4442.164259] [] rwsem_down_write_failed+0xf5/0x1a0 [ 4442.164264] [] ? __btree_node_write_done+0x100/0x100 [ 4442.164269] [] call_rwsem_down_write_failed+0x13/0x20 [ 4442.164274] [] ? down_write+0x31/0x40 [ 4442.164279] [] btree_node_write_work+0x2f/0x80 [ 4442.164283] [] process_one_work+0x174/0x490 [ 4442.164288] [] worker_thread+0x11b/0x370 [ 4442.164293] [] ? manage_workers.isra.23+0x2d0/0x2d0 [ 4442.164297] [] kthread+0xc0/0xd0 [ 4442.164302] [] ? flush_kthread_worker+0xb0/0xb0 [ 4442.164307] [] ret_from_fork+0x7c/0xb0 [ 4442.164311] [] ? flush_kthread_worker+0xb0/0xb0 [ 4442.164325] INFO: task bcache_allocato:2256 blocked for more than 120 seconds. [ 4442.164329] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4442.164334] bcache_allocato D 0000000000000001 0 2256 2 0x00000000 [ 4442.164337] ffff881004e3dd88 0000000000000046 ffff881004e3dda8 ffffffff810808ad [ 4442.164343] ffff882fa3d64cb0 ffff881004e3dfd8 ffff881004e3dfd8 ffff881004e3dfd8 [ 4442.164348] ffff882f89ea0000 ffff882fa3d64cb0 ffff882fa6a84cb0 ffff8822ca2c0d98 [ 4442.164353] Call Trace: [ 4442.164358] [] ? dequeue_task_fair+0x2cd/0x530 [ 4442.164363] [] schedule+0x29/0x70 [ 4442.164368] [] schedule_preempt_disabled+0xe/0x10 [ 4442.164373] [] __mutex_lock_slowpath+0x112/0x1b0 [ 4442.164378] [] mutex_lock+0x2a/0x50 [ 4442.164383] [] bch_allocator_thread+0x10f/0xe20 [ 4442.164388] [] ? bch_bucket_add_unused+0xe0/0xe0 [ 4442.164392] [] kthread+0xc0/0xd0 [ 4442.164398] [] ? end_buffer_async_read+0x130/0x130 [ 4442.164402] [] ? flush_kthread_worker+0xb0/0xb0 [ 4442.164407] [] ret_from_fork+0x7c/0xb0 [ 4442.164411] [] ? flush_kthread_worker+0xb0/0xb0 [ 4442.164417] INFO: task iozone:2565 blocked for more than 120 seconds. [ 4442.164421] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4442.164426] iozone D 0000000000000001 0 2565 1660 0x00000000 [ 4442.164429] ffff882fa3ae1978 0000000000000086 ffff882fa3ae1938 ffffffff81301d7a [ 4442.164435] ffff882f8a420000 ffff882fa3ae1fd8 ffff882fa3ae1fd8 ffff882fa3ae1fd8 [ 4442.164440] ffff882fa6a84cb0 ffff882f8a420000 ffff882fa3ae1978 ffff882fbf2139f8 [ 4442.164445] Call Trace: [ 4442.164451] [] ? generic_make_request+0xca/0x100 [ 4442.164456] [] schedule+0x29/0x70 [ 4442.164461] [] io_schedule+0x8f/0xd0 [ 4442.164467] [] do_blockdev_direct_IO+0x1a7c/0x1fb0 [ 4442.164477] [] ? ext2_get_blocks+0xa60/0xa60 [ext2] [ 4442.164484] [] __blockdev_direct_IO+0x55/0x60 [ 4442.164490] [] ? ext2_get_blocks+0xa60/0xa60 [ext2] [ 4442.164497] [] ext2_direct_IO+0x79/0xe0 [ext2] [ 4442.164502] [] ? ext2_get_blocks+0xa60/0xa60 [ext2] [ 4442.164509] [] ? current_fs_time+0x16/0x60 [ 4442.164516] [] generic_file_direct_write+0xc6/0x180 [ 4442.164521] [] __generic_file_aio_write+0x2dd/0x3b0 [ 4442.164526] [] generic_file_aio_write+0x69/0xd0 [ 4442.164532] [] do_sync_write+0x7a/0xb0 [ 4442.164537] [] vfs_write+0xce/0x1e0 [ 4442.164542] [] SyS_write+0x52/0xa0 [ 4442.164548] [] system_call_fastpath+0x16/0x1b -Neo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/