Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1761586Ab3IDXfx (ORCPT ); Wed, 4 Sep 2013 19:35:53 -0400 Received: from mail-pa0-f54.google.com ([209.85.220.54]:56452 "EHLO mail-pa0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753121Ab3IDXfv (ORCPT ); Wed, 4 Sep 2013 19:35:51 -0400 MIME-Version: 1.0 In-Reply-To: <20130830211510.GA20307@kmo-pixel> References: <20130830211510.GA20307@kmo-pixel> Date: Wed, 4 Sep 2013 16:35:50 -0700 Message-ID: Subject: Re: [PATCH] bcache: Fix a shrinker deadlock From: kernel neophyte To: Kent Overstreet Cc: "linux-bcache@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Stefan Priebe , Jens Axboe Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3951 Lines: 85 On Fri, Aug 30, 2013 at 2:15 PM, Kent Overstreet wrote: > GFP_NOIO means we could be getting called recursively - mca_alloc() -> > mca_data_alloc() - definitely can't use mutex_lock(bucket_lock) then. > Whoops. > > Signed-off-by: Kent Overstreet Awesome! I tested the fix... no crashes/deadlock. But I see lower benchmark numbers for random write.. is this expected for this change ? Thanks Kent. -Suhas > --- > > On Thu, Aug 29, 2013 at 05:29:54PM -0700, kernel neophyte wrote: >> We are evaluating to use bcache on our production systems where the >> caching devices are insanely fast, in this scenario under a moderate load >> of random 4k writes.. bcache fails miserably :-( >> >> [ 3588.513638] bcache: bch_cached_dev_attach() Caching sda4 as bcache0 >> on set b082ce66-04c6-43d5-8207-ebf39840191d >> [ 4442.163661] INFO: task kworker/0:0:4 blocked for more than 120 seconds. >> [ 4442.163671] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. >> [ 4442.163678] kworker/0:0 D ffffffff81813d40 0 4 2 0x00000000 >> [ 4442.163695] Workqueue: bcache bch_data_insert_keys >> [ 4442.163699] ffff882fa6ac93c8 0000000000000046 ffff882fa6ac93e8 >> 0000000000000151 >> [ 4442.163705] ffff882fa6a84cb0 ffff882fa6ac9fd8 ffff882fa6ac9fd8 >> ffff882fa6ac9fd8 >> [ 4442.163711] ffff882fa6ad6640 ffff882fa6a84cb0 ffff882fa6a84cb0 >> ffff8822ca2c0d98 >> [ 4442.163716] Call Trace: >> [ 4442.163729] [] schedule+0x29/0x70 >> [ 4442.163735] [] schedule_preempt_disabled+0xe/0x10 >> [ 4442.163741] [] __mutex_lock_slowpath+0x112/0x1b0 >> [ 4442.163746] [] mutex_lock+0x2a/0x50 >> [ 4442.163752] [] bch_mca_shrink+0x1b5/0x2f0 >> [ 4442.163759] [] ? prune_super+0x162/0x1b0 >> [ 4442.163769] [] shrink_slab+0x154/0x300 >> [ 4442.163776] [] ? resched_task+0x68/0x70 >> [ 4442.163782] [] ? check_preempt_curr+0x75/0xa0 >> [ 4442.163788] [] ? fragmentation_index+0x19/0x70 >> [ 4442.163794] [] do_try_to_free_pages+0x20f/0x4b0 >> [ 4442.163800] [] try_to_free_pages+0xe4/0x1a0 >> [ 4442.163810] [] __alloc_pages_nodemask+0x60c/0x9b0 >> [ 4442.163818] [] alloc_pages_current+0xba/0x170 >> [ 4442.163824] [] __get_free_pages+0xe/0x40 >> [ 4442.163829] [] mca_data_alloc+0x73/0x1d0 >> [ 4442.163834] [] mca_bucket_alloc+0x14a/0x1f0 >> [ 4442.163838] [] mca_alloc+0x360/0x470 >> [ 4442.163843] [] bch_btree_node_alloc+0x8c/0x1c0 >> [ 4442.163849] [] btree_split+0x110/0x5c0 > > Ohhh, that definitely isn't supposed to happen. > > Wonder why I hadn't seen this before, looking at the backtrace it's > pretty obvious what's broken though - try this patch: > > drivers/md/bcache/btree.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c > index 60908de..55e8666 100644 > --- a/drivers/md/bcache/btree.c > +++ b/drivers/md/bcache/btree.c > @@ -617,7 +617,7 @@ static int bch_mca_shrink(struct shrinker *shrink, struct shrink_control *sc) > return mca_can_free(c) * c->btree_pages; > > /* Return -1 if we can't do anything right now */ > - if (sc->gfp_mask & __GFP_WAIT) > + if (sc->gfp_mask & __GFP_IO) > mutex_lock(&c->bucket_lock); > else if (!mutex_trylock(&c->bucket_lock)) > return -1; > -- > 1.8.4.rc3 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/