Received: by 2002:a25:b794:0:0:0:0:0 with SMTP id n20csp4378915ybh; Tue, 6 Aug 2019 10:38:05 -0700 (PDT) X-Google-Smtp-Source: APXvYqzgcQZrICjIEXLhrcPD9s35xvAnndGHGZxucpPHtedyL0l2sGF2nQEtLeH6sWX8CPvuDLcT X-Received: by 2002:a17:90a:3401:: with SMTP id o1mr4308688pjb.7.1565113084971; Tue, 06 Aug 2019 10:38:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565113084; cv=none; d=google.com; s=arc-20160816; b=qYZyoDJGYFT4PkqO4yqE7mc82Lajp+pT+U4XJp1fuXqq0gPz4FPmYX35zGZLny5hrU Yxs9Uk4CdTRhJjKDFtlgPgnWUzIRkuLD6GofKNikix4O0O4D5e/PhiqEoKBT1emaoTAu +H5me+LlcCo63gUuqYkAbrh+rADwYviR7hRuub55oUa6OzDklifJT7LryS8ZXBLdDLXk NfwRxBWRtTQibpLY+UI9vnhUMTwuSIzJ6zqhvPdInGPvL7dcv2RNkjAzVOnyxUaZ0hHX c3TKvNc89H+0sqh6QUDWpfvRQ3T5ubfiSYoxFFkBuJPD8VhiWn1te0RI05IUI+yprL6V 9RIw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=k8160bX0AxtMu26pGIgkyDQWX/jNyhf79KSwyNs478o=; b=LA0+2QILf2V8orhOZRkxF3A+wwTiKjW8QCQSYl4k8se1+UTNCQ3oArfhrcRkdDVarz PcNjUQy4pAkQch/09SAOEo4eUdSM9s5J0zBVCX0zGr9tscnwm/molQrjvpFQbBIhHwx8 2izIyTd40evsjEJqJgIToQ3z22oRfrcAhMt5vLcSYxzOln1wU0dYwQSZAwSXpy/jrEhn 9BTN7kiJ8Ofg47tH6FCPlqlnXrWZksOr2R8GUAxPQsmpB825F1v+iTMKb4l7+eGRCsy1 lbGQzpioxiBhnAZHHR9eEkgP4MTbKQn8BN5yQYX3+Tp+SQIY6J9f5lTbUaoTKCdDymN6 ZnMA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f1si14959711plf.410.2019.08.06.10.37.48; Tue, 06 Aug 2019 10:38:04 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387496AbfHFRgx (ORCPT + 99 others); Tue, 6 Aug 2019 13:36:53 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:50245 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727549AbfHFRgx (ORCPT ); Tue, 6 Aug 2019 13:36:53 -0400 Received: from mail-wr1-f71.google.com ([209.85.221.71]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1hv3OF-0004u6-6h for linux-kernel@vger.kernel.org; Tue, 06 Aug 2019 17:36:51 +0000 Received: by mail-wr1-f71.google.com with SMTP id g8so42465976wrw.2 for ; Tue, 06 Aug 2019 10:36:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=k8160bX0AxtMu26pGIgkyDQWX/jNyhf79KSwyNs478o=; b=VOxzUJDxIpSVMqCWN6wYU/eeQ5G2tfRGMM0e042UEeqMo5tSto+kS+EV+hQGzXlhyb HhtxRFIkGzFe8PR2MKg+BNePdHiuoFnTeFUz5N79u7Pl9lBUVhXvbzrzEm9NLwEw4+Lw fdqu3ziCuO2CVRgcf5YF1K/drFUN56TjnCwyBw4N2TpV0cT+RbyjI0m17FP6Q3mNj+DI hBoEyaMaCYR7Wo2WNg7NMJxzyojYJhsdicWgv9kFq9x2kVL87T76/VDfDCpEnWIubVte b/Y/jG8X97IVNOn4Hob8cLKF1Bvviv4wgWifo5QgP5cgC5fCnuuj6Elxb2/2UcWKeWki Oqkg== X-Gm-Message-State: APjAAAUtfDi+l03VEIHo9MoDjq9sUtluqLZjydPtSprw73yaUjEafGhR iBQggTL3dyXbGSwatXysgFGQZaUyq8x9JEmnBJdCac5GqVQQghFJdNjK/83NmGljuB6dDuFiFO+ CwhobnFEmQ3ncRsiktM0ejcu/dbfwHkzmVudpFH9Cjw== X-Received: by 2002:adf:fb8e:: with SMTP id a14mr5757122wrr.263.1565113010859; Tue, 06 Aug 2019 10:36:50 -0700 (PDT) X-Received: by 2002:adf:fb8e:: with SMTP id a14mr5757103wrr.263.1565113010488; Tue, 06 Aug 2019 10:36:50 -0700 (PDT) Received: from localhost (host21-131-dynamic.46-79-r.retail.telecomitalia.it. [79.46.131.21]) by smtp.gmail.com with ESMTPSA id e19sm122697381wra.71.2019.08.06.10.36.49 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Tue, 06 Aug 2019 10:36:49 -0700 (PDT) Date: Tue, 6 Aug 2019 19:36:48 +0200 From: Andrea Righi To: Coly Li Cc: Kent Overstreet , linux-bcache@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2] bcache: fix deadlock in bcache_allocator Message-ID: <20190806173648.GB27866@xps-13> References: <20190806091801.GC11184@xps-13> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190806091801.GC11184@xps-13> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 06, 2019 at 11:18:01AM +0200, Andrea Righi wrote: > bcache_allocator() can call the following: > > bch_allocator_thread() > -> bch_prio_write() > -> bch_bucket_alloc() > -> wait on &ca->set->bucket_wait > > But the wake up event on bucket_wait is supposed to come from > bch_allocator_thread() itself => deadlock: > > [ 1158.490744] INFO: task bcache_allocato:15861 blocked for more than 10 seconds. > [ 1158.495929] Not tainted 5.3.0-050300rc3-generic #201908042232 > [ 1158.500653] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 1158.504413] bcache_allocato D 0 15861 2 0x80004000 > [ 1158.504419] Call Trace: > [ 1158.504429] __schedule+0x2a8/0x670 > [ 1158.504432] schedule+0x2d/0x90 > [ 1158.504448] bch_bucket_alloc+0xe5/0x370 [bcache] > [ 1158.504453] ? wait_woken+0x80/0x80 > [ 1158.504466] bch_prio_write+0x1dc/0x390 [bcache] > [ 1158.504476] bch_allocator_thread+0x233/0x490 [bcache] > [ 1158.504491] kthread+0x121/0x140 > [ 1158.504503] ? invalidate_buckets+0x890/0x890 [bcache] > [ 1158.504506] ? kthread_park+0xb0/0xb0 > [ 1158.504510] ret_from_fork+0x35/0x40 > > Fix by making the call to bch_prio_write() non-blocking, so that > bch_allocator_thread() never waits on itself. > > Moreover, make sure to wake up the garbage collector thread when > bch_prio_write() is failing to allocate buckets. > > BugLink: https://bugs.launchpad.net/bugs/1784665 > BugLink: https://bugs.launchpad.net/bugs/1796292 > Signed-off-by: Andrea Righi > --- > Changes in v2: > - prevent retry_invalidate busy loop in bch_allocator_thread() > > drivers/md/bcache/alloc.c | 5 ++++- > drivers/md/bcache/bcache.h | 2 +- > drivers/md/bcache/super.c | 13 +++++++++---- > 3 files changed, 14 insertions(+), 6 deletions(-) > > diff --git a/drivers/md/bcache/alloc.c b/drivers/md/bcache/alloc.c > index 6f776823b9ba..a1df0d95151c 100644 > --- a/drivers/md/bcache/alloc.c > +++ b/drivers/md/bcache/alloc.c > @@ -377,7 +377,10 @@ static int bch_allocator_thread(void *arg) > if (!fifo_full(&ca->free_inc)) > goto retry_invalidate; > > - bch_prio_write(ca); > + if (bch_prio_write(ca, false) < 0) { > + ca->invalidate_needs_gc = 1; > + wake_up_gc(ca->set); > + } > } > } > out: > diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h > index 013e35a9e317..deb924e1d790 100644 > --- a/drivers/md/bcache/bcache.h > +++ b/drivers/md/bcache/bcache.h > @@ -977,7 +977,7 @@ bool bch_cached_dev_error(struct cached_dev *dc); > __printf(2, 3) > bool bch_cache_set_error(struct cache_set *c, const char *fmt, ...); > > -void bch_prio_write(struct cache *ca); > +int bch_prio_write(struct cache *ca, bool wait); > void bch_write_bdev_super(struct cached_dev *dc, struct closure *parent); > > extern struct workqueue_struct *bcache_wq; > diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c > index 20ed838e9413..716ea272fb55 100644 > --- a/drivers/md/bcache/super.c > +++ b/drivers/md/bcache/super.c > @@ -529,7 +529,7 @@ static void prio_io(struct cache *ca, uint64_t bucket, int op, > closure_sync(cl); > } > > -void bch_prio_write(struct cache *ca) > +int bch_prio_write(struct cache *ca, bool wait) > { > int i; > struct bucket *b; > @@ -564,8 +564,12 @@ void bch_prio_write(struct cache *ca) > p->magic = pset_magic(&ca->sb); > p->csum = bch_crc64(&p->magic, bucket_bytes(ca) - 8); > > - bucket = bch_bucket_alloc(ca, RESERVE_PRIO, true); > - BUG_ON(bucket == -1); > + bucket = bch_bucket_alloc(ca, RESERVE_PRIO, wait); > + if (bucket == -1) { > + if (!wait) > + return -ENOMEM; > + BUG_ON(1); > + } Coly, looking more at this change, I think we should handle the failure path properly or we may leak buckets, am I right? (sorry for not realizing this before). Maybe we need something like the following on top of my previous patch. I'm going to run more stress tests with this patch applied and will try to figure out if we're actually leaking buckets without it. --- Subject: bcache: prevent leaking buckets in bch_prio_write() Handle the allocation failure path properly in bch_prio_write() to avoid leaking buckets from the previous successful iterations. Signed-off-by: Andrea Righi --- drivers/md/bcache/super.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 716ea27..727266f 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -531,7 +531,7 @@ static void prio_io(struct cache *ca, uint64_t bucket, int op, int bch_prio_write(struct cache *ca, bool wait) { - int i; + int i, j, ret = 0; struct bucket *b; struct closure cl; @@ -566,9 +566,11 @@ int bch_prio_write(struct cache *ca, bool wait) bucket = bch_bucket_alloc(ca, RESERVE_PRIO, wait); if (bucket == -1) { - if (!wait) - return -ENOMEM; - BUG_ON(1); + if (!wait) { + ret = -ENOMEM; + break; + } + BUG(); } mutex_unlock(&ca->set->bucket_lock); @@ -590,14 +592,14 @@ int bch_prio_write(struct cache *ca, bool wait) * Don't want the old priorities to get garbage collected until after we * finish writing the new ones, and they're journalled */ - for (i = 0; i < prio_buckets(ca); i++) { - if (ca->prio_last_buckets[i]) + for (j = prio_buckets(ca) - 1; j > i; --j) { + if (ca->prio_last_buckets[j]) __bch_bucket_free(ca, - &ca->buckets[ca->prio_last_buckets[i]]); + &ca->buckets[ca->prio_last_buckets[j]]); - ca->prio_last_buckets[i] = ca->prio_buckets[i]; + ca->prio_last_buckets[j] = ca->prio_buckets[j]; } - return 0; + return ret; } static void prio_read(struct cache *ca, uint64_t bucket) -- 2.7.4