2019-08-06 09:20:32

by Andrea Righi

[permalink] [raw]
Subject: [PATCH v2] bcache: fix deadlock in bcache_allocator

bcache_allocator() can call the following:

bch_allocator_thread()
-> bch_prio_write()
-> bch_bucket_alloc()
-> wait on &ca->set->bucket_wait

But the wake up event on bucket_wait is supposed to come from
bch_allocator_thread() itself => deadlock:

[ 1158.490744] INFO: task bcache_allocato:15861 blocked for more than 10 seconds.
[ 1158.495929] Not tainted 5.3.0-050300rc3-generic #201908042232
[ 1158.500653] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1158.504413] bcache_allocato D 0 15861 2 0x80004000
[ 1158.504419] Call Trace:
[ 1158.504429] __schedule+0x2a8/0x670
[ 1158.504432] schedule+0x2d/0x90
[ 1158.504448] bch_bucket_alloc+0xe5/0x370 [bcache]
[ 1158.504453] ? wait_woken+0x80/0x80
[ 1158.504466] bch_prio_write+0x1dc/0x390 [bcache]
[ 1158.504476] bch_allocator_thread+0x233/0x490 [bcache]
[ 1158.504491] kthread+0x121/0x140
[ 1158.504503] ? invalidate_buckets+0x890/0x890 [bcache]
[ 1158.504506] ? kthread_park+0xb0/0xb0
[ 1158.504510] ret_from_fork+0x35/0x40

Fix by making the call to bch_prio_write() non-blocking, so that
bch_allocator_thread() never waits on itself.

Moreover, make sure to wake up the garbage collector thread when
bch_prio_write() is failing to allocate buckets.

BugLink: https://bugs.launchpad.net/bugs/1784665
BugLink: https://bugs.launchpad.net/bugs/1796292
Signed-off-by: Andrea Righi <[email protected]>
---
Changes in v2:
- prevent retry_invalidate busy loop in bch_allocator_thread()

drivers/md/bcache/alloc.c | 5 ++++-
drivers/md/bcache/bcache.h | 2 +-
drivers/md/bcache/super.c | 13 +++++++++----
3 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/md/bcache/alloc.c b/drivers/md/bcache/alloc.c
index 6f776823b9ba..a1df0d95151c 100644
--- a/drivers/md/bcache/alloc.c
+++ b/drivers/md/bcache/alloc.c
@@ -377,7 +377,10 @@ static int bch_allocator_thread(void *arg)
if (!fifo_full(&ca->free_inc))
goto retry_invalidate;

- bch_prio_write(ca);
+ if (bch_prio_write(ca, false) < 0) {
+ ca->invalidate_needs_gc = 1;
+ wake_up_gc(ca->set);
+ }
}
}
out:
diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 013e35a9e317..deb924e1d790 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -977,7 +977,7 @@ bool bch_cached_dev_error(struct cached_dev *dc);
__printf(2, 3)
bool bch_cache_set_error(struct cache_set *c, const char *fmt, ...);

-void bch_prio_write(struct cache *ca);
+int bch_prio_write(struct cache *ca, bool wait);
void bch_write_bdev_super(struct cached_dev *dc, struct closure *parent);

extern struct workqueue_struct *bcache_wq;
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 20ed838e9413..716ea272fb55 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -529,7 +529,7 @@ static void prio_io(struct cache *ca, uint64_t bucket, int op,
closure_sync(cl);
}

-void bch_prio_write(struct cache *ca)
+int bch_prio_write(struct cache *ca, bool wait)
{
int i;
struct bucket *b;
@@ -564,8 +564,12 @@ void bch_prio_write(struct cache *ca)
p->magic = pset_magic(&ca->sb);
p->csum = bch_crc64(&p->magic, bucket_bytes(ca) - 8);

- bucket = bch_bucket_alloc(ca, RESERVE_PRIO, true);
- BUG_ON(bucket == -1);
+ bucket = bch_bucket_alloc(ca, RESERVE_PRIO, wait);
+ if (bucket == -1) {
+ if (!wait)
+ return -ENOMEM;
+ BUG_ON(1);
+ }

mutex_unlock(&ca->set->bucket_lock);
prio_io(ca, bucket, REQ_OP_WRITE, 0);
@@ -593,6 +597,7 @@ void bch_prio_write(struct cache *ca)

ca->prio_last_buckets[i] = ca->prio_buckets[i];
}
+ return 0;
}

static void prio_read(struct cache *ca, uint64_t bucket)
@@ -1954,7 +1959,7 @@ static int run_cache_set(struct cache_set *c)

mutex_lock(&c->bucket_lock);
for_each_cache(ca, c, i)
- bch_prio_write(ca);
+ bch_prio_write(ca, true);
mutex_unlock(&c->bucket_lock);

err = "cannot allocate new UUID bucket";
--
2.20.1


2019-08-06 13:01:30

by Coly Li

[permalink] [raw]
Subject: Re: [PATCH v2] bcache: fix deadlock in bcache_allocator

On 2019/8/6 5:18 下午, Andrea Righi wrote:
> bcache_allocator() can call the following:
>
> bch_allocator_thread()
> -> bch_prio_write()
> -> bch_bucket_alloc()
> -> wait on &ca->set->bucket_wait
>
> But the wake up event on bucket_wait is supposed to come from
> bch_allocator_thread() itself => deadlock:
>
> [ 1158.490744] INFO: task bcache_allocato:15861 blocked for more than 10 seconds.
> [ 1158.495929] Not tainted 5.3.0-050300rc3-generic #201908042232
> [ 1158.500653] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1158.504413] bcache_allocato D 0 15861 2 0x80004000
> [ 1158.504419] Call Trace:
> [ 1158.504429] __schedule+0x2a8/0x670
> [ 1158.504432] schedule+0x2d/0x90
> [ 1158.504448] bch_bucket_alloc+0xe5/0x370 [bcache]
> [ 1158.504453] ? wait_woken+0x80/0x80
> [ 1158.504466] bch_prio_write+0x1dc/0x390 [bcache]
> [ 1158.504476] bch_allocator_thread+0x233/0x490 [bcache]
> [ 1158.504491] kthread+0x121/0x140
> [ 1158.504503] ? invalidate_buckets+0x890/0x890 [bcache]
> [ 1158.504506] ? kthread_park+0xb0/0xb0
> [ 1158.504510] ret_from_fork+0x35/0x40
>
> Fix by making the call to bch_prio_write() non-blocking, so that
> bch_allocator_thread() never waits on itself.
>
> Moreover, make sure to wake up the garbage collector thread when
> bch_prio_write() is failing to allocate buckets.
>
> BugLink: https://bugs.launchpad.net/bugs/1784665
> BugLink: https://bugs.launchpad.net/bugs/1796292
> Signed-off-by: Andrea Righi <[email protected]>

At this moment I am not able to find a better solution, so I take this
patch in my for-test.

Thank you. And I hope you may continue to maintain this change if people
report problem (I mean if) in future.


Coly Li


> ---
> Changes in v2:
> - prevent retry_invalidate busy loop in bch_allocator_thread()
>
> drivers/md/bcache/alloc.c | 5 ++++-
> drivers/md/bcache/bcache.h | 2 +-
> drivers/md/bcache/super.c | 13 +++++++++----
> 3 files changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/md/bcache/alloc.c b/drivers/md/bcache/alloc.c
> index 6f776823b9ba..a1df0d95151c 100644
> --- a/drivers/md/bcache/alloc.c
> +++ b/drivers/md/bcache/alloc.c
> @@ -377,7 +377,10 @@ static int bch_allocator_thread(void *arg)
> if (!fifo_full(&ca->free_inc))
> goto retry_invalidate;
>
> - bch_prio_write(ca);
> + if (bch_prio_write(ca, false) < 0) {
> + ca->invalidate_needs_gc = 1;
> + wake_up_gc(ca->set);
> + }
> }
> }
> out:
> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
> index 013e35a9e317..deb924e1d790 100644
> --- a/drivers/md/bcache/bcache.h
> +++ b/drivers/md/bcache/bcache.h
> @@ -977,7 +977,7 @@ bool bch_cached_dev_error(struct cached_dev *dc);
> __printf(2, 3)
> bool bch_cache_set_error(struct cache_set *c, const char *fmt, ...);
>
> -void bch_prio_write(struct cache *ca);
> +int bch_prio_write(struct cache *ca, bool wait);
> void bch_write_bdev_super(struct cached_dev *dc, struct closure *parent);
>
> extern struct workqueue_struct *bcache_wq;
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 20ed838e9413..716ea272fb55 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -529,7 +529,7 @@ static void prio_io(struct cache *ca, uint64_t bucket, int op,
> closure_sync(cl);
> }
>
> -void bch_prio_write(struct cache *ca)
> +int bch_prio_write(struct cache *ca, bool wait)
> {
> int i;
> struct bucket *b;
> @@ -564,8 +564,12 @@ void bch_prio_write(struct cache *ca)
> p->magic = pset_magic(&ca->sb);
> p->csum = bch_crc64(&p->magic, bucket_bytes(ca) - 8);
>
> - bucket = bch_bucket_alloc(ca, RESERVE_PRIO, true);
> - BUG_ON(bucket == -1);
> + bucket = bch_bucket_alloc(ca, RESERVE_PRIO, wait);
> + if (bucket == -1) {
> + if (!wait)
> + return -ENOMEM;
> + BUG_ON(1);
> + }
>
> mutex_unlock(&ca->set->bucket_lock);
> prio_io(ca, bucket, REQ_OP_WRITE, 0);
> @@ -593,6 +597,7 @@ void bch_prio_write(struct cache *ca)
>
> ca->prio_last_buckets[i] = ca->prio_buckets[i];
> }
> + return 0;
> }
>
> static void prio_read(struct cache *ca, uint64_t bucket)
> @@ -1954,7 +1959,7 @@ static int run_cache_set(struct cache_set *c)
>
> mutex_lock(&c->bucket_lock);
> for_each_cache(ca, c, i)
> - bch_prio_write(ca);
> + bch_prio_write(ca, true);
> mutex_unlock(&c->bucket_lock);
>
> err = "cannot allocate new UUID bucket";
>


--

Coly Li

2019-08-06 17:38:05

by Andrea Righi

[permalink] [raw]
Subject: Re: [PATCH v2] bcache: fix deadlock in bcache_allocator

On Tue, Aug 06, 2019 at 11:18:01AM +0200, Andrea Righi wrote:
> bcache_allocator() can call the following:
>
> bch_allocator_thread()
> -> bch_prio_write()
> -> bch_bucket_alloc()
> -> wait on &ca->set->bucket_wait
>
> But the wake up event on bucket_wait is supposed to come from
> bch_allocator_thread() itself => deadlock:
>
> [ 1158.490744] INFO: task bcache_allocato:15861 blocked for more than 10 seconds.
> [ 1158.495929] Not tainted 5.3.0-050300rc3-generic #201908042232
> [ 1158.500653] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1158.504413] bcache_allocato D 0 15861 2 0x80004000
> [ 1158.504419] Call Trace:
> [ 1158.504429] __schedule+0x2a8/0x670
> [ 1158.504432] schedule+0x2d/0x90
> [ 1158.504448] bch_bucket_alloc+0xe5/0x370 [bcache]
> [ 1158.504453] ? wait_woken+0x80/0x80
> [ 1158.504466] bch_prio_write+0x1dc/0x390 [bcache]
> [ 1158.504476] bch_allocator_thread+0x233/0x490 [bcache]
> [ 1158.504491] kthread+0x121/0x140
> [ 1158.504503] ? invalidate_buckets+0x890/0x890 [bcache]
> [ 1158.504506] ? kthread_park+0xb0/0xb0
> [ 1158.504510] ret_from_fork+0x35/0x40
>
> Fix by making the call to bch_prio_write() non-blocking, so that
> bch_allocator_thread() never waits on itself.
>
> Moreover, make sure to wake up the garbage collector thread when
> bch_prio_write() is failing to allocate buckets.
>
> BugLink: https://bugs.launchpad.net/bugs/1784665
> BugLink: https://bugs.launchpad.net/bugs/1796292
> Signed-off-by: Andrea Righi <[email protected]>
> ---
> Changes in v2:
> - prevent retry_invalidate busy loop in bch_allocator_thread()
>
> drivers/md/bcache/alloc.c | 5 ++++-
> drivers/md/bcache/bcache.h | 2 +-
> drivers/md/bcache/super.c | 13 +++++++++----
> 3 files changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/md/bcache/alloc.c b/drivers/md/bcache/alloc.c
> index 6f776823b9ba..a1df0d95151c 100644
> --- a/drivers/md/bcache/alloc.c
> +++ b/drivers/md/bcache/alloc.c
> @@ -377,7 +377,10 @@ static int bch_allocator_thread(void *arg)
> if (!fifo_full(&ca->free_inc))
> goto retry_invalidate;
>
> - bch_prio_write(ca);
> + if (bch_prio_write(ca, false) < 0) {
> + ca->invalidate_needs_gc = 1;
> + wake_up_gc(ca->set);
> + }
> }
> }
> out:
> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
> index 013e35a9e317..deb924e1d790 100644
> --- a/drivers/md/bcache/bcache.h
> +++ b/drivers/md/bcache/bcache.h
> @@ -977,7 +977,7 @@ bool bch_cached_dev_error(struct cached_dev *dc);
> __printf(2, 3)
> bool bch_cache_set_error(struct cache_set *c, const char *fmt, ...);
>
> -void bch_prio_write(struct cache *ca);
> +int bch_prio_write(struct cache *ca, bool wait);
> void bch_write_bdev_super(struct cached_dev *dc, struct closure *parent);
>
> extern struct workqueue_struct *bcache_wq;
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 20ed838e9413..716ea272fb55 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -529,7 +529,7 @@ static void prio_io(struct cache *ca, uint64_t bucket, int op,
> closure_sync(cl);
> }
>
> -void bch_prio_write(struct cache *ca)
> +int bch_prio_write(struct cache *ca, bool wait)
> {
> int i;
> struct bucket *b;
> @@ -564,8 +564,12 @@ void bch_prio_write(struct cache *ca)
> p->magic = pset_magic(&ca->sb);
> p->csum = bch_crc64(&p->magic, bucket_bytes(ca) - 8);
>
> - bucket = bch_bucket_alloc(ca, RESERVE_PRIO, true);
> - BUG_ON(bucket == -1);
> + bucket = bch_bucket_alloc(ca, RESERVE_PRIO, wait);
> + if (bucket == -1) {
> + if (!wait)
> + return -ENOMEM;
> + BUG_ON(1);
> + }

Coly,

looking more at this change, I think we should handle the failure path
properly or we may leak buckets, am I right? (sorry for not realizing
this before). Maybe we need something like the following on top of my
previous patch.

I'm going to run more stress tests with this patch applied and will try
to figure out if we're actually leaking buckets without it.

---
Subject: bcache: prevent leaking buckets in bch_prio_write()

Handle the allocation failure path properly in bch_prio_write() to avoid
leaking buckets from the previous successful iterations.

Signed-off-by: Andrea Righi <[email protected]>
---
drivers/md/bcache/super.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 716ea27..727266f 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -531,7 +531,7 @@ static void prio_io(struct cache *ca, uint64_t bucket, int op,

int bch_prio_write(struct cache *ca, bool wait)
{
- int i;
+ int i, j, ret = 0;
struct bucket *b;
struct closure cl;

@@ -566,9 +566,11 @@ int bch_prio_write(struct cache *ca, bool wait)

bucket = bch_bucket_alloc(ca, RESERVE_PRIO, wait);
if (bucket == -1) {
- if (!wait)
- return -ENOMEM;
- BUG_ON(1);
+ if (!wait) {
+ ret = -ENOMEM;
+ break;
+ }
+ BUG();
}

mutex_unlock(&ca->set->bucket_lock);
@@ -590,14 +592,14 @@ int bch_prio_write(struct cache *ca, bool wait)
* Don't want the old priorities to get garbage collected until after we
* finish writing the new ones, and they're journalled
*/
- for (i = 0; i < prio_buckets(ca); i++) {
- if (ca->prio_last_buckets[i])
+ for (j = prio_buckets(ca) - 1; j > i; --j) {
+ if (ca->prio_last_buckets[j])
__bch_bucket_free(ca,
- &ca->buckets[ca->prio_last_buckets[i]]);
+ &ca->buckets[ca->prio_last_buckets[j]]);

- ca->prio_last_buckets[i] = ca->prio_buckets[i];
+ ca->prio_last_buckets[j] = ca->prio_buckets[j];
}
- return 0;
+ return ret;
}

static void prio_read(struct cache *ca, uint64_t bucket)
--
2.7.4

2019-08-07 09:42:09

by Andrea Righi

[permalink] [raw]
Subject: Re: [PATCH v2] bcache: fix deadlock in bcache_allocator

On Tue, Aug 06, 2019 at 07:36:48PM +0200, Andrea Righi wrote:
> On Tue, Aug 06, 2019 at 11:18:01AM +0200, Andrea Righi wrote:
> > bcache_allocator() can call the following:
> >
> > bch_allocator_thread()
> > -> bch_prio_write()
> > -> bch_bucket_alloc()
> > -> wait on &ca->set->bucket_wait
> >
> > But the wake up event on bucket_wait is supposed to come from
> > bch_allocator_thread() itself => deadlock:
> >
> > [ 1158.490744] INFO: task bcache_allocato:15861 blocked for more than 10 seconds.
> > [ 1158.495929] Not tainted 5.3.0-050300rc3-generic #201908042232
> > [ 1158.500653] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [ 1158.504413] bcache_allocato D 0 15861 2 0x80004000
> > [ 1158.504419] Call Trace:
> > [ 1158.504429] __schedule+0x2a8/0x670
> > [ 1158.504432] schedule+0x2d/0x90
> > [ 1158.504448] bch_bucket_alloc+0xe5/0x370 [bcache]
> > [ 1158.504453] ? wait_woken+0x80/0x80
> > [ 1158.504466] bch_prio_write+0x1dc/0x390 [bcache]
> > [ 1158.504476] bch_allocator_thread+0x233/0x490 [bcache]
> > [ 1158.504491] kthread+0x121/0x140
> > [ 1158.504503] ? invalidate_buckets+0x890/0x890 [bcache]
> > [ 1158.504506] ? kthread_park+0xb0/0xb0
> > [ 1158.504510] ret_from_fork+0x35/0x40
> >
> > Fix by making the call to bch_prio_write() non-blocking, so that
> > bch_allocator_thread() never waits on itself.
> >
> > Moreover, make sure to wake up the garbage collector thread when
> > bch_prio_write() is failing to allocate buckets.
> >
> > BugLink: https://bugs.launchpad.net/bugs/1784665
> > BugLink: https://bugs.launchpad.net/bugs/1796292
> > Signed-off-by: Andrea Righi <[email protected]>
> > ---
> > Changes in v2:
> > - prevent retry_invalidate busy loop in bch_allocator_thread()
> >
> > drivers/md/bcache/alloc.c | 5 ++++-
> > drivers/md/bcache/bcache.h | 2 +-
> > drivers/md/bcache/super.c | 13 +++++++++----
> > 3 files changed, 14 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/md/bcache/alloc.c b/drivers/md/bcache/alloc.c
> > index 6f776823b9ba..a1df0d95151c 100644
> > --- a/drivers/md/bcache/alloc.c
> > +++ b/drivers/md/bcache/alloc.c
> > @@ -377,7 +377,10 @@ static int bch_allocator_thread(void *arg)
> > if (!fifo_full(&ca->free_inc))
> > goto retry_invalidate;
> >
> > - bch_prio_write(ca);
> > + if (bch_prio_write(ca, false) < 0) {
> > + ca->invalidate_needs_gc = 1;
> > + wake_up_gc(ca->set);
> > + }
> > }
> > }
> > out:
> > diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
> > index 013e35a9e317..deb924e1d790 100644
> > --- a/drivers/md/bcache/bcache.h
> > +++ b/drivers/md/bcache/bcache.h
> > @@ -977,7 +977,7 @@ bool bch_cached_dev_error(struct cached_dev *dc);
> > __printf(2, 3)
> > bool bch_cache_set_error(struct cache_set *c, const char *fmt, ...);
> >
> > -void bch_prio_write(struct cache *ca);
> > +int bch_prio_write(struct cache *ca, bool wait);
> > void bch_write_bdev_super(struct cached_dev *dc, struct closure *parent);
> >
> > extern struct workqueue_struct *bcache_wq;
> > diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> > index 20ed838e9413..716ea272fb55 100644
> > --- a/drivers/md/bcache/super.c
> > +++ b/drivers/md/bcache/super.c
> > @@ -529,7 +529,7 @@ static void prio_io(struct cache *ca, uint64_t bucket, int op,
> > closure_sync(cl);
> > }
> >
> > -void bch_prio_write(struct cache *ca)
> > +int bch_prio_write(struct cache *ca, bool wait)
> > {
> > int i;
> > struct bucket *b;
> > @@ -564,8 +564,12 @@ void bch_prio_write(struct cache *ca)
> > p->magic = pset_magic(&ca->sb);
> > p->csum = bch_crc64(&p->magic, bucket_bytes(ca) - 8);
> >
> > - bucket = bch_bucket_alloc(ca, RESERVE_PRIO, true);
> > - BUG_ON(bucket == -1);
> > + bucket = bch_bucket_alloc(ca, RESERVE_PRIO, wait);
> > + if (bucket == -1) {
> > + if (!wait)
> > + return -ENOMEM;
> > + BUG_ON(1);
> > + }
>
> Coly,
>
> looking more at this change, I think we should handle the failure path
> properly or we may leak buckets, am I right? (sorry for not realizing
> this before). Maybe we need something like the following on top of my
> previous patch.
>
> I'm going to run more stress tests with this patch applied and will try
> to figure out if we're actually leaking buckets without it.
>
> ---
> Subject: bcache: prevent leaking buckets in bch_prio_write()
>
> Handle the allocation failure path properly in bch_prio_write() to avoid
> leaking buckets from the previous successful iterations.
>
> Signed-off-by: Andrea Righi <[email protected]>

Coly, ignore this one please. A v3 of the previous patch with a better
fix for this potential buckets leak is on the way.

Thanks,
-Andrea

2019-08-07 10:20:40

by Coly Li

[permalink] [raw]
Subject: Re: [PATCH v2] bcache: fix deadlock in bcache_allocator

On 2019/8/7 5:25 下午, Andrea Righi wrote:
> On Tue, Aug 06, 2019 at 07:36:48PM +0200, Andrea Righi wrote:
>> On Tue, Aug 06, 2019 at 11:18:01AM +0200, Andrea Righi wrote:
>>> bcache_allocator() can call the following:
>>>
>>> bch_allocator_thread()
>>> -> bch_prio_write()
>>> -> bch_bucket_alloc()
>>> -> wait on &ca->set->bucket_wait
>>>
>>> But the wake up event on bucket_wait is supposed to come from
>>> bch_allocator_thread() itself => deadlock:
>>>
>>> [ 1158.490744] INFO: task bcache_allocato:15861 blocked for more than 10 seconds.
>>> [ 1158.495929] Not tainted 5.3.0-050300rc3-generic #201908042232
>>> [ 1158.500653] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> [ 1158.504413] bcache_allocato D 0 15861 2 0x80004000
>>> [ 1158.504419] Call Trace:
>>> [ 1158.504429] __schedule+0x2a8/0x670
>>> [ 1158.504432] schedule+0x2d/0x90
>>> [ 1158.504448] bch_bucket_alloc+0xe5/0x370 [bcache]
>>> [ 1158.504453] ? wait_woken+0x80/0x80
>>> [ 1158.504466] bch_prio_write+0x1dc/0x390 [bcache]
>>> [ 1158.504476] bch_allocator_thread+0x233/0x490 [bcache]
>>> [ 1158.504491] kthread+0x121/0x140
>>> [ 1158.504503] ? invalidate_buckets+0x890/0x890 [bcache]
>>> [ 1158.504506] ? kthread_park+0xb0/0xb0
>>> [ 1158.504510] ret_from_fork+0x35/0x40
>>>
>>> Fix by making the call to bch_prio_write() non-blocking, so that
>>> bch_allocator_thread() never waits on itself.
>>>
>>> Moreover, make sure to wake up the garbage collector thread when
>>> bch_prio_write() is failing to allocate buckets.
>>>
>>> BugLink: https://bugs.launchpad.net/bugs/1784665
>>> BugLink: https://bugs.launchpad.net/bugs/1796292
>>> Signed-off-by: Andrea Righi <[email protected]>
>>> ---
>>> Changes in v2:
>>> - prevent retry_invalidate busy loop in bch_allocator_thread()
>>>
>>> drivers/md/bcache/alloc.c | 5 ++++-
>>> drivers/md/bcache/bcache.h | 2 +-
>>> drivers/md/bcache/super.c | 13 +++++++++----
>>> 3 files changed, 14 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/md/bcache/alloc.c b/drivers/md/bcache/alloc.c
>>> index 6f776823b9ba..a1df0d95151c 100644
>>> --- a/drivers/md/bcache/alloc.c
>>> +++ b/drivers/md/bcache/alloc.c
>>> @@ -377,7 +377,10 @@ static int bch_allocator_thread(void *arg)
>>> if (!fifo_full(&ca->free_inc))
>>> goto retry_invalidate;
>>>
>>> - bch_prio_write(ca);
>>> + if (bch_prio_write(ca, false) < 0) {
>>> + ca->invalidate_needs_gc = 1;
>>> + wake_up_gc(ca->set);
>>> + }
>>> }
>>> }
>>> out:
>>> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
>>> index 013e35a9e317..deb924e1d790 100644
>>> --- a/drivers/md/bcache/bcache.h
>>> +++ b/drivers/md/bcache/bcache.h
>>> @@ -977,7 +977,7 @@ bool bch_cached_dev_error(struct cached_dev *dc);
>>> __printf(2, 3)
>>> bool bch_cache_set_error(struct cache_set *c, const char *fmt, ...);
>>>
>>> -void bch_prio_write(struct cache *ca);
>>> +int bch_prio_write(struct cache *ca, bool wait);
>>> void bch_write_bdev_super(struct cached_dev *dc, struct closure *parent);
>>>
>>> extern struct workqueue_struct *bcache_wq;
>>> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
>>> index 20ed838e9413..716ea272fb55 100644
>>> --- a/drivers/md/bcache/super.c
>>> +++ b/drivers/md/bcache/super.c
>>> @@ -529,7 +529,7 @@ static void prio_io(struct cache *ca, uint64_t bucket, int op,
>>> closure_sync(cl);
>>> }
>>>
>>> -void bch_prio_write(struct cache *ca)
>>> +int bch_prio_write(struct cache *ca, bool wait)
>>> {
>>> int i;
>>> struct bucket *b;
>>> @@ -564,8 +564,12 @@ void bch_prio_write(struct cache *ca)
>>> p->magic = pset_magic(&ca->sb);
>>> p->csum = bch_crc64(&p->magic, bucket_bytes(ca) - 8);
>>>
>>> - bucket = bch_bucket_alloc(ca, RESERVE_PRIO, true);
>>> - BUG_ON(bucket == -1);
>>> + bucket = bch_bucket_alloc(ca, RESERVE_PRIO, wait);
>>> + if (bucket == -1) {
>>> + if (!wait)
>>> + return -ENOMEM;
>>> + BUG_ON(1);
>>> + }
>>
>> Coly,
>>
>> looking more at this change, I think we should handle the failure path
>> properly or we may leak buckets, am I right? (sorry for not realizing
>> this before). Maybe we need something like the following on top of my
>> previous patch.
>>
>> I'm going to run more stress tests with this patch applied and will try
>> to figure out if we're actually leaking buckets without it.
>>
>> ---
>> Subject: bcache: prevent leaking buckets in bch_prio_write()
>>
>> Handle the allocation failure path properly in bch_prio_write() to avoid
>> leaking buckets from the previous successful iterations.
>>
>> Signed-off-by: Andrea Righi <[email protected]>
>
> Coly, ignore this one please. A v3 of the previous patch with a better
> fix for this potential buckets leak is on the way.

Sure, waiting for next version :-)


--

Coly Li