2012-02-20 12:11:58

by Andrea Righi

[permalink] [raw]
Subject: [PATCH] zcache: avoid AB-BA deadlock condition

Commit 9256a47 fixed a deadlock condition, being sure that the buddy
list spinlock is always taken before the page spinlock.

However in zbud_free_and_delist() locking order is the opposite
(page lock -> list lock).

Possible unsafe locking scenario (reported by lockdep):

CPU0 CPU1
---- ----
lock(&(&zbpg->lock)->rlock);
lock(zbud_budlists_spinlock);
lock(&(&zbpg->lock)->rlock);
lock(zbud_budlists_spinlock);

Fix by grabbing the locks in opposite order in zbud_free_and_delist().

Signed-off-by: Andrea Righi <[email protected]>
---
drivers/staging/zcache/zcache-main.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/zcache/zcache-main.c b/drivers/staging/zcache/zcache-main.c
index ef7c52b..dce04be 100644
--- a/drivers/staging/zcache/zcache-main.c
+++ b/drivers/staging/zcache/zcache-main.c
@@ -299,10 +299,12 @@ static void zbud_free_and_delist(struct zbud_hdr *zh)
struct zbud_page *zbpg =
container_of(zh, struct zbud_page, buddy[budnum]);

+ spin_lock(&zbud_budlists_spinlock);
spin_lock(&zbpg->lock);
if (list_empty(&zbpg->bud_list)) {
/* ignore zombie page... see zbud_evict_pages() */
spin_unlock(&zbpg->lock);
+ spin_unlock(&zbud_budlists_spinlock);
return;
}
size = zbud_free(zh);
@@ -310,7 +312,6 @@ static void zbud_free_and_delist(struct zbud_hdr *zh)
zh_other = &zbpg->buddy[(budnum == 0) ? 1 : 0];
if (zh_other->size == 0) { /* was unbuddied: unlist and free */
chunks = zbud_size_to_chunks(size) ;
- spin_lock(&zbud_budlists_spinlock);
BUG_ON(list_empty(&zbud_unbuddied[chunks].list));
list_del_init(&zbpg->bud_list);
zbud_unbuddied[chunks].count--;
@@ -318,7 +319,6 @@ static void zbud_free_and_delist(struct zbud_hdr *zh)
zbud_free_raw_page(zbpg);
} else { /* was buddied: move remaining buddy to unbuddied list */
chunks = zbud_size_to_chunks(zh_other->size) ;
- spin_lock(&zbud_budlists_spinlock);
list_del_init(&zbpg->bud_list);
zcache_zbud_buddied_count--;
list_add_tail(&zbpg->bud_list, &zbud_unbuddied[chunks].list);
--
1.7.5.4


2012-02-27 17:24:05

by Dan Magenheimer

[permalink] [raw]
Subject: RE: [PATCH] zcache: avoid AB-BA deadlock condition

> From: Andrea Righi [mailto:[email protected]]
> Sent: Monday, February 20, 2012 5:12 AM
> To: Greg Kroah-Hartman
> Cc: Dan Magenheimer; Seth Jennings; [email protected]; [email protected];
> [email protected]
> Subject: [PATCH] zcache: avoid AB-BA deadlock condition
>
> Commit 9256a47 fixed a deadlock condition, being sure that the buddy
> list spinlock is always taken before the page spinlock.
>
> However in zbud_free_and_delist() locking order is the opposite
> (page lock -> list lock).
>
> Possible unsafe locking scenario (reported by lockdep):
>
> CPU0 CPU1
> ---- ----
> lock(&(&zbpg->lock)->rlock);
> lock(zbud_budlists_spinlock);
> lock(&(&zbpg->lock)->rlock);
> lock(zbud_budlists_spinlock);
>
> Fix by grabbing the locks in opposite order in zbud_free_and_delist().
>
> Signed-off-by: Andrea Righi <[email protected]>

Acked-by: Dan Magenheimer <[email protected]>

Thanks for catching this Andrea! (And thanks also to
Alex Vallacis-Lasso for independently reporting and testing:
http://permalink.gmane.org/gmane.linux.kernel/1257214 )

Greg, this patch could be targeted for 3.3-rc6 and 3.2-stable.
AFAIK, nobody has actually experienced a deadlock from this so
if Linus has the screws down tight for -rc6, it could wait
until the 3.4 window.

> ---
> drivers/staging/zcache/zcache-main.c | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/staging/zcache/zcache-main.c b/drivers/staging/zcache/zcache-main.c
> index ef7c52b..dce04be 100644
> --- a/drivers/staging/zcache/zcache-main.c
> +++ b/drivers/staging/zcache/zcache-main.c
> @@ -299,10 +299,12 @@ static void zbud_free_and_delist(struct zbud_hdr *zh)
> struct zbud_page *zbpg =
> container_of(zh, struct zbud_page, buddy[budnum]);
>
> + spin_lock(&zbud_budlists_spinlock);
> spin_lock(&zbpg->lock);
> if (list_empty(&zbpg->bud_list)) {
> /* ignore zombie page... see zbud_evict_pages() */
> spin_unlock(&zbpg->lock);
> + spin_unlock(&zbud_budlists_spinlock);
> return;
> }
> size = zbud_free(zh);
> @@ -310,7 +312,6 @@ static void zbud_free_and_delist(struct zbud_hdr *zh)
> zh_other = &zbpg->buddy[(budnum == 0) ? 1 : 0];
> if (zh_other->size == 0) { /* was unbuddied: unlist and free */
> chunks = zbud_size_to_chunks(size) ;
> - spin_lock(&zbud_budlists_spinlock);
> BUG_ON(list_empty(&zbud_unbuddied[chunks].list));
> list_del_init(&zbpg->bud_list);
> zbud_unbuddied[chunks].count--;
> @@ -318,7 +319,6 @@ static void zbud_free_and_delist(struct zbud_hdr *zh)
> zbud_free_raw_page(zbpg);
> } else { /* was buddied: move remaining buddy to unbuddied list */
> chunks = zbud_size_to_chunks(zh_other->size) ;
> - spin_lock(&zbud_budlists_spinlock);
> list_del_init(&zbpg->bud_list);
> zcache_zbud_buddied_count--;
> list_add_tail(&zbpg->bud_list, &zbud_unbuddied[chunks].list);
> --
> 1.7.5.4

2012-02-27 17:29:35

by Andrea Righi

[permalink] [raw]
Subject: Re: [PATCH] zcache: avoid AB-BA deadlock condition

On Mon, Feb 27, 2012 at 09:23:24AM -0800, Dan Magenheimer wrote:
> > From: Andrea Righi [mailto:[email protected]]
> > Sent: Monday, February 20, 2012 5:12 AM
> > To: Greg Kroah-Hartman
> > Cc: Dan Magenheimer; Seth Jennings; [email protected]; [email protected];
> > [email protected]
> > Subject: [PATCH] zcache: avoid AB-BA deadlock condition
> >
> > Commit 9256a47 fixed a deadlock condition, being sure that the buddy
> > list spinlock is always taken before the page spinlock.
> >
> > However in zbud_free_and_delist() locking order is the opposite
> > (page lock -> list lock).
> >
> > Possible unsafe locking scenario (reported by lockdep):
> >
> > CPU0 CPU1
> > ---- ----
> > lock(&(&zbpg->lock)->rlock);
> > lock(zbud_budlists_spinlock);
> > lock(&(&zbpg->lock)->rlock);
> > lock(zbud_budlists_spinlock);
> >
> > Fix by grabbing the locks in opposite order in zbud_free_and_delist().
> >
> > Signed-off-by: Andrea Righi <[email protected]>
>
> Acked-by: Dan Magenheimer <[email protected]>
>
> Thanks for catching this Andrea! (And thanks also to
> Alex Vallacis-Lasso for independently reporting and testing:
> http://permalink.gmane.org/gmane.linux.kernel/1257214 )
>
> Greg, this patch could be targeted for 3.3-rc6 and 3.2-stable.
> AFAIK, nobody has actually experienced a deadlock from this so
> if Linus has the screws down tight for -rc6, it could wait
> until the 3.4 window.

If it helps, without the fix I can easily trigger the lockdep splat
running a echo 3 > /proc/sys/vm/drop_caches.

-Andrea