2015-08-07 08:03:53

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH] zram: fix possible race when checking idle_strm

Currently, when we enter the wait state due to lack of idle stream,
we check idle_strm list without holding the lock in expanding of
wait_event define. In this case, some one can see stale value and
process could fall into wait state without any upcoming wakeup process.
Although I didn't see any error related to this race, it should be fixed.

To fix it, we should check idle_strm with holding the lock and
this patch implements it.

Signed-off-by: Joonsoo Kim <[email protected]>
---
drivers/block/zram/zcomp.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
index 80a62e7..837e9c3 100644
--- a/drivers/block/zram/zcomp.c
+++ b/drivers/block/zram/zcomp.c
@@ -76,6 +76,17 @@ static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp)
return zstrm;
}

+static bool avail_idle_strm(struct zcomp_strm_multi *zs)
+{
+ int avail;
+
+ spin_lock(&zs->strm_lock);
+ avail = !list_empty(&zs->idle_strm);
+ spin_unlock(&zs->strm_lock);
+
+ return avail;
+}
+
/*
* get idle zcomp_strm or wait until other process release
* (zcomp_strm_release()) one for us
@@ -97,7 +108,7 @@ static struct zcomp_strm *zcomp_strm_multi_find(struct zcomp *comp)
/* zstrm streams limit reached, wait for idle stream */
if (zs->avail_strm >= zs->max_strm) {
spin_unlock(&zs->strm_lock);
- wait_event(zs->strm_wait, !list_empty(&zs->idle_strm));
+ wait_event(zs->strm_wait, avail_idle_strm(zs));
continue;
}
/* allocate new zstrm stream */
@@ -109,7 +120,7 @@ static struct zcomp_strm *zcomp_strm_multi_find(struct zcomp *comp)
spin_lock(&zs->strm_lock);
zs->avail_strm--;
spin_unlock(&zs->strm_lock);
- wait_event(zs->strm_wait, !list_empty(&zs->idle_strm));
+ wait_event(zs->strm_wait, avail_idle_strm(zs));
continue;
}
break;
--
1.9.1


2015-08-07 09:14:30

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH] zram: fix possible race when checking idle_strm

On (08/07/15 17:03), Joonsoo Kim wrote:
> Currently, when we enter the wait state due to lack of idle stream,
> we check idle_strm list without holding the lock in expanding of
> wait_event define. In this case, some one can see stale value and
> process could fall into wait state without any upcoming wakeup process.

hm... I need to think about it more.

we do wake_up every time we put stream back to the list

zcomp_strm_multi_release():

spin_lock(&zs->strm_lock);
if (zs->avail_strm <= zs->max_strm) {
list_add(&zstrm->list, &zs->idle_strm);
spin_unlock(&zs->strm_lock);
wake_up(&zs->strm_wait);
return;
}


but I can probably see what you mean... in some very extreme case,
though. I can't even formulate it... eh... we use a multi stream
backend with ->max_strm == 1 and there are two processes, one
just falsely passed the wait_event() `if (condition)' check, the
other one just put stream back to ->idle_strm and called wake_up(),
but the first process hasn't yet executed prepare_to_wait_event()
so it might miss a wakeup. and there should be no other process
doing read or write operation. otherwise, there will be wakeup
eventually.

is this the case you were thinking of?... then yes, this spinlock
may help.

hm... we also can simply forbid downgrading a multi stream backend
to a single stream (in terms of performance it is much slower
than a real single steam anyway). will this do the trick? if we
have more than 1 stream and idle list then there will be more that
one wakeup. and every woken up task will call wakeup once it put
stream.

I don't think I'll be able to check the email until this Sunday,
so here is my

Acked-by: Sergey Senozhatsky <[email protected]>

to this patch if Minchan decides that `forbid downgrading` is
hacky or doesn't work.

very nice finding, Joonsoo!


p.s. hm... we can take a look at 'forbid downgrading a multi
stream backend to a single-streamed'.

-ss

2015-08-07 09:18:57

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH] zram: fix possible race when checking idle_strm

On (08/07/15 18:14), Sergey Senozhatsky wrote:
[..]
> hm... we also can simply forbid downgrading a multi stream backend
> to a single stream (in terms of performance it is much slower
> than a real single steam anyway). will this do the trick? if we
> have more than 1 stream and idle list then there will be more that
> one wakeup. and every woken up task will call wakeup once it put
> stream.

this is unreadable, sorry.

should be:
"if we have more than 1 stream and _empty_ idle list then there
will be more than one wakeup. and every woken up task will call
wakeup once it puts stream back to the idle list."

-ss

2015-08-07 09:36:46

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH] zram: fix possible race when checking idle_strm

On (08/07/15 17:03), Joonsoo Kim wrote:
> Currently, when we enter the wait state due to lack of idle stream,
> we check idle_strm list without holding the lock in expanding of
> wait_event define. In this case, some one can see stale value and
> process could fall into wait state without any upcoming wakeup process.
> Although I didn't see any error related to this race, it should be fixed.
>
> To fix it, we should check idle_strm with holding the lock and
> this patch implements it.
>
> Signed-off-by: Joonsoo Kim <[email protected]>

Acked-by: Sergey Senozhatsky <[email protected]>

> ---
> drivers/block/zram/zcomp.c | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
> index 80a62e7..837e9c3 100644
> --- a/drivers/block/zram/zcomp.c
> +++ b/drivers/block/zram/zcomp.c
> @@ -76,6 +76,17 @@ static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp)
> return zstrm;
> }
>
> +static bool avail_idle_strm(struct zcomp_strm_multi *zs)
> +{
> + int avail;
> +
> + spin_lock(&zs->strm_lock);
> + avail = !list_empty(&zs->idle_strm);
> + spin_unlock(&zs->strm_lock);
> +
> + return avail;
> +}
> +
> /*
> * get idle zcomp_strm or wait until other process release
> * (zcomp_strm_release()) one for us
> @@ -97,7 +108,7 @@ static struct zcomp_strm *zcomp_strm_multi_find(struct zcomp *comp)
> /* zstrm streams limit reached, wait for idle stream */
> if (zs->avail_strm >= zs->max_strm) {
> spin_unlock(&zs->strm_lock);
> - wait_event(zs->strm_wait, !list_empty(&zs->idle_strm));
> + wait_event(zs->strm_wait, avail_idle_strm(zs));
> continue;
> }
> /* allocate new zstrm stream */
> @@ -109,7 +120,7 @@ static struct zcomp_strm *zcomp_strm_multi_find(struct zcomp *comp)
> spin_lock(&zs->strm_lock);
> zs->avail_strm--;
> spin_unlock(&zs->strm_lock);
> - wait_event(zs->strm_wait, !list_empty(&zs->idle_strm));
> + wait_event(zs->strm_wait, avail_idle_strm(zs));
> continue;
> }
> break;
> --

2015-08-07 09:37:35

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH] zram: fix possible race when checking idle_strm

On (08/07/15 18:19), Sergey Senozhatsky wrote:
> [..]
> > hm... we also can simply forbid downgrading a multi stream backend
> > to a single stream (in terms of performance it is much slower
> > than a real single steam anyway). will this do the trick? if we
> > have more than 1 stream and idle list then there will be more that
> > one wakeup. and every woken up task will call wakeup once it put
> > stream.
>
> this is unreadable, sorry.
>
> should be:
> "if we have more than 1 stream and _empty_ idle list then there
> will be more than one wakeup. and every woken up task will call
> wakeup once it puts stream back to the idle list."

no, nevermind. I think it sucks.

-ss

2015-08-07 09:57:43

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH] zram: fix possible race when checking idle_strm

On (08/07/15 18:14), Sergey Senozhatsky wrote:
> hm... I need to think about it more.
>
> we do wake_up every time we put stream back to the list
>
> zcomp_strm_multi_release():
>
> spin_lock(&zs->strm_lock);
> if (zs->avail_strm <= zs->max_strm) {
> list_add(&zstrm->list, &zs->idle_strm);
> spin_unlock(&zs->strm_lock);
> wake_up(&zs->strm_wait);
> return;
> }
>
>
> but I can probably see what you mean... in some very extreme case,
> though. I can't even formulate it... eh... we use a multi stream
> backend with ->max_strm == 1 and there are two processes, one
> just falsely passed the wait_event() `if (condition)' check, the
> other one just put stream back to ->idle_strm and called wake_up(),
> but the first process hasn't yet executed prepare_to_wait_event()
> so it might miss a wakeup. and there should be no other process
> doing read or write operation. otherwise, there will be wakeup
> eventually.
>
> is this the case you were thinking of?... then yes, this spinlock
> may help.
>

on the other hand... it's actually

wait_event() is

if (condition)
break;
prepare_to_wait_event(&wq, &__wait, state)
if (condition)
break;
schedule();

if first condition check was false and we missed a wakeup call between
first condition and prepare_to_wait_event(), then second condition
check should do the trick I think (or you expect that second condition
check may be wrongly pre-fetched or something).
if wakeup arrives after prepare_to_wait_event(), then we are fine by
defintion.

so, I'm puzzled a bit. do we have a problem or we are ok.

-ss

2015-08-07 10:15:35

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH] zram: fix possible race when checking idle_strm

On (08/07/15 18:37), Sergey Senozhatsky wrote:
[..]
> > Signed-off-by: Joonsoo Kim <[email protected]>
>
> Acked-by: Sergey Senozhatsky <[email protected]>

Andrew, my apologies, can we hold on a bit with my Acked-by. I
need more time to think about the issue -- do we actually have
one or we don't.

It seems that to be on a safe side this check is better be done
under the lock, but at the same time I sort of fail to convenience
myself that doing this check outside the lock introduces any danger.
Need some rest. May be Joonsoo and Minchan can add some bits
(that I currently miss) or thoughts.

-ss

> > ---
> > drivers/block/zram/zcomp.c | 15 +++++++++++++--
> > 1 file changed, 13 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
> > index 80a62e7..837e9c3 100644
> > --- a/drivers/block/zram/zcomp.c
> > +++ b/drivers/block/zram/zcomp.c
> > @@ -76,6 +76,17 @@ static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp)
> > return zstrm;
> > }
> >
> > +static bool avail_idle_strm(struct zcomp_strm_multi *zs)
> > +{
> > + int avail;
> > +
> > + spin_lock(&zs->strm_lock);
> > + avail = !list_empty(&zs->idle_strm);
> > + spin_unlock(&zs->strm_lock);
> > +
> > + return avail;
> > +}
> > +
> > /*
> > * get idle zcomp_strm or wait until other process release
> > * (zcomp_strm_release()) one for us
> > @@ -97,7 +108,7 @@ static struct zcomp_strm *zcomp_strm_multi_find(struct zcomp *comp)
> > /* zstrm streams limit reached, wait for idle stream */
> > if (zs->avail_strm >= zs->max_strm) {
> > spin_unlock(&zs->strm_lock);
> > - wait_event(zs->strm_wait, !list_empty(&zs->idle_strm));
> > + wait_event(zs->strm_wait, avail_idle_strm(zs));
> > continue;
> > }
> > /* allocate new zstrm stream */
> > @@ -109,7 +120,7 @@ static struct zcomp_strm *zcomp_strm_multi_find(struct zcomp *comp)
> > spin_lock(&zs->strm_lock);
> > zs->avail_strm--;
> > spin_unlock(&zs->strm_lock);
> > - wait_event(zs->strm_wait, !list_empty(&zs->idle_strm));
> > + wait_event(zs->strm_wait, avail_idle_strm(zs));
> > continue;
> > }
> > break;
> > --
>

2015-08-07 14:49:18

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH] zram: fix possible race when checking idle_strm

Hi Joonsoo,

On Fri, Aug 07, 2015 at 05:03:29PM +0900, Joonsoo Kim wrote:
> Currently, when we enter the wait state due to lack of idle stream,
> we check idle_strm list without holding the lock in expanding of
> wait_event define. In this case, some one can see stale value and
> process could fall into wait state without any upcoming wakeup process.
> Although I didn't see any error related to this race, it should be fixed.

Long time ago, I wondered about lost wake-up problem and found a article.

http://www.linuxjournal.com/article/8144

>From then, I have thought such issue shouldn't happen if something is
wrong since then and I believe it's same issue.

Could you point out exact code sequence about the problem you mentioned?

Thanks.

>
> To fix it, we should check idle_strm with holding the lock and
> this patch implements it.
>
> Signed-off-by: Joonsoo Kim <[email protected]>
> ---
> drivers/block/zram/zcomp.c | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
> index 80a62e7..837e9c3 100644
> --- a/drivers/block/zram/zcomp.c
> +++ b/drivers/block/zram/zcomp.c
> @@ -76,6 +76,17 @@ static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp)
> return zstrm;
> }
>
> +static bool avail_idle_strm(struct zcomp_strm_multi *zs)
> +{
> + int avail;
> +
> + spin_lock(&zs->strm_lock);
> + avail = !list_empty(&zs->idle_strm);
> + spin_unlock(&zs->strm_lock);
> +
> + return avail;
> +}
> +
> /*
> * get idle zcomp_strm or wait until other process release
> * (zcomp_strm_release()) one for us
> @@ -97,7 +108,7 @@ static struct zcomp_strm *zcomp_strm_multi_find(struct zcomp *comp)
> /* zstrm streams limit reached, wait for idle stream */
> if (zs->avail_strm >= zs->max_strm) {
> spin_unlock(&zs->strm_lock);
> - wait_event(zs->strm_wait, !list_empty(&zs->idle_strm));
> + wait_event(zs->strm_wait, avail_idle_strm(zs));
> continue;
> }
> /* allocate new zstrm stream */
> @@ -109,7 +120,7 @@ static struct zcomp_strm *zcomp_strm_multi_find(struct zcomp *comp)
> spin_lock(&zs->strm_lock);
> zs->avail_strm--;
> spin_unlock(&zs->strm_lock);
> - wait_event(zs->strm_wait, !list_empty(&zs->idle_strm));
> + wait_event(zs->strm_wait, avail_idle_strm(zs));
> continue;
> }
> break;
> --
> 1.9.1
>

--
Kind regards,
Minchan Kim

2015-08-10 00:27:05

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH] zram: fix possible race when checking idle_strm

On Fri, Aug 07, 2015 at 06:58:16PM +0900, Sergey Senozhatsky wrote:
> On (08/07/15 18:14), Sergey Senozhatsky wrote:
> > hm... I need to think about it more.
> >
> > we do wake_up every time we put stream back to the list
> >
> > zcomp_strm_multi_release():
> >
> > spin_lock(&zs->strm_lock);
> > if (zs->avail_strm <= zs->max_strm) {
> > list_add(&zstrm->list, &zs->idle_strm);
> > spin_unlock(&zs->strm_lock);
> > wake_up(&zs->strm_wait);
> > return;
> > }
> >
> >
> > but I can probably see what you mean... in some very extreme case,
> > though. I can't even formulate it... eh... we use a multi stream
> > backend with ->max_strm == 1 and there are two processes, one
> > just falsely passed the wait_event() `if (condition)' check, the
> > other one just put stream back to ->idle_strm and called wake_up(),
> > but the first process hasn't yet executed prepare_to_wait_event()
> > so it might miss a wakeup. and there should be no other process
> > doing read or write operation. otherwise, there will be wakeup
> > eventually.
> >
> > is this the case you were thinking of?... then yes, this spinlock
> > may help.
> >
>
> on the other hand... it's actually
>
> wait_event() is
>
> if (condition)
> break;
> prepare_to_wait_event(&wq, &__wait, state)
> if (condition)
> break;
> schedule();
>
> if first condition check was false and we missed a wakeup call between
> first condition and prepare_to_wait_event(), then second condition
> check should do the trick I think (or you expect that second condition
> check may be wrongly pre-fetched or something).

Hello, Sergey.

This is what I thought.
I expected that second condition can be false if compiler reuse result
of first check for optimization. I guess that there is no prevention
for this kind of optimization.

So, following is the problem sequence I thought.
T1 means thread 1, T2 means another thread, 2.

<T1-1> check if idle_strm is empty or not with holding the lock
<T1-2> It is empty so do spin_unlock and run wait_event macro
<T1-3> check if idle_strm is empty or not
<T1-4> It is still empty

<T2-1> do strm release
<T2-2> call wake_up

<T1-5> add T1 to wait queue
<T1-6> check if idle_strm is empty or not
<T1-7> compiler reuse <T1-4>'s result or CPU just fetch cached
result so T1 starts waiting

In this case, T1 can be sleep permanently. To prevent compiler
optimization or fetching cached value, we need a lock here.

Thanks.

2015-08-10 00:29:45

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH] zram: fix possible race when checking idle_strm

On Fri, Aug 07, 2015 at 11:49:04PM +0900, Minchan Kim wrote:
> Hi Joonsoo,
>
> On Fri, Aug 07, 2015 at 05:03:29PM +0900, Joonsoo Kim wrote:
> > Currently, when we enter the wait state due to lack of idle stream,
> > we check idle_strm list without holding the lock in expanding of
> > wait_event define. In this case, some one can see stale value and
> > process could fall into wait state without any upcoming wakeup process.
> > Although I didn't see any error related to this race, it should be fixed.
>
> Long time ago, I wondered about lost wake-up problem and found a article.
>
> http://www.linuxjournal.com/article/8144
>
> >From then, I have thought such issue shouldn't happen if something is
> wrong since then and I believe it's same issue.
>
> Could you point out exact code sequence about the problem you mentioned?

Hello,

I describe exact code sequence in reply of Sergey's mail.
It would fully describe the problem.

Thanks.

2015-08-10 02:16:28

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH] zram: fix possible race when checking idle_strm

Hello Joonsoo,

On (08/10/15 09:32), Joonsoo Kim wrote:
> > on the other hand... it's actually
> >
> > wait_event() is
> >
> > if (condition)
> > break;
> > prepare_to_wait_event(&wq, &__wait, state)
> > if (condition)
> > break;
> > schedule();
> >
> > if first condition check was false and we missed a wakeup call between
> > first condition and prepare_to_wait_event(), then second condition
> > check should do the trick I think (or you expect that second condition
> > check may be wrongly pre-fetched or something).
>
...
> I expected that second condition can be false if compiler reuse result
> of first check for optimization. I guess that there is no prevention
> for this kind of optimization.

hm... so we have outer and inner checks (out of loop and inside of loop).
can compiler decide that outer and inner checks are equivalent here?

#define wait_event(wq, condition) \
do { \
might_sleep(); \
if (condition) \
break; \
....
for (;;) { \
long __int = prepare_to_wait_event(&wq, &__wait, state);\
\
if (condition) \
break; \
\
if (___wait_is_interruptible(state) && __int) { \
__ret = __int; \
if (exclusive) { \
abort_exclusive_wait(&wq, &__wait, \
state, NULL); \
goto __out; \
} \
break; \
} \
\
cmd; \
} \
....
} while (0)


I probably don't have enough knowledge about compilers; but I think it
must keep two checks. But I may be wrong.

just out of curiosity, a quick grep

wait_event(zatm_vcc->tx_wait, !skb_peek(&zatm_vcc->tx_queue))
wait_event(pmu->recv.wait, (pmu->recv.process == 0))
wait_event(ep->com.waitq, ep->com.rpl_done)
wait_event(cs->waitqueue, !cs->waiting)
wait_event(resync_wait, (mddev->sync_thread == NULL &&...
wait_event(mddev->sb_wait, mddev->flags == 0 ||...

and so on.

-ss

> So, following is the problem sequence I thought.
> T1 means thread 1, T2 means another thread, 2.
>
> <T1-1> check if idle_strm is empty or not with holding the lock
> <T1-2> It is so do spin_unlock and run wait_event macro
> <T1-3> check if idle_strm is empty or not
> <T1-4> It is still empty
>
> <T2-1> do strm release
> <T2-2> call wake_up
>
> <T1-5> add T1 to wait queue
> <T1-6> check if idle_strm is empty or not
> <T1-7> compiler reuse <T1-4>'s result or CPU just fetch cached
> result so T1 starts waiting
>
> In this case, T1 can be sleep permanently. To prevent compiler
> optimization or fetching cached value, we need a lock here.

-ss

2015-08-10 23:26:44

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH] zram: fix possible race when checking idle_strm

Hi Joonsoo,

On Mon, Aug 10, 2015 at 09:32:30AM +0900, Joonsoo Kim wrote:
> On Fri, Aug 07, 2015 at 06:58:16PM +0900, Sergey Senozhatsky wrote:
> > On (08/07/15 18:14), Sergey Senozhatsky wrote:
> > > hm... I need to think about it more.
> > >
> > > we do wake_up every time we put stream back to the list
> > >
> > > zcomp_strm_multi_release():
> > >
> > > spin_lock(&zs->strm_lock);
> > > if (zs->avail_strm <= zs->max_strm) {
> > > list_add(&zstrm->list, &zs->idle_strm);
> > > spin_unlock(&zs->strm_lock);
> > > wake_up(&zs->strm_wait);
> > > return;
> > > }
> > >
> > >
> > > but I can probably see what you mean... in some very extreme case,
> > > though. I can't even formulate it... eh... we use a multi stream
> > > backend with ->max_strm == 1 and there are two processes, one
> > > just falsely passed the wait_event() `if (condition)' check, the
> > > other one just put stream back to ->idle_strm and called wake_up(),
> > > but the first process hasn't yet executed prepare_to_wait_event()
> > > so it might miss a wakeup. and there should be no other process
> > > doing read or write operation. otherwise, there will be wakeup
> > > eventually.
> > >
> > > is this the case you were thinking of?... then yes, this spinlock
> > > may help.
> > >
> >
> > on the other hand... it's actually
> >
> > wait_event() is
> >
> > if (condition)
> > break;
> > prepare_to_wait_event(&wq, &__wait, state)
> > if (condition)
> > break;
> > schedule();
> >
> > if first condition check was false and we missed a wakeup call between
> > first condition and prepare_to_wait_event(), then second condition
> > check should do the trick I think (or you expect that second condition
> > check may be wrongly pre-fetched or something).
>
> Hello, Sergey.
>
> This is what I thought.
> I expected that second condition can be false if compiler reuse result
> of first check for optimization. I guess that there is no prevention
> for this kind of optimization.
>
> So, following is the problem sequence I thought.
> T1 means thread 1, T2 means another thread, 2.
>
> <T1-1> check if idle_strm is empty or not with holding the lock
> <T1-2> It is empty so do spin_unlock and run wait_event macro
> <T1-3> check if idle_strm is empty or not
> <T1-4> It is still empty
>
> <T2-1> do strm release
> <T2-2> call wake_up
>
> <T1-5> add T1 to wait queue
> <T1-6> check if idle_strm is empty or not
> <T1-7> compiler reuse <T1-4>'s result or CPU just fetch cached
> result so T1 starts waiting
>
> In this case, T1 can be sleep permanently. To prevent compiler
> optimization or fetching cached value, we need a lock here.

When I read Documentation/memory-barrier.txt, it shouldn't happen.

"All memory barriers except the data dependency barriers imply a compiler
barrier. Data dependencies do not impose any additional compiler ordering."

"SLEEP AND WAKE-UP FUNCTIONS
---------------------------

Sleeping and waking on an event flagged in global data ...
...
...
...

A general memory barrier is interpolated automatically by set_current_state()
after it has altered the task state:"

So I think your T1-7 assumption is not true.

As well, there are many examples under drivers/ to use the global data
as event flag without locking or atomic.

Just in case, Ccing Paul and Peter.

2015-08-11 08:24:53

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH] zram: fix possible race when checking idle_strm

On (08/11/15 17:25), Joonsoo Kim wrote:
[..]
> > "SLEEP AND WAKE-UP FUNCTIONS
> > ---------------------------
> >
> > Sleeping and waking on an event flagged in global data ...
> > ...
> > ...
> > ...
> >
> > A general memory barrier is interpolated automatically by set_current_state()
> > after it has altered the task state:"
> >
> > So I think your T1-7 assumption is not true.
> >
> > As well, there are many examples under drivers/ to use the global data
> > as event flag without locking or atomic.
> >
>
> Okay. Now, I'm convinced that race is not possible. I will drop this
> patch.

yep, Minchan found it first. thanks guys.

-ss

2015-08-11 08:19:56

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH] zram: fix possible race when checking idle_strm

On Tue, Aug 11, 2015 at 08:26:33AM +0900, Minchan Kim wrote:
> Hi Joonsoo,
>
> On Mon, Aug 10, 2015 at 09:32:30AM +0900, Joonsoo Kim wrote:
> > On Fri, Aug 07, 2015 at 06:58:16PM +0900, Sergey Senozhatsky wrote:
> > > On (08/07/15 18:14), Sergey Senozhatsky wrote:
> > > > hm... I need to think about it more.
> > > >
> > > > we do wake_up every time we put stream back to the list
> > > >
> > > > zcomp_strm_multi_release():
> > > >
> > > > spin_lock(&zs->strm_lock);
> > > > if (zs->avail_strm <= zs->max_strm) {
> > > > list_add(&zstrm->list, &zs->idle_strm);
> > > > spin_unlock(&zs->strm_lock);
> > > > wake_up(&zs->strm_wait);
> > > > return;
> > > > }
> > > >
> > > >
> > > > but I can probably see what you mean... in some very extreme case,
> > > > though. I can't even formulate it... eh... we use a multi stream
> > > > backend with ->max_strm == 1 and there are two processes, one
> > > > just falsely passed the wait_event() `if (condition)' check, the
> > > > other one just put stream back to ->idle_strm and called wake_up(),
> > > > but the first process hasn't yet executed prepare_to_wait_event()
> > > > so it might miss a wakeup. and there should be no other process
> > > > doing read or write operation. otherwise, there will be wakeup
> > > > eventually.
> > > >
> > > > is this the case you were thinking of?... then yes, this spinlock
> > > > may help.
> > > >
> > >
> > > on the other hand... it's actually
> > >
> > > wait_event() is
> > >
> > > if (condition)
> > > break;
> > > prepare_to_wait_event(&wq, &__wait, state)
> > > if (condition)
> > > break;
> > > schedule();
> > >
> > > if first condition check was false and we missed a wakeup call between
> > > first condition and prepare_to_wait_event(), then second condition
> > > check should do the trick I think (or you expect that second condition
> > > check may be wrongly pre-fetched or something).
> >
> > Hello, Sergey.
> >
> > This is what I thought.
> > I expected that second condition can be false if compiler reuse result
> > of first check for optimization. I guess that there is no prevention
> > for this kind of optimization.
> >
> > So, following is the problem sequence I thought.
> > T1 means thread 1, T2 means another thread, 2.
> >
> > <T1-1> check if idle_strm is empty or not with holding the lock
> > <T1-2> It is empty so do spin_unlock and run wait_event macro
> > <T1-3> check if idle_strm is empty or not
> > <T1-4> It is still empty
> >
> > <T2-1> do strm release
> > <T2-2> call wake_up
> >
> > <T1-5> add T1 to wait queue
> > <T1-6> check if idle_strm is empty or not
> > <T1-7> compiler reuse <T1-4>'s result or CPU just fetch cached
> > result so T1 starts waiting
> >
> > In this case, T1 can be sleep permanently. To prevent compiler
> > optimization or fetching cached value, we need a lock here.
>
> When I read Documentation/memory-barrier.txt, it shouldn't happen.
>
> "All memory barriers except the data dependency barriers imply a compiler
> barrier. Data dependencies do not impose any additional compiler ordering."
>
> "SLEEP AND WAKE-UP FUNCTIONS
> ---------------------------
>
> Sleeping and waking on an event flagged in global data ...
> ...
> ...
> ...
>
> A general memory barrier is interpolated automatically by set_current_state()
> after it has altered the task state:"
>
> So I think your T1-7 assumption is not true.
>
> As well, there are many examples under drivers/ to use the global data
> as event flag without locking or atomic.
>

Okay. Now, I'm convinced that race is not possible. I will drop this
patch.

Thanks.

2015-08-11 08:20:51

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH] zram: fix possible race when checking idle_strm

On Mon, Aug 10, 2015 at 11:16:59AM +0900, Sergey Senozhatsky wrote:
> Hello Joonsoo,
>
> On (08/10/15 09:32), Joonsoo Kim wrote:
> > > on the other hand... it's actually
> > >
> > > wait_event() is
> > >
> > > if (condition)
> > > break;
> > > prepare_to_wait_event(&wq, &__wait, state)
> > > if (condition)
> > > break;
> > > schedule();
> > >
> > > if first condition check was false and we missed a wakeup call between
> > > first condition and prepare_to_wait_event(), then second condition
> > > check should do the trick I think (or you expect that second condition
> > > check may be wrongly pre-fetched or something).
> >
> ...
> > I expected that second condition can be false if compiler reuse result
> > of first check for optimization. I guess that there is no prevention
> > for this kind of optimization.
>
> hm... so we have outer and inner checks (out of loop and inside of loop).
> can compiler decide that outer and inner checks are equivalent here?
>
> #define wait_event(wq, condition) \
> do { \
> might_sleep(); \
> if (condition) \
> break; \
> ....
> for (;;) { \
> long __int = prepare_to_wait_event(&wq, &__wait, state);\
> \
> if (condition) \
> break; \
> \
> if (___wait_is_interruptible(state) && __int) { \
> __ret = __int; \
> if (exclusive) { \
> abort_exclusive_wait(&wq, &__wait, \
> state, NULL); \
> goto __out; \
> } \
> break; \
> } \
> \
> cmd; \
> } \
> ....
> } while (0)
>
>
> I probably don't have enough knowledge about compilers; but I think it
> must keep two checks. But I may be wrong.
>
> just out of curiosity, a quick grep
>
> wait_event(zatm_vcc->tx_wait, !skb_peek(&zatm_vcc->tx_queue))
> wait_event(pmu->recv.wait, (pmu->recv.process == 0))
> wait_event(ep->com.waitq, ep->com.rpl_done)
> wait_event(cs->waitqueue, !cs->waiting)
> wait_event(resync_wait, (mddev->sync_thread == NULL &&...
> wait_event(mddev->sb_wait, mddev->flags == 0 ||...
>
> and so on.

>From Minchan's comment, I think that race is not possible.
I will drop the patch.

Thanks.