0) Looking for clues to solve a problem I ran into, I noticed something
odd in block/blk-ioc.c:put_io_context(). It seems it accesses the atomic
variable ioc->refcount twice in a way which suggests things might race.
1) Code is more exact than words, so this (entirely untested) patch to
solve this possible race might describe better what this is all about:
@@ -33,12 +33,16 @@ static void cfq_dtor(struct io_context *ioc)
*/
int put_io_context(struct io_context *ioc)
{
+ int new;
+
if (ioc == NULL)
return 1;
- BUG_ON(atomic_long_read(&ioc->refcount) == 0);
+ new = atomic_long_dec_return(&ioc->refcount);
+
+ BUG_ON(new < 0);
- if (atomic_long_dec_and_test(&ioc->refcount)) {
+ if (new == 0) {
rcu_read_lock();
cfq_dtor(ioc);
rcu_read_unlock();
Paul Bolle
2011/4/10 Paul Bolle <[email protected]>:
> 0) Looking for clues to solve a problem I ran into, I noticed something
> odd in block/blk-ioc.c:put_io_context(). It seems it accesses the atomic
> variable ioc->refcount twice in a way which suggests things might race.
>
> 1) Code is more exact than words, so this (entirely untested) patch to
> solve this possible race might describe better what this is all about:
>
> @@ -33,12 +33,16 @@ static void cfq_dtor(struct io_context *ioc)
> ?*/
> ?int put_io_context(struct io_context *ioc)
> ?{
> + ? ? ? int new;
> +
> ? ? ? ?if (ioc == NULL)
> ? ? ? ? ? ? ? ?return 1;
>
> - ? ? ? BUG_ON(atomic_long_read(&ioc->refcount) == 0);
> + ? ? ? new = atomic_long_dec_return(&ioc->refcount);
> +
> + ? ? ? BUG_ON(new < 0);
>
> - ? ? ? if (atomic_long_dec_and_test(&ioc->refcount)) {
> + ? ? ? if (new == 0) {
> ? ? ? ? ? ? ? ?rcu_read_lock();
> ? ? ? ? ? ? ? ?cfq_dtor(ioc);
> ? ? ? ? ? ? ? ?rcu_read_unlock();
>
so you hit this line?
BUG_ON(atomic_long_read(&ioc->refcount) == 0);
this suggests something else is already wrong, you should fix that.
On 2011-04-11 03:54, Shaohua Li wrote:
> 2011/4/10 Paul Bolle <[email protected]>:
>> 0) Looking for clues to solve a problem I ran into, I noticed something
>> odd in block/blk-ioc.c:put_io_context(). It seems it accesses the atomic
>> variable ioc->refcount twice in a way which suggests things might race.
>>
>> 1) Code is more exact than words, so this (entirely untested) patch to
>> solve this possible race might describe better what this is all about:
>>
>> @@ -33,12 +33,16 @@ static void cfq_dtor(struct io_context *ioc)
>> */
>> int put_io_context(struct io_context *ioc)
>> {
>> + int new;
>> +
>> if (ioc == NULL)
>> return 1;
>>
>> - BUG_ON(atomic_long_read(&ioc->refcount) == 0);
>> + new = atomic_long_dec_return(&ioc->refcount);
>> +
>> + BUG_ON(new < 0);
>>
>> - if (atomic_long_dec_and_test(&ioc->refcount)) {
>> + if (new == 0) {
>> rcu_read_lock();
>> cfq_dtor(ioc);
>> rcu_read_unlock();
>>
> so you hit this line?
> BUG_ON(atomic_long_read(&ioc->refcount) == 0);
> this suggests something else is already wrong, you should fix that.
Indeed, there is nothing wrong with having the BUG_ON() there first and
doing the decrement later. If the BUG_ON() is hit, then it's not a race
conditon - it's a plain bug in the code.
--
Jens Axboe
On Mon, 2011-04-11 at 09:42 +0200, Jens Axboe wrote:
> Indeed, there is nothing wrong with having the BUG_ON() there first and
> doing the decrement later.
But what makes sure then that refcount doesn't get decremented by
something else just before the atomic_long_dec_and_test() call. Eg:
Thread 1 Thread 2
======== ========
BUG_ON()
BUG_ON()
atomic_long_dec_and_test()
atomic_long_dec_and_test()
/* refcount drops to -1 here */
Or is this not possible?
> If the BUG_ON() is hit, then it's not a race
> conditon - it's a plain bug in the code.
I haven't hit that BUG_ON(), I'm just wondering why an atomic variable
is accessed twice in the same function.
Paul Bolle
On 2011-04-11 10:45, Paul Bolle wrote:
> On Mon, 2011-04-11 at 09:42 +0200, Jens Axboe wrote:
>> Indeed, there is nothing wrong with having the BUG_ON() there first and
>> doing the decrement later.
>
> But what makes sure then that refcount doesn't get decremented by
> something else just before the atomic_long_dec_and_test() call. Eg:
>
> Thread 1 Thread 2
> ======== ========
> BUG_ON()
> BUG_ON()
> atomic_long_dec_and_test()
> atomic_long_dec_and_test()
> /* refcount drops to -1 here */
>
> Or is this not possible?
It's not possible, if it was then that would be the bug - someone
releasing a reference to the ioc that they do not hold. And that is what
the BUG_ON() is there to catch, not a race between two threads.
--
Jens Axboe