2007-08-23 15:30:08

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH] sigqueue_free: fix the race with collect_signal()

Spotted by taoyue <[email protected]> and Jeremy Katz <[email protected]>.

collect_signal: sigqueue_free:

list_del_init(&first->list);
if (!list_empty(&q->list)) {
// not taken
}
q->flags &= ~SIGQUEUE_PREALLOC;

__sigqueue_free(first); __sigqueue_free(q);

Now, __sigqueue_free() is called twice on the same "struct sigqueue" with the
obviously bad implications.

In particular, this double free breaks the array_cache->avail logic, so the
same sigqueue could be "allocated" twice, and the bug can manifest itself via
the "impossible" BUG_ON(!SIGQUEUE_PREALLOC) in sigqueue_free/send_sigqueue.

Hopefully this can explain these mysterious bug-reports, see

http://marc.info/?t=118766926500003
http://marc.info/?t=118466273000005

Alexey Dobriyan reports this patch makes the difference for the testcase, but
nobody has an access to the application which opened the problems originally.

Also, this patch removes tasklist lock/unlock, ->siglock is enough.

Signed-off-by: Oleg Nesterov <[email protected]>

--- t/kernel/signal.c~SQFREE 2007-08-22 20:06:31.000000000 +0400
+++ t/kernel/signal.c 2007-08-23 16:02:57.000000000 +0400
@@ -1297,20 +1297,19 @@ struct sigqueue *sigqueue_alloc(void)
void sigqueue_free(struct sigqueue *q)
{
unsigned long flags;
+ spinlock_t *lock = &current->sighand->siglock;
+
BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
/*
* If the signal is still pending remove it from the
- * pending queue.
+ * pending queue. We must hold ->siglock while testing
+ * q->list to serialize with collect_signal().
*/
- if (unlikely(!list_empty(&q->list))) {
- spinlock_t *lock = &current->sighand->siglock;
- read_lock(&tasklist_lock);
- spin_lock_irqsave(lock, flags);
- if (!list_empty(&q->list))
- list_del_init(&q->list);
- spin_unlock_irqrestore(lock, flags);
- read_unlock(&tasklist_lock);
- }
+ spin_lock_irqsave(lock, flags);
+ if (!list_empty(&q->list))
+ list_del_init(&q->list);
+ spin_unlock_irqrestore(lock, flags);
+
q->flags &= ~SIGQUEUE_PREALLOC;
__sigqueue_free(q);
}


2007-08-23 21:37:52

by Sukadev Bhattiprolu

[permalink] [raw]
Subject: Re: [PATCH] sigqueue_free: fix the race with collect_signal()

Oleg Nesterov wrote:
> Spotted by taoyue <[email protected]> and Jeremy Katz <[email protected]>.
>
> collect_signal: sigqueue_free:
>
> list_del_init(&first->list);
> if (!list_empty(&q->list)) {
> // not taken
> }
> q->flags &= ~SIGQUEUE_PREALLOC;
>
> __sigqueue_free(first); __sigqueue_free(q);
>
> Now, __sigqueue_free() is called twice on the same "struct sigqueue" with the
> obviously bad implications.
>
> In particular, this double free breaks the array_cache->avail logic, so the
> same sigqueue could be "allocated" twice, and the bug can manifest itself via
> the "impossible" BUG_ON(!SIGQUEUE_PREALLOC) in sigqueue_free/send_sigqueue.
>
> Hopefully this can explain these mysterious bug-reports, see
>
> http://marc.info/?t=118766926500003
> http://marc.info/?t=118466273000005
>
> Alexey Dobriyan reports this patch makes the difference for the testcase, but
> nobody has an access to the application which opened the problems originally.
>
> Also, this patch removes tasklist lock/unlock, ->siglock is enough.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
>
> --- t/kernel/signal.c~SQFREE 2007-08-22 20:06:31.000000000 +0400
> +++ t/kernel/signal.c 2007-08-23 16:02:57.000000000 +0400
> @@ -1297,20 +1297,19 @@ struct sigqueue *sigqueue_alloc(void)
> void sigqueue_free(struct sigqueue *q)
> {
> unsigned long flags;
> + spinlock_t *lock = &current->sighand->siglock;
> +
> BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
> /*
> * If the signal is still pending remove it from the
> - * pending queue.
> + * pending queue. We must hold ->siglock while testing
> + * q->list to serialize with collect_signal().
> */
> - if (unlikely(!list_empty(&q->list))) {
> - spinlock_t *lock = &current->sighand->siglock;
> - read_lock(&tasklist_lock);
> - spin_lock_irqsave(lock, flags);
>
Hmm, but the existing code _does_ take the siglock here. Is that not
sufficient ?
Isn't the first list_empty() check without lock only an optimization for
the common
case ?

> - if (!list_empty(&q->list))
> - list_del_init(&q->list);
> - spin_unlock_irqrestore(lock, flags);
> - read_unlock(&tasklist_lock);
> - }
> + spin_lock_irqsave(lock, flags);
> + if (!list_empty(&q->list))
> + list_del_init(&q->list);
> + spin_unlock_irqrestore(lock, flags);
> +
> q->flags &= ~SIGQUEUE_PREALLOC;
> __sigqueue_free(q);
> }
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>


2007-08-23 22:07:01

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] sigqueue_free: fix the race with collect_signal()

On 08/23, Sukadev Bhattiprolu wrote:
>
> Oleg Nesterov wrote:
> >Spotted by taoyue <[email protected]> and Jeremy Katz
> ><[email protected]>.
> >
> >collect_signal: sigqueue_free:
> >
> > list_del_init(&first->list);
> > if (!list_empty(&q->list)) {
> > // not taken
> > }
> > q->flags &=
> > ~SIGQUEUE_PREALLOC;
> >
> > __sigqueue_free(first); __sigqueue_free(q);
> >
> >Now, __sigqueue_free() is called twice on the same "struct sigqueue" with
> >the
> >obviously bad implications.
> >
> >--- t/kernel/signal.c~SQFREE 2007-08-22 20:06:31.000000000 +0400
> >+++ t/kernel/signal.c 2007-08-23 16:02:57.000000000 +0400
> >@@ -1297,20 +1297,19 @@ struct sigqueue *sigqueue_alloc(void)
> > void sigqueue_free(struct sigqueue *q)
> > {
> > unsigned long flags;
> >+ spinlock_t *lock = &current->sighand->siglock;
> >+
> > BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
> > /*
> > * If the signal is still pending remove it from the
> >- * pending queue.
> >+ * pending queue. We must hold ->siglock while testing
> >+ * q->list to serialize with collect_signal().
> > */
> >- if (unlikely(!list_empty(&q->list))) {
> >- spinlock_t *lock = &current->sighand->siglock;
> >- read_lock(&tasklist_lock);
> >- spin_lock_irqsave(lock, flags);
> >
> Hmm, but the existing code _does_ take the siglock here. Is that not
> sufficient ?

Yes, it does, and this is sufficient, so the patch removes tasklist_lock.

> Isn't the first list_empty() check without lock only an optimization for
> the common
> case ?

Yes, this is optimization, but I strongly believe it is wrong. Please look
at the race description above.

!list_empty(&q->list) does _not_ necessary mean that q is not used and we can
free it. It is possible that collect_signal() just removed this sigqueue from
list (list_empty(&q->list) becomes true) and going to free it.

The whole point is: we can't check list_empty() without ->siglock, this is
racy, and leads to double-free.

Oleg.

2007-08-24 02:31:36

by taoyue

[permalink] [raw]
Subject: Re: [PATCH] sigqueue_free: fix the race with collect_signal()

Oleg Nesterov wrote:
> Spotted by taoyue <[email protected]> and Jeremy Katz <[email protected]>.
>
> collect_signal: sigqueue_free:
>
> list_del_init(&first->list);
> if (!list_empty(&q->list)) {
> // not taken
> }
> q->flags &= ~SIGQUEUE_PREALLOC;
>
> __sigqueue_free(first); __sigqueue_free(q);
>
> Now, __sigqueue_free() is called twice on the same "struct sigqueue" with the
> obviously bad implications.
>
> In particular, this double free breaks the array_cache->avail logic, so the
> same sigqueue could be "allocated" twice, and the bug can manifest itself via
> the "impossible" BUG_ON(!SIGQUEUE_PREALLOC) in sigqueue_free/send_sigqueue.
>
> Hopefully this can explain these mysterious bug-reports, see
>
> http://marc.info/?t=118766926500003
> http://marc.info/?t=118466273000005
>
> Alexey Dobriyan reports this patch makes the difference for the testcase, but
> nobody has an access to the application which opened the problems originally.
>
> Also, this patch removes tasklist lock/unlock, ->siglock is enough.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
>
> --- t/kernel/signal.c~SQFREE 2007-08-22 20:06:31.000000000 +0400
> +++ t/kernel/signal.c 2007-08-23 16:02:57.000000000 +0400
> @@ -1297,20 +1297,19 @@ struct sigqueue *sigqueue_alloc(void)
> void sigqueue_free(struct sigqueue *q)
> {
> unsigned long flags;
> + spinlock_t *lock = &current->sighand->siglock;
> +
> BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
> /*
> * If the signal is still pending remove it from the
> - * pending queue.
> + * pending queue. We must hold ->siglock while testing
> + * q->list to serialize with collect_signal().
> */
> - if (unlikely(!list_empty(&q->list))) {
> - spinlock_t *lock = &current->sighand->siglock;
> - read_lock(&tasklist_lock);
> - spin_lock_irqsave(lock, flags);
> - if (!list_empty(&q->list))
> - list_del_init(&q->list);
> - spin_unlock_irqrestore(lock, flags);
> - read_unlock(&tasklist_lock);
> - }
> + spin_lock_irqsave(lock, flags);
> + if (!list_empty(&q->list))
> + list_del_init(&q->list);
> + spin_unlock_irqrestore(lock, flags);
> +
> q->flags &= ~SIGQUEUE_PREALLOC;
> __sigqueue_free(q);
> }
>
>
>
Applying previous patch,it seems likely that the __sigqueue_free() is also called twice.

collect_signal: sigqueue_free:

list_del_init(&first->list);
spin_lock_irqsave(lock, flags);
if (!list_empty(&q->list))
list_del_init(&q->list);
spin_unlock_irqrestore(lock, flags);
q->flags &= ~SIGQUEUE_PREALLOC;

__sigqueue_free(first); __sigqueue_free(q);



yue.tao

2007-08-24 07:46:36

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] sigqueue_free: fix the race with collect_signal()

On 08/24, taoyue wrote:
>
> Oleg Nesterov wrote:
> >
> >--- t/kernel/signal.c~SQFREE 2007-08-22 20:06:31.000000000 +0400
> >+++ t/kernel/signal.c 2007-08-23 16:02:57.000000000 +0400
> >@@ -1297,20 +1297,19 @@ struct sigqueue *sigqueue_alloc(void)
> > void sigqueue_free(struct sigqueue *q)
> > {
> > unsigned long flags;
> >+ spinlock_t *lock = &current->sighand->siglock;
> >+
> > BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
> > /*
> > * If the signal is still pending remove it from the
> >- * pending queue.
> >+ * pending queue. We must hold ->siglock while testing
> >+ * q->list to serialize with collect_signal().
> > */
> >- if (unlikely(!list_empty(&q->list))) {
> >- spinlock_t *lock = &current->sighand->siglock;
> >- read_lock(&tasklist_lock);
> >- spin_lock_irqsave(lock, flags);
> >- if (!list_empty(&q->list))
> >- list_del_init(&q->list);
> >- spin_unlock_irqrestore(lock, flags);
> >- read_unlock(&tasklist_lock);
> >- }
> >+ spin_lock_irqsave(lock, flags);
> >+ if (!list_empty(&q->list))
> >+ list_del_init(&q->list);
> >+ spin_unlock_irqrestore(lock, flags);
> >+
> > q->flags &= ~SIGQUEUE_PREALLOC;
> > __sigqueue_free(q);
> > }
> >
> >
> >
> Applying previous patch???it seems likely that the __sigqueue_free() is
> also called twice.
>
> collect_signal: sigqueue_free:
>
> list_del_init(&first->list);
> spin_lock_irqsave(lock, flags);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> if (!list_empty(&q->list))
> list_del_init(&q->list);
> spin_unlock_irqrestore(lock, flags);
> q->flags &= ~SIGQUEUE_PREALLOC;
>
> __sigqueue_free(first); __sigqueue_free(q);

collect_signal() is always called under ->siglock which is also taken by
sigqueue_free(), so this is not possible.

Basically, this patch is the same one-liner I sent you before

http://marc.info/?l=linux-kernel&m=118772206603453&w=2

(Thanks for the additional testing and report, btw).

P.S. It would be nice to know if this patch solves the problems reported
by Jeremy, but his email is disabled.

Oleg.

2007-08-24 09:35:40

by taoyue

[permalink] [raw]
Subject: Re: [PATCH] sigqueue_free: fix the race with collect_signal()

Oleg Nesterov wrote:
> On 08/24, taoyue wrote:
>
>> Oleg Nesterov wrote:
>>
>>> --- t/kernel/signal.c~SQFREE 2007-08-22 20:06:31.000000000 +0400
>>> +++ t/kernel/signal.c 2007-08-23 16:02:57.000000000 +0400
>>> @@ -1297,20 +1297,19 @@ struct sigqueue *sigqueue_alloc(void)
>>> void sigqueue_free(struct sigqueue *q)
>>> {
>>> unsigned long flags;
>>> + spinlock_t *lock = &current->sighand->siglock;
>>> +
>>> BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
>>> /*
>>> * If the signal is still pending remove it from the
>>> - * pending queue.
>>> + * pending queue. We must hold ->siglock while testing
>>> + * q->list to serialize with collect_signal().
>>> */
>>> - if (unlikely(!list_empty(&q->list))) {
>>> - spinlock_t *lock = &current->sighand->siglock;
>>> - read_lock(&tasklist_lock);
>>> - spin_lock_irqsave(lock, flags);
>>> - if (!list_empty(&q->list))
>>> - list_del_init(&q->list);
>>> - spin_unlock_irqrestore(lock, flags);
>>> - read_unlock(&tasklist_lock);
>>> - }
>>> + spin_lock_irqsave(lock, flags);
>>> + if (!list_empty(&q->list))
>>> + list_del_init(&q->list);
>>> + spin_unlock_irqrestore(lock, flags);
>>> +
>>> q->flags &= ~SIGQUEUE_PREALLOC;
>>> __sigqueue_free(q);
>>> }
>>>
>>>
>>>
>>>
>> Applying previous patch???it seems likely that the __sigqueue_free() is
>> also called twice.
>>
>> collect_signal: sigqueue_free:
>>
>> list_del_init(&first->list);
>> spin_lock_irqsave(lock, flags);
>>
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
>> if (!list_empty(&q->list))
>> list_del_init(&q->list);
>> spin_unlock_irqrestore(lock, flags);
>> q->flags &= ~SIGQUEUE_PREALLOC;
>>
>> __sigqueue_free(first); __sigqueue_free(q);
>>
>
> collect_signal() is always called under ->siglock which is also taken by
> sigqueue_free(), so this is not possible.
>
> Basically, this patch is the same one-liner I sent you before
>
> http://marc.info/?l=linux-kernel&m=118772206603453&w=2
>
> (Thanks for the additional testing and report, btw).
>
> P.S. It would be nice to know if this patch solves the problems reported
> by Jeremy, but his email is disabled.
>
> Oleg.
>
>
I know, using current->sighand->siglock to prevent one sigqueue
is free twice. I want to know whether it is possible that the two
function is called in different thread. If that, the spin_lock is useless.

yue.tao

2007-08-24 11:06:35

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] sigqueue_free: fix the race with collect_signal()

On 08/24, taoyue wrote:
>
> Oleg Nesterov wrote:
> >>
> >>collect_signal: sigqueue_free:
> >>
> >> list_del_init(&first->list);
> >> spin_lock_irqsave(lock, flags);
> >>
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >
> >> if (!list_empty(&q->list))
> >> list_del_init(&q->list);
> >> spin_unlock_irqrestore(lock,
> >> flags);
> >> q->flags &= ~SIGQUEUE_PREALLOC;
> >>
> >> __sigqueue_free(first); __sigqueue_free(q);
> >>
> >
> >collect_signal() is always called under ->siglock which is also taken by
> >sigqueue_free(), so this is not possible.
> >
> >Basically, this patch is the same one-liner I sent you before
> >
> > http://marc.info/?l=linux-kernel&m=118772206603453&w=2
> >
> >(Thanks for the additional testing and report, btw).
> >
> >P.S. It would be nice to know if this patch solves the problems reported
> >by Jeremy, but his email is disabled.
> >
> >Oleg.
> >
> >
> I know, using current->sighand->siglock to prevent one sigqueue
> is free twice. I want to know whether it is possible that the two
> function is called in different thread. If that, the spin_lock is useless.

Not sure I understand. Yes, it is possible they are called by 2 different
threads, that is why we had a race. But all threads in the same thread
group have the same ->sighand, and thus the same ->sighand->siglock.

Oleg.

2007-08-24 20:04:56

by Sukadev Bhattiprolu

[permalink] [raw]
Subject: Re: [PATCH] sigqueue_free: fix the race with collect_signal()

Oleg Nesterov wrote:
> On 08/24, taoyue wrote:
>
>> Oleg Nesterov wrote:
>>
>>>> collect_signal: sigqueue_free:
>>>>
>>>> list_del_init(&first->list);
>>>> spin_lock_irqsave(lock, flags);
>>>>
>>>>
>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>
>>>
>>>> if (!list_empty(&q->list))
>>>> list_del_init(&q->list);
>>>> spin_unlock_irqrestore(lock,
>>>> flags);
>>>> q->flags &= ~SIGQUEUE_PREALLOC;
>>>>
>>>> __sigqueue_free(first); __sigqueue_free(q);
>>>>
>>>>
>>> collect_signal() is always called under ->siglock which is also taken by
>>> sigqueue_free(), so this is not possible.
>>>
>>> Basically, this patch is the same one-liner I sent you before
>>>
>>> http://marc.info/?l=linux-kernel&m=118772206603453&w=2
>>>
>>> (Thanks for the additional testing and report, btw).
>>>
>>> P.S. It would be nice to know if this patch solves the problems reported
>>> by Jeremy, but his email is disabled.
>>>
>>> Oleg.
>>>
>>>
>>>
>> I know, using current->sighand->siglock to prevent one sigqueue
>> is free twice. I want to know whether it is possible that the two
>> function is called in different thread. If that, the spin_lock is useless.
>>
>
> Not sure I understand. Yes, it is possible they are called by 2 different
> threads, that is why we had a race. But all threads in the same thread
> group have the same ->sighand, and thus the same ->sighand->siglock.
>

Oleg, if one thread can be in collect_signal() and another in
sigqueue_free() and both operate on the exact same sigqueue object, its
not clear how we prevent two calls to __sigqueue_free() to
the same object. In that case the lock (or some lock) should be around
__sigqueue_free() - no ?

i.e if we enter sigqueue_free(), we will call __sigqueue_free()
regardless of the state.

> Oleg.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>


2007-08-24 20:23:56

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] sigqueue_free: fix the race with collect_signal()

On 08/24, Sukadev Bhattiprolu wrote:
>
> Oleg Nesterov wrote:
> >On 08/24, taoyue wrote:
> >
> >>Oleg Nesterov wrote:
> >>
> >>>>collect_signal: sigqueue_free:
> >>>>
> >>>> list_del_init(&first->list);
> >>>> spin_lock_irqsave(lock, flags);
> >>>>
> >>>>
> >>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >>>
> >>>
> >>>> if (!list_empty(&q->list))
> >>>> list_del_init(&q->list);
> >>>> spin_unlock_irqrestore(lock,
> >>>> flags);
> >>>> q->flags &= ~SIGQUEUE_PREALLOC;
> >>>>
> >>>> __sigqueue_free(first); __sigqueue_free(q);
> >>>>
> >>>>
> >>>collect_signal() is always called under ->siglock which is also taken by
> >>>sigqueue_free(), so this is not possible.
> >>>
> >>>
> >>>
> >>I know, using current->sighand->siglock to prevent one sigqueue
> >>is free twice. I want to know whether it is possible that the two
> >>function is called in different thread. If that, the spin_lock is useless.
> >>
> >
> >Not sure I understand. Yes, it is possible they are called by 2 different
> >threads, that is why we had a race. But all threads in the same thread
> >group have the same ->sighand, and thus the same ->sighand->siglock.
> >
>
> Oleg, if one thread can be in collect_signal() and another in
> sigqueue_free() and both operate on the exact same sigqueue object, its
> not clear how we prevent two calls to __sigqueue_free() to
> the same object. In that case the lock (or some lock) should be around
> __sigqueue_free() - no ?
>
> i.e if we enter sigqueue_free(), we will call __sigqueue_free()
> regardless of the state.

Yes. They both will call __sigqueue_free(). But please note that __sigqueue_free()
checks SIGQUEUE_PREALLOC, which is cleared by sigqueue_free().

IOW, when sigqueue_free() unlocks ->siglock, we know that it can't be used
by collect_signal() from another thread. So we can clear SIGQUEUE_PREALLOC
and free sigqueue. We don't need this lock around sigqueue_free() to prevent
the race. collect_signal() can "see" only those sigqueues which are on list.

IOW, when sigqueue_free() takes ->siglock, colect_signal() can't run, because
it needs the same lock. Now we delete this sigqueue from list, nobody can
see it, it can't have other references. So we can unlock ->siglock, mark
sigqueue as freeable (clear SIGQUEUE_PREALLOC), and free it.

Do you agree?

Oleg.

2007-08-25 17:26:20

by Sukadev Bhattiprolu

[permalink] [raw]
Subject: Re: [PATCH] sigqueue_free: fix the race with collect_signal()

Oleg Nesterov wrote:
> On 08/24, Sukadev Bhattiprolu wrote:
>
>> Oleg Nesterov wrote:
>>
>>> On 08/24, taoyue wrote:
>>>
>>>
>>>> Oleg Nesterov wrote:
>>>>
>>>>
>>>>>> collect_signal: sigqueue_free:
>>>>>>
>>>>>> list_del_init(&first->list);
>>>>>> spin_lock_irqsave(lock, flags);
>>>>>>
>>>>>>
>>>>>>
>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>
>>>>>
>>>>>
>>>>>> if (!list_empty(&q->list))
>>>>>> list_del_init(&q->list);
>>>>>> spin_unlock_irqrestore(lock,
>>>>>> flags);
>>>>>> q->flags &= ~SIGQUEUE_PREALLOC;
>>>>>>
>>>>>> __sigqueue_free(first); __sigqueue_free(q);
>>>>>>
>>>>>>
>>>>>>
>>>>> collect_signal() is always called under ->siglock which is also taken by
>>>>> sigqueue_free(), so this is not possible.
>>>>>
>>>>>
>>>>>
>>>>>
>>>> I know, using current->sighand->siglock to prevent one sigqueue
>>>> is free twice. I want to know whether it is possible that the two
>>>> function is called in different thread. If that, the spin_lock is useless.
>>>>
>>>>
>>> Not sure I understand. Yes, it is possible they are called by 2 different
>>> threads, that is why we had a race. But all threads in the same thread
>>> group have the same ->sighand, and thus the same ->sighand->siglock.
>>>
>>>
>> Oleg, if one thread can be in collect_signal() and another in
>> sigqueue_free() and both operate on the exact same sigqueue object, its
>> not clear how we prevent two calls to __sigqueue_free() to
>> the same object. In that case the lock (or some lock) should be around
>> __sigqueue_free() - no ?
>>
>> i.e if we enter sigqueue_free(), we will call __sigqueue_free()
>> regardless of the state.
>>
>
> Yes. They both will call __sigqueue_free(). But please note that __sigqueue_free()
> checks SIGQUEUE_PREALLOC, which is cleared by sigqueue_free().
>
> IOW, when sigqueue_free() unlocks ->siglock, we know that it can't be used
> by collect_signal() from another thread. So we can clear SIGQUEUE_PREALLOC
> and free sigqueue. We don't need this lock around sigqueue_free() to prevent
> the race. collect_signal() can "see" only those sigqueues which are on list.
>
> IOW, when sigqueue_free() takes ->siglock, colect_signal() can't run, because
> it needs the same lock. Now we delete this sigqueue from list, nobody can
> see it, it can't have other references. So we can unlock ->siglock, mark
> sigqueue as freeable (clear SIGQUEUE_PREALLOC), and free it.
>
> Do you agree?
>

Yes. I see it now. I had missed the SIGQUEUE_PREALLOC in __sigqueue_free().

Thanks for clarifying

Suka


2007-08-25 17:33:25

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] sigqueue_free: fix the race with collect_signal()

On 08/25, Sukadev Bhattiprolu wrote:
>
> Yes. I see it now. I had missed the SIGQUEUE_PREALLOC in __sigqueue_free().

Thanks for looking at this. This code looks simple, but it is not obvious, at
least for me.

Oleg.

2007-08-27 01:59:23

by taoyue

[permalink] [raw]
Subject: Re: [PATCH] sigqueue_free: fix the race with collect_signal()

Oleg Nesterov wrote:
> On 08/24, Sukadev Bhattiprolu wrote:
>
>> Oleg Nesterov wrote:
>>
>>> On 08/24, taoyue wrote:
>>>
>>>
>>>> Oleg Nesterov wrote:
>>>>
>>>>
>>>>>> collect_signal: sigqueue_free:
>>>>>>
>>>>>> list_del_init(&first->list);
>>>>>> spin_lock_irqsave(lock, flags);
>>>>>>
>>>>>>
>>>>>>
>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>
>>>>>
>>>>>
>>>>>> if (!list_empty(&q->list))
>>>>>> list_del_init(&q->list);
>>>>>> spin_unlock_irqrestore(lock,
>>>>>> flags);
>>>>>> q->flags &= ~SIGQUEUE_PREALLOC;
>>>>>>
>>>>>> __sigqueue_free(first); __sigqueue_free(q);
>>>>>>
>>>>>>
>>>>>>
>>>>> collect_signal() is always called under ->siglock which is also taken by
>>>>> sigqueue_free(), so this is not possible.
>>>>>
>>>>>
>>>>>
>>>>>
>>>> I know, using current->sighand->siglock to prevent one sigqueue
>>>> is free twice. I want to know whether it is possible that the two
>>>> function is called in different thread. If that, the spin_lock is useless.
>>>>
>>>>
>>> Not sure I understand. Yes, it is possible they are called by 2 different
>>> threads, that is why we had a race. But all threads in the same thread
>>> group have the same ->sighand, and thus the same ->sighand->siglock.
>>>
I has applied the new patch, and start test program last Friday.
So far, the test program has run for three days.
In my test program, all threads is created in one process, so they
are in the same thread group. If two thread is not in the same thread
group, they have the different ->sighand->siglock. In this case, the
lock can't prevent the sigqueue object from race condition.
>>>
>>>
>> Oleg, if one thread can be in collect_signal() and another in
>> sigqueue_free() and both operate on the exact same sigqueue object, its
>> not clear how we prevent two calls to __sigqueue_free() to
>> the same object. In that case the lock (or some lock) should be around
>> __sigqueue_free() - no ?
>>
>> i.e if we enter sigqueue_free(), we will call __sigqueue_free()
>> regardless of the state.
>>
>
> Yes. They both will call __sigqueue_free(). But please note that __sigqueue_free()
> checks SIGQUEUE_PREALLOC, which is cleared by sigqueue_free().
>
> IOW, when sigqueue_free() unlocks ->siglock, we know that it can't be used
> by collect_signal() from another thread. So we can clear SIGQUEUE_PREALLOC
> and free sigqueue. We don't need this lock around sigqueue_free() to prevent
> the race. collect_signal() can "see" only those sigqueues which are on list.
>
> IOW, when sigqueue_free() takes ->siglock, colect_signal() can't run, because
> it needs the same lock. Now we delete this sigqueue from list, nobody can
> see it, it can't have other references. So we can unlock ->siglock, mark
> sigqueue as freeable (clear SIGQUEUE_PREALLOC), and free it.
>
> Do you agree?
>
> Oleg.
>
>
>

2007-08-27 05:59:32

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] sigqueue_free: fix the race with collect_signal()

On 08/27, taoyue wrote:
>
> Oleg Nesterov wrote:
> >On 08/24, Sukadev Bhattiprolu wrote:
> >>>>>
> >>>>I know, using current->sighand->siglock to prevent one sigqueue
> >>>>is free twice. I want to know whether it is possible that the two
> >>>>function is called in different thread. If that, the spin_lock is
> >>>>useless.
> >>>>
> >>>>
> >>>Not sure I understand. Yes, it is possible they are called by 2 different
> >>>threads, that is why we had a race. But all threads in the same thread
> >>>group have the same ->sighand, and thus the same ->sighand->siglock.
> >>>
> I has applied the new patch, and start test program last Friday.
> So far, the test program has run for three days.

Great, thanks.

> In my test program, all threads is created in one process, so they
> are in the same thread group. If two thread is not in the same thread
> group, they have the different ->sighand->siglock. In this case, the
> lock can't prevent the sigqueue object from race condition.

If two thread are not in the same thread, they can't access the same
SIGQUEUE_PREALLOC sigqueue. That is why sigqueue_free() uses current->sighand.
Otherwise, this lock doesn't make any sense, and can't protect list_del(q->list).

Oleg.