From: Manfred Spraul <manfred@colorfullife.com>
Subject: Re: [PATCH] ipc,sem: fix use after free on IPC_RMID after a task
 using same semaphore set exits
To: "Herton R. Krzesinski" <herton@redhat.com>, linux-kernel@vger.kernel.org
References: <1438967375-14877-1-git-send-email-herton@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
        Davidlohr Bueso <dave@stgolabs.net>, Rafael Aquini <aquini@redhat.com>,
        Joe Perches <joe@perches.com>, Aristeu Rozanski <aris@redhat.com>,
        djeffery@redhat.com
Message-ID: <55C79294.2010006@colorfullife.com>
Date: Sun, 9 Aug 2015 19:49:08 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
 Thunderbird/38.1.0
MIME-Version: 1.0
In-Reply-To: <1438967375-14877-1-git-send-email-herton@redhat.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4351
Lines: 120

Hi Herton,

On 08/07/2015 07:09 PM, Herton R. Krzesinski wrote:
> The current semaphore code allows a potential use after free: in exit_sem we may
> free the task's sem_undo_list while there is still another task looping through
> the same semaphore set and cleaning the sem_undo list at freeary function (the
> task called IPC_RMID for the same semaphore set).
Correct, good catch!

semid==-1 can happen due to two reasons:
    a) end of sem_undo_list (i.e.: last undo structure in CLONE_SYSVSEM 
group)
    b) parallel IPC_RMID

If semid==-1 happens due to a parallel IPC_RMID, then exit_sem does not 
free all sem_undo structures that belong to the current CLONE_SYSVSEM group.
But it does free the sem_undo_list structure.

Since:
- struct sem_undo contains a link to struct sem_undo_list.
- struct sem_undo_list is kfreed immediately at the end of exit_sem()
- the parallel IPC_RMID will find the sem_undo structure, then follow
   the link to sem_undo_list to unlink it
         -> use after free, spinlock debug errors because spinlock
                 was already overwritten by slab debug.

(what makes it worse: un->semid is read twice, without synchronization. 
It should be read once, with synchronization)

> Signed-off-by: Herton R. Krzesinski<herton@redhat.com>
I would add: Cc: <stable@vger.kernel.org>
> ---
>   ipc/sem.c | 24 ++++++++++++++++--------
>   1 file changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/ipc/sem.c b/ipc/sem.c
> index bc3d530..35ccddd 100644
> --- a/ipc/sem.c
> +++ b/ipc/sem.c
> @@ -2074,17 +2074,24 @@ void exit_sem(struct task_struct *tsk)
>   		rcu_read_lock();
>   		un = list_entry_rcu(ulp->list_proc.next,
>   				    struct sem_undo, list_proc);
> -		if (&un->list_proc == &ulp->list_proc)
> -			semid = -1;
> -		 else
> -			semid = un->semid;
> +		if (&un->list_proc == &ulp->list_proc) {
> +			rcu_read_unlock();
> +			/* Make sure we wait for any place still referencing
> +			 * the current ulp to finish */
> +			synchronize_rcu();
Sorry, no. synchronize_rcu() is a high-latency operation.
We can't call it within exit_sem(). We could use kfree_rcu(), but I 
don't see that we need it:

Which race do you imagine?
ulp is accessed by:
- freeary(). Race impossible due to explicit locking.
- exit_sem(). Race impossible due to ulp->refcount
- find_alloc_undo(). Race impossible, because it operates on 
current->sysvsem.undo_list.
"current" is in do_exit, thus can't be inside semtimedop().

> +			break;
> +		}
> +		spin_lock(&ulp->lock);
> +		semid = un->semid;
> +		spin_unlock(&ulp->lock);
Ok/good.
Note (I've tried it first):
Just "READ_ONCE(un->semid)" would be insufficient, because this can happen:
  A: thread 1, within freeary:
  A: spin_lock(&ulp->lock);
  A: un->semid = -1;
  B:    thread 2, within exit_sem():
  B:    if (un->semid == -1) exit;
  B:    kfree(ulp);
  A: spin_unlock(&ulp->lock); <<<< use-after-free, bug

>   
> +		/* exit_sem raced with IPC_RMID, nothing to do */
>   		if (semid == -1) {
>   			rcu_read_unlock();
> -			break;
> +			continue;
>   		}
>   
> -		sma = sem_obtain_object_check(tsk->nsproxy->ipc_ns, un->semid);
> +		sma = sem_obtain_object_check(tsk->nsproxy->ipc_ns, semid);
>   		/* exit_sem raced with IPC_RMID, nothing to do */
>   		if (IS_ERR(sma)) {
>   			rcu_read_unlock();
Ok.
> @@ -2112,9 +2119,10 @@ void exit_sem(struct task_struct *tsk)
>   		ipc_assert_locked_object(&sma->sem_perm);
>   		list_del(&un->list_id);
>   
> -		spin_lock(&ulp->lock);
> +		/* we should be the last process using this ulp, so no need
> +		 * to acquire ulp->lock here; we are also protected against
> +		 * IPC_RMID as we hold sma->sem_perm.lock */
>   		list_del_rcu(&un->list_proc);
> -		spin_unlock(&ulp->lock);
>   
>   		/* perform adjustments registered in un */
>   		for (i = 0; i < sma->sem_nsems; i++) {
a) "we should be the last" or "we are the last"?
b) The bug that you have found is probably old, thus it must go into the 
stable kernels as well.
I would not do this change together with the bugfix.

Perhaps make two patches? One cc stable, the other one without cc stable.
--
     Manfred
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/