Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754364AbbHJPbz (ORCPT ); Mon, 10 Aug 2015 11:31:55 -0400 Received: from mx1.redhat.com ([209.132.183.28]:37126 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754269AbbHJPbu (ORCPT ); Mon, 10 Aug 2015 11:31:50 -0400 Date: Mon, 10 Aug 2015 12:31:47 -0300 From: "Herton R. Krzesinski" To: Manfred Spraul Cc: linux-kernel@vger.kernel.org, Andrew Morton , Davidlohr Bueso , Rafael Aquini , Joe Perches , Aristeu Rozanski , djeffery@redhat.com Subject: Re: [PATCH] ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits Message-ID: <20150810153147.GA3540@dhcppc4.redhat.com> References: <1438967375-14877-1-git-send-email-herton@redhat.com> <55C79294.2010006@colorfullife.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <55C79294.2010006@colorfullife.com> User-Agent: Mutt/1.5.23.1-rc1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6360 Lines: 170 On Sun, Aug 09, 2015 at 07:49:08PM +0200, Manfred Spraul wrote: > Hi Herton, > (snip) > >+++ b/ipc/sem.c > >@@ -2074,17 +2074,24 @@ void exit_sem(struct task_struct *tsk) > > rcu_read_lock(); > > un = list_entry_rcu(ulp->list_proc.next, > > struct sem_undo, list_proc); > >- if (&un->list_proc == &ulp->list_proc) > >- semid = -1; > >- else > >- semid = un->semid; > >+ if (&un->list_proc == &ulp->list_proc) { > >+ rcu_read_unlock(); > >+ /* Make sure we wait for any place still referencing > >+ * the current ulp to finish */ > >+ synchronize_rcu(); > Sorry, no. synchronize_rcu() is a high-latency operation. > We can't call it within exit_sem(). We could use kfree_rcu(), but I don't > see that we need it: > > Which race do you imagine? > ulp is accessed by: > - freeary(). Race impossible due to explicit locking. > - exit_sem(). Race impossible due to ulp->refcount > - find_alloc_undo(). Race impossible, because it operates on > current->sysvsem.undo_list. > "current" is in do_exit, thus can't be inside semtimedop(). Well without the synchronize_rcu() and with the semid list loop fix I was still able to get issues, and I thought the problem is related to racing with IPC_RMID on freeary again. This is one scenario I would imagine: A B freeary() list_del(&un->list_id) spin_lock(&un->ulp->lock) un->semid = -1 list_del_rcu(&un->list_proc) __list_del_entry(&un->list_proc) __list_del(entry->prev, entry->next) exit_sem() next->prev = prev; ... prev->next = next; ... ... un = list_entry_rcu(ulp->list_proc.next...) (&un->list_proc)->prev = LIST_POISON2 if (&un->list_proc == &ulp->list_proc) ... kfree(ulp) spin_unlock(&un->ulp->lock) <---- bug Now that is a very tight window, but I had problems even when I tried this patch first: (...) - if (&un->list_proc == &ulp->list_proc) - semid = -1; - else - semid = un->semid; + if (&un->list_proc == &ulp->list_proc) { + rcu_read_unlock(); + break; + } + spin_lock(&ulp->lock); + semid = un->semid; + spin_unlock(&ulp->lock); + /* exit_sem raced with IPC_RMID, nothing to do */ if (semid == -1) { rcu_read_unlock(); - break; + synchronize_rcu(); + continue; } (...) So even with the bad/uneeded synchronize_rcu() which I had placed there, I could still get issues (however the testing on patch above was on an older kernel than latest upstream, from RHEL 6, I can test without synchronize_rcu() on latest upstream, however the affected code is the same). That's when I thought of scenario above. I was able to get this oops: <0>BUG: spinlock bad magic on CPU#3, test/5427 (Not tainted) <4>general protection fault: 0000 [#1] SMP ... <4>Pid: 5427, comm: test Not tainted 2.6.32-573.el6.1233300.2.x86_64.debug #1 QEMU Standard PC (i440FX + PIIX, 1996) <4>RIP: 0010:[] [] spin_bug+0x81/0x100 ... <4>Call Trace: <4> [] _raw_spin_unlock+0x65/0xa0 <4> [] _spin_unlock+0x2b/0x40 <4> [] freeary+0x96/0x250 <4> [] ? _spin_lock+0x62/0x70 <4> [] semctl_down.clone.0+0x8d/0x130 <4> [] ? sched_clock_cpu+0xb8/0x110 <4> [] ? trace_hardirqs_off+0xd/0x10 <4> [] ? cpu_clock+0x6f/0x80 <4> [] ? lock_release_holdtime+0x3d/0x190 <4> [] sys_semctl+0x2a2/0x300 <4> [] ? trace_hardirqs_on_thunk+0x3a/0x3f <4> [] system_call_fastpath+0x16/0x1b May be below spin_lock(ulp->lock) could be acquired earlier instead before checking list_proc being empty? Then this would avoid the need for synchronize_rcu(), which was my initial idea to avoid the problem. > > >+ break; > >+ } > >+ spin_lock(&ulp->lock); > >+ semid = un->semid; > >+ spin_unlock(&ulp->lock); > Ok/good. > Note (I've tried it first): > Just "READ_ONCE(un->semid)" would be insufficient, because this can happen: > A: thread 1, within freeary: > A: spin_lock(&ulp->lock); > A: un->semid = -1; > B: thread 2, within exit_sem(): > B: if (un->semid == -1) exit; > B: kfree(ulp); > A: spin_unlock(&ulp->lock); <<<< use-after-free, bug > > >+ /* exit_sem raced with IPC_RMID, nothing to do */ > > if (semid == -1) { > > rcu_read_unlock(); > >- break; > >+ continue; > > } > >- sma = sem_obtain_object_check(tsk->nsproxy->ipc_ns, un->semid); > >+ sma = sem_obtain_object_check(tsk->nsproxy->ipc_ns, semid); > > /* exit_sem raced with IPC_RMID, nothing to do */ > > if (IS_ERR(sma)) { > > rcu_read_unlock(); > Ok. > >@@ -2112,9 +2119,10 @@ void exit_sem(struct task_struct *tsk) > > ipc_assert_locked_object(&sma->sem_perm); > > list_del(&un->list_id); > >- spin_lock(&ulp->lock); > >+ /* we should be the last process using this ulp, so no need > >+ * to acquire ulp->lock here; we are also protected against > >+ * IPC_RMID as we hold sma->sem_perm.lock */ > > list_del_rcu(&un->list_proc); > >- spin_unlock(&ulp->lock); > > /* perform adjustments registered in un */ > > for (i = 0; i < sma->sem_nsems; i++) { > a) "we should be the last" or "we are the last"? I'll change the comment, as "we are the last". > b) The bug that you have found is probably old, thus it must go into the > stable kernels as well. > I would not do this change together with the bugfix. > > Perhaps make two patches? One cc stable, the other one without cc stable. Ok, I'll separate the changes. Thanks for reviewing. > -- > Manfred -- []'s Herton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/