Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751915Ab0F1Qj4 (ORCPT ); Mon, 28 Jun 2010 12:39:56 -0400 Received: from cantor.suse.de ([195.135.220.2]:46113 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751348Ab0F1Qjz (ORCPT ); Mon, 28 Jun 2010 12:39:55 -0400 Date: Mon, 28 Jun 2010 18:39:52 +0200 From: Michal Hocko To: Darren Hart Cc: Thomas Gleixner , Peter Zijlstra , LKML , Nick Piggin , Alexey Kuznetsov , Linus Torvalds Subject: Re: futex: race in lock and unlock&exit for robust futex with PI? Message-ID: <20100628163952.GD24127@tiehlicka.suse.cz> References: <20100623091307.GA11072@tiehlicka.suse.cz> <4C2417AA.4030306@us.ibm.com> <20100625082711.GA32765@tiehlicka.suse.cz> <4C24ED34.9040808@us.ibm.com> <4C253D32.6040304@us.ibm.com> <20100628144246.GA14201@tiehlicka.suse.cz> <20100628153214.GA24127@tiehlicka.suse.cz> <20100628155845.GC24127@tiehlicka.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100628155845.GC24127@tiehlicka.suse.cz> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 14220 Lines: 406 [I am sorry for bombarding you with emails] On Mon 28-06-10 17:58:45, Michal Hocko wrote: > On Mon 28-06-10 17:32:14, Michal Hocko wrote: > > On Mon 28-06-10 16:42:46, Michal Hocko wrote: > > > Hi Darren, > > > > Hmm, I think that I've found the reason. I have used one additional > > tracing patch (bellow) and we are getting ESRCH because cread and pcred > > don't match: > > > > > > version = 6 > > CPU 0 is empty > > cpus=2 > > field->offset = 16 size=4 > > <...>-22260 [001] 139.669672: bprint: do_futex : futex_lock_pi start > > <...>-22260 [001] 139.669678: bprint: do_futex : futex_lock_pi done ret=0 > > <...>-22281 [001] 139.693690: bprint: do_futex : futex_lock_pi start > > <...>-22281 [001] 139.693696: bprint: lookup_pi_state : cred(1004,1004) != pcred(1003,1003) > > <...>-22281 [001] 139.693697: bprint: lookup_pi_state : futex_find_get_task failed with -3 > > <...>-22281 [001] 139.693697: bprint: futex_lock_pi_atomic : lookup_pi_state: -ESRCH for pid=22280 > > <...>-22281 [001] 139.693698: bprint: futex_lock_pi_atomic : ownerdied not detected, returning -ESRCH > > <...>-22281 [001] 139.693698: bprint: futex_lock_pi_atomic : lookup_pi_state: -3 > > <...>-22281 [001] 139.693699: bprint: futex_lock_pi : returning -ESRCH to userspace > > <...>-22281 [001] 139.693700: bprint: do_futex : futex_lock_pi done ret=-3 > > <...>-22280 [001] 139.694033: bprint: do_futex : futex_unlock_pi start > > <...>-22280 [001] 139.694035: bprint: do_futex : futex_unlock_pi: TID->0 transition 2147505928 > > <...>-22280 [001] 139.694036: bprint: do_futex : futex_unlock_pi: no waiters, unlock the futex ret=0 uval=-2147461368 > > <...>-22280 [001] 139.694036: bprint: do_futex : futex_unlock_pi done ret=0 > > <...>-22488 [001] 139.874967: bprint: do_futex : futex_lock_pi start > > <...>-22488 [001] 139.874972: bprint: do_futex : futex_lock_pi done ret=0 > > > > This would answer why we cannot reproduce with a single user. > > I am trying to understand why futex_find_get_task has to check the > credentials at all. I guess that there might be some concerns from > futex_requeue callers but does it make sense from futex_lock_pi_atomic > call path? This would mean that the futex is unusable in multi-user process > synchronization, right? (Is this documented somewhere?) Would something like the following be acceptable (just a compile tested without comments). It simply makes caller of lookup_pi_state to decide whether credentials should be checked. diff --git a/kernel/futex.c b/kernel/futex.c index e7a35f1..dfe4b11 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -426,7 +426,7 @@ static void free_pi_state(struct futex_pi_state *pi_state) * Look up the task based on what TID userspace gave us. * We dont trust it. */ -static struct task_struct * futex_find_get_task(pid_t pid) +static struct task_struct * futex_find_get_task(pid_t pid, bool check_cred) { struct task_struct *p; const struct cred *cred = current_cred(), *pcred; @@ -436,10 +436,12 @@ static struct task_struct * futex_find_get_task(pid_t pid) if (!p) { p = ERR_PTR(-ESRCH); } else { - pcred = __task_cred(p); - if (cred->euid != pcred->euid && - cred->euid != pcred->uid) - p = ERR_PTR(-ESRCH); + if (check_cred) { + pcred = __task_cred(p); + if (cred->euid != pcred->euid && + cred->euid != pcred->uid) + p = ERR_PTR(-ESRCH); + } else get_task_struct(p); } @@ -506,7 +508,7 @@ void exit_pi_state_list(struct task_struct *curr) static int lookup_pi_state(u32 uval, struct futex_hash_bucket *hb, - union futex_key *key, struct futex_pi_state **ps) + union futex_key *key, struct futex_pi_state **ps, bool check_cred) { struct futex_pi_state *pi_state = NULL; struct futex_q *this, *next; @@ -563,7 +565,7 @@ lookup_pi_state(u32 uval, struct futex_hash_bucket *hb, */ if (!pid) return -ESRCH; - p = futex_find_get_task(pid); + p = futex_find_get_task(pid, check_cred); if (IS_ERR(p)) return PTR_ERR(p); @@ -705,7 +707,7 @@ retry: * We dont have the lock. Look up the PI state (or create it if * we are the first waiter): */ - ret = lookup_pi_state(uval, hb, key, ps); + ret = lookup_pi_state(uval, hb, key, ps, false); if (unlikely(ret)) { switch (ret) { @@ -1258,7 +1260,7 @@ retry_private: ret = get_futex_value_locked(&curval2, uaddr2); if (!ret) ret = lookup_pi_state(curval2, hb2, &key2, - &pi_state); + &pi_state, true); } switch (ret) { > > [...] > > > > -- > > From f0743d2b2d2f81c40700c4d6816ad3abfd113c1b Mon Sep 17 00:00:00 2001 > > From: Michal Hocko > > Date: Mon, 28 Jun 2010 17:08:56 +0200 > > Subject: [PATCH] more traces > > > > --- > > kernel/futex.c | 13 +++++++++++-- > > 1 files changed, 11 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/futex.c b/kernel/futex.c > > index d114fee..d1c2489 100644 > > --- a/kernel/futex.c > > +++ b/kernel/futex.c > > @@ -435,11 +435,16 @@ static struct task_struct * futex_find_get_task(pid_t pid) > > p = find_task_by_vpid(pid); > > if (!p) { > > p = ERR_PTR(-ESRCH); > > + trace_printk("%u pid not found\n", pid); > > } else { > > pcred = __task_cred(p); > > if (cred->euid != pcred->euid && > > - cred->euid != pcred->uid) > > + cred->euid != pcred->uid) { > > p = ERR_PTR(-ESRCH); > > + trace_printk("cred(%u,%u) != pcred(%u,%u)\n", > > + cred->euid, cred->uid, > > + pcred->euid, pcred->uid); > > + } > > else > > get_task_struct(p); > > } > > @@ -564,8 +569,11 @@ lookup_pi_state(u32 uval, struct futex_hash_bucket *hb, > > if (!pid) > > return -ESRCH; > > p = futex_find_get_task(pid); > > - if (IS_ERR(p)) > > + if (IS_ERR(p)) { > > + trace_printk("futex_find_get_task failed with %d\n", > > + PTR_ERR(p)); > > return PTR_ERR(p); > > + } > > > > /* > > * We need to look at the task state flags to figure out, > > @@ -581,6 +589,7 @@ lookup_pi_state(u32 uval, struct futex_hash_bucket *hb, > > * cleanup: > > */ > > int ret = (p->flags & PF_EXITPIDONE) ? -ESRCH : -EAGAIN; > > + trace_printk("pid=%u is exiting ret=%d\n", pid, ret); > > > > raw_spin_unlock_irq(&p->pi_lock); > > put_task_struct(p); > > -- > > 1.7.1 > > > > > > > > > > -- > > > From 733816347db91670f27d206382b8c2e57e5ef125 Mon Sep 17 00:00:00 2001 > > > From: Michal Hocko > > > Date: Mon, 28 Jun 2010 13:42:29 +0200 > > > Subject: [PATCH] futex pi unlock tracing added > > > > > > --- > > > kernel/futex.c | 24 +++++++++++++++++++----- > > > 1 files changed, 19 insertions(+), 5 deletions(-) > > > > > > diff --git a/kernel/futex.c b/kernel/futex.c > > > index 24ac437..d114fee 100644 > > > --- a/kernel/futex.c > > > +++ b/kernel/futex.c > > > @@ -716,7 +716,8 @@ retry: > > > if (unlikely(ret)) { > > > switch (ret) { > > > case -ESRCH: > > > - trace_printk("lookup_pi_state: -ESRCH\n"); > > > + trace_printk("lookup_pi_state: -ESRCH for pid=%u\n", > > > + uval & FUTEX_TID_MASK); > > > /* > > > * No owner found for this futex. Check if the > > > * OWNER_DIED bit is set to figure out whether > > > @@ -2070,8 +2071,10 @@ retry: > > > * again. If it succeeds then we can return without waking > > > * anyone else up: > > > */ > > > - if (!(uval & FUTEX_OWNER_DIED)) > > > + if (!(uval & FUTEX_OWNER_DIED)) { > > > uval = cmpxchg_futex_value_locked(uaddr, task_pid_vnr(current), 0); > > > + trace_printk("futex_unlock_pi: TID->0 transition %u\n", uval); > > > + } > > > > > > > > > if (unlikely(uval == -EFAULT)) > > > @@ -2080,8 +2083,10 @@ retry: > > > * Rare case: we managed to release the lock atomically, > > > * no need to wake anyone else up: > > > */ > > > - if (unlikely(uval == task_pid_vnr(current))) > > > + if (unlikely(uval == task_pid_vnr(current))) { > > > + trace_printk("futex_unlock_pi: release without wakeup\n"); > > > goto out_unlock; > > > + } > > > > > > /* > > > * Ok, other tasks may need to be woken up - check waiters > > > @@ -2093,6 +2098,7 @@ retry: > > > if (!match_futex (&this->key, &key)) > > > continue; > > > ret = wake_futex_pi(uaddr, uval, this); > > > + trace_printk("futex_unlock_pi: wake ret=%d uval=%u this=%p\n", ret, uval, this); > > > /* > > > * The atomic access to the futex value > > > * generated a pagefault, so retry the > > > @@ -2107,6 +2113,8 @@ retry: > > > */ > > > if (!(uval & FUTEX_OWNER_DIED)) { > > > ret = unlock_futex_pi(uaddr, uval); > > > + trace_printk("futex_unlock_pi: no waiters, unlock the futex ret=%d uval=%d\n", > > > + ret, uval); > > > if (ret == -EFAULT) > > > goto pi_faulted; > > > } > > > @@ -2600,12 +2608,18 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout, > > > ret = futex_wake_op(uaddr, fshared, uaddr2, val, val2, val3); > > > break; > > > case FUTEX_LOCK_PI: > > > - if (futex_cmpxchg_enabled) > > > + if (futex_cmpxchg_enabled) { > > > + trace_printk("futex_lock_pi start\n"); > > > ret = futex_lock_pi(uaddr, fshared, val, timeout, 0); > > > + trace_printk("futex_lock_pi done ret=%d\n", ret); > > > + } > > > break; > > > case FUTEX_UNLOCK_PI: > > > - if (futex_cmpxchg_enabled) > > > + if (futex_cmpxchg_enabled) { > > > + trace_printk("futex_unlock_pi start\n"); > > > ret = futex_unlock_pi(uaddr, fshared); > > > + trace_printk("futex_unlock_pi done ret=%d\n", ret); > > > + } > > > break; > > > case FUTEX_TRYLOCK_PI: > > > if (futex_cmpxchg_enabled) > > > -- > > > 1.7.1 > > > > > > > > > > > -- > > > > Darren Hart > > > > > > > > From 92014a07df73489460ff788274506255ff0f775d Mon Sep 17 00:00:00 2001 > > > > From: Darren Hart > > > > Date: Fri, 25 Jun 2010 13:54:25 -0700 > > > > Subject: [PATCH] robust pi futex tracing > > > > > > > > --- > > > > kernel/futex.c | 24 ++++++++++++++++++++---- > > > > 1 files changed, 20 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/kernel/futex.c b/kernel/futex.c > > > > index e7a35f1..24ac437 100644 > > > > --- a/kernel/futex.c > > > > +++ b/kernel/futex.c > > > > @@ -683,6 +683,8 @@ retry: > > > > */ > > > > if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) { > > > > /* Keep the OWNER_DIED bit */ > > > > + if (ownerdied) > > > > + trace_printk("ownerdied, taking over lock\n"); > > > > newval = (curval & ~FUTEX_TID_MASK) | task_pid_vnr(task); > > > > ownerdied = 0; > > > > lock_taken = 1; > > > > @@ -692,14 +694,18 @@ retry: > > > > > > > > if (unlikely(curval == -EFAULT)) > > > > return -EFAULT; > > > > - if (unlikely(curval != uval)) > > > > + if (unlikely(curval != uval)) { > > > > + trace_printk("cmpxchg failed, retrying\n"); > > > > goto retry; > > > > + } > > > > > > > > /* > > > > * We took the lock due to owner died take over. > > > > */ > > > > - if (unlikely(lock_taken)) > > > > + if (unlikely(lock_taken)) { > > > > + trace_printk("ownerdied, lock acquired, return 1\n"); > > > > return 1; > > > > + } > > > > > > > > /* > > > > * We dont have the lock. Look up the PI state (or create it if > > > > @@ -710,13 +716,16 @@ retry: > > > > if (unlikely(ret)) { > > > > switch (ret) { > > > > case -ESRCH: > > > > + trace_printk("lookup_pi_state: -ESRCH\n"); > > > > /* > > > > * No owner found for this futex. Check if the > > > > * OWNER_DIED bit is set to figure out whether > > > > * this is a robust futex or not. > > > > */ > > > > - if (get_futex_value_locked(&curval, uaddr)) > > > > + if (get_futex_value_locked(&curval, uaddr)) { > > > > + trace_printk("get_futex_value_locked: -EFAULT\n"); > > > > return -EFAULT; > > > > + } > > > > > > > > /* > > > > * We simply start over in case of a robust > > > > @@ -724,10 +733,13 @@ retry: > > > > * and return happy. > > > > */ > > > > if (curval & FUTEX_OWNER_DIED) { > > > > + trace_printk("ownerdied, goto retry\n"); > > > > ownerdied = 1; > > > > goto retry; > > > > } > > > > + trace_printk("ownerdied not detected, returning -ESRCH\n"); > > > > default: > > > > + trace_printk("lookup_pi_state: %d\n", ret); > > > > break; > > > > } > > > > } > > > > @@ -1950,6 +1962,8 @@ retry_private: > > > > put_futex_key(fshared, &q.key); > > > > cond_resched(); > > > > goto retry; > > > > + case -ESRCH: > > > > + trace_printk("returning -ESRCH to userspace\n"); > > > > default: > > > > goto out_unlock_put_key; > > > > } > > > > @@ -2537,8 +2551,10 @@ void exit_robust_list(struct task_struct *curr) > > > > /* > > > > * Avoid excessively long or circular lists: > > > > */ > > > > - if (!--limit) > > > > + if (!--limit) { > > > > + trace_printk("excessively long list, aborting\n"); > > > > break; > > > > + } > > > > > > > > cond_resched(); > > > > } > > > > -- > > > > 1.7.0.4 > > > > > > > > -- > > > > Darren Hart > > > > IBM Linux Technology Center > > > > Real-Time Linux Team > > > > > > -- > > > Michal Hocko > > > L3 team > > > SUSE LINUX s.r.o. > > > Lihovarska 1060/12 > > > 190 00 Praha 9 > > > Czech Republic > > > > -- > > Michal Hocko > > L3 team > > SUSE LINUX s.r.o. > > Lihovarska 1060/12 > > 190 00 Praha 9 > > Czech Republic > > -- > Michal Hocko > L3 team > SUSE LINUX s.r.o. > Lihovarska 1060/12 > 190 00 Praha 9 > Czech Republic -- Michal Hocko L3 team SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/