Date: Mon, 2 Feb 2015 17:20:18 +0100
From: Oleg Nesterov <oleg@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <darren@dvhart.com>, Thomas Gleixner <tglx@linutronix.de>,
        Jerome Marchand <jmarchan@redhat.com>,
        Larry Woodman <lwoodman@redhat.com>, Mateusz Guzik <mguzik@redhat.com>,
        linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/1] futex: check PF_KTHREAD rather than !p->mm to
	filter out kthreads
Message-ID: <20150202162018.GA8795@redhat.com>
References: <20150202140515.GA26398@redhat.com> <20150202151159.GE26304@twins.programming.kicks-ass.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150202151159.GE26304@twins.programming.kicks-ass.net>
User-Agent: Mutt/1.5.18 (2008-05-17)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2198
Lines: 52

On 02/02, Peter Zijlstra wrote:
>
> On Mon, Feb 02, 2015 at 03:05:15PM +0100, Oleg Nesterov wrote:
>
> > IIUC, the fact that we can abuse ->pi_state_list is not that bad, no matter
> > if this (k)thread will exit or not. AFAICS, the only problem is that we can
> > boost the prio of this thread. Or I missed another problem?
>
> No that's it.

OK, thanks Peter. I was afraid I missed another reason for this check.

> > I must have missed something but this looks buggy, I do not see any
> > preemption point in this "retry" loop. Suppose that max_cpus=1 and rt_task()
> > preempts the non-rt PF_EXITING owner. Looks like futex_lock_pi() can spin
> > forever in this case? (OK, ignoring RT throttling).
>
> This is not something I've ever looked at before; 778e9a9c3e71
> ("pi-futex: fix exit races and locking problems") seems to suggest its
> possible to get onto tsk->pi_state_list after exit_pi_state_list().
>
> So while the below shows preemption points; those don't actually help
> against RT tasks, a FIFO-99 task will always be more eligible to run
> than most others.

Yes, yes, sorry, "not see any preemption point" looks confusing.

> So yes, I do like your proposal of putting PF_EXITPIDONE under the
> ->pi_lock section that handles exit_pi_state_list().
>
> I further think we can remove the smp_mb(); raw_spin_unlock_wait() from
> do_exit() -- this would offset the new unconditional ->pi_lock
> acquisition in exit_pi_state_list(). The comment there suggests robust
> futexes are involved but I cannot find any except the PI state muck
> testing ->flags.

Yes, probably nothing else needs to sync with PF_EXITING...

> As for the recursive fault; I think the safer option is to set
> EXITPIDONE and not register more PI states, as opposed to allowing more
> and more states to be added. Yes we'll leak whatever currently is there,
> but no point in allowing it to get worse.

Thanks! I'll try to think about the patch tomorrow.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/