Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751395AbbLKR5P (ORCPT ); Fri, 11 Dec 2015 12:57:15 -0500 Received: from eu-smtp-delivery-143.mimecast.com ([146.101.78.143]:50784 "EHLO eu-smtp-delivery-143.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751054AbbLKR5N convert rfc822-to-8bit (ORCPT ); Fri, 11 Dec 2015 12:57:13 -0500 Message-ID: <566B0E76.9020704@arm.com> Date: Fri, 11 Dec 2015 17:57:10 +0000 From: Vladimir Murzin User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Peter Zijlstra , Paul Turner CC: NeilBrown , Linus Torvalds , Thomas Gleixner , LKML , Mike Galbraith , Ingo Molnar , Peter Anvin , linux-tip-commits@vger.kernel.org, jstancek@redhat.com, Oleg Nesterov Subject: Re: [tip:locking/core] sched/wait: Fix signal handling in bit wait helpers References: <20151201130404.GL3816@twins.programming.kicks-ass.net> <20151208104712.GJ6356@twins.programming.kicks-ass.net> <87zixkph0m.fsf@notabene.neil.brown.name> <20151209074033.GF6357@twins.programming.kicks-ass.net> <87si3bpaxy.fsf@notabene.neil.brown.name> <20151210130948.GW6356@twins.programming.kicks-ass.net> <20151211113959.GI6356@twins.programming.kicks-ass.net> In-Reply-To: <20151211113959.GI6356@twins.programming.kicks-ass.net> X-OriginalArrivalTime: 11 Dec 2015 17:57:10.0290 (UTC) FILETIME=[5B7B1F20:01D1343D] X-MC-Unique: 3qFGB6BEREq7QFqR8RPlVQ-1 Content-Type: text/plain; charset=WINDOWS-1252 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1884 Lines: 59 On 11/12/15 11:39, Peter Zijlstra wrote: > On Fri, Dec 11, 2015 at 03:30:33AM -0800, Paul Turner wrote: > >>> Blergh, all I've managed to far is to confuse myself further. Even >>> something like the original (+- the EINTR) should work when we consider >>> the looping, even when mixed with an occasional spurious wakeup. >>> >>> >>> int bit_wait() >>> { >>> if (signal_pending_state(current->state, current)) >>> return -EINTR; >>> schedule(); >>> } > > So I asked Vladimir to test that (simply changing the return from 1 to > -EINTR) and it made his fail much less likely but it still failed in the > same way. > > So I'm fairly sure I'm still missing something :/ > >> Hugh asked me about this after seeing a crash, here's another exciting >> way in which the current code breaks -- this one actually quite >> serious: > > Yep, this got reported by Jan and I did kick myself for that. > >> Peter's proposed follow-up above looks strictly more correct. We need >> to evaluate the potential existence of a signal, *after* we return >> from schedule, but in the context of the state which we previously >> _entered_ schedule() on. >> >> Reviewed-by: Paul Turner > > Right, its maybe a bit overkill, but at this point I'm a tad > conservative/paranoid. > > Vladimir, Jan could you both please that patch? > > lkml.kernel.org/r/20151208104712.GJ6356@twins.programming.kicks-ass.net > By this time my test has been run ~500 times without any stalls. I'll keep running overnight (just in case), but I think that patch can be marked as tested. Cheers Vladimir > > Thanks! > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/