Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933662AbbLOSQM (ORCPT ); Tue, 15 Dec 2015 13:16:12 -0500 Received: from mx1.redhat.com ([209.132.183.28]:52037 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933584AbbLOSQK (ORCPT ); Tue, 15 Dec 2015 13:16:10 -0500 Date: Tue, 15 Dec 2015 19:16:29 +0100 From: Oleg Nesterov To: Peter Zijlstra Cc: Paul Turner , NeilBrown , Linus Torvalds , Thomas Gleixner , LKML , Mike Galbraith , Ingo Molnar , Peter Anvin , vladimir.murzin@arm.com, linux-tip-commits@vger.kernel.org, jstancek@redhat.com Subject: Re: [tip:locking/core] sched/wait: Fix signal handling in bit wait helpers Message-ID: <20151215181629.GA19007@redhat.com> References: <20151201130404.GL3816@twins.programming.kicks-ass.net> <20151208104712.GJ6356@twins.programming.kicks-ass.net> <87zixkph0m.fsf@notabene.neil.brown.name> <20151209074033.GF6357@twins.programming.kicks-ass.net> <87si3bpaxy.fsf@notabene.neil.brown.name> <20151210130948.GW6356@twins.programming.kicks-ass.net> <20151211113959.GI6356@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151211113959.GI6356@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1195 Lines: 35 On 12/11, Peter Zijlstra wrote: > > On Fri, Dec 11, 2015 at 03:30:33AM -0800, Paul Turner wrote: > > > > Blergh, all I've managed to far is to confuse myself further. Even > > > something like the original (+- the EINTR) should work when we consider > > > the looping, even when mixed with an occasional spurious wakeup. > > > > > > > > > int bit_wait() > > > { > > > if (signal_pending_state(current->state, current)) > > > return -EINTR; > > > schedule(); > > > } > > So I asked Vladimir to test that (simply changing the return from 1 to > -EINTR) and it made his fail much less likely but it still failed in the > same way. > > So I'm fairly sure I'm still missing something :/ Same here... Yes, "return 1" in bit_wait_io() doesn't look right. For example do_generic_file_read() can wrongly return if lock_page_killable() returns this error code. But I fail to understand how this can read to rcu-stall. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/