Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751942AbaJEU0f (ORCPT ); Sun, 5 Oct 2014 16:26:35 -0400 Received: from mx1.redhat.com ([209.132.183.28]:41898 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751775AbaJEU0e (ORCPT ); Sun, 5 Oct 2014 16:26:34 -0400 Date: Sun, 5 Oct 2014 22:23:00 +0200 From: Oleg Nesterov To: Linus Torvalds , Frederic Weisbecker , Steven Rostedt Cc: Sasha Levin , Ingo Molnar , Peter Anvin , Linux Kernel Mailing List , Peter Zijlstra , Andy Lutomirski , Denys Vlasenko , Thomas Gleixner , Chuck Ebbert Subject: [PATCH 0/1] stop the unbound recursion in preempt_schedule_context() Message-ID: <20141005202300.GA27962@redhat.com> References: <20140921184153.GA23727@redhat.com> <542E2B05.5080607@oracle.com> <20141003232631.GA3439@redhat.com> <20141004003300.GA6297@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141004003300.GA6297@redhat.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/04, Oleg Nesterov wrote: > > On 10/03, Linus Torvalds wrote: > > > > On Fri, Oct 3, 2014 at 5:01 PM, Linus Torvalds > > wrote: > > > > > > The real fix would appear to be to use > > > "preempt_enable_no_resched_notrace()", which your patch did, but > > > without the loop. > > > > Actually, the real fix would be to not be stupid, and just make the > > code do something like > > > > > if (likely(!preemptible())) > > > return; > > > > > > __preempt_count_add(PREEMPT_ACTIVE); > > > prev_ctx = exception_enter(); > > > > > > __schedule(); > > > > > > exception_exit(prev_ctx); > > > __preempt_count_sub(PREEMPT_ACTIVE); > > > > and *not* enable preemption around the scheduling at all. Yes, I think you are right. And I hate to admit that I didn't think about this simplification. The only complication is that we should move this function from context_tracking.c to sched/core.c. However, I still think we need the need_resched() loop, we can't do this only once. And in fact the simplest fix could just turn it into local_irq_disable(); preempt_schedule_irq(); local_irq_enable(); > Again, it is too late for me... Most probably I am wrong, but somehow > it seems to me that the real fix should try to kill preempt_schedule_context() > altogether and teach preempt_schedule() to play well with CONTEXT_TRACKING. Yes, the very fact that preempt_enable() != trace_preempt_on() + preempt_enable_notrace() looks simply wrong imo. And preempt_enable_notrace() has users outside of the tracing code which can run in IN_USER state. For example, why should __perf_sw_event() worry about context_tracking.state? OTOH, if the caller of preempt_enable_notrace() actually needs to take care of potential IN_USER state, then it probably has other problems. And indeed, ftrace_ops_control_func() has to check rcu_is_watching() anyway. IOW, imho 29bb9e5a75 "tracing/context-tracking: Add preempt_schedule_context() for tracing" was not a right solution. Perhaps the tracing functions can be changed to switch to IN_KERNEL mode, or at least perhaps we can add the special preempt_disable_ftrace() helper. But this needs another discussion. Either way, I do think that preempt_schedule_context() in its current form must die. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/