Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753012AbbBSKQR (ORCPT ); Thu, 19 Feb 2015 05:16:17 -0500 Received: from bombadil.infradead.org ([198.137.202.9]:56602 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752453AbbBSKQP (ORCPT ); Thu, 19 Feb 2015 05:16:15 -0500 Date: Thu, 19 Feb 2015 11:16:07 +0100 From: Peter Zijlstra To: Josh Poimboeuf Cc: Andrew Morton , Ingo Molnar , Jiri Kosina , Seth Jennings , linux-kernel@vger.kernel.org, Vojtech Pavlik Subject: Re: [PATCH 1/3] sched: add sched_task_call() Message-ID: <20150219101607.GG5029@twins.programming.kicks-ass.net> References: <20150216204436.GH5029@twins.programming.kicks-ass.net> <20150216220505.GB11861@treble.redhat.com> <20150217092450.GI5029@twins.programming.kicks-ass.net> <20150217141211.GC11861@treble.redhat.com> <20150217181541.GP5029@twins.programming.kicks-ass.net> <20150217212532.GJ11861@treble.redhat.com> <20150218152100.GZ5029@twins.programming.kicks-ass.net> <20150218171256.GA28553@treble.hsd1.ky.comcast.net> <20150219002058.GD5029@twins.programming.kicks-ass.net> <20150219041753.GA13423@treble.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150219041753.GA13423@treble.redhat.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4370 Lines: 107 On Wed, Feb 18, 2015 at 10:17:53PM -0600, Josh Poimboeuf wrote: > On Thu, Feb 19, 2015 at 01:20:58AM +0100, Peter Zijlstra wrote: > > On Wed, Feb 18, 2015 at 11:12:56AM -0600, Josh Poimboeuf wrote: > > > The next line of attack is patching tasks when exiting the kernel to > > > user space (system calls, interrupts, signals), to catch all CPU-bound > > > and some I/O-bound tasks. That's done in patch 9 [1] of the consistency > > > model patch set. > > > > So the HPC people are really into userspace that does for (;;) ; and > > isolate that on CPUs and have the tick interrupt stopped and all that. > > > > You'll not catch those threads on the sysexit path. > > > > And I'm fairly sure they'll not want to SIGSTOP/CONT their stuff either. > > > > Now its fairly easy to also handle this; just mark those tasks with a > > _TIF_WORK_SYSCALL_ENTRY flag, have that slowpath wait for the flag to > > go-away, then flip their state and clear the flag. > > I guess you mean patch the task when it makes a syscall? I'm doing that > already on syscall exit with a bit in _TIF_ALLWORK_MASK and > _TIF_DO_NOTIFY_MASK. No, these tasks will _never_ make syscalls. So you need to guarantee they don't accidentally enter the kernel while you flip them. Something like so should do. You set TIF_ENTER_WAIT on them, check they're still in userspace, flip them then clear TIF_ENTER_WAIT. --- arch/x86/include/asm/thread_info.h | 4 +++- arch/x86/kernel/entry_64.S | 2 ++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h index e82e95abc92b..baa836f13536 100644 --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -90,6 +90,7 @@ struct thread_info { #define TIF_SYSCALL_TRACEPOINT 28 /* syscall tracepoint instrumentation */ #define TIF_ADDR32 29 /* 32-bit address space on 64 bits */ #define TIF_X32 30 /* 32-bit native x86-64 binary */ +#define TIF_ENTER_WAIT 31 #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE) #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME) @@ -113,12 +114,13 @@ struct thread_info { #define _TIF_SYSCALL_TRACEPOINT (1 << TIF_SYSCALL_TRACEPOINT) #define _TIF_ADDR32 (1 << TIF_ADDR32) #define _TIF_X32 (1 << TIF_X32) +#define _TIF_ENTER_WAIT (1 << TIF_ENTER_WAIT) /* work to do in syscall_trace_enter() */ #define _TIF_WORK_SYSCALL_ENTRY \ (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU | _TIF_SYSCALL_AUDIT | \ _TIF_SECCOMP | _TIF_SINGLESTEP | _TIF_SYSCALL_TRACEPOINT | \ - _TIF_NOHZ) + _TIF_NOHZ | _TIF_ENTER_WAIT) /* work to do in syscall_trace_leave() */ #define _TIF_WORK_SYSCALL_EXIT \ diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index db13655c3a2a..735566b35903 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -387,6 +387,8 @@ GLOBAL(system_call_after_swapgs) /* Do syscall tracing */ tracesys: + andl $_TIF_ENTER_WAIT,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) + jnz tracesys; leaq -REST_SKIP(%rsp), %rdi movq $AUDIT_ARCH_X86_64, %rsi call syscall_trace_enter_phase1 > > > As a last resort, if there are still any tasks which are sleeping on a > > > to-be-patched function, the user can send them SIGSTOP and SIGCONT to > > > force them to be patched. > > > > You typically cannot SIGSTOP/SIGCONT kernel threads. Also > > TASK_UNINTERRUPTIBLE sleeps are unaffected by signals. > > > > Bit pesky that.. needs pondering. I still absolutely hate you need to disturb userspace like that. Signals are quite visible and perturb userspace state. Also, you cannot SIGCONT a task that was SIGSTOP'ed by userspace for what they thought was a good reason. You'd wreck their state. > But now I'm thinking that kthreads will almost never be a problem. Most > kthreads are basically this: You guys are way too optimistic; maybe its because I've worked on realtime stuff too much, but I'm always looking at worst cases. If you can't handle those, I feel you might as well not bother :-) > Patching thread_fn wouldn't be possible unless we killed the thread. It is, see kthread_park(). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/