Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756273AbbEUTID (ORCPT ); Thu, 21 May 2015 15:08:03 -0400 Received: from mail.efficios.com ([78.47.125.74]:52785 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756198AbbEUTH7 (ORCPT ); Thu, 21 May 2015 15:07:59 -0400 Date: Thu, 21 May 2015 19:08:02 +0000 (UTC) From: Mathieu Desnoyers To: Peter Zijlstra Cc: Paul Turner , Andrew Hunter , Ben Maurer , linux-kernel@vger.kernel.org, Ingo Molnar , Steven Rostedt , "Paul E. McKenney" , Josh Triplett , Lai Jiangshan , Linus Torvalds , Andrew Morton Message-ID: <511681386.5786.1432235282365.JavaMail.zimbra@efficios.com> In-Reply-To: <20150521183240.GW3644@twins.programming.kicks-ass.net> References: <1432219487-13364-1-git-send-email-mathieu.desnoyers@efficios.com> <20150521183240.GW3644@twins.programming.kicks-ass.net> Subject: Re: [RFC PATCH] percpu system call: fast userspace percpu critical sections MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [192.222.194.238] X-Mailer: Zimbra 8.0.7_GA_6021 (ZimbraWebClient - FF38 (Linux)/8.0.7_GA_6021) Thread-Topic: percpu system call: fast userspace percpu critical sections Thread-Index: HRU5lK2eCiLDDjur0CO9bSpiHLWCUw== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3725 Lines: 128 ----- Original Message ----- > On Thu, May 21, 2015 at 10:44:47AM -0400, Mathieu Desnoyers wrote: > > > +struct thread_percpu_user { > > + int32_t nesting; > > + int32_t signal_sent; > > + int32_t signo; > > + int32_t current_cpu; > > +}; > > I would require this thing be naturally aligned, such that it does not > cross cacheline boundaries. Good point. Adding a comment into the code to that effect. > > > + > > +static void percpu_user_sched_in(struct preempt_notifier *notifier, int > > cpu) > > +{ > > + struct thread_percpu_user __user *tpu_user; > > + struct thread_percpu_user tpu; > > + struct task_struct *t = current; > > + > > + tpu_user = t->percpu_user; > > + if (tpu_user == NULL) > > + return; > > + if (unlikely(t->flags & PF_EXITING)) > > + return; > > + /* > > + * access_ok() of tpu_user has already been checked by sys_percpu(). > > + */ > > + if (__put_user(smp_processor_id(), &tpu_user->current_cpu)) { > > + WARN_ON_ONCE(1); > > + return; > > + } > > This seems a waste; you already read the number unconditionally, might > as well double check and avoid the store. > > > + if (__copy_from_user(&tpu, tpu_user, sizeof(tpu))) { > > + WARN_ON_ONCE(1); > > + return; > > + } > > if (tpu.current_cpu != smp_processor_id()) > __put_user(); Yep, and I could even use the "cpu" parameter received by the function rather than smp_processor_id(). > > > > > + if (!tpu.nesting || tpu.signal_sent) > > + return; > > + if (do_send_sig_info(tpu.signo, SEND_SIG_PRIV, t, 0)) { > > + WARN_ON_ONCE(1); > > + return; > > + } > > + tpu.signal_sent = 1; > > + if (__copy_to_user(tpu_user, &tpu, sizeof(tpu))) { > > + WARN_ON_ONCE(1); > > + return; > > + } > > +} > > Please do not use preempt notifiers for this. Do you recommend we issue a function call from the scheduler finish_task_switch() ? > > Second, this all is done with preemption disabled, this means that all > that user access can fail. OK, this is one part I was worried about. > > You print useless WARNs and misbehave. If you detect a desire to fault, > you could delay until return to userspace and try again there. But it > all adds complexity. We could keep a flag, and then call the function again if we detect a desire to fault. > > The big advantage pjt's scheme had is that we have the instruction > pointer, we do not need to go read userspace memory that might not be > there. And it being limited to a single range, while inconvenient, > simplifies the entire kernel side to: > > if ((unsigned long)(ip - offset) < size) > do_magic(); > > Which is still simpler than the above. There is one big aspect of pjt's approach that I still don't grasp after all this time that makes me worry. How does it interact with the following scenario ? Userspace thread - within the code region that needs to be restarted - signal handler nested on top - running within the signal handler code - preempted by kernel - checking instruction pointer misses the userspace stack underneath the signal handler. Given this scenario, is the kernel code really as simple as a pointer check on pt_regs, or do we need a stack walk over all signal frames ? Another way would be to check for the pt_regs instruction pointer whenever we receive a signal, but then it would require per-architectures modifications, and suddenly becomes less straightforward. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/