Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751726AbbGBJtM (ORCPT ); Thu, 2 Jul 2015 05:49:12 -0400 Received: from mail.skyhub.de ([78.46.96.112]:32992 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750908AbbGBJtA (ORCPT ); Thu, 2 Jul 2015 05:49:00 -0400 Date: Thu, 2 Jul 2015 11:48:37 +0200 From: Borislav Petkov To: Andy Lutomirski Cc: x86@kernel.org, linux-kernel@vger.kernel.org, =?utf-8?B?RnLDqWTDqXJpYw==?= Weisbecker , Rik van Riel , Oleg Nesterov , Denys Vlasenko , Kees Cook , Brian Gerst , paulmck@linux.vnet.ibm.com Subject: Re: [PATCH v4 09/17] x86/entry: Add new, comprehensible entry and exit hooks Message-ID: <20150702094837.GD4001@pd.tnic> References: <5fa9d4c6b13d0d5a5cf77c64e9253af32b391f1c.1435602481.git.luto@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <5fa9d4c6b13d0d5a5cf77c64e9253af32b391f1c.1435602481.git.luto@kernel.org> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4334 Lines: 122 On Mon, Jun 29, 2015 at 12:33:41PM -0700, Andy Lutomirski wrote: > The current entry and exit code is incomprehensible, appears to work > primary by luck, and is very difficult to incrementally improve. Add > new code in preparation for simply deleting the old code. > > prepare_exit_to_usermode is a new function that will handle all slow > path exits to user mode. It is called with IRQs disabled and it > leaves us in a state in which it is safe to immediately return to > user mode. IRQs must not be re-enabled at any point after > prepare_exit_to_usermode returns and user mode is actually entered. > (We can, of course, fail to enter user mode and treat that failure > as a fresh entry to kernel mode.) All callers of do_notify_resume > will be migrated to call prepare_exit_to_usermode instead; > prepare_exit_to_usermode needs to do everything that > do_notify_resume does, but it also takes care of scheduling and > context tracking. Unlike do_notify_resume, it does not need to be > called in a loop. > > syscall_return_slowpath is exactly what it sounds like. It will be > called on any syscall exit slow path. It will replaces > syscall_trace_leave and it calls prepare_exit_to_usermode on the way > out. > > Signed-off-by: Andy Lutomirski > --- > arch/x86/entry/common.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 111 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c > index 8a7e35af7164..55530d6dd1bd 100644 > --- a/arch/x86/entry/common.c > +++ b/arch/x86/entry/common.c > @@ -207,6 +207,7 @@ long syscall_trace_enter(struct pt_regs *regs) > return syscall_trace_enter_phase2(regs, arch, phase1_result); > } > > +/* Deprecated. */ > void syscall_trace_leave(struct pt_regs *regs) Ah yes, this will get replaced later with syscall_return_slowpath below. > { > bool step; > @@ -237,8 +238,117 @@ void syscall_trace_leave(struct pt_regs *regs) > user_enter(); > } > > +static struct thread_info *pt_regs_to_thread_info(struct pt_regs *regs) > +{ > + unsigned long top_of_stack = > + (unsigned long)(regs + 1) + TOP_OF_KERNEL_STACK_PADDING; > + return (struct thread_info *)(top_of_stack - THREAD_SIZE); > +} > + > +/* Called with IRQs disabled. */ > +__visible void prepare_exit_to_usermode(struct pt_regs *regs) > +{ > + if (WARN_ON(!irqs_disabled())) > + local_irq_disable(); > + > + /* > + * In order to return to user mode, we need to have IRQs off with > + * none of _TIF_SIGPENDING, _TIF_NOTIFY_RESUME, _TIF_USER_RETURN_NOTIFY, > + * _TIF_UPROBE, or _TIF_NEED_RESCHED set. Several of these flags > + * can be set at any time on preemptable kernels if we have IRQs on, > + * so we need to loop. Disabling preemption wouldn't help: doing the > + * work to clear some of the flags can sleep. > + */ > + while (true) { > + u32 cached_flags = > + READ_ONCE(pt_regs_to_thread_info(regs)->flags); > + > + if (!(cached_flags & (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | > + _TIF_UPROBE | _TIF_NEED_RESCHED))) > + break; > + > + /* We have work to do. */ > + local_irq_enable(); > + > + if (cached_flags & _TIF_NEED_RESCHED) > + schedule(); > + > + if (cached_flags & _TIF_UPROBE) > + uprobe_notify_resume(regs); > + > + /* deal with pending signal delivery */ > + if (cached_flags & _TIF_SIGPENDING) > + do_signal(regs); > + > + if (cached_flags & _TIF_NOTIFY_RESUME) { > + clear_thread_flag(TIF_NOTIFY_RESUME); > + tracehook_notify_resume(regs); > + } > + > + if (cached_flags & _TIF_USER_RETURN_NOTIFY) > + fire_user_return_notifiers(); > + > + /* Disable IRQs and retry */ > + local_irq_disable(); > + } Stupid question: what assures us that we'll break out of this loop at some point? I.e., isn't the scenario possible of something always setting bits in ->flags while we're handling stuff in the IRQs on section? OTOH, this is what int_ret_from_sys_call() does now anyway so we should be fine. Yeah, it looks that way. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/