Date: Thu, 2 Jul 2015 11:48:37 +0200
From: Borislav Petkov <bp@alien8.de>
To: Andy Lutomirski <luto@kernel.org>
Cc: x86@kernel.org, linux-kernel@vger.kernel.org,
        =?utf-8?B?RnLDqWTDqXJpYw==?= Weisbecker <fweisbec@gmail.com>,
        Rik van Riel <riel@redhat.com>, Oleg Nesterov <oleg@redhat.com>,
        Denys Vlasenko <vda.linux@googlemail.com>,
        Kees Cook <keescook@chromium.org>, Brian Gerst <brgerst@gmail.com>,
        paulmck@linux.vnet.ibm.com
Subject: Re: [PATCH v4 09/17] x86/entry: Add new, comprehensible entry and
 exit hooks
Message-ID: <20150702094837.GD4001@pd.tnic>
References: <cover.1435602481.git.luto@kernel.org>
 <5fa9d4c6b13d0d5a5cf77c64e9253af32b391f1c.1435602481.git.luto@kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <5fa9d4c6b13d0d5a5cf77c64e9253af32b391f1c.1435602481.git.luto@kernel.org>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4334
Lines: 122

On Mon, Jun 29, 2015 at 12:33:41PM -0700, Andy Lutomirski wrote:
> The current entry and exit code is incomprehensible, appears to work
> primary by luck, and is very difficult to incrementally improve.  Add
> new code in preparation for simply deleting the old code.
> 
> prepare_exit_to_usermode is a new function that will handle all slow
> path exits to user mode.  It is called with IRQs disabled and it
> leaves us in a state in which it is safe to immediately return to
> user mode.  IRQs must not be re-enabled at any point after
> prepare_exit_to_usermode returns and user mode is actually entered.
> (We can, of course, fail to enter user mode and treat that failure
> as a fresh entry to kernel mode.)  All callers of do_notify_resume
> will be migrated to call prepare_exit_to_usermode instead;
> prepare_exit_to_usermode needs to do everything that
> do_notify_resume does, but it also takes care of scheduling and
> context tracking.  Unlike do_notify_resume, it does not need to be
> called in a loop.
> 
> syscall_return_slowpath is exactly what it sounds like.  It will be
> called on any syscall exit slow path.  It will replaces
> syscall_trace_leave and it calls prepare_exit_to_usermode on the way
> out.
> 
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/entry/common.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 111 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index 8a7e35af7164..55530d6dd1bd 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -207,6 +207,7 @@ long syscall_trace_enter(struct pt_regs *regs)
>  		return syscall_trace_enter_phase2(regs, arch, phase1_result);
>  }
>  
> +/* Deprecated. */
>  void syscall_trace_leave(struct pt_regs *regs)

Ah yes, this will get replaced later with syscall_return_slowpath below.

>  {
>  	bool step;
> @@ -237,8 +238,117 @@ void syscall_trace_leave(struct pt_regs *regs)
>  	user_enter();
>  }
>  
> +static struct thread_info *pt_regs_to_thread_info(struct pt_regs *regs)
> +{
> +	unsigned long top_of_stack =
> +		(unsigned long)(regs + 1) + TOP_OF_KERNEL_STACK_PADDING;
> +	return (struct thread_info *)(top_of_stack - THREAD_SIZE);
> +}
> +
> +/* Called with IRQs disabled. */
> +__visible void prepare_exit_to_usermode(struct pt_regs *regs)
> +{
> +	if (WARN_ON(!irqs_disabled()))
> +		local_irq_disable();
> +
> +	/*
> +	 * In order to return to user mode, we need to have IRQs off with
> +	 * none of _TIF_SIGPENDING, _TIF_NOTIFY_RESUME, _TIF_USER_RETURN_NOTIFY,
> +	 * _TIF_UPROBE, or _TIF_NEED_RESCHED set.  Several of these flags
> +	 * can be set at any time on preemptable kernels if we have IRQs on,
> +	 * so we need to loop.  Disabling preemption wouldn't help: doing the
> +	 * work to clear some of the flags can sleep.
> +	 */
> +	while (true) {
> +		u32 cached_flags =
> +			READ_ONCE(pt_regs_to_thread_info(regs)->flags);
> +
> +		if (!(cached_flags & (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME |
> +				      _TIF_UPROBE | _TIF_NEED_RESCHED)))
> +			break;
> +
> +		/* We have work to do. */
> +		local_irq_enable();
> +
> +		if (cached_flags & _TIF_NEED_RESCHED)
> +			schedule();
> +
> +		if (cached_flags & _TIF_UPROBE)
> +			uprobe_notify_resume(regs);
> +
> +		/* deal with pending signal delivery */
> +		if (cached_flags & _TIF_SIGPENDING)
> +			do_signal(regs);
> +
> +		if (cached_flags & _TIF_NOTIFY_RESUME) {
> +			clear_thread_flag(TIF_NOTIFY_RESUME);
> +			tracehook_notify_resume(regs);
> +		}
> +
> +		if (cached_flags & _TIF_USER_RETURN_NOTIFY)
> +			fire_user_return_notifiers();
> +
> +		/* Disable IRQs and retry */
> +		local_irq_disable();
> +	}

Stupid question: what assures us that we'll break out of this loop
at some point? I.e., isn't the scenario possible of something always
setting bits in ->flags while we're handling stuff in the IRQs on
section?

OTOH, this is what int_ret_from_sys_call() does now anyway so we should
be fine.

Yeah, it looks that way.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/