Received: by 2002:a25:868d:0:0:0:0:0 with SMTP id z13csp3670505ybk; Tue, 19 May 2020 10:08:20 -0700 (PDT) X-Google-Smtp-Source: ABdhPJye262ZYCnlgeTkMcuhfrtHZcqWER3dEak45XEYBBTsHoMw7yVH6hLYrzGlF9NLUEzW1fDf X-Received: by 2002:a17:906:81c6:: with SMTP id e6mr177498ejx.241.1589908100083; Tue, 19 May 2020 10:08:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1589908100; cv=none; d=google.com; s=arc-20160816; b=i6jeD0cY+KQe3sgh9i+5bKuJW4yiVHcbabTUu3tYJIcweHNNMbunegL9O0E/Q72KwG IvvDZrJ96YRjPxbtMk54GhyUzmrFYptcsXWZI8lbZT/DSP7AA2Gqyok+0M7CRwiDQxvw Rw22iYEzfR5BCL9fm9fja7wqtUlWxpKXHBq2lIqhcfkG8F3RV/HLqO7atHlPc3qQ1f7W pcSrYbsayGffO7tuoK1GUDa3d66L1FkFWB7ly666p8raojdlpaLP4e22u5/atgdczB1Y m4hneA+Kqr++y98LFi+gv3NnS+nK13WfY/tv8hmXqUyUbxY7+3KL32yNG27ZiS2gfiE3 3cYw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=2L3pnX0xXSOzzpL+vk356Epuu+6O0nTB3u4WOuWD62M=; b=DVvUIwjU8JvuFdIExTYNcVDWOOZd6Cn1WQOjU5p+s4BLWorKqkKjBCk38UyTaYmAas Hsk9LuQoa5NcTjgXDEevWppx1MXdX3zc0dGzcCqt0CCtJdT1APdkYlSbDuvmoKoRBfrT y7P/7rhrd0i3dG5TY+JJRZKwDqOY5l1jHerCV92qTnfBQC3gaSQxYmdnojIowFZ4MMJw +JuJxewZkTPklCFsPXFuMoldy9ZrXV6uRRWrR2mg15YJ7CsifNjCvFhYmvu+4SvFx+LY 9ybplKTpHUOCt0+czT5rufYz1RIifeJoL7uCXIkIEEHc5KdrmyWQYAf/0AGHmGTjRFk/ SlQQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=e3AorEdh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id r27si218098ejb.750.2020.05.19.10.07.56; Tue, 19 May 2020 10:08:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=e3AorEdh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729391AbgESRGU (ORCPT + 99 others); Tue, 19 May 2020 13:06:20 -0400 Received: from mail.kernel.org ([198.145.29.99]:38248 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728689AbgESRGU (ORCPT ); Tue, 19 May 2020 13:06:20 -0400 Received: from mail-wr1-f45.google.com (mail-wr1-f45.google.com [209.85.221.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 4188420826 for ; Tue, 19 May 2020 17:06:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1589907979; bh=Q+yrETbVImO1E5Xvyhq3XAyizDi41LibvtObQeK03Q0=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=e3AorEdhRH5pQTVmMXn9LIm3tOxiXnkrq/uw/M00hNk4fI7d6UcAYSNNzlXJqfXcE WC+lE4+dV/gW8rOQq8xTdI8s0JsRfy4TdLAmx8UT3PlrVkZiRGkrwYsjDwESrMYuOb 4lMIpqm0TldqYKAkft9oaC7hxQtYLMMs8eLTBVoo= Received: by mail-wr1-f45.google.com with SMTP id e1so225124wrt.5 for ; Tue, 19 May 2020 10:06:19 -0700 (PDT) X-Gm-Message-State: AOAM530iGYI/vyc87Ywjxm4Z9gRHO3sK8Up0bsOetajk1AYNYTd1qO/G gtfaCFS+8kYFbe9hRSneV21WA4+QZfOeVY8oavDJAA== X-Received: by 2002:adf:a389:: with SMTP id l9mr27181359wrb.18.1589907977639; Tue, 19 May 2020 10:06:17 -0700 (PDT) MIME-Version: 1.0 References: <20200515234547.710474468@linutronix.de> <20200515235125.425810667@linutronix.de> In-Reply-To: <20200515235125.425810667@linutronix.de> From: Andy Lutomirski Date: Tue, 19 May 2020 10:06:05 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [patch V6 10/37] x86/entry: Switch XEN/PV hypercall entry to IDTENTRY To: Thomas Gleixner , Andrew Cooper Cc: LKML , X86 ML , "Paul E. McKenney" , Andy Lutomirski , Alexandre Chartre , Frederic Weisbecker , Paolo Bonzini , Sean Christopherson , Masami Hiramatsu , Petr Mladek , Steven Rostedt , Joel Fernandes , Boris Ostrovsky , Juergen Gross , Brian Gerst , Mathieu Desnoyers , Josh Poimboeuf , Will Deacon , Tom Lendacky , Wei Liu , Michael Kelley , Jason Chen CJ , Zhao Yakui , "Peter Zijlstra (Intel)" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 15, 2020 at 5:10 PM Thomas Gleixner wrote: > > > Convert the XEN/PV hypercall to IDTENTRY: > > - Emit the ASM stub with DECLARE_IDTENTRY > - Remove the ASM idtentry in 64bit > - Remove the open coded ASM entry code in 32bit > - Remove the old prototypes > > The handler stubs need to stay in ASM code as it needs corner case handling > and adjustment of the stack pointer. > > Provide a new C function which invokes the entry/exit handling and calls > into the XEN handler on the interrupt stack. > > The exit code is slightly different from the regular idtentry_exit() on > non-preemptible kernels. If the hypercall is preemptible and need_resched() > is set then XEN provides a preempt hypercall scheduling function. Add it as > conditional path to __idtentry_exit() so the function can be reused. > > __idtentry_exit() is forced inlined so on the regular idtentry_exit() path > the extra condition is optimized out by the compiler. > > Signed-off-by: Thomas Gleixner > Cc: Boris Ostrovsky > Cc: Juergen Gross > > diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c > index 882ada245bd5..34caf3849632 100644 > --- a/arch/x86/entry/common.c > +++ b/arch/x86/entry/common.c > @@ -27,6 +27,9 @@ > #include > #include > > +#include > +#include > + > #include > #include > #include > @@ -35,6 +38,7 @@ > #include > #include > #include > +#include > > #define CREATE_TRACE_POINTS > #include > @@ -539,7 +543,8 @@ void noinstr idtentry_enter(struct pt_regs *regs) > } > } > > -static __always_inline void __idtentry_exit(struct pt_regs *regs) > +static __always_inline void __idtentry_exit(struct pt_regs *regs, > + bool preempt_hcall) > { > lockdep_assert_irqs_disabled(); > > @@ -573,6 +578,16 @@ static __always_inline void __idtentry_exit(struct pt_regs *regs) > instrumentation_end(); > return; > } > + } else if (IS_ENABLED(CONFIG_XEN_PV)) { > + if (preempt_hcall) { > + /* See CONFIG_PREEMPTION above */ > + instrumentation_begin(); > + rcu_irq_exit_preempt(); > + xen_maybe_preempt_hcall(); > + trace_hardirqs_on(); > + instrumentation_end(); > + return; > + } Ewwwww! This shouldn't be taken as a NAK -- it's just an expression of disgust. > } > /* > * If preemption is disabled then this needs to be done > @@ -612,5 +627,43 @@ static __always_inline void __idtentry_exit(struct pt_regs *regs) > */ > void noinstr idtentry_exit(struct pt_regs *regs) > { > - __idtentry_exit(regs); > + __idtentry_exit(regs, false); > +} > + > +#ifdef CONFIG_XEN_PV > +static void __xen_pv_evtchn_do_upcall(void) > +{ > + irq_enter_rcu(); > + inc_irq_stat(irq_hv_callback_count); > + > + xen_hvm_evtchn_do_upcall(); > + > + irq_exit_rcu(); > +} > + > +__visible noinstr void xen_pv_evtchn_do_upcall(struct pt_regs *regs) > +{ > + struct pt_regs *old_regs; > + > + idtentry_enter(regs); > + old_regs = set_irq_regs(regs); > + > + if (!irq_needs_irq_stack(regs)) { > + instrumentation_begin(); > + __xen_pv_evtchn_do_upcall(); > + instrumentation_end(); > + } else { > + run_on_irqstack(__xen_pv_evtchn_do_upcall, NULL); > + } Shouldn't this be: instrumentation_begin(); if (!irq_needs_irq_stack(...)) __blah(); else run_on_irqstack(__blah, NULL); instrumentation_end(); or even: instrumentation_begin(); run_on_irqstack_if_needed(__blah, NULL); instrumentation_end(); ****** BUT ******* I think this is all arse-backwards. This is a giant mess designed to pretend we support preemption and to emulate normal preemption in a non-preemptible kernel. I propose one to two massive cleanups: A: Just delete all of this code. Preemptible hypercalls on non-preempt kernels will still process interrupts but won't get preempted. If you want preemption, compile with preemption. B: Turn this thing around. Specifically, in the one and only case we care about, we know pretty much exactly what context we got this entry in: we're running in a schedulable context doing an explicitly preemptible hypercall, and we have RIP pointing at a SYSCALL instruction (presumably, but we shouldn't bet on it) in the hypercall page. Ideally we would change the Xen PV ABI so the hypercall would return something like EAGAIN instead of auto-restarting and we could ditch this mess entirely. But the ABI seems to be set in stone or at least in molasses, so how about just: idt_entry(exit(regs)); if (inhcall && need_resched()) schedule(); Off the top of my head, I don't see any reason this wouldn't work, and it's a heck of a lot cleaner. Possibly it should really be: if (inhcall) { if (!WARN_ON(regs->ip not in hypercall page)) cond_resched(); }