Received: by 2002:a25:868d:0:0:0:0:0 with SMTP id z13csp727962ybk; Wed, 20 May 2020 10:25:49 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx0Kr9BGv4p5a0uHJHpEEMC8XmezP4dAqapCeqtn0ehiKKkD1J4fEDj3guoQXPeUH4aOEmC X-Received: by 2002:a17:907:40f1:: with SMTP id no1mr138814ejb.178.1589995549849; Wed, 20 May 2020 10:25:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1589995549; cv=none; d=google.com; s=arc-20160816; b=XjbE2y0g5tCoIQKm5r2qgQfTDtQVp2Z4MmuptVstfwvWC6H9Mp4hJqBmV85Z14EVQx beJ1hPIghB+lckchZLI0tjWdErVUHbjyHBR+v2wq9Tk398o1bgAE5p+DtZ5hCRgBIva5 lMuPN927ye2VxPulLF8+JCrdlLoTbutS0ar7cLqlMrqCAl/FNPPC2QJiX9yuAvhpBfTC TdOzhV3aQ1oY8X5OCt0uelhBb33aIAAZsmNcRJuNyRdAOsLtU5rQcfhOo7p6X6HVDmSL 8WuZOYd/TIsb2JTEEEKzI2RzZMfHogHjjVwYZSaUX70ijuSZ/QfBcIgbX+SzMM3vu+h+ l2/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=L3ze1zMNPC88GIYk4LxjaF1WS6gE7n323108UjTN0Mw=; b=hqmlaTfUCFNSS6635DVF4Kqz0qbcStmc3NZnlHdiUIwRj2/PcbkbZB5PRjrT3+TPJP DRzK7lqNtXeppFogqX4X2pgHW+sJ9aociwAe+R4LSO+kkYd7tF6CNRfTnz35WRSpW5Q2 4lFbLXCqF9MvZg3MjOQ4V/04rPdMLla0g2gUhQ/catozBOwY1Wav3HBEa3EQMQMmx9rq 6cuJYcYC/IQ8OE7VvGzuyJWpOphFnAOvet8bJ3VgeRNR55/qF4wcqLGHrYaiJRCeHXA8 a0s+XPbBtFGKnKTcYQ44k8xScEqvRwB7ac7F6bd1e7+Ym0xuOHe0k5MO1BBwMWecO+fj SMUg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=nBfvpKHE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h15si2093024eja.51.2020.05.20.10.25.27; Wed, 20 May 2020 10:25:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=nBfvpKHE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727839AbgETRXR (ORCPT + 99 others); Wed, 20 May 2020 13:23:17 -0400 Received: from mail.kernel.org ([198.145.29.99]:37458 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728138AbgETRXN (ORCPT ); Wed, 20 May 2020 13:23:13 -0400 Received: from mail-wr1-f53.google.com (mail-wr1-f53.google.com [209.85.221.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 82CAD20899 for ; Wed, 20 May 2020 17:23:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1589995392; bh=BFLQ/xq4FnWU6i5c9z1NolKnkBi9zZFq5E30La70Lg8=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=nBfvpKHE44qsSO+QmT+g2rruAlPKAgHvdSBWsGy4VQIoUumXO2FbQ+ka6HJa1dHlA lhbZnnm0aa7OXw6BR7aENphWEmDmm7kwF+2co5hIEShPUydU0KifdTdTAkDKAQirkx jm6IKJRr6G5tIq92umLeJHet4YHv6nBFXIJK9qhc= Received: by mail-wr1-f53.google.com with SMTP id l17so3987150wrr.4 for ; Wed, 20 May 2020 10:23:12 -0700 (PDT) X-Gm-Message-State: AOAM531/uQgRzUf7fUkNi13C0c8JBDa86PlYC+C2QXLB4xx6gYzj61QI 9UOQHNS+4uxmEz0KZiJGG3LtJ5lEr8evpeeJxLFMvw== X-Received: by 2002:adf:a389:: with SMTP id l9mr5199865wrb.18.1589995390910; Wed, 20 May 2020 10:23:10 -0700 (PDT) MIME-Version: 1.0 References: <20200515234547.710474468@linutronix.de> <20200515235125.425810667@linutronix.de> <87imgr7nwp.fsf@nanos.tec.linutronix.de> <87y2pm4ruh.fsf@nanos.tec.linutronix.de> In-Reply-To: From: Andy Lutomirski Date: Wed, 20 May 2020 10:22:59 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [patch V6 10/37] x86/entry: Switch XEN/PV hypercall entry to IDTENTRY To: Andy Lutomirski Cc: Thomas Gleixner , Andrew Cooper , LKML , X86 ML , "Paul E. McKenney" , Alexandre Chartre , Frederic Weisbecker , Paolo Bonzini , Sean Christopherson , Masami Hiramatsu , Petr Mladek , Steven Rostedt , Joel Fernandes , Boris Ostrovsky , Juergen Gross , Brian Gerst , Mathieu Desnoyers , Josh Poimboeuf , Will Deacon , Tom Lendacky , Wei Liu , Michael Kelley , Jason Chen CJ , Zhao Yakui , "Peter Zijlstra (Intel)" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 20, 2020 at 8:16 AM Andy Lutomirski wrote: > > On Wed, May 20, 2020 at 7:13 AM Thomas Gleixner wrote: > > > > Andy Lutomirski writes: > > > On Tue, May 19, 2020 at 11:58 AM Thomas Gleixner wrote: > > >> Which brings you into the situation that you call schedule() from the > > >> point where we just moved it out. If we would go there we'd need to > > >> ensure that RCU is watching as well. idtentry_exit() might have it > > >> turned off .... > > > > > > I don't think this is possible. Once you untangle all the wrappers, > > > the call sites are effectively: > > > > > > __this_cpu_write(xen_in_preemptible_hcall, true); > > > CALL_NOSPEC to the hypercall page > > > __this_cpu_write(xen_in_preemptible_hcall, false); > > > > > > I think IF=1 when this happens, but I won't swear to it. RCU had > > > better be watching. > > > > > > As I understand it, the one and only situation Xen wants to handle is > > > that an interrupt gets delivered during the hypercall. The hypervisor > > > is too clever for its own good and deals with this by rewinding RIP to > > > the beginning of whatever instruction did the hypercall and delivers > > > the interrupt, and we end up in this handler. So, if this happens, > > > the idea is to not only handle the interrupt but to schedule if > > > scheduling would be useful. > > > > > > So I don't think we need all this RCU magic. This really ought to be > > > able to be simplified to: > > > > > > idtentry_exit(); > > > > > > if (appropriate condition) > > > schedule(); > > > > This is exactly the kind of tinkering which causes all kinds of trouble. > > > > idtentry_exit() > > > > if (user_mode(regs)) { > > prepare_exit_to_usermode(regs); > > } else if (regs->flags & X86_EFLAGS_IF) { > > /* Check kernel preemption, if enabled */ > > if (IS_ENABLED(CONFIG_PREEMPTION)) { > > .... > > } > > instrumentation_begin(); > > /* Tell the tracer that IRET will enable interrupts */ > > trace_hardirqs_on_prepare(); > > lockdep_hardirqs_on_prepare(CALLER_ADDR0); > > instrumentation_end(); > > rcu_irq_exit(); > > lockdep_hardirqs_on(CALLER_ADDR0); > > } else { > > /* IRQ flags state is correct already. Just tell RCU */ > > rcu_irq_exit(); > > } > > > > So in case IF is set then this already told the tracer and lockdep that > > interrupts are enabled. And contrary to the ugly version this exit path > > does not use rcu_irq_exit_preempt() which is there to warn about crappy > > RCU state when trying to schedule. > > > > So we went great length to sanitize _all_ of this and make it consistent > > just to say: screw it for that xen thingy. > > > > The extra checks and extra warnings for scheduling come with the > > guarantee to bitrot when idtentry_exit() or any logic invoked from there > > is changed. It's going to look like this: > > > > /* > > * If the below causes problems due to inconsistent state > > * or out of sync sanity checks, please complain to > > * luto@kernel.org directly. > > */ > > idtentry_exit(); > > > > if (user_mode(regs) || !(regs->flags & X86_FlAGS_IF)) > > return; > > > > if (!__this_cpu_read(xen_in_preemptible_hcall)) > > return; > > > > rcu_sanity_check_for_preemption(); > > > > if (need_resched()) { > > instrumentation_begin(); > > xen_maybe_preempt_hcall(); > > trace_hardirqs_on(); > > instrumentation_end(); > > } > > > > Of course you need the extra rcu_sanity_check_for_preemption() function > > just for this muck. > > > > That's a true win on all ends? I don't think so. > > Hmm, fair enough. I guess the IRQ tracing messes a bunch of this logic up. > > Let's keep your patch as is and consider cleanups later. One approach > might be to make this work more like extable handling: instead of > trying to schedule from inside the interrupt handler here, patch up > RIP and perhaps some other registers and let the actual Xen code just > do cond_resched(). IOW, try to make this work the way it always > should have: > > int ret; > do { > ret = issue_the_hypercall(); > cond_resched(); > } while (ret == EAGAIN); Andrew Cooper pointed out that there is too much magic in Xen for this to work. So never mind.