Received: by 2002:a25:868d:0:0:0:0:0 with SMTP id z13csp593245ybk; Wed, 20 May 2020 07:18:06 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzsGdN15oU17b4FeTu7w0WVKeeoNvFeTP2pzUPssqtbS8ywL6ij+KeB6UZmPbgx/MbPG9KK X-Received: by 2002:a05:6402:169a:: with SMTP id a26mr3471572edv.321.1589984286095; Wed, 20 May 2020 07:18:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1589984286; cv=none; d=google.com; s=arc-20160816; b=HcRi5CD9sLE0D76oEd+QU0QPUZZBVNjbjLMmpyDSScBqG9kYXQAjkaWN9YiaOKOUCc qugqgTZcg2n3agDSAa6mCyHVZh2VjjzhRuIAYdTW4HwyutuAPdRny58K9E1/kaSBNoyN 1vQ2+SYWPTQRHm7yN4ZNQyrWmSHY0E4XcLbB1/Q8IMYdr//rIdc92nx/5iM5czFs+rXQ JSgF1qzUKVVIqeqssTQ8VNT6nO4xve9gjVlqL/t33oODfCUmrA3LpkzYq6675C0Qvb5j L+9aAW4bvZ8del9Y487Na8Kza+bnjUiDIpwngPmvWhWV7x+w5VxlFbsJmbxlUqRSLNyE pabg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from; bh=dBoAGVbpNqhCfxYb5gzKzt6QU4OXeCBxdE8qzFa0RaA=; b=07mP8SIBK9r3UZ0MSfYgHGkUOc/GQb/vtX5/w6yYrNAQLka/WReji2f8LK/SWh3nZ7 S5xtwGoZxK3L27781ASXMNNJ1wkaC8qus2XCv1+fTAOswrq+e2YFNGcMvyP2tvD0Vfcq B7T6sQ83FuGVnCLF0ko9M7Ernx5GTQ0TXqnhpAknD3iVCyBisAjt6RI94wEeJr+PbU8S Hn3na/KpW6ia4J8BVQvuGmtvI+OkarttwD6mjquepdrhJ/HRKMHDyLqokVweVLlywdfg /42pLE2W6ztb/GTrcvfKGfXZI0xxkKFTA0j3w51SpQg2iEO+Twdq5eNjAU/cTjNhNMz6 QvMg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v28si1467059edi.192.2020.05.20.07.17.41; Wed, 20 May 2020 07:18:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726596AbgETOOH (ORCPT + 99 others); Wed, 20 May 2020 10:14:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53844 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726436AbgETOOG (ORCPT ); Wed, 20 May 2020 10:14:06 -0400 Received: from Galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BC4E5C061A0E for ; Wed, 20 May 2020 07:14:06 -0700 (PDT) Received: from p5de0bf0b.dip0.t-ipconnect.de ([93.224.191.11] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1jbPTL-0002Rw-Tb; Wed, 20 May 2020 16:13:28 +0200 Received: by nanos.tec.linutronix.de (Postfix, from userid 1000) id C97CB100C99; Wed, 20 May 2020 16:13:26 +0200 (CEST) From: Thomas Gleixner To: Andy Lutomirski Cc: Andy Lutomirski , Andrew Cooper , LKML , X86 ML , "Paul E. McKenney" , Alexandre Chartre , Frederic Weisbecker , Paolo Bonzini , Sean Christopherson , Masami Hiramatsu , Petr Mladek , Steven Rostedt , Joel Fernandes , Boris Ostrovsky , Juergen Gross , Brian Gerst , Mathieu Desnoyers , Josh Poimboeuf , Will Deacon , Tom Lendacky , Wei Liu , Michael Kelley , Jason Chen CJ , Zhao Yakui , "Peter Zijlstra \(Intel\)" Subject: Re: [patch V6 10/37] x86/entry: Switch XEN/PV hypercall entry to IDTENTRY In-Reply-To: References: <20200515234547.710474468@linutronix.de> <20200515235125.425810667@linutronix.de> <87imgr7nwp.fsf@nanos.tec.linutronix.de> Date: Wed, 20 May 2020 16:13:26 +0200 Message-ID: <87y2pm4ruh.fsf@nanos.tec.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Andy Lutomirski writes: > On Tue, May 19, 2020 at 11:58 AM Thomas Gleixner wrote: >> Which brings you into the situation that you call schedule() from the >> point where we just moved it out. If we would go there we'd need to >> ensure that RCU is watching as well. idtentry_exit() might have it >> turned off .... > > I don't think this is possible. Once you untangle all the wrappers, > the call sites are effectively: > > __this_cpu_write(xen_in_preemptible_hcall, true); > CALL_NOSPEC to the hypercall page > __this_cpu_write(xen_in_preemptible_hcall, false); > > I think IF=1 when this happens, but I won't swear to it. RCU had > better be watching. > > As I understand it, the one and only situation Xen wants to handle is > that an interrupt gets delivered during the hypercall. The hypervisor > is too clever for its own good and deals with this by rewinding RIP to > the beginning of whatever instruction did the hypercall and delivers > the interrupt, and we end up in this handler. So, if this happens, > the idea is to not only handle the interrupt but to schedule if > scheduling would be useful. > > So I don't think we need all this RCU magic. This really ought to be > able to be simplified to: > > idtentry_exit(); > > if (appropriate condition) > schedule(); This is exactly the kind of tinkering which causes all kinds of trouble. idtentry_exit() if (user_mode(regs)) { prepare_exit_to_usermode(regs); } else if (regs->flags & X86_EFLAGS_IF) { /* Check kernel preemption, if enabled */ if (IS_ENABLED(CONFIG_PREEMPTION)) { .... } instrumentation_begin(); /* Tell the tracer that IRET will enable interrupts */ trace_hardirqs_on_prepare(); lockdep_hardirqs_on_prepare(CALLER_ADDR0); instrumentation_end(); rcu_irq_exit(); lockdep_hardirqs_on(CALLER_ADDR0); } else { /* IRQ flags state is correct already. Just tell RCU */ rcu_irq_exit(); } So in case IF is set then this already told the tracer and lockdep that interrupts are enabled. And contrary to the ugly version this exit path does not use rcu_irq_exit_preempt() which is there to warn about crappy RCU state when trying to schedule. So we went great length to sanitize _all_ of this and make it consistent just to say: screw it for that xen thingy. The extra checks and extra warnings for scheduling come with the guarantee to bitrot when idtentry_exit() or any logic invoked from there is changed. It's going to look like this: /* * If the below causes problems due to inconsistent state * or out of sync sanity checks, please complain to * luto@kernel.org directly. */ idtentry_exit(); if (user_mode(regs) || !(regs->flags & X86_FlAGS_IF)) return; if (!__this_cpu_read(xen_in_preemptible_hcall)) return; rcu_sanity_check_for_preemption(); if (need_resched()) { instrumentation_begin(); xen_maybe_preempt_hcall(); trace_hardirqs_on(); instrumentation_end(); } Of course you need the extra rcu_sanity_check_for_preemption() function just for this muck. That's a true win on all ends? I don't think so. Thanks, tglx