Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp231427iob; Mon, 2 May 2022 17:57:51 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwKac+NFXxWTtWn/z+7Bqsm6YqWNm/heWOgES0B4rMRvqjavs9WvZpsJ/RZ47Ezwdsb/gGy X-Received: by 2002:a17:90a:e2cb:b0:1da:35d6:3a08 with SMTP id fr11-20020a17090ae2cb00b001da35d63a08mr2016281pjb.223.1651539471119; Mon, 02 May 2022 17:57:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651539471; cv=none; d=google.com; s=arc-20160816; b=R6TDV/cApuunb2TdbIwkIMRHycZ5g/YHvEzFr0HtOJCunemGuf34Jazms5fcy7UBS9 xgO0FH1efdglaT+IwHYFMGIV2biy5U75H00l9CzQahJk2xt0ikPJYfc7uGiTR1TEo/tG z8VSMvYddq9zIdNh+FCfXPTqytHIt/zcqaQkSoG51JTM0yG0QVL4AQKHCvRnM8JWLw6k +6bOyyzAWF7bvqeq/oB9dVW4YoKPi/Kdl5tc0xATydWYpU9rIG6GQni/pmur6waWKNkV s7NYfZ0wzxIiP7iF0NVlxaw96Dpg4uP8QtFFb9onToX4m0V9KIByCHiUKdZ+81GyA0eh Guzg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=kqZxY6yBPZveA61uE6RKmS9yZUetoL0bfmeRixrjLTg=; b=OJjvKsY/xduRFZ0K2dnZ8dtPwzT6jtnW60Hr2nXmeZU9PRiFXDfIHJCgqPvicmzsLT PLH6yojdeTjjOXJVNue5nZxPSNQ3/0kZbaTpWwq+hL9TSxGoHh4ff6Ev4xfweesqM9QS XNe0O8E8Og232jtSXBDumZMBDiVvhzLu+HsSQXjEJSg9cf1otGFskIyzPIxkRxDPK0qQ jfzGG2hB3/Cvdsm7xnIvDQBXc5MtmYFiPLQ2UE/j/46KGaEI2FNIXmZAvPxL+s1AEyPv xOREiiPlbnY0+R5EBXipfomM0YKeoPMf2zXLmHhcuL3w2uILsUCXsLEtT6R8K56+nwj5 oADQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@alien8.de header.s=dkim header.b=evyaV+G6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alien8.de Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id z18-20020a17090ab11200b001d9752b43absi701069pjq.137.2022.05.02.17.57.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 May 2022 17:57:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@alien8.de header.s=dkim header.b=evyaV+G6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alien8.de Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 86E394A3C8; Mon, 2 May 2022 17:43:41 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238878AbiD3JyY (ORCPT + 99 others); Sat, 30 Apr 2022 05:54:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37452 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1382561AbiD3JyN (ORCPT ); Sat, 30 Apr 2022 05:54:13 -0400 Received: from mail.skyhub.de (mail.skyhub.de [IPv6:2a01:4f8:190:11c2::b:1457]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 82AC92B246; Sat, 30 Apr 2022 02:50:48 -0700 (PDT) Received: from zn.tnic (p5de8eeb4.dip0.t-ipconnect.de [93.232.238.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.skyhub.de (SuperMail on ZX Spectrum 128k) with ESMTPSA id 095A31EC008F; Sat, 30 Apr 2022 11:50:42 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alien8.de; s=dkim; t=1651312243; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=kqZxY6yBPZveA61uE6RKmS9yZUetoL0bfmeRixrjLTg=; b=evyaV+G6u6Ol/KjT0r7zj64vReQq2CBtl9mkv7SngqtV6mfGRbWaYUIMi0g88kd0U0b0z6 G0d5cN9NtOo2RMiqgkqIx5z20zx6ASjDyPGq107JCbB4l5wSzlfgIlsC7OJfgda8YMxJyh 3pko2MzBxWg+jOEUMm1edlGPzheHWNQ= Date: Sat, 30 Apr 2022 11:50:40 +0200 From: Borislav Petkov To: Sean Christopherson Cc: Jon Kohler , Thomas Gleixner , Ingo Molnar , Dave Hansen , "x86@kernel.org" , "H. Peter Anvin" , Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Josh Poimboeuf , Peter Zijlstra , Balbir Singh , Kim Phillips , "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" , Andrea Arcangeli , Kees Cook , Waiman Long Subject: Re: [PATCH v3] x86/speculation, KVM: only IBPB for switch_mm_always_ibpb on vCPU load Message-ID: References: <20220422162103.32736-1-jon@nutanix.com> <645E4ED5-F6EE-4F8F-A990-81F19ED82BFA@nutanix.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 29, 2022 at 11:23:32PM +0000, Sean Christopherson wrote: > The host kernel is protected via RETPOLINE and by flushing the RSB immediately > after VM-Exit. Ah, right. > I don't know definitively. My guess is that IBPB is far too costly to do on every > exit, and so the onus was put on userspace to recompile with RETPOLINE. What I > don't know is why it wasn't implemented as an opt-out feature. Or, we could add the same logic on the exit path as in cond_mitigation() and test for LAST_USER_MM_IBPB when the host has selected switch_mm_cond_ibpb and thus allows for certain guests to be protected... Although, that use case sounds kinda meh: AFAIU, the attack vector here would be, protecting the guest from a malicious kernel. I guess this might matter for encrypted guests tho. > I'll write up the bits I have my head wrapped around. That's nice, thanks! > I don't know of any actual examples. But, it's trivially easy to create multiple > VMs in a single process, and so proving the negative that no one runs multiple VMs > in a single address space is essentially impossible. > > The container thing is just one scenario I can think of where userspace might > actually benefit from sharing an address space, e.g. it would allow backing the > image for large number of VMs with a single set of read-only VMAs. Why I keep harping about this: so let's say we eventually add something and then months, years from now we cannot find out anymore why that thing was added. We will likely remove it or start wasting time figuring out why that thing was added in the first place. This very questioning keeps popping up almost on a daily basis during refactoring so I'd like for us to be better at documenting *why* we're doing a certain solution or function or whatever. And this is doubly important when it comes to the hw mitigations because if you look at the problem space and all the possible ifs and whens and but(t)s, your head will spin in no time. So I'm really sceptical when there's not even a single actual use case to a proposed change. So Jon said something about oversubscription and a lot of vCPU switching. That should be there in the docs as the use case and explaining why dropping IBPB during vCPU switches is redundant. > I truly have no idea, which is part of the reason I brought it up in the first > place. I'd have happily just whacked KVM's IBPB entirely, but it seemed prudent > to preserve the existing behavior if someone went out of their way to enable > switch_mm_always_ibpb. So let me try to understand this use case: you have a guest and a bunch of vCPUs which belong to it. And that guest gets switched between those vCPUs and KVM does IBPB flushes between those vCPUs. So either I'm missing something - which is possible - but if not, that "protection" doesn't make any sense - it is all within the same guest! So that existing behavior was silly to begin with so we might just as well kill it. > Yes, or do it iff switch_mm_always_ibpb is enabled to maintain "compability". Yap, and I'm questioning the even smallest shred of reasoning for having that IBPB flush there *at* *all*. And here's the thing with documenting all that: we will document and say, IBPB between vCPU flushes is non-sensical. Then, when someone comes later and says, "but but, it makes sense because of X" and we hadn't thought about X at the time, we will change it and document it again and this way you'll have everything explicit there, how we arrived at the current situation and be able to always go, "ah, ok, that's why we did it." I hope I'm making some sense... -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette