Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp972690iob; Fri, 13 May 2022 18:11:24 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzYM4lrkfKMInNpGUjZatBeV5TWkpdnF2Adyj2KFILFVaFSxPSg5BsFgG33miiTb6XWg6Tp X-Received: by 2002:a05:600c:1c95:b0:394:709f:a0db with SMTP id k21-20020a05600c1c9500b00394709fa0dbmr6963581wms.82.1652490683923; Fri, 13 May 2022 18:11:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652490683; cv=none; d=google.com; s=arc-20160816; b=ObPaAQF++McieMIRkJf+uPU8U8ZsYwqoD/fz0bvmVRG++7dR8jW55R3WofBLoCr455 xbfW5qYGB7sEuZSi6srE12lpR9bhNB8uqDU3RAuvlXkzF4Q5bgZy8xKAfEg8Z3CPWryr Kfmi5zEhwxj5EKWTOCv6SBs0EPEm1adUsp4IoSc5tHnG5Uewz/kDam4BEf/eLyhMn6M9 7xG7w7c7Y6NuM6sMqkqR0IyQs6eUS8bd7sH/wW4aNDAzFE2N/iGcNxVUi9Blic8Uw/Eb B5NokSJu55YyNW7Yt6hgaPMfRl4wK18JfPpbWvcqxSB/RuH5HAsKl7SmgzBarmdCMYrP +pDg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=7qvuAI/LbHot+Z3zgvkne2FOb1LdGua9+c4HxvF2hBM=; b=j9JjhBMKDkqsH/qsEGwtL8SNKQm9IYsylHCr2AZiVXW+mAAengILioqJ2ou8Sw0sWo Q6D2A3AVLIhUZtEmZNNwNW2Guqc8FENMpIMbHCO8O+fXFXHSc7LbDqYgsS3ru7ysb9u8 pDEqjl5G53wXBrhnUmWrDMHcSCEO3BhIs9EV0+p55b128EqMETa8Z4AUAHeoS6pYjtyt B3AHDdEHwdX3vc+NdDVOM48aw8emu9WEryHI7heXrqch9uY6b/sJhsz2mL41KeQ0sv49 9yLor6nWoss/upaI0cMKqmahJUZmx6kR7G+kAtTpnpuyyDAGs9aJ4sIJtJg/PXUAkw0w mFeQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="eFd/xCYa"; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id z16-20020adfdf90000000b0020aeb9e5c03si3670442wrl.650.2022.05.13.18.11.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 May 2022 18:11:23 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="eFd/xCYa"; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B46EA39783F; Fri, 13 May 2022 16:42:50 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1359675AbiELX5k (ORCPT + 99 others); Thu, 12 May 2022 19:57:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59854 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1359668AbiELX5i (ORCPT ); Thu, 12 May 2022 19:57:38 -0400 Received: from mail-oi1-x233.google.com (mail-oi1-x233.google.com [IPv6:2607:f8b0:4864:20::233]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 449F0554B1 for ; Thu, 12 May 2022 16:57:36 -0700 (PDT) Received: by mail-oi1-x233.google.com with SMTP id v66so8356954oib.3 for ; Thu, 12 May 2022 16:57:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=7qvuAI/LbHot+Z3zgvkne2FOb1LdGua9+c4HxvF2hBM=; b=eFd/xCYa6a0vzCzsVHnwJRF3FcWjsKpaBwxGVr+x8Kljpbk1guwoVq/BIW1TmRrCTS olJB7RCMwpzv8ljkm8YSQgiP/n9IIyVjyXUqfIJKB1MoZns6czGr/qBXwEt73KzEqZAJ KD/QrHtZiEM8I/AS2WZAMjPXv2Dcz5MNOJMEe/CiJWMrdE1z0a4a7NBs1uB22beLr/I/ R5AylckCoPqjFPhax1zG3QtKqJkq8dj8xwYX0yf1e2fEmRLH63JG8f+PtAO7h30OEaxj s6Li+r6vbJaqiPaJCOukutWCXWpU7q4b2zBp10r7kxFCneFhFHvMdnoaj0GqyXU3J3ol igrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=7qvuAI/LbHot+Z3zgvkne2FOb1LdGua9+c4HxvF2hBM=; b=279jtzwoq6eLDL2w/ZY8mRBecd3RGB33rz/2UoKMVSw8agEdnONk6s3raGTkBucVox B22Fa2kYWwoMaU1c7sAU0gUtCpI5EtiMN+m8xgMloNUVkbPUgESNzfNC1h98BDWYXDU1 jMtfN3C6GUhMG7dxSxcPtelN097dTj+l4i40yvjzjUiOnHvjFHaV1cxRQ1W5TY3y8HDK /L3kJP7b1aDOX8SDFzvDFDiQRkCrYDdU4RCoDCZRucks+ow2sgBsQPxYKjqgVwmI/DMI I10kJw2ZUnVqlflwBQWpabsRekp9jonXU9ECF006gh/kTKL+8QRyr8isTQkfQekjYFYr jI1g== X-Gm-Message-State: AOAM532NwexPLCIsO+gXuEVe3kl8DusIM04uwi6FGgyTvpuR5SXRgfp4 iAhUbMEyYo1JsHwkrBgCE8LMc1ZgWhWM7jmTjP4Oew== X-Received: by 2002:a05:6808:c2:b0:325:eb71:7266 with SMTP id t2-20020a05680800c200b00325eb717266mr6676460oic.269.1652399855370; Thu, 12 May 2022 16:57:35 -0700 (PDT) MIME-Version: 1.0 References: <20220512184514.15742-1-jon@nutanix.com> <07BEC8B1-469C-4E36-AE92-90BFDF93B2C4@nutanix.com> In-Reply-To: From: Jim Mattson Date: Thu, 12 May 2022 16:57:24 -0700 Message-ID: Subject: Re: [PATCH v4] x86/speculation, KVM: remove IBPB on vCPU load To: Jon Kohler Cc: Sean Christopherson , Jonathan Corbet , Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , X86 ML , "H. Peter Anvin" , Kees Cook , Andrea Arcangeli , Josh Poimboeuf , Kim Phillips , Lukas Bulwahn , Peter Zijlstra , Ashok Raj , KarimAllah Ahmed , David Woodhouse , "linux-doc@vger.kernel.org" , LKML , "kvm @ vger . kernel . org" , Waiman Long Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 12, 2022 at 1:34 PM Jon Kohler wrote: > > > > > On May 12, 2022, at 4:27 PM, Jim Mattson wrote: > > > > On Thu, May 12, 2022 at 1:07 PM Sean Christopherson = wrote: > >> > >> On Thu, May 12, 2022, Jon Kohler wrote: > >>> > >>> > >>>> On May 12, 2022, at 3:35 PM, Sean Christopherson = wrote: > >>>> > >>>> On Thu, May 12, 2022, Sean Christopherson wrote: > >>>>> On Thu, May 12, 2022, Jon Kohler wrote: > >>>>>> Remove IBPB that is done on KVM vCPU load, as the guest-to-guest > >>>>>> attack surface is already covered by switch_mm_irqs_off() -> > >>>>>> cond_mitigation(). > >>>>>> > >>>>>> The original commit 15d45071523d ("KVM/x86: Add IBPB support") was= simply > >>>>>> wrong in its guest-to-guest design intention. There are three scen= arios > >>>>>> at play here: > >>>>> > >>>>> Jim pointed offline that there's a case we didn't consider. When s= witching between > >>>>> vCPUs in the same VM, an IBPB may be warranted as the tasks in the = VM may be in > >>>>> different security domains. E.g. the guest will not get a notifica= tion that vCPU0 is > >>>>> being swapped out for vCPU1 on a single pCPU. > >>>>> > >>>>> So, sadly, after all that, I think the IBPB needs to stay. But the= documentation > >>>>> most definitely needs to be updated. > >>>>> > >>>>> A per-VM capability to skip the IBPB may be warranted, e.g. for con= tainer-like > >>>>> use cases where a single VM is running a single workload. > >>>> > >>>> Ah, actually, the IBPB can be skipped if the vCPUs have different mm= _structs, > >>>> because then the IBPB is fully redundant with respect to any IBPB pe= rformed by > >>>> switch_mm_irqs_off(). Hrm, though it might need a KVM or per-VM kno= b, e.g. just > >>>> because the VMM doesn't want IBPB doesn't mean the guest doesn't wan= t IBPB. > >>>> > >>>> That would also sidestep the largely theoretical question of whether= vCPUs from > >>>> different VMs but the same address space are in the same security do= main. It doesn't > >>>> matter, because even if they are in the same domain, KVM still needs= to do IBPB. > >>> > >>> So should we go back to the earlier approach where we have it be only > >>> IBPB on always_ibpb? Or what? > >>> > >>> At minimum, we need to fix the unilateral-ness of all of this :) sinc= e we=E2=80=99re > >>> IBPB=E2=80=99ing even when the user did not explicitly tell us to. > >> > >> I think we need separate controls for the guest. E.g. if the userspac= e VMM is > >> sufficiently hardened then it can run without "do IBPB" flag, but that= doesn't > >> mean that the entire guest it's running is sufficiently hardened. > >> > >>> That said, since I just re-read the documentation today, it does spec= ifically > >>> suggest that if the guest wants to protect *itself* it should turn on= IBPB or > >>> STIBP (or other mitigations galore), so I think we end up having to t= hink > >>> about what our =E2=80=9Ccontract=E2=80=9D is with users who host thei= r workloads on > >>> KVM - are they expecting us to protect them in any/all cases? > >>> > >>> Said another way, the internal guest areas of concern aren=E2=80=99t = something > >>> the kernel would always be able to A) identify far in advance and B) > >>> always solve on the users behalf. There is an argument to be made > >>> that the guest needs to deal with its own house, yea? > >> > >> The issue is that the guest won't get a notification if vCPU0 is repla= ced with > >> vCPU1 on the same physical CPU, thus the guest doesn't get an opportun= ity to emit > >> IBPB. Since the host doesn't know whether or not the guest wants )IBP= B, unless the > >> owner of the host is also the owner of the guest workload, the safe ap= proach is to > >> assume the guest is vulnerable. > > > > Exactly. And if the guest has used taskset as its mitigation strategy, > > how is the host to know? > > Yea thats fair enough. I posed a solution on Sean=E2=80=99s response just= as this email > came in, would love to know your thoughts (keying off MSR bitmap). > I don't believe this works. The IBPBs in cond_mitigation (static in arch/x86/mm/tlb.c) won't be triggered if the guest has given its sensitive tasks exclusive use of their cores. And, if performance is a concern, that is exactly what someone would do.