Received: by 2002:a5d:9c59:0:0:0:0:0 with SMTP id 25csp2127711iof; Tue, 7 Jun 2022 20:40:00 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxp6g7CfEYfE36ObYfgR1EGGS0UoF0KC36YXho3di4Cjj4yifVINVI2W3io7HUtJ073UbhZ X-Received: by 2002:a05:6a00:1306:b0:512:ca3d:392f with SMTP id j6-20020a056a00130600b00512ca3d392fmr100368809pfu.79.1654659599947; Tue, 07 Jun 2022 20:39:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1654659599; cv=none; d=google.com; s=arc-20160816; b=eq1SrOPoX76axoTCxPFlghwng9bziqmzgqMdZdfUOpEv3AmG9Gi7IVSklYVIeC8K74 m2mg+lk7mn5hb/by6OIAK11B2RSJnxXBRjphPso1eNawImH0qUZNMgLJHKJKYI3N3lqO FJuea+tQoEXEnD5Jg/EBog+AUSTiRkoz11jktCq27jcIob4RJZ09wn/DiI9SP5bWbswh BLrLl/giwd6HaVHfhh5bDraTR0s1DTjKBRTuOU8M1M6Kj8ySCR/uHyvX1j+LLrerhsmk XTAzDRJx4/00UJ1x5Sy3OTF8iaH7w481FeJaWpVoENpnjaB2PCR+pE4PhexLCIAwX/rq CPsQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=c3aEghlIFYXizBn7fhAGiu6GPETp/GU2W/Dl4rLj3u8=; b=b79Rk99MhRha+vybT7bEGcACtjbtcCuLoiSSfdZS8DkJjPTq2W/vi8tFvW9fI5NJih JOC+c4x7NRnsNptV3D6iahOBzEIK9UdM7mG4LSANUWFehTIV6eGY9umv9n9Z6ebV8cL7 afn6gWxvL+x8UIITmlfTxWhEzws17KzGW8K+cukI566vaf5R5rX71a30nnK5sNhqhxEX 5kg5pULgoBMWG0GhKWyA/4FPQzbJp4mP3odTZ3+Zqx/H4ycZqEY2AtQ+PvvXhEcQup0c Zmy6n4N4kKou9eqYFAAySXU02GdA6TyLCUx9tz4vqMsY3ZOzLqXMqCiS7+AtPgZ8QgSg souw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ENOADYxc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id y7-20020a170902d64700b001639f0ff47esi25149036plh.40.2022.06.07.20.39.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jun 2022 20:39:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ENOADYxc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id A0EBC27F450; Tue, 7 Jun 2022 19:57:32 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1382256AbiFGWdL (ORCPT + 99 others); Tue, 7 Jun 2022 18:33:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47450 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1380424AbiFGVQS (ORCPT ); Tue, 7 Jun 2022 17:16:18 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 907D46FA1F for ; Tue, 7 Jun 2022 11:55:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1654628105; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=c3aEghlIFYXizBn7fhAGiu6GPETp/GU2W/Dl4rLj3u8=; b=ENOADYxcN+DRtiEGUKXWpGVTWOvrJ+YACtCDSzviA0f45R8wahAhVSUTEH9G43nAb88HzB gd2SIbyk9PZw6G3rDT8dEMn5qS0GKaEcfPKIamgxEmgi2EW48DRcMyM1k++ktatj6ScUzX pE2wqWCMf+QdhsAi6VTd+GgKWG/UHZs= Received: from mail-il1-f199.google.com (mail-il1-f199.google.com [209.85.166.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-110-SvZRZizaOM6tEb9QUeDRoQ-1; Tue, 07 Jun 2022 14:55:05 -0400 X-MC-Unique: SvZRZizaOM6tEb9QUeDRoQ-1 Received: by mail-il1-f199.google.com with SMTP id a3-20020a924443000000b002d1bc79da14so14334434ilm.15 for ; Tue, 07 Jun 2022 11:55:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=c3aEghlIFYXizBn7fhAGiu6GPETp/GU2W/Dl4rLj3u8=; b=X4lebnES4QdSdzB6ngT+z6rI/+IYAjxdl8nNjovARicoaFQ/SFCWoComIDAPQa2xF8 cnLYsPG426uA5zboQzA+JfmPcHSdc07ODMNtVEKgDYbvrTOYq1G9rWZ1xXKemm9H8UhJ YnfLl+IzffYyUd7fuPa8TitXr1TzPcOH4TopRs8n3IZVLTwNQN0aCGORlyugHYDwPx9C Him7DbTMfLG5SMGWYZzG5VRfg7257SVWdCBIIRZGLk6yYDpBj18m/H9vheyA8ZmTtoMt aNUDCrs0++TZjqpOMYiIJ5WZXulRZgbXG5AlxI44JHc0peoYjxuhagFvA4fWTSvb/nZI pwzA== X-Gm-Message-State: AOAM5328PRxJH4g9eaT9NxJZ92naXI99XmAXWSLapparmnvOuJrMxT4H JPGw7Bpn8mu6oFcF460WhBCNf3uDkd4/Gm8fCkSflfRsKM13V01DVn1jUHSBrRljOp0W6p/jICs 1DOy8nl/EIeBQxGI8V4w6GDOU X-Received: by 2002:a92:c891:0:b0:2d3:a221:ad70 with SMTP id w17-20020a92c891000000b002d3a221ad70mr17415282ilo.99.1654628104167; Tue, 07 Jun 2022 11:55:04 -0700 (PDT) X-Received: by 2002:a92:c891:0:b0:2d3:a221:ad70 with SMTP id w17-20020a92c891000000b002d3a221ad70mr17415270ilo.99.1654628103881; Tue, 07 Jun 2022 11:55:03 -0700 (PDT) Received: from xz-m1.local (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id k14-20020a928e4e000000b002d54d827007sm1580963ilh.17.2022.06.07.11.55.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jun 2022 11:55:03 -0700 (PDT) Date: Tue, 7 Jun 2022 14:55:01 -0400 From: Peter Xu To: Sean Christopherson Cc: Paolo Bonzini , Sasha Levin , linux-kernel@vger.kernel.org, stable@vger.kernel.org, Leonardo Bras , tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, chang.seok.bae@intel.com, luto@kernel.org, kvm@vger.kernel.org Subject: Re: [PATCH AUTOSEL 5.16 07/28] x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0 Message-ID: References: <20220301201344.18191-1-sashal@kernel.org> <20220301201344.18191-7-sashal@kernel.org> <5f2b7b93-d4c9-1d59-14df-6e8b2366ca8a@redhat.com> <2d9ba70b-ac18-a461-7a57-22df2c0165c6@redhat.com> <9d336622-6964-454a-605f-1ca90b902836@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 07, 2022 at 02:17:54PM -0400, Peter Xu wrote: > On Tue, Jun 07, 2022 at 03:04:27PM +0000, Sean Christopherson wrote: > > On Tue, Jun 07, 2022, Paolo Bonzini wrote: > > > On 6/6/22 23:27, Peter Xu wrote: > > > > On Mon, Jun 06, 2022 at 06:18:12PM +0200, Paolo Bonzini wrote: > > > > > > However there seems to be something missing at least to me, on why it'll > > > > > > fail a migration from 5.15 (without this patch) to 5.18 (with this patch). > > > > > > In my test case, user_xfeatures will be 0x7 (FP|SSE|YMM) if without this > > > > > > patch, but 0x0 if with it. > > > > > > > > > > What CPU model are you using for the VM? > > > > > > > > I didn't specify it, assuming it's qemu64 with no extra parameters. > > > > > > Ok, so indeed it lacks AVX and this patch can have an effect. > > > > > > > > For example, if the source lacks this patch but the destination has it, > > > > > the source will transmit YMM registers, but the destination will fail to > > > > > set them if they are not available for the selected CPU model. > > > > > > > > > > See the commit message: "As a bonus, it will also fail if userspace tries to > > > > > set fpu features (with the KVM_SET_XSAVE ioctl) that are not compatible to > > > > > the guest configuration. Such features will never be returned by > > > > > KVM_GET_XSAVE or KVM_GET_XSAVE2." > > > > > > > > IIUC you meant we should have failed KVM_SET_XSAVE when they're not aligned > > > > (probably by failing validate_user_xstate_header when checking against the > > > > user_xfeatures on dest host). But that's probably not my case, because here > > > > KVM_SET_XSAVE succeeded, it's just that the guest gets a double fault after > > > > the precopy migration completes (or for postcopy when the switchover is > > > > done). > > > > > > Difficult to say what's happening without seeing at least the guest code > > > around the double fault (above you said "fail a migration" and I thought > > > that was a different scenario than the double fault), and possibly which was > > > the first exception that contributed to the double fault. > > > > Regardless of why the guest explodes in the way it does, is someone planning on > > bisecting this (if necessary?) and sending a backport to v5.15? There's another > > bug report that is more than likely hitting the same bug. > > What's the bisection you mentioned? I actually did a bisection and I also > checked reverting Leo's change can also fix this issue. Or do you mean > something else? Ah, I forgot to mention on the "stable tree decisions": IIUC it also means we should apply Leo's patch to all the stable trees if possible, then migrations between them won't trigger the misterous faults anymore, including when migrating to the latest Linux versions. However there's the delimma that other kernels (any kernel that does not have Leo's patch) will start to fail migrations to the stable branches that apply Leo's patch too.. So that's kind of a slight pity. It's just IIUC the stable trees are more important, because it should have a broader audience (most Linux distros)? > > > > > https://lore.kernel.org/all/48353e0d-e771-8a97-21d4-c65ff3bc4192@sentex.net > > That is kvm64, and I agree it could be the same problem since both qemu64 > and kvm64 models do not have any xsave feature bit declared in cpuid 0xd, > so potentially we could be migrating some fpu states to it even with > user_xfeatures==0 on dest host. > > So today I continued the investigation, and I think what's really missing > is qemu seems to be ignoring the user_xfeatures check for KVM_SET_XSAVE and > continues even if it returns -EINVAL. IOW, I'm wondering whether we should > fail properly and start to check kvm_arch_put_registers() retcode. But > that'll be a QEMU fix, and it'll at least not causing random faults > (e.g. double faults) in guest but we should fail the migration gracefully. > > Sean: a side note is that I can also easily trigger one WARN_ON_ONCE() in > your commit 98c25ead5eda5 in kvm_arch_vcpu_ioctl_run(): > > WARN_ON_ONCE(kvm_lapic_hv_timer_in_use(vcpu)); > > It'll be great if you'd like to check that up. > > Thanks, > > -- > Peter Xu -- Peter Xu