Received: by 2002:ac8:6d01:0:b0:423:7e07:f8e4 with SMTP id o1csp6920191qtt; Mon, 18 Dec 2023 10:16:21 -0800 (PST) X-Google-Smtp-Source: AGHT+IFjxkIzjlaGqKF/EdmPhc58rHQjy1PMfF5UJPhMP1K/BegSiwKn3FV1MfK+xlM6qV8abQaE X-Received: by 2002:a17:903:41cd:b0:1d0:7619:e8b6 with SMTP id u13-20020a17090341cd00b001d07619e8b6mr7939747ple.34.1702923381530; Mon, 18 Dec 2023 10:16:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702923381; cv=none; d=google.com; s=arc-20160816; b=YhMeDkJkyTQrkB+vY98H3Jx0a7lCp/FiifkSxdG2JaHbazua7lE3UCEwPK/dPHQ8JK CWbXMK4pXD8tyAtkyKeqPl7qaL3oSxOcjTdfCl5rOmLG3JYtBKZmftz1BHJFrHkzvCNA 7jOTyr3rPaSYBUgTdJGwoLIUzNjpHKlf3LzH4k+A38x1YJp6Pdy/yKn1Rup0KQUw/gNv Xobh4jM3mhvqBWQZmnXKCd3McX92l+r7nLsW156kKWQ4mnjGUwQ+oFCg6E+lmQH0yAjL JMMkm289KflKLKRKuS/UlUnsGcyMBUlasZ12BipUga7mVoC5EaWlE1NxIXE+TyCVHsZG oVdQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :dkim-signature; bh=DPggsM3NCSPNHuDW6MYkybcIgkppTmiteb7qARs7Z58=; fh=HIe9rvBdLc5SGW08Mx2EDs1H5ViFOK3fpqXDqCZviwY=; b=o7vXhgpmP2iDckVpYHz+u9opuihIlDGLffhCf1WLxxIvsx4N0TFaZWlB1RjdTzoTKn czgRhSdOYEfujFYiFYt4mpYGfWQZo7wuKg1b14OW9ptNVazJBK3fk8Vx2qufGCq0xHj3 8/u0lpB/ZkPO0O94Cbp0pt9aVKAn32se8XQA0OzO0H8y5ENxa/ZjkXDNxwgoyOZxSXeV yZ7DXQg7ISOumZz22Dxtsigi9HwSHOL4s1GtsdVDfzGm5qpzgcKnZVio2TASMjFTFl56 /xZDTTI+/wntXQvki+RBwnkk+Zw26qzvbMx4KDsnEbqxHjXN07OpEXVBaIJVVp/NAqSB aBow== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=BWtF+un6; spf=pass (google.com: domain of linux-kernel+bounces-4234-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-4234-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id be9-20020a170902aa0900b001cfc9a1915bsi18018090plb.234.2023.12.18.10.16.21 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Dec 2023 10:16:21 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-4234-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=BWtF+un6; spf=pass (google.com: domain of linux-kernel+bounces-4234-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-4234-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 6E55EB218F2 for ; Mon, 18 Dec 2023 18:14:37 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 794AF5D74F; Mon, 18 Dec 2023 18:14:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="BWtF+un6" X-Original-To: linux-kernel@vger.kernel.org Received: from mail-ed1-f46.google.com (mail-ed1-f46.google.com [209.85.208.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1F5045D733 for ; Mon, 18 Dec 2023 18:14:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Received: by mail-ed1-f46.google.com with SMTP id 4fb4d7f45d1cf-5534180f0e9so627a12.1 for ; Mon, 18 Dec 2023 10:14:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1702923263; x=1703528063; darn=vger.kernel.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=DPggsM3NCSPNHuDW6MYkybcIgkppTmiteb7qARs7Z58=; b=BWtF+un6752vqmkcElNRbra4G8AL7vWd0Q0/4iFu5IbKRN4M3aJ8E4KjKqp0TS6mE0 fjLc2KSlNOzadOiEl3Ik/m6tSwq//QFcSGxxm46KPA65++ucVdv53gDTNF/FRCX8Vq85 w6l3TYM6SAkgOkou9Xnes/rwDBDKfYOp3MJy1kI3rHIQE9t0jlRCzcR/dGuNCDEa0xP8 ksDXGasI+eaH/WtJTeBFH67Fe8mMsHS0Qsq3tTB33CDYfISQs56zpYZXnTOS/bZwdIvv egQpsHvQAG2WH6hmn9I5+gGBIpBLAxOrc+a0WrCUOclwug3BB+MCxu0N6dKzCyF7fTuF imJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702923263; x=1703528063; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=DPggsM3NCSPNHuDW6MYkybcIgkppTmiteb7qARs7Z58=; b=VIxkhUOUv9zcmcF3wL/J345nU+ghpQ0Z51l0XdHlqxx96jkR8okmOf5E5Us8VqzLRs rAiFBANzdIqzk371JtfY66UmIXHMM5Io9yfqwpAM8MSdHWvrhQ1kbj/bochxixkDHjs7 gO97dZLl6ea3tdIybNUqginRREo8p3l8PlZEHxKcgxVfpYK9tROBQZz1J6XEoKA+yK8g s2g3Mgz46rr2/+JBMpiRwekNKW9fQk9Z/WAcke3O+EU2Dsb+pPgTHeZBvzgKj4TXnude OpnJ6TYkokwqU1zfV9AUxIRjstmqOQUtFcx3C+kZnZnsGiAe/SXtGjuE/cpL3NEUgnNG IrJA== X-Gm-Message-State: AOJu0YxbDYfr76kA+2FfWv8QNY0uNUu8REhfOzbd2Ao+jc1vpJhTsBIC +qaS55lZbU+lZUg07KGXQdEKjT/K8G8DRcVTuXr1w2HtH88= X-Received: by 2002:a50:9e49:0:b0:553:62b4:5063 with SMTP id z67-20020a509e49000000b0055362b45063mr16629ede.4.1702923263119; Mon, 18 Dec 2023 10:14:23 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20231214103520.7198-1-yan.y.zhao@intel.com> In-Reply-To: From: Yiwei Zhang Date: Mon, 18 Dec 2023 10:14:09 -0800 Message-ID: Subject: Re: [RFC PATCH] KVM: Introduce KVM VIRTIO device To: Sean Christopherson Cc: Kevin Tian , Yan Y Zhao , "kvm@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "dri-devel@lists.freedesktop.org" , "pbonzini@redhat.com" , "olvaffe@gmail.com" , Zhiyuan Lv , Zhenyu Z Wang , Yongwei Ma , "vkuznets@redhat.com" , "wanpengli@tencent.com" , "jmattson@google.com" , "joro@8bytes.org" , "gurchetansingh@chromium.org" , "kraxel@redhat.com" Content-Type: text/plain; charset="UTF-8" > +Yiwei > > On Fri, Dec 15, 2023, Kevin Tian wrote: > > > From: Zhao, Yan Y > > > Sent: Thursday, December 14, 2023 6:35 PM > > > > > > - For host non-MMIO pages, > > > * virtio guest frontend and host backend driver should be synced to use > > > the same memory type to map a buffer. Otherwise, there will be > > > potential problem for incorrect memory data. But this will only impact > > > the buggy guest alone. > > > * for live migration, > > > as QEMU will read all guest memory during live migration, page aliasing > > > could happen. > > > Current thinking is to disable live migration if a virtio device has > > > indicated its noncoherent state. > > > As a follow-up, we can discuss other solutions. e.g. > > > (a) switching back to coherent path before starting live migration. > > > > both guest/host switching to coherent or host-only? > > > > host-only certainly is problematic if guest is still using non-coherent. > > > > on the other hand I'm not sure whether the host/guest gfx stack is > > capable of switching between coherent and non-coherent path in-fly > > when the buffer is right being rendered. > > > > > (b) read/write of guest memory with clflush during live migration. > > > > write is irrelevant as it's only done in the resume path where the > > guest is not running. > > > > > > > > Implementation Consideration > > > === > > > There is a previous series [1] from google to serve the same purpose to > > > let KVM be aware of virtio GPU's noncoherent DMA status. That series > > > requires a new memslot flag, and special memslots in user space. > > > > > > We don't choose to use memslot flag to request honoring guest memory > > > type. > > > > memslot flag has the potential to restrict the impact e.g. when using > > clflush-before-read in migration? > > Yep, exactly. E.g. if KVM needs to ensure coherency when freeing memory back to > the host kernel, then the memslot flag will allow for a much more targeted > operation. > > > Of course the implication is to honor guest type only for the selected slot > > in KVM instead of applying to the entire guest memory as in previous series > > (which selects this way because vmx_get_mt_mask() is in perf-critical path > > hence not good to check memslot flag?) > > Checking a memslot flag won't impact performance. KVM already has the memslot > when creating SPTEs, e.g. the sole caller of vmx_get_mt_mask(), make_spte(), has > access to the memslot. > > That isn't coincidental, KVM _must_ have the memslot to construct the SPTE, e.g. > to retrieve the associated PFN, update write-tracking for shadow pages, etc. > > I added Yiwei, who I think is planning on posting another RFC for the memslot > idea (I actually completely forgot that the memslot idea had been thought of and > posted a few years back). We've deferred to Yan (Intel side) to drive the userspace opt-in. So it's up to Yan to revise the series to be memslot flag based. I'm okay with what upstream folks think to be safer for the opt-in. Thanks! > > > Instead we hope to make the honoring request to be explicit (not tied to a > > > memslot flag). This is because once guest memory type is honored, not only > > > memory used by guest virtio device, but all guest memory is facing page > > > aliasing issue potentially. KVM needs a generic solution to take care of > > > page aliasing issue rather than counting on memory type of a special > > > memslot being aligned in host and guest. > > > (we can discuss what a generic solution to handle page aliasing issue will > > > look like in later follow-up series). > > > > > > On the other hand, we choose to introduce a KVM virtio device rather than > > > just provide an ioctl to wrap kvm_arch_[un]register_noncoherent_dma() > > > directly, which is based on considerations that > > > > I wonder it's over-engineered for the purpose. > > > > why not just introducing a KVM_CAP and allowing the VMM to enable? > > KVM doesn't need to know the exact source of requiring it... > > Agreed. If we end up needing to grant the whole VM access for some reason, just > give userspace a direct toggle.