Received: by 10.213.65.68 with SMTP id h4csp1860741imn; Thu, 5 Apr 2018 05:11:17 -0700 (PDT) X-Google-Smtp-Source: AIpwx49jl1ejmdt3IOXHOfkIkJ6onLA/Bx0mkNF66FZQFiLYWoTQstd0FPkdetRZkvEHTGLBYgsv X-Received: by 2002:a17:902:2f03:: with SMTP id s3-v6mr23524212plb.274.1522930277068; Thu, 05 Apr 2018 05:11:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522930277; cv=none; d=google.com; s=arc-20160816; b=QvUwjKIuTfetRs32YmbURbmuwf4i6z39qwAOB7p0NipTe8Ir7DIM9DdsygfYdb7zLi w3hzotY7RsorCTqIi/MjWHSv2tgY5QgwoPZVUSJ3S+f6q7I7wvPvRhTMRMJ8p9gePyy8 h5PE1TLWd9e9B20WfnzW/W9jvxozB7jhXnspjC4GSIsJwzLd8h+FNx8Clk/dqV2kZ7z+ r1xHkFaFFa4R54mx1o6HqJ0gAlkLI+nKNrfc1Q9yivnadE1Je89/UP2aYKJHcNRkoYgo f1jEedSagwez9CDKp2v7mO2JtUgjZZdw63dOs756joOdvi8YUbAz13N5HTIlumSx7FYq sDsg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:thread-index:thread-topic :content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:arc-authentication-results; bh=+g9hD4Eq4r+b8OO+XaR4VVfsTmbN+ao7Un9H0BvGKgA=; b=IMlK9NQRna/IafTGTSSnjiDsDO+mazpbcD0a2/NewqECwq19NfQZSRWlbpHZJ3wtN4 tVW1RmSdfEiUpolafaLSS0aumhEUfF6g8SFfbmNstQaioN+TokAgH71KqM8plU1dyOPF DZ4SQUeciKcygCICxBmEoW0aCLpIOCo10tVocY6t0VvC/41eSjM3IdgOJWKzAOLtfCoU SxagQf3ZlVUc927iGQt2Fyitw3D8yu2UKcV7BtDOWeufpF5gLuKl9a0qAj/AJyjqu3xg TXRULlk2zuls0m76vhIKp5PxLxJ7hhf0Ep2MMn9gkfwSFGazVJ4l9d/DOnabVgXYovxE E6fQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 3-v6si5857316plt.98.2018.04.05.05.10.38; Thu, 05 Apr 2018 05:11:17 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751414AbeDEMJb (ORCPT + 99 others); Thu, 5 Apr 2018 08:09:31 -0400 Received: from mx1.redhat.com ([209.132.183.28]:51176 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751262AbeDEMJ3 (ORCPT ); Thu, 5 Apr 2018 08:09:29 -0400 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9035E9D4F4; Thu, 5 Apr 2018 12:09:29 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 09B447E21E; Thu, 5 Apr 2018 12:09:26 +0000 (UTC) Received: from zmail21.collab.prod.int.phx2.redhat.com (zmail21.collab.prod.int.phx2.redhat.com [10.5.83.24]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 6B1511800C9F; Thu, 5 Apr 2018 12:09:26 +0000 (UTC) Date: Thu, 5 Apr 2018 08:09:26 -0400 (EDT) From: Pankaj Gupta To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, qemu-devel@nongnu.org, linux-nvdimm@ml01.01.org, kwolf@redhat.com, haozhong zhang , jack@suse.cz, xiaoguangrong eric , riel@surriel.com, niteshnarayanlal@hotmail.com, mst@redhat.com, ross zwisler , hch@infradead.org, stefanha@redhat.com, imammedo@redhat.com, marcel@redhat.com, pbonzini@redhat.com, dan j williams , nilal@redhat.com Message-ID: <416823501.16310251.1522930166070.JavaMail.zimbra@redhat.com> In-Reply-To: References: <20180405104834.10457-1-pagupta@redhat.com> <20180405104834.10457-4-pagupta@redhat.com> Subject: Re: [Qemu-devel] [RFC] qemu: Add virtio pmem device MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [10.65.193.127, 10.4.195.11] Thread-Topic: qemu: Add virtio pmem device Thread-Index: plmM33DI7OCC3mJ7kViuFHcnLcEguQ== X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Thu, 05 Apr 2018 12:09:29 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi David, > > This patch adds virtio-pmem Qemu device. > > > > This device configures memory address range information with file > > backend type. It acts like persistent memory device for KVM guest. > > It presents the memory address range to virtio-pmem driver over > > virtio channel and does the block flush whenever there is request > > from guest to flush/sync. (Qemu part for backing file flush > > is yet to be implemented). > > > > Current code is a RFC to support guest with persistent memory > > range & DAX. > > > > Signed-off-by: Pankaj Gupta > > --- > > hw/virtio/Makefile.objs | 2 +- > > hw/virtio/virtio-pci.c | 44 +++++++++ > > hw/virtio/virtio-pci.h | 14 +++ > > hw/virtio/virtio-pmem.c | 133 > > ++++++++++++++++++++++++++++ > > include/hw/pci/pci.h | 1 + > > include/hw/virtio/virtio-pmem.h | 43 +++++++++ > > include/standard-headers/linux/virtio_ids.h | 1 + > > 7 files changed, 237 insertions(+), 1 deletion(-) > > create mode 100644 hw/virtio/virtio-pmem.c > > create mode 100644 include/hw/virtio/virtio-pmem.h > > > > diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs > > index 765d363c1f..bb5573d2ef 100644 > > --- a/hw/virtio/Makefile.objs > > +++ b/hw/virtio/Makefile.objs > > @@ -5,7 +5,7 @@ common-obj-y += virtio-bus.o > > common-obj-y += virtio-mmio.o > > > > obj-y += virtio.o virtio-balloon.o > > -obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o > > +obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o virtio-pmem.o > > obj-$(CONFIG_VHOST_VSOCK) += vhost-vsock.o > > obj-y += virtio-crypto.o > > obj-$(CONFIG_VIRTIO_PCI) += virtio-crypto-pci.o > > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c > > index c20537f31d..114ca05497 100644 > > --- a/hw/virtio/virtio-pci.c > > +++ b/hw/virtio/virtio-pci.c > > @@ -2491,6 +2491,49 @@ static const TypeInfo virtio_rng_pci_info = { > > .class_init = virtio_rng_pci_class_init, > > }; > > > > +/* virtio-pmem-pci */ > > + > > +static void virtio_pmem_pci_realize(VirtIOPCIProxy *vpci_dev, Error > > **errp) > > +{ > > + VirtIOPMEMPCI *vpmem = VIRTIO_PMEM_PCI(vpci_dev); > > + DeviceState *vdev = DEVICE(&vpmem->vdev); > > + > > + qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus)); > > + object_property_set_bool(OBJECT(vdev), true, "realized", errp); > > +} > > + > > +static void virtio_pmem_pci_class_init(ObjectClass *klass, void *data) > > +{ > > + DeviceClass *dc = DEVICE_CLASS(klass); > > + VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass); > > + PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass); > > + k->realize = virtio_pmem_pci_realize; > > + set_bit(DEVICE_CATEGORY_MISC, dc->categories); > > + pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET; > > + pcidev_k->device_id = PCI_DEVICE_ID_VIRTIO_PMEM; > > + pcidev_k->revision = VIRTIO_PCI_ABI_VERSION; > > + pcidev_k->class_id = PCI_CLASS_OTHERS; > > +} > > + > > +static void virtio_pmem_pci_instance_init(Object *obj) > > +{ > > + VirtIOPMEMPCI *dev = VIRTIO_PMEM_PCI(obj); > > + > > + virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev), > > + TYPE_VIRTIO_PMEM); > > + object_property_add_alias(obj, "memdev", OBJECT(&dev->vdev), "memdev", > > + &error_abort); > > +} > > + > > +static const TypeInfo virtio_pmem_pci_info = { > > + .name = TYPE_VIRTIO_PMEM_PCI, > > + .parent = TYPE_VIRTIO_PCI, > > + .instance_size = sizeof(VirtIOPMEMPCI), > > + .instance_init = virtio_pmem_pci_instance_init, > > + .class_init = virtio_pmem_pci_class_init, > > +}; > > + > > + > > /* virtio-input-pci */ > > > > static Property virtio_input_pci_properties[] = { > > @@ -2683,6 +2726,7 @@ static void virtio_pci_register_types(void) > > type_register_static(&virtio_balloon_pci_info); > > type_register_static(&virtio_serial_pci_info); > > type_register_static(&virtio_net_pci_info); > > + type_register_static(&virtio_pmem_pci_info); > > #ifdef CONFIG_VHOST_SCSI > > type_register_static(&vhost_scsi_pci_info); > > #endif > > diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h > > index 813082b0d7..fe74fcad3f 100644 > > --- a/hw/virtio/virtio-pci.h > > +++ b/hw/virtio/virtio-pci.h > > @@ -19,6 +19,7 @@ > > #include "hw/virtio/virtio-blk.h" > > #include "hw/virtio/virtio-net.h" > > #include "hw/virtio/virtio-rng.h" > > +#include "hw/virtio/virtio-pmem.h" > > #include "hw/virtio/virtio-serial.h" > > #include "hw/virtio/virtio-scsi.h" > > #include "hw/virtio/virtio-balloon.h" > > @@ -57,6 +58,7 @@ typedef struct VirtIOInputHostPCI VirtIOInputHostPCI; > > typedef struct VirtIOGPUPCI VirtIOGPUPCI; > > typedef struct VHostVSockPCI VHostVSockPCI; > > typedef struct VirtIOCryptoPCI VirtIOCryptoPCI; > > +typedef struct VirtIOPMEMPCI VirtIOPMEMPCI; > > > > /* virtio-pci-bus */ > > > > @@ -274,6 +276,18 @@ struct VirtIOBlkPCI { > > VirtIOBlock vdev; > > }; > > > > +/* > > + * virtio-pmem-pci: This extends VirtioPCIProxy. > > + */ > > +#define TYPE_VIRTIO_PMEM_PCI "virtio-pmem-pci" > > +#define VIRTIO_PMEM_PCI(obj) \ > > + OBJECT_CHECK(VirtIOPMEMPCI, (obj), TYPE_VIRTIO_PMEM_PCI) > > + > > +struct VirtIOPMEMPCI { > > + VirtIOPCIProxy parent_obj; > > + VirtIOPMEM vdev; > > +}; > > + > > /* > > * virtio-balloon-pci: This extends VirtioPCIProxy. > > */ > > diff --git a/hw/virtio/virtio-pmem.c b/hw/virtio/virtio-pmem.c > > new file mode 100644 > > index 0000000000..28d06fc501 > > --- /dev/null > > +++ b/hw/virtio/virtio-pmem.c > > @@ -0,0 +1,133 @@ > > +/* > > + * Virtio pmem device > > + * > > + */ > > + > > + > > +#include "qemu/osdep.h" > > +#include "qapi/error.h" > > +#include "qemu-common.h" > > +#include "qemu/error-report.h" > > +#include "hw/virtio/virtio-pmem.h" > > + > > + > > +static void virtio_pmem_system_reset(void *opaque) > > +{ > > + > > +} > > + > > +static void virtio_pmem_flush(VirtIODevice *vdev, VirtQueue *vq) > > +{ > > + VirtQueueElement *elem; > > + > > + elem = virtqueue_pop(vq, sizeof(VirtQueueElement)); > > + if (!elem) { > > + return; > > + } > > + /* todo flush raw file */ > > + > > + virtio_notify(vdev, vq); > > + g_free(elem); > > + > > +} > > + > > +static void virtio_pmem_get_config(VirtIODevice *vdev, uint8_t *config) > > +{ > > + VirtIOPMEM *pmem = VIRTIO_PMEM(vdev); > > + struct virtio_pmem_config *pmemcfg = (struct virtio_pmem_config *) > > config; > > + > > + pmemcfg->start = pmem->start; > > + pmemcfg->size = pmem->size; > > + pmemcfg->align = pmem->align; > > +} > > + > > +static uint64_t virtio_pmem_get_features(VirtIODevice *vdev, uint64_t > > features, > > + Error **errp) > > +{ > > + virtio_add_feature(&features, VIRTIO_PMEM_PLUG); > > + return features; > > +} > > + > > + > > +static void virtio_pmem_realize(DeviceState *dev, Error **errp) > > +{ > > + VirtIODevice *vdev = VIRTIO_DEVICE(dev); > > + VirtIOPMEM *pmem = VIRTIO_PMEM(dev); > > + MachineState *ms = MACHINE(qdev_get_machine()); > > + MemoryRegion *mr; > > + PCMachineState *pcms = PC_MACHINE(object_dynamic_cast(OBJECT(ms), > > TYPE_PC_MACHINE)); > > + uint64_t addr; > > + > > + if (!pmem->memdev) { > > + error_setg(errp, "virtio-pmem not set"); > > + return; > > + } > > + > > + mr = host_memory_backend_get_memory(pmem->memdev, errp); > > + addr = pcms->hotplug_memory.base; > > + pmem->start = addr; > > + pmem->size = memory_region_size(mr); > > + pmem->align = memory_region_get_alignment(mr); > > + > > + memory_region_init_alias(&pmem->mr, OBJECT(ms), > > + "virtio_pmem-memory", mr, 0, pmem->size); > > + > > + host_memory_backend_set_mapped(pmem->memdev, true); > > + virtio_init(vdev, TYPE_VIRTIO_PMEM, VIRTIO_ID_PMEM, > > + sizeof(struct virtio_pmem_config)); > > + > > + pmem->rq_vq = virtio_add_queue(vdev, 128, virtio_pmem_flush); > > + qemu_register_reset(virtio_pmem_system_reset, pmem); > > > So right now you're just using some memdev for testing. yes. > > I assume that the memory region we will provide to the guest will be a > simple memory mapped raw file. Dirty tracking (using the kvm slot) will > be used to detect which blocks actually changed and have to be flushed > to disk. Not really, we will perform fsync on raw file. As this file is created on regular storage and not nvdimm, so host page cache radix tree would have the dirty pages information which will be used for fsync. > > Will this raw file already have the "disk information header" (no idea > how that stuff is called) encoded? Are there any plans/possible ways to > > a) automatically create the headers? (if that's even possible) Its raw. Right now we are just supporting raw format. As this is direct mapping of memory into guest address space, I don't think we can have an abstraction of headers for block specific features. Or may be we can get opinion of others(Qemu block people) it is at all possible? > b) support anything but raw files? > > Please note that under x86, a KVM memory slot still has a (in my > opinion) fairly big overhead depending on the size of the slot (rmap, > page_track). We might have to optimize that. I have not tried/observed this. Right now I just used single memory slot and cold add few MB's of memory in Qemu. Can you please provide more details on this? > > -- > > Thanks, > > David / dhildenb > >