Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp809167imu; Wed, 9 Jan 2019 06:47:58 -0800 (PST) X-Google-Smtp-Source: ALg8bN5+F7oeXvpa8GjubkaO5ck6RJcHfwwnGnamlmBPmerocI274Thuv9kmOeyUuJFdJ2xc15cB X-Received: by 2002:a62:53c5:: with SMTP id h188mr6161429pfb.190.1547045278567; Wed, 09 Jan 2019 06:47:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547045278; cv=none; d=google.com; s=arc-20160816; b=G0P5tYTXpNiIN1ju/+1UsmuvctLirTzgJA53vbhHwfGp8bRz+Vwcv52IlJRgq55wNZ LvDgvbVS6w3simeebcLJucg3kOWLi2tJLxz39KxAaDTB1ll7DbkI6WROXL/6V76sHSt6 N1QG89pWVa2qimx2s56E3suxXNGgWJX1teXEJnjQWi39Qcy4JFNRCpVsA/d6aD6VCVAr dfOZlOxSowNfsOlxxMSD8/R8aGjKXWMF+pKHV5PU7OewliINppjWchIbDy29MZ/rGmuE ZgIYavqC9ua/y8olUOqkvt6HG/OaprdtCT9rS7gLuVLpX5jt+NY7YRf02xtfKIyGJOhH 3U9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:thread-index:thread-topic :content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date; bh=6n+YZIJD4MwyNz9q9n3rBUq7GJr14VZKqnBeD2wE2YE=; b=dPbnOaZrJ9CZAn07NHy+8BZAFR57g96ZR8LLSEW+usr6sr9lOc190eFGy4/WJWPS5b myYLTtOHiXQHAX0qiO2BLrh4zTSu283fYbeTN8hU3OVKpyo+I/K+PGeFKM9GCf2QmyVy zXnyuHylSJZbH/pEnoNfQOddoAOCA52SWDD/lIJnRtySM7jO/lHx51UnzOAgdDTYpP5a 1ODjbzRfgBNEkgvyonpzRu/e6O+hzVlACUq0uDm3PWHuQGctoGmkYBqvyeaY0uuYks57 VOjlq+WlKCLAy3cYR2Om/Fdul6ysBLpsnog7iJGrVxzAk7bYJ0U5DnTqlSyhnhbc2CJD V2hg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f7si14929092pga.87.2019.01.09.06.47.42; Wed, 09 Jan 2019 06:47:58 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731892AbfAIOqL convert rfc822-to-8bit (ORCPT + 99 others); Wed, 9 Jan 2019 09:46:11 -0500 Received: from mx1.redhat.com ([209.132.183.28]:46124 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731708AbfAIOqK (ORCPT ); Wed, 9 Jan 2019 09:46:10 -0500 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A6D92C051671; Wed, 9 Jan 2019 14:46:09 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 6C0261001F5E; Wed, 9 Jan 2019 14:46:09 +0000 (UTC) Received: from zmail21.collab.prod.int.phx2.redhat.com (zmail21.collab.prod.int.phx2.redhat.com [10.5.83.24]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 0B5ED181BA1A; Wed, 9 Jan 2019 14:46:09 +0000 (UTC) Date: Wed, 9 Jan 2019 09:46:08 -0500 (EST) From: Pankaj Gupta To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, qemu-devel@nongnu.org, linux-nvdimm@ml01.01.org, linux-fsdevel@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-acpi@vger.kernel.org, linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org Cc: jack@suse.cz, david@redhat.com, jasowang@redhat.com, lcapitulino@redhat.com, adilger kernel , zwisler@kernel.org, dave jiang , darrick wong , vishal l verma , mst@redhat.com, willy@infradead.org, hch@infradead.org, jmoyer@redhat.com, nilal@redhat.com, riel@surriel.com, stefanha@redhat.com, imammedo@redhat.com, dan j williams , kwolf@redhat.com, tytso@mit.edu, xiaoguangrong eric , rjw@rjwysocki.net, pbonzini@redhat.com Message-ID: <1814830087.61221572.1547045168645.JavaMail.zimbra@redhat.com> In-Reply-To: <20190109135024.14093-1-pagupta@redhat.com> References: <20190109135024.14093-1-pagupta@redhat.com> Subject: Re: [Qemu-devel] [PATCH v3 0/5] kvm "virtio pmem" device MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT X-Originating-IP: [10.65.161.12, 10.4.195.13] Thread-Topic: kvm "virtio pmem" device Thread-Index: OHqAmz+43aD7/XcwD1jICekcBHpUfA== X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Wed, 09 Jan 2019 14:46:10 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Please ignore this series as my network went down while sending this. I will send this series again. Thanks, Pankaj > > This patch series has implementation for "virtio pmem". > "virtio pmem" is fake persistent memory(nvdimm) in guest > which allows to bypass the guest page cache. This also > implements a VIRTIO based asynchronous flush mechanism. > > Sharing guest kernel driver in this patchset with the > changes suggested in v2. Tested with Qemu side device > emulation for virtio-pmem [6]. > > Details of project idea for 'virtio pmem' flushing interface > is shared [3] & [4]. > > Implementation is divided into two parts: > New virtio pmem guest driver and qemu code changes for new > virtio pmem paravirtualized device. > > 1. Guest virtio-pmem kernel driver > --------------------------------- > - Reads persistent memory range from paravirt device and > registers with 'nvdimm_bus'. > - 'nvdimm/pmem' driver uses this information to allocate > persistent memory region and setup filesystem operations > to the allocated memory. > - virtio pmem driver implements asynchronous flushing > interface to flush from guest to host. > > 2. Qemu virtio-pmem device > --------------------------------- > - Creates virtio pmem device and exposes a memory range to > KVM guest. > - At host side this is file backed memory which acts as > persistent memory. > - Qemu side flush uses aio thread pool API's and virtio > for asynchronous guest multi request handling. > > David Hildenbrand CCed also posted a modified version[6] of > qemu virtio-pmem code based on updated Qemu memory device API. > > Virtio-pmem errors handling: > ---------------------------------------- > Checked behaviour of virtio-pmem for below types of errors > Need suggestions on expected behaviour for handling these errors? > > - Hardware Errors: Uncorrectable recoverable Errors: > a] virtio-pmem: > - As per current logic if error page belongs to Qemu process, > host MCE handler isolates(hwpoison) that page and send SIGBUS. > Qemu SIGBUS handler injects exception to KVM guest. > - KVM guest then isolates the page and send SIGBUS to guest > userspace process which has mapped the page. > > b] Existing implementation for ACPI pmem driver: > - Handles such errors with MCE notifier and creates a list > of bad blocks. Read/direct access DAX operation return EIO > if accessed memory page fall in bad block list. > - It also starts backgound scrubbing. > - Similar functionality can be reused in virtio-pmem with MCE > notifier but without scrubbing(no ACPI/ARS)? Need inputs to > confirm if this behaviour is ok or needs any change? > > Changes from PATCH v2: [1] > - Disable MAP_SYNC for ext4 & XFS filesystems - [Dan] > - Use name 'virtio pmem' in place of 'fake dax' > > Changes from PATCH v1: [2] > - 0-day build test for build dependency on libnvdimm > > Changes suggested by - [Dan Williams] > - Split the driver into two parts virtio & pmem > - Move queuing of async block request to block layer > - Add "sync" parameter in nvdimm_flush function > - Use indirect call for nvdimm_flush > - Don’t move declarations to common global header e.g nd.h > - nvdimm_flush() return 0 or -EIO if it fails > - Teach nsio_rw_bytes() that the flush can fail > - Rename nvdimm_flush() to generic_nvdimm_flush() > - Use 'nd_region->provider_data' for long dereferencing > - Remove virtio_pmem_freeze/restore functions > - Remove BSD license text with SPDX license text > > - Add might_sleep() in virtio_pmem_flush - [Luiz] > - Make spin_lock_irqsave() narrow > > Changes from RFC v3 > - Rebase to latest upstream - Luiz > - Call ndregion->flush in place of nvdimm_flush- Luiz > - kmalloc return check - Luiz > - virtqueue full handling - Stefan > - Don't map entire virtio_pmem_req to device - Stefan > - request leak, correct sizeof req- Stefan > - Move declaration to virtio_pmem.c > > Changes from RFC v2: > - Add flush function in the nd_region in place of switching > on a flag - Dan & Stefan > - Add flush completion function with proper locking and wait > for host side flush completion - Stefan & Dan > - Keep userspace API in uapi header file - Stefan, MST > - Use LE fields & New device id - MST > - Indentation & spacing suggestions - MST & Eric > - Remove extra header files & add licensing - Stefan > > Changes from RFC v1: > - Reuse existing 'pmem' code for registering persistent > memory and other operations instead of creating an entirely > new block driver. > - Use VIRTIO driver to register memory information with > nvdimm_bus and create region_type accordingly. > - Call VIRTIO flush from existing pmem driver. > > Pankaj Gupta (5): > libnvdimm: nd_region flush callback support > virtio-pmem: Add virtio-pmem guest driver > libnvdimm: add nd_region buffered dax_dev flag > ext4: disable map_sync for virtio pmem > xfs: disable map_sync for virtio pmem > > [2] https://lkml.org/lkml/2018/8/31/407 > [3] https://www.spinics.net/lists/kvm/msg149761.html > [4] https://www.spinics.net/lists/kvm/msg153095.html > [5] https://lkml.org/lkml/2018/8/31/413 > [6] https://marc.info/?l=qemu-devel&m=153555721901824&w=2 > > drivers/acpi/nfit/core.c | 4 - > drivers/dax/super.c | 17 +++++ > drivers/nvdimm/claim.c | 6 + > drivers/nvdimm/nd.h | 1 > drivers/nvdimm/pmem.c | 15 +++- > drivers/nvdimm/region_devs.c | 45 +++++++++++++- > drivers/nvdimm/virtio_pmem.c | 84 ++++++++++++++++++++++++++ > drivers/virtio/Kconfig | 10 +++ > drivers/virtio/Makefile | 1 > drivers/virtio/pmem.c | 125 > +++++++++++++++++++++++++++++++++++++++ > fs/ext4/file.c | 11 +++ > fs/xfs/xfs_file.c | 8 ++ > include/linux/dax.h | 9 ++ > include/linux/libnvdimm.h | 11 +++ > include/linux/virtio_pmem.h | 60 ++++++++++++++++++ > include/uapi/linux/virtio_ids.h | 1 > include/uapi/linux/virtio_pmem.h | 10 +++ > 17 files changed, 406 insertions(+), 12 deletions(-) > > >