Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751494AbbEFUHi (ORCPT ); Wed, 6 May 2015 16:07:38 -0400 Received: from mga11.intel.com ([192.55.52.93]:23695 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750812AbbEFUHf (ORCPT ); Wed, 6 May 2015 16:07:35 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.13,380,1427785200"; d="scan'208";a="706297698" Subject: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t From: Dan Williams To: linux-kernel@vger.kernel.org Cc: Boaz Harrosh , Jan Kara , Mike Snitzer , Neil Brown , Benjamin Herrenschmidt , Dave Hansen , Heiko Carstens , Chris Mason , Paul Mackerras , "H. Peter Anvin" , hch@lst.de, Alasdair Kergon , linux-nvdimm@ml01.01.org, mingo@kernel.org, mgorman@suse.de, Matthew Wilcox , Ross Zwisler , riel@redhat.com, Martin Schwidefsky , axboe@kernel.dk, "Theodore Ts'o" , "Martin K. Petersen" , Julia Lawall , Tejun Heo , linux-fsdevel@vger.kernel.org, akpm@linux-foundation.org, Linus Torvalds Date: Wed, 06 May 2015 16:04:53 -0400 Message-ID: <20150506200219.40425.74411.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.17.1-8-g92dd MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9932 Lines: 192 Changes since v1 [1]: 1/ added include/asm-generic/pfn.h for the __pfn_t definition and helpers. 2/ added kmap_atomic_pfn_t() 3/ rebased on v4.1-rc2 [1]: http://marc.info/?l=linux-kernel&m=142653770511970&w=2 --- A lead in note, this looks scarier than it is. Most of the code thrash is automated via Coccinelle. Also the subtle differences behind an 'unsigned long pfn' and a '__pfn_t' are mitigated by type-safety and a Kconfig option (default disabled CONFIG_PMEM_IO) that globally controls whether a pfn and a __pfn_t are equivalent. The motivation for this change is persistent memory and the desire to use it not only via the pmem driver, but also as a memory target for I/O (DAX, O_DIRECT, DMA, RDMA, etc) in other parts of the kernel. Aside from the pmem driver and DAX, persistent memory is not able to be used in these I/O scenarios due to the lack of a backing struct page, i.e. persistent memory is not part of the memmap. This patchset takes the position that the solution is to teach I/O paths that want to operate on persistent memory to do so by referencing a __pfn_t. The alternatives are discussed in the changelog for "[PATCH v2 01/10] arch: introduce __pfn_t for persistent memory i/o", copied here: Alternatives: 1/ Provide struct page coverage for persistent memory in DRAM. The expectation is that persistent memory capacities make this untenable in the long term. 2/ Provide struct page coverage for persistent memory with persistent memory. While persistent memory may have near DRAM performance characteristics it may not have the same write-endurance of DRAM. Given the update frequency of struct page objects it may not be suitable for persistent memory. 3/ Dynamically allocate struct page. This appears to be on the order of the complexity of converting code paths to use __pfn_t references instead of struct page, and the amount of setup required to establish a valid struct page reference is mostly wasted when the only usage in the block stack is to perform a page_to_pfn() conversion for dma-mapping. Instances of kmap() / kmap_atomic() usage appear to be the only occasions in the block stack where struct page is non-trivially used. A new kmap_atomic_pfn_t() is proposed to handle those cases. --- Dan Williams (9): arch: introduce __pfn_t for persistent memory i/o block: add helpers for accessing a bio_vec page block: convert .bv_page to .bv_pfn bio_vec dma-mapping: allow archs to optionally specify a ->map_pfn() operation scatterlist: use sg_phys() x86: support dma_map_pfn() x86: support kmap_atomic_pfn_t() for persistent memory dax: convert to __pfn_t block: base support for pfn i/o Matthew Wilcox (1): scatterlist: support "page-less" (__pfn_t only) entries arch/Kconfig | 6 ++ arch/arm/mm/dma-mapping.c | 2 - arch/microblaze/kernel/dma.c | 2 - arch/powerpc/sysdev/axonram.c | 6 +- arch/x86/Kconfig | 7 ++ arch/x86/kernel/Makefile | 1 arch/x86/kernel/amd_gart_64.c | 22 +++++- arch/x86/kernel/kmap.c | 95 ++++++++++++++++++++++++++ arch/x86/kernel/pci-nommu.c | 22 +++++- arch/x86/kernel/pci-swiotlb.c | 4 + arch/x86/pci/sta2x11-fixup.c | 4 + arch/x86/xen/pci-swiotlb-xen.c | 4 + block/bio-integrity.c | 8 +- block/bio.c | 82 ++++++++++++++++------ block/blk-core.c | 13 +++- block/blk-integrity.c | 7 +- block/blk-lib.c | 2 - block/blk-merge.c | 15 ++-- block/bounce.c | 26 ++++--- drivers/block/aoe/aoecmd.c | 8 +- drivers/block/brd.c | 6 +- drivers/block/drbd/drbd_bitmap.c | 5 + drivers/block/drbd/drbd_main.c | 6 +- drivers/block/drbd/drbd_receiver.c | 4 + drivers/block/drbd/drbd_worker.c | 3 + drivers/block/floppy.c | 6 +- drivers/block/loop.c | 13 ++-- drivers/block/nbd.c | 8 +- drivers/block/nvme-core.c | 2 - drivers/block/pktcdvd.c | 11 ++- drivers/block/pmem.c | 16 +++- drivers/block/ps3disk.c | 2 - drivers/block/ps3vram.c | 2 - drivers/block/rbd.c | 2 - drivers/block/rsxx/dma.c | 2 - drivers/block/umem.c | 2 - drivers/block/zram/zram_drv.c | 10 +-- drivers/dma/ste_dma40.c | 5 - drivers/iommu/amd_iommu.c | 21 ++++-- drivers/iommu/intel-iommu.c | 26 +++++-- drivers/iommu/iommu.c | 2 - drivers/md/bcache/btree.c | 4 + drivers/md/bcache/debug.c | 6 +- drivers/md/bcache/movinggc.c | 2 - drivers/md/bcache/request.c | 6 +- drivers/md/bcache/super.c | 10 +-- drivers/md/bcache/util.c | 5 + drivers/md/bcache/writeback.c | 2 - drivers/md/dm-crypt.c | 12 ++- drivers/md/dm-io.c | 2 - drivers/md/dm-log-writes.c | 14 ++-- drivers/md/dm-verity.c | 2 - drivers/md/raid1.c | 50 +++++++------- drivers/md/raid10.c | 38 +++++----- drivers/md/raid5.c | 6 +- drivers/mmc/card/queue.c | 4 + drivers/s390/block/dasd_diag.c | 2 - drivers/s390/block/dasd_eckd.c | 14 ++-- drivers/s390/block/dasd_fba.c | 6 +- drivers/s390/block/dcssblk.c | 8 +- drivers/s390/block/scm_blk.c | 2 - drivers/s390/block/scm_blk_cluster.c | 2 - drivers/s390/block/xpram.c | 2 - drivers/scsi/mpt2sas/mpt2sas_transport.c | 6 +- drivers/scsi/mpt3sas/mpt3sas_transport.c | 6 +- drivers/scsi/sd_dif.c | 4 + drivers/staging/android/ion/ion_chunk_heap.c | 4 + drivers/staging/lustre/lustre/llite/lloop.c | 2 - drivers/target/target_core_file.c | 4 + drivers/xen/biomerge.c | 4 + drivers/xen/swiotlb-xen.c | 29 +++++--- fs/9p/vfs_addr.c | 2 - fs/block_dev.c | 2 - fs/btrfs/check-integrity.c | 6 +- fs/btrfs/compression.c | 12 ++- fs/btrfs/disk-io.c | 5 + fs/btrfs/extent_io.c | 8 +- fs/btrfs/file-item.c | 8 +- fs/btrfs/inode.c | 19 +++-- fs/btrfs/raid56.c | 4 + fs/btrfs/volumes.c | 2 - fs/buffer.c | 4 + fs/dax.c | 9 +- fs/direct-io.c | 2 - fs/exofs/ore.c | 4 + fs/exofs/ore_raid.c | 2 - fs/ext4/page-io.c | 2 - fs/ext4/readpage.c | 4 + fs/f2fs/data.c | 4 + fs/f2fs/segment.c | 2 - fs/gfs2/lops.c | 4 + fs/jfs/jfs_logmgr.c | 4 + fs/logfs/dev_bdev.c | 10 +-- fs/mpage.c | 2 - fs/splice.c | 2 - include/asm-generic/dma-mapping-common.h | 30 ++++++++ include/asm-generic/memory_model.h | 1 include/asm-generic/pfn.h | 67 ++++++++++++++++++ include/asm-generic/scatterlist.h | 10 +++ include/crypto/scatterwalk.h | 10 +++ include/linux/bio.h | 24 ++++--- include/linux/blk_types.h | 20 +++++ include/linux/blkdev.h | 6 +- include/linux/dma-debug.h | 23 +++++- include/linux/dma-mapping.h | 8 ++ include/linux/highmem.h | 23 ++++++ include/linux/mm.h | 1 include/linux/scatterlist.h | 91 ++++++++++++++++++++++--- include/linux/swiotlb.h | 4 + init/Kconfig | 13 ++++ kernel/power/block_io.c | 2 - lib/dma-debug.c | 10 ++- lib/iov_iter.c | 22 +++--- lib/swiotlb.c | 20 ++++- mm/page_io.c | 10 +-- net/ceph/messenger.c | 2 - 116 files changed, 896 insertions(+), 372 deletions(-) create mode 100644 arch/x86/kernel/kmap.c create mode 100644 include/asm-generic/pfn.h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/