Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp7310861ybi; Wed, 5 Jun 2019 15:15:59 -0700 (PDT) X-Google-Smtp-Source: APXvYqwlMN/p9qopH+iZhHbph+N3xSYnYf8ei0yDWreLs9UkKf0rm8osPbOdFIcuIVg7pvrSdPf3 X-Received: by 2002:a63:2ec6:: with SMTP id u189mr136269pgu.44.1559772958974; Wed, 05 Jun 2019 15:15:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559772958; cv=none; d=google.com; s=arc-20160816; b=pMr5fG4Ilh2NjZj01pQmcYvM8QnnacB64PBgLVd/Zz+iDmREA6Ldv09OBZM9U64Cdz jkDLY4WNGkbvtc+kXUHd6+zuTojABaO3/9vkNfYp/ZTcXQW9KfIQsSPo/RReNpreBBa1 IgjCYko1EfZm0+roIh97VjvJ2vvP5N0iJqBDclGk/KHB1OLqVbwCZyk1ZJTRk2bmDZpK XzzQXo9sv7/3F+cYG2ox7CHJ21vhtLR9A30/Vd6UrOfEdpbfk6hvOePpc5RG61O4jRrf eE2Ko34Zo1eWG/bM5j1Kv4XGNA8Omv/FTUyIl8OCpV3uQTK4AVHFe8SgXKh3nmEOPKoq Mbew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject; bh=ZKEVF6/9Fmnf7YWUEHXmw9ISQdNkRcpJo8Na4G/GKC4=; b=yYJf8PzRwfBTTxGTC/wSAjzXPDVCjWRG7K/QCzeYLWjXoxHPeVQBfJhlbeSLr8OeSA 86+DQORRQGK06CMN+orbUKk5Lqt4rFKp8M+CC0HK/9G0aAwUWIrpGJJAEFEMxCK9KlES rp+tN4F6WOzShTjalCsfnZXkGSU8XLuX70G+YViR2jMslJSqdMN3k6LDlUpSkDdUFXi8 23qrUj9r6atRTyWGOgLhegCvBPsEMuq6wi9P1vC/LFHhZo0/bR1S8ea5SqginMbp9hND KTdlD4MjGlJk5iQKKLVLevWrzWUeAAlp3jk2mchn1WEEt+p8tU3alyDnF3Vij39H4V5D fUOQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f62si29502945plf.88.2019.06.05.15.15.42; Wed, 05 Jun 2019 15:15:58 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726828AbfFEWNF (ORCPT + 99 others); Wed, 5 Jun 2019 18:13:05 -0400 Received: from mga03.intel.com ([134.134.136.65]:25093 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726821AbfFEWNF (ORCPT ); Wed, 5 Jun 2019 18:13:05 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Jun 2019 15:13:04 -0700 X-ExtLoop1: 1 Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga008.fm.intel.com with ESMTP; 05 Jun 2019 15:13:04 -0700 Subject: [PATCH v9 09/12] mm: Document ZONE_DEVICE memory-model implications From: Dan Williams To: akpm@linux-foundation.org Cc: Jonathan Corbet , Mike Rapoport , linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, osalvador@suse.de, mhocko@suse.com Date: Wed, 05 Jun 2019 14:58:47 -0700 Message-ID: <155977192794.2443951.16177998596403034849.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <155977186863.2443951.9036044808311959913.stgit@dwillia2-desk3.amr.corp.intel.com> References: <155977186863.2443951.9036044808311959913.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Explain the general mechanisms of 'ZONE_DEVICE' pages and list the users of 'devm_memremap_pages()'. Cc: Jonathan Corbet Reported-by: Mike Rapoport Signed-off-by: Dan Williams --- Documentation/vm/memory-model.rst | 39 +++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/Documentation/vm/memory-model.rst b/Documentation/vm/memory-model.rst index 382f72ace1fc..e0af47e02e78 100644 --- a/Documentation/vm/memory-model.rst +++ b/Documentation/vm/memory-model.rst @@ -181,3 +181,42 @@ that is eventually passed to vmemmap_populate() through a long chain of function calls. The vmemmap_populate() implementation may use the `vmem_altmap` along with :c:func:`altmap_alloc_block_buf` helper to allocate memory map on the persistent memory device. + +ZONE_DEVICE +=========== +The `ZONE_DEVICE` facility builds upon `SPARSEMEM_VMEMMAP` to offer +`struct page` `mem_map` services for device driver identified physical +address ranges. The "device" aspect of `ZONE_DEVICE` relates to the fact +that the page objects for these address ranges are never marked online, +and that a reference must be taken against the device, not just the page +to keep the memory pinned for active use. `ZONE_DEVICE`, via +:c:func:`devm_memremap_pages`, performs just enough memory hotplug to +turn on :c:func:`pfn_to_page`, :c:func:`page_to_pfn`, and +:c:func:`get_user_pages` service for the given range of pfns. Since the +page reference count never drops below 1 the page is never tracked as +free memory and the page's `struct list_head lru` space is repurposed +for back referencing to the host device / driver that mapped the memory. + +While `SPARSEMEM` presents memory as a collection of sections, +optionally collected into memory blocks, `ZONE_DEVICE` users have a need +for smaller granularity of populating the `mem_map`. Given that +`ZONE_DEVICE` memory is never marked online it is subsequently never +subject to its memory ranges being exposed through the sysfs memory +hotplug api on memory block boundaries. The implementation relies on +this lack of user-api constraint to allow sub-section sized memory +ranges to be specified to :c:func:`arch_add_memory`, the top-half of +memory hotplug. Sub-section support allows for `PMD_SIZE` as the minimum +alignment granularity for :c:func:`devm_memremap_pages`. + +The users of `ZONE_DEVICE` are: +* pmem: Map platform persistent memory to be used as a direct-I/O target + via DAX mappings. + +* hmm: Extend `ZONE_DEVICE` with `->page_fault()` and `->page_free()` + event callbacks to allow a device-driver to coordinate memory management + events related to device-memory, typically GPU memory. See + Documentation/vm/hmm.rst. + +* p2pdma: Create `struct page` objects to allow peer devices in a + PCI/-E topology to coordinate direct-DMA operations between themselves, + i.e. bypass host memory.