Received: by 10.213.65.68 with SMTP id h4csp69620imn; Wed, 21 Mar 2018 12:35:43 -0700 (PDT) X-Google-Smtp-Source: AG47ELuc3de8OrdfM4jM5brH/7si8gaTAjHMrk5E+oEtXXAHg6jCWx3SbGfIBEKlLjmyo5oa1irN X-Received: by 2002:a17:902:b7cc:: with SMTP id v12-v6mr6379498plz.237.1521660943107; Wed, 21 Mar 2018 12:35:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521660943; cv=none; d=google.com; s=arc-20160816; b=i4yJlN2Ct2bJw6rkir0TNqa4CybVp3GXpuEvylmw+wBzR6L1c+1QOnOKBx2An4vsy+ ZY76C9dKy4yUxXTJ56QhwtuBFivlOjqrrKXhrG1jCKg2cKh1p+MAKJ7/PcTu4NFLqr+a HztCJ92JW2plk4wAjgYSo7wt8UUX+dxvUO/TvPvG3s3esyimXz8GCU0a9WWqYeifZXs6 +O6M+DuI1p+hk9zt8bhz8vvNzrtynAiFXyOlQcLfzDh1zwqs8X620g8L/ECKwonhlDZB kj6aO1KYQyGoB8Qa4217jDkl0K6F1FsXPZOHvVzEYZ+jUayFKedjpy9ZPn4melN539vs 3Zjg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:references:in-reply-to:date :subject:cc:to:from:arc-authentication-results; bh=aMpNOU8pvzrwAryd+m32bb6eP6RlHzbC0RUGc97edQQ=; b=v5BnzKSm3g2mJHDNjqDaLRql4Lg4Y3BjHSiq5LUbcNytIHWF8p9eDSM2swU8Qc70NZ IZe9uQDeOAGFZih512R3o+qJdw+XRZsBlHI4w/copM7x5HSU7mjJZ87N8i3lZDWaixJx 9HI/3IcFXDA/M8UgX2/9ahDntGBf+4QeACQsVP3vpZjdbst+pHtHy7Ptm6G77LfElNy3 sDEsiXnvEgbGMGuBfOCSRn7ZvVt0kferZlnBGkjVHInIXqViR89elwN8RzXUrVTv99q9 7oGLAmFtw7ReMNIpp+0bHfTCBs8oPEis9zm5ZFjTMmd4xwUR8I9FQKWaYWBdESOF/Xni v0TQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e74si3576422pfd.97.2018.03.21.12.35.28; Wed, 21 Mar 2018 12:35:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753384AbeCUTd0 (ORCPT + 99 others); Wed, 21 Mar 2018 15:33:26 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:33412 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753159AbeCUTXc (ORCPT ); Wed, 21 Mar 2018 15:23:32 -0400 Received: from pps.filterd (m0098421.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w2LJIZLQ094325 for ; Wed, 21 Mar 2018 15:23:32 -0400 Received: from e06smtp11.uk.ibm.com (e06smtp11.uk.ibm.com [195.75.94.107]) by mx0a-001b2d01.pphosted.com with ESMTP id 2guvmsafg2-1 (version=TLSv1.2 cipher=AES256-SHA256 bits=256 verify=NOT) for ; Wed, 21 Mar 2018 15:23:31 -0400 Received: from localhost by e06smtp11.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 21 Mar 2018 19:23:29 -0000 Received: from b06cxnps3074.portsmouth.uk.ibm.com (9.149.109.194) by e06smtp11.uk.ibm.com (192.168.101.141) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 21 Mar 2018 19:23:24 -0000 Received: from d06av24.portsmouth.uk.ibm.com (d06av24.portsmouth.uk.ibm.com [9.149.105.60]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w2LJNNCm58261676; Wed, 21 Mar 2018 19:23:23 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9F4BD42045; Wed, 21 Mar 2018 19:15:30 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A09C542041; Wed, 21 Mar 2018 19:15:27 +0000 (GMT) Received: from rapoport-lnx (unknown [9.148.206.27]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Wed, 21 Mar 2018 19:15:27 +0000 (GMT) Received: by rapoport-lnx (sSMTP sendmail emulation); Wed, 21 Mar 2018 21:23:19 +0200 From: Mike Rapoport To: Jonathan Corbet Cc: Andrey Ryabinin , Richard Henderson , Ivan Kokshaysky , Matt Turner , Tony Luck , Fenghua Yu , Ralf Baechle , James Hogan , Michael Ellerman , Alexander Viro , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kasan-dev@googlegroups.com, linux-alpha@vger.kernel.org, linux-ia64@vger.kernel.org, linux-mips@linux-mips.org, linuxppc-dev@lists.ozlabs.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Mike Rapoport Subject: [PATCH 06/32] docs/vm: hmm.txt: convert to ReST format Date: Wed, 21 Mar 2018 21:22:22 +0200 X-Mailer: git-send-email 2.7.4 In-Reply-To: <1521660168-14372-1-git-send-email-rppt@linux.vnet.ibm.com> References: <1521660168-14372-1-git-send-email-rppt@linux.vnet.ibm.com> X-TM-AS-GCONF: 00 x-cbid: 18032119-0040-0000-0000-00000443E0A8 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18032119-0041-0000-0000-000020E705C2 Message-Id: <1521660168-14372-7-git-send-email-rppt@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2018-03-21_09:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=2 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1803210221 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Signed-off-by: Mike Rapoport --- Documentation/vm/hmm.txt | 66 ++++++++++++++++++++---------------------------- 1 file changed, 28 insertions(+), 38 deletions(-) diff --git a/Documentation/vm/hmm.txt b/Documentation/vm/hmm.txt index 4d3aac9..3fafa33 100644 --- a/Documentation/vm/hmm.txt +++ b/Documentation/vm/hmm.txt @@ -1,4 +1,8 @@ +.. hmm: + +===================================== Heterogeneous Memory Management (HMM) +===================================== Transparently allow any component of a program to use any memory region of said program with a device without using device specific memory allocator. This is @@ -14,19 +18,10 @@ deals with how device memory is represented inside the kernel. Finaly the last section present the new migration helper that allow to leverage the device DMA engine. +.. contents:: :local: -1) Problems of using device specific memory allocator: -2) System bus, device memory characteristics -3) Share address space and migration -4) Address space mirroring implementation and API -5) Represent and manage device memory from core kernel point of view -6) Migrate to and from device memory -7) Memory cgroup (memcg) and rss accounting - - -------------------------------------------------------------------------------- - -1) Problems of using device specific memory allocator: +Problems of using device specific memory allocator +================================================== Device with large amount of on board memory (several giga bytes) like GPU have historically manage their memory through dedicated driver specific API. This @@ -68,9 +63,8 @@ only do-able with a share address. It is as well more reasonable to use a share address space for all the other patterns. -------------------------------------------------------------------------------- - -2) System bus, device memory characteristics +System bus, device memory characteristics +========================================= System bus cripple share address due to few limitations. Most system bus only allow basic memory access from device to main memory, even cache coherency is @@ -100,9 +94,8 @@ access any memory memory but we must also permit any memory to be migrated to device memory while device is using it (blocking CPU access while it happens). -------------------------------------------------------------------------------- - -3) Share address space and migration +Share address space and migration +================================= HMM intends to provide two main features. First one is to share the address space by duplication the CPU page table into the device page table so same @@ -140,14 +133,13 @@ leverage device memory by migrating part of data-set that is actively use by a device. -------------------------------------------------------------------------------- - -4) Address space mirroring implementation and API +Address space mirroring implementation and API +============================================== Address space mirroring main objective is to allow to duplicate range of CPU page table into a device page table and HMM helps keeping both synchronize. A device driver that want to mirror a process address space must start with the -registration of an hmm_mirror struct: +registration of an hmm_mirror struct:: int hmm_mirror_register(struct hmm_mirror *mirror, struct mm_struct *mm); @@ -156,7 +148,7 @@ registration of an hmm_mirror struct: The locked variant is to be use when the driver is already holding the mmap_sem of the mm in write mode. The mirror struct has a set of callback that are use -to propagate CPU page table: +to propagate CPU page table:: struct hmm_mirror_ops { /* sync_cpu_device_pagetables() - synchronize page tables @@ -187,7 +179,8 @@ be done with the update. When device driver wants to populate a range of virtual address it can use -either: +either:: + int hmm_vma_get_pfns(struct vm_area_struct *vma, struct hmm_range *range, unsigned long start, @@ -211,7 +204,7 @@ that array correspond to an address in the virtual range. HMM provide a set of flags to help driver identify special CPU page table entries. Locking with the update() callback is the most important aspect the driver must -respect in order to keep things properly synchronize. The usage pattern is : +respect in order to keep things properly synchronize. The usage pattern is:: int driver_populate_range(...) { @@ -251,9 +244,8 @@ concurrently for multiple devices. Waiting for each device to report commands as executed is serialize (there is no point in doing this concurrently). -------------------------------------------------------------------------------- - -5) Represent and manage device memory from core kernel point of view +Represent and manage device memory from core kernel point of view +================================================================= Several differents design were try to support device memory. First one use device specific data structure to keep information about migrated memory and @@ -269,14 +261,14 @@ un-aware of the difference. We only need to make sure that no one ever try to map those page from the CPU side. HMM provide a set of helpers to register and hotplug device memory as a new -region needing struct page. This is offer through a very simple API: +region needing struct page. This is offer through a very simple API:: struct hmm_devmem *hmm_devmem_add(const struct hmm_devmem_ops *ops, struct device *device, unsigned long size); void hmm_devmem_remove(struct hmm_devmem *devmem); -The hmm_devmem_ops is where most of the important things are: +The hmm_devmem_ops is where most of the important things are:: struct hmm_devmem_ops { void (*free)(struct hmm_devmem *devmem, struct page *page); @@ -294,13 +286,12 @@ second callback happens whenever CPU try to access a device page which it can not do. This second callback must trigger a migration back to system memory. -------------------------------------------------------------------------------- - -6) Migrate to and from device memory +Migrate to and from device memory +================================= Because CPU can not access device memory, migration must use device DMA engine to perform copy from and to device memory. For this we need a new migration -helper: +helper:: int migrate_vma(const struct migrate_vma_ops *ops, struct vm_area_struct *vma, @@ -319,7 +310,7 @@ such migration base on range of address the device is actively accessing. The migrate_vma_ops struct define two callbacks. First one (alloc_and_copy()) control destination memory allocation and copy operation. Second one is there -to allow device driver to perform cleanup operation after migration. +to allow device driver to perform cleanup operation after migration:: struct migrate_vma_ops { void (*alloc_and_copy)(struct vm_area_struct *vma, @@ -353,9 +344,8 @@ bandwidth but this is considered as a rare event and a price that we are willing to pay to keep all the code simpler. -------------------------------------------------------------------------------- - -7) Memory cgroup (memcg) and rss accounting +Memory cgroup (memcg) and rss accounting +======================================== For now device memory is accounted as any regular page in rss counters (either anonymous if device page is use for anonymous, file if device page is use for -- 2.7.4