Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753862AbbHMTPl (ORCPT ); Thu, 13 Aug 2015 15:15:41 -0400 Received: from mx1.redhat.com ([209.132.183.28]:49099 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752637AbbHMTPj (ORCPT ); Thu, 13 Aug 2015 15:15:39 -0400 From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= To: akpm@linux-foundation.org, , linux-mm@kvack.org Cc: Linus Torvalds , , Mel Gorman , "H. Peter Anvin" , Peter Zijlstra , Andrea Arcangeli , Johannes Weiner , Larry Woodman , Rik van Riel , Dave Airlie , Brendan Conoboy , Joe Donohue , Christophe Harle , Duncan Poole , Sherry Cheung , Subhash Gutti , John Hubbard , Mark Hairgrove , Lucien Dunning , Cameron Buschardt , Arvind Gopalakrishnan , Haggai Eran , Shachar Raindel , Liran Liss , Roland Dreier , Ben Sander , Greg Stoner , John Bridgman , Michael Mantor , Paul Blinzer , Leonid Shamis , Laurent Morichetti , Alexander Deucher , "Linda Wang" , "Kevin E Martin" , "Jeff Law" , "Or Gerlitz" , "Sagi Grimberg" Subject: HMM (Heterogeneous Memory Management) v10 Date: Thu, 13 Aug 2015 15:15:13 -0400 Message-Id: <1439493328-1028-1-git-send-email-jglisse@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5953 Lines: 147 Minor fixes since last post (1), apply on top of 4.2-rc6 done that because conflict in infiniband are harder to solve then conflict with mm tree. Tree with the patchset: git://people.freedesktop.org/~glisse/linux hmm-v10 branch Previous cover letter : HMM (Heterogeneous Memory Management) is an helper layer for device that want to mirror a process address space into their own mmu. Main target is GPU but other hardware, like network device can take also use HMM. There is two side to HMM, first one is mirroring of process address space on behalf of a device. HMM will manage a secondary page table for the device and keep it synchronize with the CPU page table. HMM also do DMA mapping on behalf of the device (which would allow new kind of optimization further down the road (2)). Second side is allowing to migrate process memory to device memory where device memory is unmappable by the CPU. Any CPU access will trigger special fault that will migrate memory back. This patchset does not deal with remote memory migration. Why doing this ? Mirroring a process address space is mandatory with OpenCL 2.0 and with other GPU compute API. OpenCL 2.0 allow different level of implementation and currently only the lowest 2 are supported on Linux. To implement the highest level, where CPU and GPU access can happen concurently and are cache coherent, HMM is needed, or something providing same functionality, for instance through platform hardware. Hardware solution such as PCIE ATS/PASID is limited to mirroring system memory and does not provide way to migrate memory to device memory (which offer significantly more bandwidth up to 10 times faster than regular system memory with discret GPU, also have lower latency than PCIE transaction). Current CPU with GPU on same die (AMD or Intel) use the ATS/PASID and for Intel a special level of cache (backed by a large pool of fast memory). For foreseeable futur, discrete GPU will remain releveant as they can have a large quantity of faster memory than integrated GPU. Thus we believe HMM will allow to leverage discret GPU memory in a transparent fashion to the application, with minimum disruption to the linux kernel mm code. Also HMM can work along hardware solution such as PCIE ATS/PASID (leaving regular case to ATS/PASID while HMM handles the migrated memory case). Design : The patch 1, 2, 3 and 4 augment the mmu notifier API with new informations to more efficiently mirror CPU page table updates. The first side of HMM, process address space mirroring, is implemented in patch 5 through 14. This use a secondary page table, in which HMM mirror memory actively use by the device. HMM does not take a reference on any of the page, it use the mmu notifier API to track changes to the CPU page table and to update the mirror page table. All this while providing a simple API to device driver. To implement this we use a "generic" page table and not a radix tree because we need to store more flags than radix allows and we need to store dma address (sizeof(dma_addr_t) > sizeof(long) on some platform). All this is (1) Previous patchset posting : v1 http://lwn.net/Articles/597289/ v2 https://lkml.org/lkml/2014/6/12/559 v3 https://lkml.org/lkml/2014/6/13/633 v4 https://lkml.org/lkml/2014/8/29/423 v5 https://lkml.org/lkml/2014/11/3/759 v6 http://lwn.net/Articles/619737/ v7 http://lwn.net/Articles/627316/ v8 https://lwn.net/Articles/645515/ v9 https://lwn.net/Articles/651553/ (2) Because HMM keeps a secondary page table which keeps track of DMA mapping, there is room for new optimization. We want to add a new DMA API to allow to manage DMA page table mapping at directory level. This would allow to minimize memory consumption of mirror page table and also over head of doing DMA mapping page per page. This is a future feature we want to work on and hope the idea will proove usefull not only to HMM users. Cheers, Jérôme To: "Andrew Morton" , To: , To: linux-mm , Cc: "Linus Torvalds" , Cc: "Mel Gorman" , Cc: "H. Peter Anvin" , Cc: "Peter Zijlstra" , Cc: "Linda Wang" , Cc: "Kevin E Martin" , Cc: "Andrea Arcangeli" , Cc: "Johannes Weiner" , Cc: "Larry Woodman" , Cc: "Rik van Riel" , Cc: "Dave Airlie" , Cc: "Jeff Law" , Cc: "Brendan Conoboy" , Cc: "Joe Donohue" , Cc: "Christophe Harle" , Cc: "Duncan Poole" , Cc: "Sherry Cheung" , Cc: "Subhash Gutti" , Cc: "John Hubbard" , Cc: "Mark Hairgrove" , Cc: "Lucien Dunning" , Cc: "Cameron Buschardt" , Cc: "Arvind Gopalakrishnan" , Cc: "Haggai Eran" , Cc: "Or Gerlitz" , Cc: "Sagi Grimberg" Cc: "Shachar Raindel" , Cc: "Liran Liss" , Cc: "Roland Dreier" , Cc: "Sander, Ben" , Cc: "Stoner, Greg" , Cc: "Bridgman, John" , Cc: "Mantor, Michael" , Cc: "Blinzer, Paul" , Cc: "Morichetti, Laurent" , Cc: "Deucher, Alexander" , Cc: "Leonid Shamis" -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/