Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753563AbbGQSxE (ORCPT ); Fri, 17 Jul 2015 14:53:04 -0400 Received: from mx1.redhat.com ([209.132.183.28]:44372 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751670AbbGQSxB (ORCPT ); Fri, 17 Jul 2015 14:53:01 -0400 From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= To: akpm@linux-foundation.org, , linux-mm@kvack.org Cc: Linus Torvalds , , Mel Gorman , "H. Peter Anvin" , Peter Zijlstra , Andrea Arcangeli , Johannes Weiner , Larry Woodman , Rik van Riel , Dave Airlie , Brendan Conoboy , Joe Donohue , Christophe Harle , Duncan Poole , Sherry Cheung , Subhash Gutti , John Hubbard , Mark Hairgrove , Lucien Dunning , Cameron Buschardt , Arvind Gopalakrishnan , Haggai Eran , Shachar Raindel , Liran Liss , Roland Dreier , Ben Sander , Greg Stoner , John Bridgman , Michael Mantor , Paul Blinzer , Leonid Shamis , Laurent Morichetti , Alexander Deucher , "Linda Wang" , "Kevin E Martin" , "Jeff Law" , "Or Gerlitz" , "Sagi Grimberg" Subject: [PATCH 00/15] HMM (Heterogeneous Memory Management) v9 Date: Fri, 17 Jul 2015 14:52:10 -0400 Message-Id: <1437159145-6548-1-git-send-email-jglisse@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6478 Lines: 152 Not much changed since last post (1), i did incorporate comments i got so far, fixed couple bugs here and there and simplified the HMM page table code. I am splitting the patchset into 3 parts. The first part has the core of HMM and is enough for device that do not care about memory migration. The second part is a convertion of Mellanox device driver to use HMM. The last part is about remote memory migration and i will post it at latter date. I think first and second part can be merge and i would really like to know where i stand on this. For the remote memory, i am reworking some bits related to memcg due to recent changes upstream on that front. Below is the previous cover letter as explanation and motivation did not change. Tree with the patchset: git://people.freedesktop.org/~glisse/linux hmm-v9 branch HMM (Heterogeneous Memory Management) is an helper layer for device that want to mirror a process address space into their own mmu. Main target is GPU but other hardware, like network device can take also use HMM. There is two side to HMM, first one is mirroring of process address space on behalf of a device. HMM will manage a secondary page table for the device and keep it synchronize with the CPU page table. HMM also do DMA mapping on behalf of the device (which would allow new kind of optimization further down the road (2)). Second side is allowing to migrate process memory to device memory where device memory is unmappable by the CPU. Any CPU access will trigger special fault that will migrate memory back. This patchset does not deal with remote memory migration. Why doing this ? Mirroring a process address space is mandatory with OpenCL 2.0 and with other GPU compute API. OpenCL 2.0 allow different level of implementation and currently only the lowest 2 are supported on Linux. To implement the highest level, where CPU and GPU access can happen concurently and are cache coherent, HMM is needed, or something providing same functionality, for instance through platform hardware. Hardware solution such as PCIE ATS/PASID is limited to mirroring system memory and does not provide way to migrate memory to device memory (which offer significantly more bandwidth up to 10 times faster than regular system memory with discret GPU, also have lower latency than PCIE transaction). Current CPU with GPU on same die (AMD or Intel) use the ATS/PASID and for Intel a special level of cache (backed by a large pool of fast memory). For foreseeable futur, discrete GPU will remain releveant as they can have a large quantity of faster memory than integrated GPU. Thus we believe HMM will allow to leverage discret GPU memory in a transparent fashion to the application, with minimum disruption to the linux kernel mm code. Also HMM can work along hardware solution such as PCIE ATS/PASID (leaving regular case to ATS/PASID while HMM handles the migrated memory case). Design : The patch 1, 2, 3 and 4 augment the mmu notifier API with new informations to more efficiently mirror CPU page table updates. The first side of HMM, process address space mirroring, is implemented in patch 5 through 14. This use a secondary page table, in which HMM mirror memory actively use by the device. HMM does not take a reference on any of the page, it use the mmu notifier API to track changes to the CPU page table and to update the mirror page table. All this while providing a simple API to device driver. To implement this we use a "generic" page table and not a radix tree because we need to store more flags than radix allows and we need to store dma address (sizeof(dma_addr_t) > sizeof(long) on some platform). All this is (1) Previous patchset posting : v1 http://lwn.net/Articles/597289/ v2 https://lkml.org/lkml/2014/6/12/559 v3 https://lkml.org/lkml/2014/6/13/633 v4 https://lkml.org/lkml/2014/8/29/423 v5 https://lkml.org/lkml/2014/11/3/759 v6 http://lwn.net/Articles/619737/ v7 http://lwn.net/Articles/627316/ v8 https://lwn.net/Articles/645515/ (2) Because HMM keeps a secondary page table which keeps track of DMA mapping, there is room for new optimization. We want to add a new DMA API to allow to manage DMA page table mapping at directory level. This would allow to minimize memory consumption of mirror page table and also over head of doing DMA mapping page per page. This is a future feature we want to work on and hope the idea will proove usefull not only to HMM users. Cheers, Jérôme To: , To: linux-mm , To: "Andrew Morton" , Cc: "Linus Torvalds" , Cc: "Mel Gorman" , Cc: "H. Peter Anvin" , Cc: "Peter Zijlstra" , Cc: "Linda Wang" , Cc: "Kevin E Martin" , Cc: "Andrea Arcangeli" , Cc: "Johannes Weiner" , Cc: "Larry Woodman" , Cc: "Rik van Riel" , Cc: "Dave Airlie" , Cc: "Jeff Law" , Cc: "Brendan Conoboy" , Cc: "Joe Donohue" , Cc: "Christophe Harle" , Cc: "Duncan Poole" , Cc: "Sherry Cheung" , Cc: "Subhash Gutti" , Cc: "John Hubbard" , Cc: "Mark Hairgrove" , Cc: "Lucien Dunning" , Cc: "Cameron Buschardt" , Cc: "Arvind Gopalakrishnan" , Cc: "Haggai Eran" , Cc: "Or Gerlitz" , Cc: "Sagi Grimberg" Cc: "Shachar Raindel" , Cc: "Liran Liss" , Cc: "Roland Dreier" , Cc: "Sander, Ben" , Cc: "Stoner, Greg" , Cc: "Bridgman, John" , Cc: "Mantor, Michael" , Cc: "Blinzer, Paul" , Cc: "Morichetti, Laurent" , Cc: "Deucher, Alexander" , Cc: "Leonid Shamis" -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/