LinuxLists.cc - [PATCH 00/15] HMM (Heterogeneous Memory Management) v9

2015-07-17 18:53:04

Subject: [PATCH 00/15] HMM (Heterogeneous Memory Management) v9

Not much changed since last post (1), i did incorporate comments i got
so far, fixed couple bugs here and there and simplified the HMM page
table code. I am splitting the patchset into 3 parts. The first part
has the core of HMM and is enough for device that do not care about
memory migration. The second part is a convertion of Mellanox device
driver to use HMM. The last part is about remote memory migration
and i will post it at latter date. I think first and second part can
be merge and i would really like to know where i stand on this.
For the remote memory, i am reworking some bits related to memcg due
to recent changes upstream on that front. Below is the previous
cover letter as explanation and motivation did not change.

Tree with the patchset:
git://people.freedesktop.org/~glisse/linux hmm-v9 branch

HMM (Heterogeneous Memory Management) is an helper layer for device
that want to mirror a process address space into their own mmu. Main
target is GPU but other hardware, like network device can take also
use HMM.

There is two side to HMM, first one is mirroring of process address
space on behalf of a device. HMM will manage a secondary page table
for the device and keep it synchronize with the CPU page table. HMM
also do DMA mapping on behalf of the device (which would allow new
kind of optimization further down the road (2)).

Second side is allowing to migrate process memory to device memory
where device memory is unmappable by the CPU. Any CPU access will
trigger special fault that will migrate memory back. This patchset
does not deal with remote memory migration.

Why doing this ?

Mirroring a process address space is mandatory with OpenCL 2.0 and
with other GPU compute API. OpenCL 2.0 allow different level of
implementation and currently only the lowest 2 are supported on
Linux. To implement the highest level, where CPU and GPU access
can happen concurently and are cache coherent, HMM is needed, or
something providing same functionality, for instance through
platform hardware.

Hardware solution such as PCIE ATS/PASID is limited to mirroring
system memory and does not provide way to migrate memory to device
memory (which offer significantly more bandwidth up to 10 times
faster than regular system memory with discret GPU, also have
lower latency than PCIE transaction).

Current CPU with GPU on same die (AMD or Intel) use the ATS/PASID
and for Intel a special level of cache (backed by a large pool of
fast memory).

For foreseeable futur, discrete GPU will remain releveant as they
can have a large quantity of faster memory than integrated GPU.

Thus we believe HMM will allow to leverage discret GPU memory in
a transparent fashion to the application, with minimum disruption
to the linux kernel mm code. Also HMM can work along hardware
solution such as PCIE ATS/PASID (leaving regular case to ATS/PASID
while HMM handles the migrated memory case).

Design :

The patch 1, 2, 3 and 4 augment the mmu notifier API with new
informations to more efficiently mirror CPU page table updates.

The first side of HMM, process address space mirroring, is
implemented in patch 5 through 14. This use a secondary page
table, in which HMM mirror memory actively use by the device.
HMM does not take a reference on any of the page, it use the
mmu notifier API to track changes to the CPU page table and to
update the mirror page table. All this while providing a simple
API to device driver.

To implement this we use a "generic" page table and not a radix
tree because we need to store more flags than radix allows and
we need to store dma address (sizeof(dma_addr_t) > sizeof(long)
on some platform). All this is

(1) Previous patchset posting :
v1 http://lwn.net/Articles/597289/
v2 https://lkml.org/lkml/2014/6/12/559
v3 https://lkml.org/lkml/2014/6/13/633
v4 https://lkml.org/lkml/2014/8/29/423
v5 https://lkml.org/lkml/2014/11/3/759
v6 http://lwn.net/Articles/619737/
v7 http://lwn.net/Articles/627316/
v8 https://lwn.net/Articles/645515/

(2) Because HMM keeps a secondary page table which keeps track of
DMA mapping, there is room for new optimization. We want to
add a new DMA API to allow to manage DMA page table mapping
at directory level. This would allow to minimize memory
consumption of mirror page table and also over head of doing
DMA mapping page per page. This is a future feature we want
to work on and hope the idea will proove usefull not only to
HMM users.

Cheers,
Jérôme

To: <[email protected]>,
To: linux-mm <[email protected]>,
To: "Andrew Morton" <[email protected]>,
Cc: "Linus Torvalds" <[email protected]>,
Cc: "Mel Gorman" <[email protected]>,
Cc: "H. Peter Anvin" <[email protected]>,
Cc: "Peter Zijlstra" <[email protected]>,
Cc: "Linda Wang" <[email protected]>,
Cc: "Kevin E Martin" <[email protected]>,
Cc: "Andrea Arcangeli" <[email protected]>,
Cc: "Johannes Weiner" <[email protected]>,
Cc: "Larry Woodman" <[email protected]>,
Cc: "Rik van Riel" <[email protected]>,
Cc: "Dave Airlie" <[email protected]>,
Cc: "Jeff Law" <[email protected]>,
Cc: "Brendan Conoboy" <[email protected]>,
Cc: "Joe Donohue" <[email protected]>,
Cc: "Christophe Harle" <[email protected]>,
Cc: "Duncan Poole" <[email protected]>,
Cc: "Sherry Cheung" <[email protected]>,
Cc: "Subhash Gutti" <[email protected]>,
Cc: "John Hubbard" <[email protected]>,
Cc: "Mark Hairgrove" <[email protected]>,
Cc: "Lucien Dunning" <[email protected]>,
Cc: "Cameron Buschardt" <[email protected]>,
Cc: "Arvind Gopalakrishnan" <[email protected]>,
Cc: "Haggai Eran" <[email protected]>,
Cc: "Or Gerlitz" <[email protected]>,
Cc: "Sagi Grimberg" <[email protected]>
Cc: "Shachar Raindel" <[email protected]>,
Cc: "Liran Liss" <[email protected]>,
Cc: "Roland Dreier" <[email protected]>,
Cc: "Sander, Ben" <[email protected]>,
Cc: "Stoner, Greg" <[email protected]>,
Cc: "Bridgman, John" <[email protected]>,
Cc: "Mantor, Michael" <[email protected]>,
Cc: "Blinzer, Paul" <[email protected]>,
Cc: "Morichetti, Laurent" <[email protected]>,
Cc: "Deucher, Alexander" <[email protected]>,
Cc: "Leonid Shamis" <[email protected]>

2015-07-17 18:58:12

Subject: [PATCH 00/15] HMM (Heterogeneous Memory Management) v9

Subject: [PATCH 01/15] mmu_notifier: add event information to address invalidation v8

Subject: [PATCH 02/15] mmu_notifier: keep track of active invalidation ranges v4

Subject: [PATCH 03/15] mmu_notifier: pass page pointer to mmu_notifier_invalidate_page() v2

Subject: [PATCH 04/15] mmu_notifier: allow range invalidation to exclude a specific mmu_notifier

Subject: [PATCH 05/15] HMM: introduce heterogeneous memory management v4.

Subject: [PATCH 06/15] HMM: add HMM page table v3.

Subject: [PATCH 07/15] HMM: add per mirror page table v4.

Subject: [PATCH 08/15] HMM: add device page fault support v4.

Subject: [PATCH 09/15] HMM: add mm page table iterator helpers.

Subject: [PATCH 10/15] HMM: use CPU page table during invalidation.

Subject: [PATCH 11/15] HMM: add discard range helper (to clear and free resources for a range).

Subject: [PATCH 12/15] HMM: add dirty range helper (toggle dirty bit inside mirror page table) v2.

Subject: [PATCH 13/15] HMM: DMA map memory on behalf of device driver v2.

Subject: [PATCH 14/15] HMM: add documentation explaining HMM internals and how to use it.

Subject: [PATCH 15/15] hmm/dummy: dummy driver for testing and showcasing the HMM API

Subject: Re: [PATCH 05/15] HMM: introduce heterogeneous memory management v4.

Subject: Re: Re: [PATCH 05/15] HMM: introduce heterogeneous memory management v4.

Subject: Re: Re: [PATCH 05/15] HMM: introduce heterogeneous memory management v4.

Subject: Re: [PATCH 15/15] hmm/dummy: dummy driver for testing and showcasing the HMM API

Subject: Re: [PATCH 15/15] hmm/dummy: dummy driver for testing and showcasing the HMM API