From: "Wuzongyong (Cordius Wu, Euler Dept)" <wuzongyong1@huawei.com>
To: Jerome Glisse <j.glisse@gmail.com>
CC: "iommu@lists.linux-foundation.org" <iommu@lists.linux-foundation.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        "oded.gabbay@amd.com" <oded.gabbay@amd.com>,
        "Wanzongshun (Vincent)" <wanzongshun@huawei.com>,
        "Lifei (Louis)" <louis.lifei@huawei.com>
Subject: =?utf-8?B?562U5aSNOiBXaGF0IGRpZmZlcmVuY2VzIGFuZCByZWxhdGlvbnMgYmV0d2Vl?=
 =?utf-8?Q?n_SVM,_HSA,_HMM_and_Unified_Memory=3F?=
Thread-Topic: What differences and relations between SVM, HSA, HMM and
 Unified Memory?
Thread-Index: AdLg9z7NmA9JkspsQ66KW6tRAbxebgCcZZgAAC5VgiA=
Date: Tue, 13 Jun 2017 12:36:16 +0000
Message-ID: <9BD73EA91F8E404F851CF3F519B14AA8CE7BDD@SZXEMI503-MBS.china.huawei.com>
References: <9BD73EA91F8E404F851CF3F519B14AA8CE753F@SZXEMI503-MBS.china.huawei.com>
 <20170612184413.GA5924@gmail.com>
In-Reply-To: <20170612184413.GA5924@gmail.com>
Accept-Language: en-US
Content-Language: zh-CN
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Sender: linux-kernel-owner@vger.kernel.org
Content-Transfer-Encoding: 8bit
Content-Length: 3405
Lines: 76

That's the thing I wanna know! Thanks for your explanation.

Thanks,
Zongyong Wu


-----邮件原件-----
发件人: Jerome Glisse [mailto:j.glisse@gmail.com] 
发送时间: 2017年6月13日 2:44
收件人: Wuzongyong (Cordius Wu, Euler Dept) <wuzongyong1@huawei.com>
抄送: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org; oded.gabbay@amd.com; Wanzongshun (Vincent) <wanzongshun@huawei.com>
主题: Re: What differences and relations between SVM, HSA, HMM and Unified Memory?

On Sat, Jun 10, 2017 at 04:06:28AM +0000, Wuzongyong (Cordius Wu, Euler Dept) wrote:
> Hi,
> 
> Could someone explain differences and relations between the SVM 
> (Shared Virtual Memory, by Intel), HSA(Heterogeneous System 
> Architecture, by AMD), HMM(Heterogeneous Memory Management, by Glisse) 
> and UM(Unified Memory, by NVIDIA) ? Are these in the substitutional 
> relation?
>
> As I understand it, these aim to solve the same thing, sharing 
> pointers between CPU and GPU(implement with ATS/PASID/PRI/IOMMU 
> support). So far, SVM and HSA can only be used by integrated gpu.
> And, Intel declare that the root ports doesn't not have the required 
> TLP prefix support, resulting  that SVM can't be used by discrete 
> devices. So could someone tell me the required TLP prefix means what 
> specifically?
>
> With HMM, we can use allocator like malloc to manage host and device 
> memory. Does this mean that there is no need to use SVM and HSA with 
> HMM, or HMM is the basis of SVM and HAS to implement Fine-Grained 
> system SVM defined in the opencl spec?

So aim of all technology is to share address space between a device and CPU. Now they are 3 way to do it:

  A) all in hardware like CAPI or CCIX where device memory is cache
     coherent from CPU access point of view and system memory is also
     accessible by device in cache coherent way with CPU. So it is
     cache coherency going both way from CPU to device memory and from
     device to system memory


  B) partially in hardware ATS/PASID (which are the same technology
     behind both HSA and SVM). Here it is only single way solution
     where you have cache coherent access from device to system memory
     but not the other way around. Moreover you share the CPU page
     table with the device so you do not need to program the IOMMU.

    Here you can not use the device memory transparently. At least
    not without software help like HMM.


  C) all in software. Here device can access system memory with cache
     coherency but it does not share the same CPU page table. Each
     device have their own page table and thus you need to synchronize
     them.

HMM provides helper that address all of the 3 solutions.
  A) for all hardware solution HMM provides new helpers to help
     with migration of process memory to device memory
  B) for partial hardware solution you can mix with HMM to again
     provide helpers for migration to device memory. This assume
     you device can mix and match local device page table with
     ATS/PASID region
  C) full software solution using all the feature of HMM where it
     is all done in software and HMM is just doing the heavy lifting
     on behalf of device driver

In all of the above we are talking fine-grained system SVM as in the OpenCL specificiation. So you can malloc() memory and use it directly from the GPU.

Hope this clarify thing.

Cheers,
Jérôme