2017-06-12 11:35:21

by Jean-Philippe Brucker

[permalink] [raw]
Subject: Re: What differences and relations between SVM, HSA, HMM and Unified Memory?

Hello,

On 10/06/17 05:06, Wuzongyong (Cordius Wu, Euler Dept) wrote:
> Hi,
>
> Could someone explain differences and relations between the SVM(Shared
> Virtual Memory, by Intel), HSA(Heterogeneous System Architecture, by AMD),
> HMM(Heterogeneous Memory Management, by Glisse) and UM(Unified Memory, by
> NVIDIA) ? Are these in the substitutional relation?
>
> As I understand it, these aim to solve the same thing, sharing pointers
> between CPU and GPU(implement with ATS/PASID/PRI/IOMMU support). So far,
> SVM and HSA can only be used by integrated gpu. And, Intel declare that
> the root ports doesn?t not have the required TLP prefix support, resulting
> that SVM can?t be used by discrete devices. So could someone tell me the
> required TLP prefix means what specifically?>
> With HMM, we can use allocator like malloc to manage host and device
> memory. Does this mean that there is no need to use SVM and HSA with HMM,
> or HMM is the basis of SVM and HAS to implement Fine-Grained system SVM
> defined in the opencl spec?

I can't provide an exhaustive answer, but I have done some work on SVM.
Take it with a grain of salt though, I am not an expert.

* HSA is an architecture that provides a common programming model for CPUs
and accelerators (GPGPUs etc). It does have SVM requirement (I/O page
faults, PASID and compatible address spaces), though it's only a small
part of it.

* Similarly, OpenCL provides an API for dealing with accelerators. OpenCL
2.0 introduced the concept of Fine-Grained System SVM, which allows to
pass userspace pointers to devices. It is just one flavor of SVM, they
also have coarse-grained and non-system. But they might have coined the
name, and I believe that in the context of Linux IOMMU, when we talk about
"SVM" it is OpenCL's fine-grained system SVM.

* Nvidia Cuda has a feature similar to fine-grained system SVM, called
Unified Virtual Adressing. I'm not sure whether it maps exactly to
OpenCL's system SVM. Nividia's Unified Memory seems to be more in line
with HMM, because in addition to unifying the virtual address space, they
also unify system and device memory.


So SVM is about userspace API, the ability to perform DMA on a process
address space instead of using a separate DMA address space. One possible
implementation, for PCIe endpoints, uses ATS+PRI+PASID.

* The PASID extension adds a prefix to the PCI TLP (characterized by
bits[31:29] = 0b100) that specifies which address space is affected by the
transaction. The IOMMU uses (RequesterID, PASID, Virt Addr) to derive a
Phys Addr, where it previously only needed (RID, IOVA).

* The PRI extension allows to handle page faults from endpoints, which are
bound to happen if they attempt to access process memory.

* PRI requires ATS. PRI adds two new TLPs, but ATS makes use of the AT
field [11:10] in PCIe TLPs, which was previously reserved.

So PCI switches, endpoints, root complexes and IOMMUs all have to be aware
of these three extensions in order to use SVM with discrete endpoints.


While SVM is only about virtual address space, HMM deals with physical
storage. If I understand correctly, HMM allows to transparently use device
RAM from userspace applications. So upon an I/O page fault, the mm
subsystem will migrate data from system memory into device RAM. It would
differ from "pure" SVM in that you would use different page directories on
IOMMU and MMU sides, and synchronize them using MMU notifiers. But please
don't take this at face value, I haven't had time to look into HMM yet.

Thanks,
Jean


2017-07-17 11:58:20

by Yisheng Xie

[permalink] [raw]
Subject: Re: What differences and relations between SVM, HSA, HMM and Unified Memory?

Hi Jean-Philippe,

On 2017/6/12 19:37, Jean-Philippe Brucker wrote:
> Hello,
>
> On 10/06/17 05:06, Wuzongyong (Cordius Wu, Euler Dept) wrote:
>> Hi,
>>
>> Could someone explain differences and relations between the SVM(Shared
>> Virtual Memory, by Intel), HSA(Heterogeneous System Architecture, by AMD),
>> HMM(Heterogeneous Memory Management, by Glisse) and UM(Unified Memory, by
>> NVIDIA) ? Are these in the substitutional relation?
>>
>> As I understand it, these aim to solve the same thing, sharing pointers
>> between CPU and GPU(implement with ATS/PASID/PRI/IOMMU support). So far,
>> SVM and HSA can only be used by integrated gpu. And, Intel declare that
>> the root ports doesn’t not have the required TLP prefix support, resulting
>> that SVM can’t be used by discrete devices. So could someone tell me the
>> required TLP prefix means what specifically?>
>> With HMM, we can use allocator like malloc to manage host and device
>> memory. Does this mean that there is no need to use SVM and HSA with HMM,
>> or HMM is the basis of SVM and HAS to implement Fine-Grained system SVM
>> defined in the opencl spec?
>
> I can't provide an exhaustive answer, but I have done some work on SVM.
> Take it with a grain of salt though, I am not an expert.
>
> * HSA is an architecture that provides a common programming model for CPUs
> and accelerators (GPGPUs etc). It does have SVM requirement (I/O page
> faults, PASID and compatible address spaces), though it's only a small
> part of it.
>
> * Similarly, OpenCL provides an API for dealing with accelerators. OpenCL
> 2.0 introduced the concept of Fine-Grained System SVM, which allows to
> pass userspace pointers to devices. It is just one flavor of SVM, they
> also have coarse-grained and non-system. But they might have coined the
> name, and I believe that in the context of Linux IOMMU, when we talk about
> "SVM" it is OpenCL's fine-grained system SVM.
> [...]
>
> While SVM is only about virtual address space,
As you mentioned, SVM is only about virtual address space, I'd like to know how to
manage the physical address especially about device's RAM, before HMM?

When OpenCL alloc a SVM pointer like:
void* p = clSVMAlloc (
context, // an OpenCL context where this buffer is available
CL_MEM_READ_WRITE | CL_MEM_SVM_FINE_GRAIN_BUFFER,
size, // amount of memory to allocate (in bytes)
0 // alignment in bytes (0 means default)
);

where this RAM come from, device RAM or host RAM?

Thanks
Yisheng Xie

> HMM deals with physical
> storage. If I understand correctly, HMM allows to transparently use device
> RAM from userspace applications. So upon an I/O page fault, the mm
> subsystem will migrate data from system memory into device RAM. It would
> differ from "pure" SVM in that you would use different page directories on
> IOMMU and MMU sides, and synchronize them using MMU notifiers. But please
> don't take this at face value, I haven't had time to look into HMM yet.
>
> Thanks,
> Jean
>
> .
>

2017-07-17 12:49:36

by Jean-Philippe Brucker

[permalink] [raw]
Subject: Re: What differences and relations between SVM, HSA, HMM and Unified Memory?

On 17/07/17 12:57, Yisheng Xie wrote:
> Hi Jean-Philippe,
>
> On 2017/6/12 19:37, Jean-Philippe Brucker wrote:
>> Hello,
>>
>> On 10/06/17 05:06, Wuzongyong (Cordius Wu, Euler Dept) wrote:
>>> Hi,
>>>
>>> Could someone explain differences and relations between the SVM(Shared
>>> Virtual Memory, by Intel), HSA(Heterogeneous System Architecture, by AMD),
>>> HMM(Heterogeneous Memory Management, by Glisse) and UM(Unified Memory, by
>>> NVIDIA) ? Are these in the substitutional relation?
>>>
>>> As I understand it, these aim to solve the same thing, sharing pointers
>>> between CPU and GPU(implement with ATS/PASID/PRI/IOMMU support). So far,
>>> SVM and HSA can only be used by integrated gpu. And, Intel declare that
>>> the root ports doesn’t not have the required TLP prefix support, resulting
>>> that SVM can’t be used by discrete devices. So could someone tell me the
>>> required TLP prefix means what specifically?>
>>> With HMM, we can use allocator like malloc to manage host and device
>>> memory. Does this mean that there is no need to use SVM and HSA with HMM,
>>> or HMM is the basis of SVM and HAS to implement Fine-Grained system SVM
>>> defined in the opencl spec?
>>
>> I can't provide an exhaustive answer, but I have done some work on SVM.
>> Take it with a grain of salt though, I am not an expert.
>>
>> * HSA is an architecture that provides a common programming model for CPUs
>> and accelerators (GPGPUs etc). It does have SVM requirement (I/O page
>> faults, PASID and compatible address spaces), though it's only a small
>> part of it.
>>
>> * Similarly, OpenCL provides an API for dealing with accelerators. OpenCL
>> 2.0 introduced the concept of Fine-Grained System SVM, which allows to
>> pass userspace pointers to devices. It is just one flavor of SVM, they
>> also have coarse-grained and non-system. But they might have coined the
>> name, and I believe that in the context of Linux IOMMU, when we talk about
>> "SVM" it is OpenCL's fine-grained system SVM.
>> [...]
>>
>> While SVM is only about virtual address space,
> As you mentioned, SVM is only about virtual address space, I'd like to know how to
> manage the physical address especially about device's RAM, before HMM?
>
> When OpenCL alloc a SVM pointer like:
> void* p = clSVMAlloc (
> context, // an OpenCL context where this buffer is available
> CL_MEM_READ_WRITE | CL_MEM_SVM_FINE_GRAIN_BUFFER,
> size, // amount of memory to allocate (in bytes)
> 0 // alignment in bytes (0 means default)
> );
>
> where this RAM come from, device RAM or host RAM?

Sorry, I'm not familiar with OpenCL/GPU drivers. It is up to them to
decide where to allocate memory for clSVMAlloc. My SMMU work would deal
with fine-grained *system* SVM, the kind that can be obtained from malloc
and doesn't require a call to clSVMAlloc. Hopefully others on this list or
linux-mm might be able to help you.

Thanks,
Jean

> Thanks
> Yisheng Xie
>
>> HMM deals with physical
>> storage. If I understand correctly, HMM allows to transparently use device
>> RAM from userspace applications. So upon an I/O page fault, the mm
>> subsystem will migrate data from system memory into device RAM. It would
>> differ from "pure" SVM in that you would use different page directories on
>> IOMMU and MMU sides, and synchronize them using MMU notifiers. But please
>> don't take this at face value, I haven't had time to look into HMM yet.
>>
>> Thanks,
>> Jean
>>
>> .
>>
>

2017-07-17 14:28:21

by Jerome Glisse

[permalink] [raw]
Subject: Re: What differences and relations between SVM, HSA, HMM and Unified Memory?

On Mon, Jul 17, 2017 at 07:57:23PM +0800, Yisheng Xie wrote:
> Hi Jean-Philippe,
>
> On 2017/6/12 19:37, Jean-Philippe Brucker wrote:
> > Hello,
> >
> > On 10/06/17 05:06, Wuzongyong (Cordius Wu, Euler Dept) wrote:
> >> Hi,
> >>
> >> Could someone explain differences and relations between the SVM(Shared
> >> Virtual Memory, by Intel), HSA(Heterogeneous System Architecture, by AMD),
> >> HMM(Heterogeneous Memory Management, by Glisse) and UM(Unified Memory, by
> >> NVIDIA) ? Are these in the substitutional relation?
> >>
> >> As I understand it, these aim to solve the same thing, sharing pointers
> >> between CPU and GPU(implement with ATS/PASID/PRI/IOMMU support). So far,
> >> SVM and HSA can only be used by integrated gpu. And, Intel declare that
> >> the root ports doesn’t not have the required TLP prefix support, resulting
> >> that SVM can’t be used by discrete devices. So could someone tell me the
> >> required TLP prefix means what specifically?>
> >> With HMM, we can use allocator like malloc to manage host and device
> >> memory. Does this mean that there is no need to use SVM and HSA with HMM,
> >> or HMM is the basis of SVM and HAS to implement Fine-Grained system SVM
> >> defined in the opencl spec?
> >
> > I can't provide an exhaustive answer, but I have done some work on SVM.
> > Take it with a grain of salt though, I am not an expert.
> >
> > * HSA is an architecture that provides a common programming model for CPUs
> > and accelerators (GPGPUs etc). It does have SVM requirement (I/O page
> > faults, PASID and compatible address spaces), though it's only a small
> > part of it.
> >
> > * Similarly, OpenCL provides an API for dealing with accelerators. OpenCL
> > 2.0 introduced the concept of Fine-Grained System SVM, which allows to
> > pass userspace pointers to devices. It is just one flavor of SVM, they
> > also have coarse-grained and non-system. But they might have coined the
> > name, and I believe that in the context of Linux IOMMU, when we talk about
> > "SVM" it is OpenCL's fine-grained system SVM.
> > [...]
> >
> > While SVM is only about virtual address space,
> As you mentioned, SVM is only about virtual address space, I'd like to know how to
> manage the physical address especially about device's RAM, before HMM?
>
> When OpenCL alloc a SVM pointer like:
> void* p = clSVMAlloc (
> context, // an OpenCL context where this buffer is available
> CL_MEM_READ_WRITE | CL_MEM_SVM_FINE_GRAIN_BUFFER,
> size, // amount of memory to allocate (in bytes)
> 0 // alignment in bytes (0 means default)
> );
>
> where this RAM come from, device RAM or host RAM?
>

For SVM using ATS/PASID with FINE_GRAIN your allocation can only
be inside the system memory (host RAM). You need a special system
bus like CAPI or CCIX which both are step further than ATS/PASID
to be able to allow fine grain to use device memory.

However that is where HMM can be usefull as HMM is a software
solution to this problem. So with HMM and a device that can work
with HMM, you can get fine grain allocation to also use device
memory however any CPU access will happen in host RAM.

Jérôme

2017-07-18 00:16:21

by Yisheng Xie

[permalink] [raw]
Subject: Re: What differences and relations between SVM, HSA, HMM and Unified Memory?

Hi Jérôme and Jean-Philippe ,

Get it, thanks for all of your detail explain.

Thanks
Yisheng Xie

On 2017/7/17 22:27, Jerome Glisse wrote:
> On Mon, Jul 17, 2017 at 07:57:23PM +0800, Yisheng Xie wrote:
>> Hi Jean-Philippe,
>>
>> On 2017/6/12 19:37, Jean-Philippe Brucker wrote:
>>> Hello,
>>>
>>> On 10/06/17 05:06, Wuzongyong (Cordius Wu, Euler Dept) wrote:
>>>> Hi,
>>>>
>>>> Could someone explain differences and relations between the SVM(Shared
>>>> Virtual Memory, by Intel), HSA(Heterogeneous System Architecture, by AMD),
>>>> HMM(Heterogeneous Memory Management, by Glisse) and UM(Unified Memory, by
>>>> NVIDIA) ? Are these in the substitutional relation?
>>>>
>>>> As I understand it, these aim to solve the same thing, sharing pointers
>>>> between CPU and GPU(implement with ATS/PASID/PRI/IOMMU support). So far,
>>>> SVM and HSA can only be used by integrated gpu. And, Intel declare that
>>>> the root ports doesn’t not have the required TLP prefix support, resulting
>>>> that SVM can’t be used by discrete devices. So could someone tell me the
>>>> required TLP prefix means what specifically?>
>>>> With HMM, we can use allocator like malloc to manage host and device
>>>> memory. Does this mean that there is no need to use SVM and HSA with HMM,
>>>> or HMM is the basis of SVM and HAS to implement Fine-Grained system SVM
>>>> defined in the opencl spec?
>>>
>>> I can't provide an exhaustive answer, but I have done some work on SVM.
>>> Take it with a grain of salt though, I am not an expert.
>>>
>>> * HSA is an architecture that provides a common programming model for CPUs
>>> and accelerators (GPGPUs etc). It does have SVM requirement (I/O page
>>> faults, PASID and compatible address spaces), though it's only a small
>>> part of it.
>>>
>>> * Similarly, OpenCL provides an API for dealing with accelerators. OpenCL
>>> 2.0 introduced the concept of Fine-Grained System SVM, which allows to
>>> pass userspace pointers to devices. It is just one flavor of SVM, they
>>> also have coarse-grained and non-system. But they might have coined the
>>> name, and I believe that in the context of Linux IOMMU, when we talk about
>>> "SVM" it is OpenCL's fine-grained system SVM.
>>> [...]
>>>
>>> While SVM is only about virtual address space,
>> As you mentioned, SVM is only about virtual address space, I'd like to know how to
>> manage the physical address especially about device's RAM, before HMM?
>>
>> When OpenCL alloc a SVM pointer like:
>> void* p = clSVMAlloc (
>> context, // an OpenCL context where this buffer is available
>> CL_MEM_READ_WRITE | CL_MEM_SVM_FINE_GRAIN_BUFFER,
>> size, // amount of memory to allocate (in bytes)
>> 0 // alignment in bytes (0 means default)
>> );
>>
>> where this RAM come from, device RAM or host RAM?
>>
>
> For SVM using ATS/PASID with FINE_GRAIN your allocation can only
> be inside the system memory (host RAM). You need a special system
> bus like CAPI or CCIX which both are step further than ATS/PASID
> to be able to allow fine grain to use device memory.
>
> However that is where HMM can be usefull as HMM is a software
> solution to this problem. So with HMM and a device that can work
> with HMM, you can get fine grain allocation to also use device
> memory however any CPU access will happen in host RAM.
>
> Jérôme
>
> .
>