LinuxLists.cc - [PATCH V2] drm/amdgpu: limit DMA size to PAGE

2018-04-10 21:04:36

Subject: [PATCH V2] drm/amdgpu: limit DMA size to PAGE_SIZE for scatter-gather buffers

Code is expecing to observe the same number of buffers returned from
dma_map_sg() function compared to sg_alloc_table_from_pages(). This
doesn't hold true universally especially for systems with IOMMU.

IOMMU driver tries to combine buffers into a single DMA address as much
as it can. The right thing is to tell the DMA layer how much combining
IOMMU can do.

Signed-off-by: Sinan Kaya <[email protected]>
---
drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 2 +-
drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 1 +
drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 1 +
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 1 +
4 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
index 8e28270..1b031eb 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
@@ -851,7 +851,7 @@ static int gmc_v6_0_sw_init(void *handle)
pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
dev_warn(adev->dev, "amdgpu: No coherent DMA available.\n");
}
-
+ dma_set_max_seg_size(adev->dev, PAGE_SIZE);
r = gmc_v6_0_init_microcode(adev);
if (r) {
dev_err(adev->dev, "Failed to load mc firmware!\n");
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
index 86e9d682..0a4b2cc1 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
@@ -999,6 +999,7 @@ static int gmc_v7_0_sw_init(void *handle)
pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
pr_warn("amdgpu: No coherent DMA available\n");
}
+ dma_set_max_seg_size(adev->dev, PAGE_SIZE);

r = gmc_v7_0_init_microcode(adev);
if (r) {
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
index 9a813d8..b171529 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
@@ -1096,6 +1096,7 @@ static int gmc_v8_0_sw_init(void *handle)
pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
pr_warn("amdgpu: No coherent DMA available\n");
}
+ dma_set_max_seg_size(adev->dev, PAGE_SIZE);

r = gmc_v8_0_init_microcode(adev);
if (r) {
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 3b7e7af..36e658ab 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -855,6 +855,7 @@ static int gmc_v9_0_sw_init(void *handle)
pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
printk(KERN_WARNING "amdgpu: No coherent DMA available.\n");
}
+ dma_set_max_seg_size(adev->dev, PAGE_SIZE);

r = gmc_v9_0_mc_init(adev);
if (r)
--
2.7.4

2018-04-11 06:25:20

by Huang Rui

[permalink] [raw]

Subject: Re: [PATCH V2] drm/amdgpu: limit DMA size to PAGE_SIZE for scatter-gather buffers

On Tue, Apr 10, 2018 at 04:59:55PM -0400, Sinan Kaya wrote:
> Code is expecing to observe the same number of buffers returned from
> dma_map_sg() function compared to sg_alloc_table_from_pages(). This
> doesn't hold true universally especially for systems with IOMMU.
>
> IOMMU driver tries to combine buffers into a single DMA address as much
> as it can. The right thing is to tell the DMA layer how much combining
> IOMMU can do.
>
> Signed-off-by: Sinan Kaya <[email protected]>

Sinan, I guess Christian's suggestion is to add amdgpu_device_init function
like here:

8<-------
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 0e798b3..9b96771 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2339,6 +2339,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
/* init the mode config */
drm_mode_config_init(adev->ddev);

+ dma_set_max_seg_size(adev->dev, PAGE_SIZE);
+
r = amdgpu_device_ip_init(adev);
if (r) {
/* failed in exclusive mode due to timeout */
8<-------

After that, we don't do it in each generation.

Thanks,
Ray

2018-04-11 09:22:53

by Christian König

[permalink] [raw]

Subject: Re: [PATCH V2] drm/amdgpu: limit DMA size to PAGE_SIZE for scatter-gather buffers

Am 11.04.2018 um 08:26 schrieb Huang Rui:
> On Tue, Apr 10, 2018 at 04:59:55PM -0400, Sinan Kaya wrote:
>> Code is expecing to observe the same number of buffers returned from
>> dma_map_sg() function compared to sg_alloc_table_from_pages(). This
>> doesn't hold true universally especially for systems with IOMMU.
>>
>> IOMMU driver tries to combine buffers into a single DMA address as much
>> as it can. The right thing is to tell the DMA layer how much combining
>> IOMMU can do.
>>
>> Signed-off-by: Sinan Kaya <[email protected]>
> Sinan, I guess Christian's suggestion is to add amdgpu_device_init function
> like here:

Yes, exactly. Looks like Sinan just tried to place the call next to
pci_set_dma_mask()/pci_set_consistent_dma_mask().

Not necessary a bad idea, but in this case not optimal.

Sinan please refine your patch as Rui suggested and resend.

Thanks,
Christian.

>
> 8<-------
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 0e798b3..9b96771 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2339,6 +2339,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> /* init the mode config */
> drm_mode_config_init(adev->ddev);
>
> + dma_set_max_seg_size(adev->dev, PAGE_SIZE);
> +
> r = amdgpu_device_ip_init(adev);
> if (r) {
> /* failed in exclusive mode due to timeout */
> 8<-------
>
> After that, we don't do it in each generation.
>
> Thanks,
> Ray
> _______________________________________________
> dri-devel mailing list
> [email protected]
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

2018-04-11 12:07:56

by Robin Murphy

[permalink] [raw]

Subject: Re: [PATCH V2] drm/amdgpu: limit DMA size to PAGE_SIZE for scatter-gather buffers

On 10/04/18 21:59, Sinan Kaya wrote:
> Code is expecing to observe the same number of buffers returned from
> dma_map_sg() function compared to sg_alloc_table_from_pages(). This
> doesn't hold true universally especially for systems with IOMMU.

So why not fix said code? It's clearly not a real hardware limitation,
and the map_sg() APIs have potentially returned fewer than nents since
forever, so there's really no excuse.

> IOMMU driver tries to combine buffers into a single DMA address as much
> as it can. The right thing is to tell the DMA layer how much combining
> IOMMU can do.

Disagree; this is a dodgy hack, since you'll now end up passing
scatterlists into dma_map_sg() which already violate max_seg_size to
begin with, and I think a conscientious DMA API implementation would be
at rights to fail the mapping for that reason (I know arm64 happens not
to, but that was a deliberate design decision to make my life easier at
the time).

As a short-term fix, at least do something like what i915 does and
constrain the table allocation to the desired segment size as well, so
things remain self-consistent. But still never claim that faking a
hardware constraint as a workaround for a driver shortcoming is "the
right thing to do" ;)

Robin.

> Signed-off-by: Sinan Kaya <[email protected]>
> ---
> drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 2 +-
> drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 1 +
> drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 1 +
> drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 1 +
> 4 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
> index 8e28270..1b031eb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
> @@ -851,7 +851,7 @@ static int gmc_v6_0_sw_init(void *handle)
> pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
> dev_warn(adev->dev, "amdgpu: No coherent DMA available.\n");
> }
> -
> + dma_set_max_seg_size(adev->dev, PAGE_SIZE);
> r = gmc_v6_0_init_microcode(adev);
> if (r) {
> dev_err(adev->dev, "Failed to load mc firmware!\n");
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
> index 86e9d682..0a4b2cc1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
> @@ -999,6 +999,7 @@ static int gmc_v7_0_sw_init(void *handle)
> pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
> pr_warn("amdgpu: No coherent DMA available\n");
> }
> + dma_set_max_seg_size(adev->dev, PAGE_SIZE);
>
> r = gmc_v7_0_init_microcode(adev);
> if (r) {
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
> index 9a813d8..b171529 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
> @@ -1096,6 +1096,7 @@ static int gmc_v8_0_sw_init(void *handle)
> pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
> pr_warn("amdgpu: No coherent DMA available\n");
> }
> + dma_set_max_seg_size(adev->dev, PAGE_SIZE);
>
> r = gmc_v8_0_init_microcode(adev);
> if (r) {
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> index 3b7e7af..36e658ab 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> @@ -855,6 +855,7 @@ static int gmc_v9_0_sw_init(void *handle)
> pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
> printk(KERN_WARNING "amdgpu: No coherent DMA available.\n");
> }
> + dma_set_max_seg_size(adev->dev, PAGE_SIZE);
>
> r = gmc_v9_0_mc_init(adev);
> if (r)
>

2018-04-11 14:40:10

by Sinan Kaya

[permalink] [raw]

Subject: Re: [PATCH V2] drm/amdgpu: limit DMA size to PAGE_SIZE for scatter-gather buffers

On 4/11/2018 8:03 AM, Robin Murphy wrote:
> On 10/04/18 21:59, Sinan Kaya wrote:
>> Code is expecing to observe the same number of buffers returned from
>> dma_map_sg() function compared to sg_alloc_table_from_pages(). This
>> doesn't hold true universally especially for systems with IOMMU.
>
> So why not fix said code? It's clearly not a real hardware limitation, and the map_sg() APIs have potentially returned fewer than nents since forever, so there's really no excuse.

Sure, I'll take a better fix if there is one.

>
>> IOMMU driver tries to combine buffers into a single DMA address as much
>> as it can. The right thing is to tell the DMA layer how much combining
>> IOMMU can do.
>
> Disagree; this is a dodgy hack, since you'll now end up passing scatterlists into dma_map_sg() which already violate max_seg_size to begin with, and I think a conscientious DMA API implementation would be at rights to fail the mapping for that reason (I know arm64 happens not to, but that was a deliberate design decision to make my life easier at the time).
>
> As a short-term fix, at least do something like what i915 does and constrain the table allocation to the desired segment size as well, so things remain self-consistent. But still never claim that faking a hardware constraint as a workaround for a driver shortcoming is "the right thing to do" ;)

You are asking for something like this from here, right?

https://elixir.bootlin.com/linux/v4.16.1/source/drivers/gpu/drm/i915/i915_gem_dmabuf.c#L58

ret = sg_alloc_table(st, obj->mm.pages->nents, GFP_KERNEL);
if (ret)
goto err_free;

src = obj->mm.pages->sgl;
dst = st->sgl;
for (i = 0; i < obj->mm.pages->nents; i++) {
sg_set_page(dst, sg_page(src), src->length, 0);
dst = sg_next(dst);
src = sg_next(src);
}

This seems to allocate the scatter gather list and fill it in manually before passing it
to dma_map_sg(). I'll give it a try.

Just double checking.

>
> Robin.
>
>> Signed-off-by: Sinan Kaya <[email protected]>

--
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

2018-04-11 15:37:28

by Robin Murphy

[permalink] [raw]

Subject: Re: [PATCH V2] drm/amdgpu: limit DMA size to PAGE_SIZE for scatter-gather buffers

On 11/04/18 15:33, Sinan Kaya wrote:
> On 4/11/2018 8:03 AM, Robin Murphy wrote:
>> On 10/04/18 21:59, Sinan Kaya wrote:
>>> Code is expecing to observe the same number of buffers returned
>>> from dma_map_sg() function compared to
>>> sg_alloc_table_from_pages(). This doesn't hold true universally
>>> especially for systems with IOMMU.
>>
>> So why not fix said code? It's clearly not a real hardware
>> limitation, and the map_sg() APIs have potentially returned fewer
>> than nents since forever, so there's really no excuse.
>
> Sure, I'll take a better fix if there is one.
>
>>
>>> IOMMU driver tries to combine buffers into a single DMA address
>>> as much as it can. The right thing is to tell the DMA layer how
>>> much combining IOMMU can do.
>>
>> Disagree; this is a dodgy hack, since you'll now end up passing
>> scatterlists into dma_map_sg() which already violate max_seg_size
>> to begin with, and I think a conscientious DMA API implementation
>> would be at rights to fail the mapping for that reason (I know
>> arm64 happens not to, but that was a deliberate design decision to
>> make my life easier at the time).
>>
>> As a short-term fix, at least do something like what i915 does and
>> constrain the table allocation to the desired segment size as well,
>> so things remain self-consistent. But still never claim that faking
>> a hardware constraint as a workaround for a driver shortcoming is
>> "the right thing to do" ;)
>
> You are asking for something like this from here, right?
>
> https://elixir.bootlin.com/linux/v4.16.1/source/drivers/gpu/drm/i915/i915_gem_dmabuf.c#L58

I just looked for callers of __sg_alloc_table_from_pages() with a limit
other than SCATTERLIST_MAX_SEGMENT, and found
__i915_gem_userptr_alloc_pages(), which at first glance appears to be
doing something vaguely similar to amdgpu_ttm_tt_pin_userptr().

Robin.

2018-04-12 06:29:36

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH V2] drm/amdgpu: limit DMA size to PAGE_SIZE for scatter-gather buffers

On Wed, Apr 11, 2018 at 01:03:59PM +0100, Robin Murphy wrote:
> On 10/04/18 21:59, Sinan Kaya wrote:
>> Code is expecing to observe the same number of buffers returned from
>> dma_map_sg() function compared to sg_alloc_table_from_pages(). This
>> doesn't hold true universally especially for systems with IOMMU.
>
> So why not fix said code? It's clearly not a real hardware limitation, and
> the map_sg() APIs have potentially returned fewer than nents since forever,
> so there's really no excuse.

Yes, relying on dma_map_sg returning the same number of entries as passed
it is completely bogus.

>> IOMMU driver tries to combine buffers into a single DMA address as much
>> as it can. The right thing is to tell the DMA layer how much combining
>> IOMMU can do.
>
> Disagree; this is a dodgy hack, since you'll now end up passing
> scatterlists into dma_map_sg() which already violate max_seg_size to begin
> with, and I think a conscientious DMA API implementation would be at rights
> to fail the mapping for that reason (I know arm64 happens not to, but that
> was a deliberate design decision to make my life easier at the time).

Agreed.

2018-04-12 09:46:23

by Christian König

[permalink] [raw]

Subject: Re: [PATCH V2] drm/amdgpu: limit DMA size to PAGE_SIZE for scatter-gather buffers

Am 12.04.2018 um 08:26 schrieb Christoph Hellwig:
> On Wed, Apr 11, 2018 at 01:03:59PM +0100, Robin Murphy wrote:
>> On 10/04/18 21:59, Sinan Kaya wrote:
>>> Code is expecing to observe the same number of buffers returned from
>>> dma_map_sg() function compared to sg_alloc_table_from_pages(). This
>>> doesn't hold true universally especially for systems with IOMMU.
>> So why not fix said code? It's clearly not a real hardware limitation, and
>> the map_sg() APIs have potentially returned fewer than nents since forever,
>> so there's really no excuse.
> Yes, relying on dma_map_sg returning the same number of entries as passed
> it is completely bogus.

I agree that the common DRM functions should be able to deal with both,
but we should let the driver side decide if it wants multiple addresses
combined or not.

>
>>> IOMMU driver tries to combine buffers into a single DMA address as much
>>> as it can. The right thing is to tell the DMA layer how much combining
>>> IOMMU can do.
>> Disagree; this is a dodgy hack, since you'll now end up passing
>> scatterlists into dma_map_sg() which already violate max_seg_size to begin
>> with, and I think a conscientious DMA API implementation would be at rights
>> to fail the mapping for that reason (I know arm64 happens not to, but that
>> was a deliberate design decision to make my life easier at the time).
> Agreed.

Sounds like my understanding of max_seg_size is incorrect, what exactly
should that describe?

Thanks,
Christian.

2018-04-12 13:55:20

by Robin Murphy

[permalink] [raw]

Subject: Re: [PATCH V2] drm/amdgpu: limit DMA size to PAGE_SIZE for scatter-gather buffers

On 12/04/18 10:42, Christian König wrote:
> Am 12.04.2018 um 08:26 schrieb Christoph Hellwig:
>> On Wed, Apr 11, 2018 at 01:03:59PM +0100, Robin Murphy wrote:
>>> On 10/04/18 21:59, Sinan Kaya wrote:
>>>> Code is expecing to observe the same number of buffers returned from
>>>> dma_map_sg() function compared to sg_alloc_table_from_pages(). This
>>>> doesn't hold true universally especially for systems with IOMMU.
>>> So why not fix said code? It's clearly not a real hardware
>>> limitation, and
>>> the map_sg() APIs have potentially returned fewer than nents since
>>> forever,
>>> so there's really no excuse.
>> Yes, relying on dma_map_sg returning the same number of entries as passed
>> it is completely bogus.
>
> I agree that the common DRM functions should be able to deal with both,
> but we should let the driver side decide if it wants multiple addresses
> combined or not.
>
>>
>>>> IOMMU driver tries to combine buffers into a single DMA address as much
>>>> as it can. The right thing is to tell the DMA layer how much combining
>>>> IOMMU can do.
>>> Disagree; this is a dodgy hack, since you'll now end up passing
>>> scatterlists into dma_map_sg() which already violate max_seg_size to
>>> begin
>>> with, and I think a conscientious DMA API implementation would be at
>>> rights
>>> to fail the mapping for that reason (I know arm64 happens not to, but
>>> that
>>> was a deliberate design decision to make my life easier at the time).
>> Agreed.
>
> Sounds like my understanding of max_seg_size is incorrect, what exactly
> should that describe?

The segment size and boundary mask are there to represent a device's
hardware scatter-gather capabilities - a lot of things have limits on
the size and alignment of the data pointed to by a single descriptor
(e.g. an xHCI TRB, where the data buffer must not cross a 64KB boundary)
- although they can also be relevant to non-scatter-gather devices if
they just have limits on how big/aligned a single DMA transfer can be.

Robin.