2023-10-04 16:18:04

by Arvind Yadav

[permalink] [raw]
Subject: [PATCH v3 0/2] drm/amdkfd: Fix unaligned doorbell absolute offset for gfx8

On older chips, the absolute doorbell offset within
the doorbell page is based on the queue ID.
KFD is using queue ID and doorbell size to get an
absolute doorbell offset in userspace.

Here, adding db_size in byte to find the doorbell's
absolute offset for both 32-bit and 64-bit doorbell sizes.
So that doorbell offset will be aligned based on the doorbell
size.

v2:
- Addressed the review comment from Felix.

v3:
- Adding doorbell_size as parameter to get db absolute offset.

Arvind Yadav (2):
drm/amdgpu: Adding db_size to get doorbell absolute offset
drm/amdkfd: get doorbell's absolute offset based on the db size

drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h | 5 +++--
drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c | 13 +++++++++----
.../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 6 +++++-
drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 13 +++++++++++--
.../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 4 +++-
5 files changed, 31 insertions(+), 10 deletions(-)

--
2.34.1


2023-10-04 16:18:13

by Arvind Yadav

[permalink] [raw]
Subject: [PATCH v3 2/2] drm/amdkfd: get doorbell's absolute offset based on the db size

This patch is to align the absolute doorbell offset
based on the doorbell's size. So that doorbell offset
will be aligned for both 32 bit and 64 bit.

v2:
- Addressed the review comment from Felix.
v3:
- Adding doorbell_size as parameter to get db absolute offset.

Cc: Christian Koenig <[email protected]>
Cc: Alex Deucher <[email protected]>
Signed-off-by: Shashank Sharma <[email protected]>
Signed-off-by: Arvind Yadav <[email protected]>
---
.../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 6 +++++-
drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 13 +++++++++++--
.../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 4 +++-
3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 0d3d538b64eb..690ff131fe4b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -346,6 +346,7 @@ static int allocate_doorbell(struct qcm_process_device *qpd,
uint32_t const *restore_id)
{
struct kfd_node *dev = qpd->dqm->dev;
+ uint32_t doorbell_size;

if (!KFD_IS_SOC15(dev)) {
/* On pre-SOC15 chips we need to use the queue ID to
@@ -405,9 +406,12 @@ static int allocate_doorbell(struct qcm_process_device *qpd,
}
}

+ doorbell_size = dev->kfd->device_info.doorbell_size;
+
q->properties.doorbell_off = amdgpu_doorbell_index_on_bar(dev->adev,
qpd->proc_doorbells,
- q->doorbell_id);
+ q->doorbell_id,
+ doorbell_size);
return 0;
}

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
index 7b38537c7c99..59dd76c4b138 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
@@ -161,7 +161,10 @@ void __iomem *kfd_get_kernel_doorbell(struct kfd_dev *kfd,
if (inx >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS)
return NULL;

- *doorbell_off = amdgpu_doorbell_index_on_bar(kfd->adev, kfd->doorbells, inx);
+ *doorbell_off = amdgpu_doorbell_index_on_bar(kfd->adev,
+ kfd->doorbells,
+ inx,
+ kfd->device_info.doorbell_size);
inx *= 2;

pr_debug("Get kernel queue doorbell\n"
@@ -233,6 +236,7 @@ phys_addr_t kfd_get_process_doorbells(struct kfd_process_device *pdd)
{
struct amdgpu_device *adev = pdd->dev->adev;
uint32_t first_db_index;
+ uint32_t doorbell_size;

if (!pdd->qpd.proc_doorbells) {
if (kfd_alloc_process_doorbells(pdd->dev->kfd, pdd))
@@ -240,7 +244,12 @@ phys_addr_t kfd_get_process_doorbells(struct kfd_process_device *pdd)
return 0;
}

- first_db_index = amdgpu_doorbell_index_on_bar(adev, pdd->qpd.proc_doorbells, 0);
+ doorbell_size = pdd->dev->kfd->device_info.doorbell_size;
+
+ first_db_index = amdgpu_doorbell_index_on_bar(adev,
+ pdd->qpd.proc_doorbells,
+ 0,
+ doorbell_size);
return adev->doorbell.base + first_db_index * sizeof(uint32_t);
}

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index adb5e4bdc0b2..010cd8e8e6a1 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -375,9 +375,11 @@ int pqm_create_queue(struct process_queue_manager *pqm,
* relative doorbell index = Absolute doorbell index -
* absolute index of first doorbell in the page.
*/
+ uint32_t doorbell_size = pdd->dev->kfd->device_info.doorbell_size;
uint32_t first_db_index = amdgpu_doorbell_index_on_bar(pdd->dev->adev,
pdd->qpd.proc_doorbells,
- 0);
+ 0,
+ doorbell_size);

*p_doorbell_offset_in_process = (q->properties.doorbell_off
- first_db_index) * sizeof(uint32_t);
--
2.34.1

2023-10-04 17:01:10

by Felix Kuehling

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] drm/amdkfd: get doorbell's absolute offset based on the db size


On 2023-10-04 12:16, Arvind Yadav wrote:
> This patch is to align the absolute doorbell offset
> based on the doorbell's size. So that doorbell offset
> will be aligned for both 32 bit and 64 bit.
>
> v2:
> - Addressed the review comment from Felix.
> v3:
> - Adding doorbell_size as parameter to get db absolute offset.
>
> Cc: Christian Koenig <[email protected]>
> Cc: Alex Deucher <[email protected]>
> Signed-off-by: Shashank Sharma <[email protected]>
> Signed-off-by: Arvind Yadav <[email protected]>

The final result looks good to me. But please squash the two patches
into one. The first patch on its own breaks the build, and that's
something we don't want to commit to the branch history as it makes
tracking regressions (e.g. with git bisect) very hard or impossible.

More nit-picks inline.


> ---
> .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 6 +++++-
> drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 13 +++++++++++--
> .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 4 +++-
> 3 files changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index 0d3d538b64eb..690ff131fe4b 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -346,6 +346,7 @@ static int allocate_doorbell(struct qcm_process_device *qpd,
> uint32_t const *restore_id)
> {
> struct kfd_node *dev = qpd->dqm->dev;
> + uint32_t doorbell_size;
>
> if (!KFD_IS_SOC15(dev)) {
> /* On pre-SOC15 chips we need to use the queue ID to
> @@ -405,9 +406,12 @@ static int allocate_doorbell(struct qcm_process_device *qpd,
> }
> }
>
> + doorbell_size = dev->kfd->device_info.doorbell_size;
> +
> q->properties.doorbell_off = amdgpu_doorbell_index_on_bar(dev->adev,
> qpd->proc_doorbells,
> - q->doorbell_id);
> + q->doorbell_id,
> + doorbell_size);

You don't need a local variable for doorbell size that's only used once.
Just pass dev->kfd->device_info.doorbell_size directly.


> return 0;
> }
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
> index 7b38537c7c99..59dd76c4b138 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
> @@ -161,7 +161,10 @@ void __iomem *kfd_get_kernel_doorbell(struct kfd_dev *kfd,
> if (inx >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS)
> return NULL;
>
> - *doorbell_off = amdgpu_doorbell_index_on_bar(kfd->adev, kfd->doorbells, inx);
> + *doorbell_off = amdgpu_doorbell_index_on_bar(kfd->adev,
> + kfd->doorbells,
> + inx,
> + kfd->device_info.doorbell_size);
> inx *= 2;
>
> pr_debug("Get kernel queue doorbell\n"
> @@ -233,6 +236,7 @@ phys_addr_t kfd_get_process_doorbells(struct kfd_process_device *pdd)
> {
> struct amdgpu_device *adev = pdd->dev->adev;
> uint32_t first_db_index;
> + uint32_t doorbell_size;
>
> if (!pdd->qpd.proc_doorbells) {
> if (kfd_alloc_process_doorbells(pdd->dev->kfd, pdd))
> @@ -240,7 +244,12 @@ phys_addr_t kfd_get_process_doorbells(struct kfd_process_device *pdd)
> return 0;
> }
>
> - first_db_index = amdgpu_doorbell_index_on_bar(adev, pdd->qpd.proc_doorbells, 0);
> + doorbell_size = pdd->dev->kfd->device_info.doorbell_size;
> +
> + first_db_index = amdgpu_doorbell_index_on_bar(adev,
> + pdd->qpd.proc_doorbells,
> + 0,
> + doorbell_size);

Same as above, no local variable needed.


> return adev->doorbell.base + first_db_index * sizeof(uint32_t);
> }
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> index adb5e4bdc0b2..010cd8e8e6a1 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> @@ -375,9 +375,11 @@ int pqm_create_queue(struct process_queue_manager *pqm,
> * relative doorbell index = Absolute doorbell index -
> * absolute index of first doorbell in the page.
> */
> + uint32_t doorbell_size = pdd->dev->kfd->device_info.doorbell_size;
> uint32_t first_db_index = amdgpu_doorbell_index_on_bar(pdd->dev->adev,
> pdd->qpd.proc_doorbells,
> - 0);
> + 0,
> + doorbell_size);

No local variable needed.

Regards,
  Felix


>
> *p_doorbell_offset_in_process = (q->properties.doorbell_off
> - first_db_index) * sizeof(uint32_t);

2023-10-04 17:18:04

by Yadav, Arvind

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] drm/amdkfd: get doorbell's absolute offset based on the db size


On 10/4/2023 10:29 PM, Felix Kuehling wrote:
>
> On 2023-10-04 12:16, Arvind Yadav wrote:
>> This patch is to align the absolute doorbell offset
>> based on the doorbell's size. So that doorbell offset
>> will be aligned for both 32 bit and 64 bit.
>>
>> v2:
>> - Addressed the review comment from Felix.
>> v3:
>> - Adding doorbell_size as parameter to get db absolute offset.
>>
>> Cc: Christian Koenig <[email protected]>
>> Cc: Alex Deucher <[email protected]>
>> Signed-off-by: Shashank Sharma <[email protected]>
>> Signed-off-by: Arvind Yadav <[email protected]>
>
> The final result looks good to me. But please squash the two patches
> into one. The first patch on its own breaks the build, and that's
> something we don't want to commit to the branch history as it makes
> tracking regressions (e.g. with git bisect) very hard or impossible.
>
> More nit-picks inline.
Sure, we can have one patch.
>
>
>> ---
>>   .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c   |  6 +++++-
>>   drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c           | 13 +++++++++++--
>>   .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c  |  4 +++-
>>   3 files changed, 19 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>> index 0d3d538b64eb..690ff131fe4b 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>> @@ -346,6 +346,7 @@ static int allocate_doorbell(struct
>> qcm_process_device *qpd,
>>                    uint32_t const *restore_id)
>>   {
>>       struct kfd_node *dev = qpd->dqm->dev;
>> +    uint32_t doorbell_size;
>>         if (!KFD_IS_SOC15(dev)) {
>>           /* On pre-SOC15 chips we need to use the queue ID to
>> @@ -405,9 +406,12 @@ static int allocate_doorbell(struct
>> qcm_process_device *qpd,
>>           }
>>       }
>>   +    doorbell_size = dev->kfd->device_info.doorbell_size;
>> +
>>       q->properties.doorbell_off =
>> amdgpu_doorbell_index_on_bar(dev->adev,
>>                                     qpd->proc_doorbells,
>> -                                  q->doorbell_id);
>> +                                  q->doorbell_id,
>> +                                  doorbell_size);
>
> You don't need a local variable for doorbell size that's only used
> once. Just pass dev->kfd->device_info.doorbell_size directly.
>
I have used local variable to make the code cleaner but I will remove
local variable.
>
>>       return 0;
>>   }
>>   diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
>> index 7b38537c7c99..59dd76c4b138 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
>> @@ -161,7 +161,10 @@ void __iomem *kfd_get_kernel_doorbell(struct
>> kfd_dev *kfd,
>>       if (inx >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS)
>>           return NULL;
>>   -    *doorbell_off = amdgpu_doorbell_index_on_bar(kfd->adev,
>> kfd->doorbells, inx);
>> +    *doorbell_off = amdgpu_doorbell_index_on_bar(kfd->adev,
>> +                             kfd->doorbells,
>> +                             inx,
>> + kfd->device_info.doorbell_size);
>>       inx *= 2;
>>         pr_debug("Get kernel queue doorbell\n"
>> @@ -233,6 +236,7 @@ phys_addr_t kfd_get_process_doorbells(struct
>> kfd_process_device *pdd)
>>   {
>>       struct amdgpu_device *adev = pdd->dev->adev;
>>       uint32_t first_db_index;
>> +    uint32_t doorbell_size;
>>         if (!pdd->qpd.proc_doorbells) {
>>           if (kfd_alloc_process_doorbells(pdd->dev->kfd, pdd))
>> @@ -240,7 +244,12 @@ phys_addr_t kfd_get_process_doorbells(struct
>> kfd_process_device *pdd)
>>               return 0;
>>       }
>>   -    first_db_index = amdgpu_doorbell_index_on_bar(adev,
>> pdd->qpd.proc_doorbells, 0);
>> +    doorbell_size = pdd->dev->kfd->device_info.doorbell_size;
>> +
>> +    first_db_index = amdgpu_doorbell_index_on_bar(adev,
>> +                              pdd->qpd.proc_doorbells,
>> +                              0,
>> +                              doorbell_size);
>
> Same as above, no local variable needed.
Noted,
>
>
>>       return adev->doorbell.base + first_db_index * sizeof(uint32_t);
>>   }
>>   diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>> index adb5e4bdc0b2..010cd8e8e6a1 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>> @@ -375,9 +375,11 @@ int pqm_create_queue(struct
>> process_queue_manager *pqm,
>>            * relative doorbell index = Absolute doorbell index -
>>            * absolute index of first doorbell in the page.
>>            */
>> +        uint32_t doorbell_size =
>> pdd->dev->kfd->device_info.doorbell_size;
>>           uint32_t first_db_index =
>> amdgpu_doorbell_index_on_bar(pdd->dev->adev,
>> pdd->qpd.proc_doorbells,
>> -                                       0);
>> +                                       0,
>> +                                       doorbell_size);
>
> No local variable needed.
>
Noted,

Thanks
~Arvind

> Regards,
>   Felix
>
>
>>             *p_doorbell_offset_in_process = (q->properties.doorbell_off
>>                           - first_db_index) * sizeof(uint32_t);