2020-09-10 12:51:24

by Mansur Alisha Shaik

[permalink] [raw]
Subject: [PATCH v2 0/3] Venus - Handle race conditions in concurrency

The intention of this patchset is to handle race
conditions during concurrency usecases like
Multiple YouTube browser tabs(approx 50 plus tabs),
graphics_Stress, WiFi ON/OFF, Bluetooth ON/OF,
and reboot in parallel.

Mansur Alisha Shaik (3):
venus: core: handle race condititon for core ops
venus: core: cancel pending work items in workqueue
venus: handle use after free for iommu_map/iommu_unmap

drivers/media/platform/qcom/venus/core.c | 4 ++++
drivers/media/platform/qcom/venus/firmware.c | 17 +++++++++++++----
drivers/media/platform/qcom/venus/hfi.c | 5 ++++-
3 files changed, 21 insertions(+), 5 deletions(-)

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


2020-09-10 12:51:31

by Mansur Alisha Shaik

[permalink] [raw]
Subject: [PATCH v2 2/3] venus: core: cancel pending work items in workqueue

In concurrency usecase and reboot scenario we are
observing race condition and seeing NULL pointer
dereference crash. In shutdown path and system
recovery path we are destroying the same mutex
hence seeing crash.

This case is handled by mutex protection and
cancel delayed work items in work queue.

Below is the call trace for the crash
Call trace:
venus_remove+0xdc/0xec [venus_core]
venus_core_shutdown+0x1c/0x34 [venus_core]
platform_drv_shutdown+0x28/0x34
device_shutdown+0x154/0x1fc
kernel_restart_prepare+0x40/0x4c
kernel_restart+0x1c/0x64

Call trace:
mutex_lock+0x34/0x60
venus_hfi_destroy+0x28/0x98 [venus_core]
hfi_destroy+0x1c/0x28 [venus_core]
venus_sys_error_handler+0x60/0x14c [venus_core]
process_one_work+0x210/0x3d0
worker_thread+0x248/0x3f4
kthread+0x11c/0x12c
ret_from_fork+0x10/0x18

Signed-off-by: Mansur Alisha Shaik <[email protected]>
---
drivers/media/platform/qcom/venus/core.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/media/platform/qcom/venus/core.c b/drivers/media/platform/qcom/venus/core.c
index c5af428..69aa199 100644
--- a/drivers/media/platform/qcom/venus/core.c
+++ b/drivers/media/platform/qcom/venus/core.c
@@ -323,6 +323,8 @@ static int venus_remove(struct platform_device *pdev)
struct device *dev = core->dev;
int ret;

+ cancel_delayed_work_sync(&core->work);
+
ret = pm_runtime_get_sync(dev);
WARN_ON(ret < 0);

@@ -340,7 +342,9 @@ static int venus_remove(struct platform_device *pdev)
if (pm_ops->core_put)
pm_ops->core_put(dev);

+ mutex_lock(&core->lock);
hfi_destroy(core);
+ mutex_unlock(&core->lock);

icc_put(core->video_path);
icc_put(core->cpucfg_path);
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2020-09-11 10:23:30

by Stanimir Varbanov

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] venus: core: cancel pending work items in workqueue

Hi,

On 9/10/20 3:44 PM, Mansur Alisha Shaik wrote:
> In concurrency usecase and reboot scenario we are
> observing race condition and seeing NULL pointer
> dereference crash. In shutdown path and system
> recovery path we are destroying the same mutex
> hence seeing crash.
>
> This case is handled by mutex protection and
> cancel delayed work items in work queue.
>
> Below is the call trace for the crash
> Call trace:
> venus_remove+0xdc/0xec [venus_core]
> venus_core_shutdown+0x1c/0x34 [venus_core]
> platform_drv_shutdown+0x28/0x34
> device_shutdown+0x154/0x1fc
> kernel_restart_prepare+0x40/0x4c
> kernel_restart+0x1c/0x64
>
> Call trace:
> mutex_lock+0x34/0x60
> venus_hfi_destroy+0x28/0x98 [venus_core]
> hfi_destroy+0x1c/0x28 [venus_core]

I queued up [1] and after it this cannot happen anymore because
hfi_destroy() is not called by venus_sys_error_handler().

So I guess this patch is not needed anymore.

[1] https://www.spinics.net/lists/linux-arm-msm/msg70092.html

> venus_sys_error_handler+0x60/0x14c [venus_core]
> process_one_work+0x210/0x3d0
> worker_thread+0x248/0x3f4
> kthread+0x11c/0x12c
> ret_from_fork+0x10/0x18
>
> Signed-off-by: Mansur Alisha Shaik <[email protected]>
> ---
> drivers/media/platform/qcom/venus/core.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/media/platform/qcom/venus/core.c b/drivers/media/platform/qcom/venus/core.c
> index c5af428..69aa199 100644
> --- a/drivers/media/platform/qcom/venus/core.c
> +++ b/drivers/media/platform/qcom/venus/core.c
> @@ -323,6 +323,8 @@ static int venus_remove(struct platform_device *pdev)
> struct device *dev = core->dev;
> int ret;
>
> + cancel_delayed_work_sync(&core->work);
> +
> ret = pm_runtime_get_sync(dev);
> WARN_ON(ret < 0);
>
> @@ -340,7 +342,9 @@ static int venus_remove(struct platform_device *pdev)
> if (pm_ops->core_put)
> pm_ops->core_put(dev);
>
> + mutex_lock(&core->lock);
> hfi_destroy(core);
> + mutex_unlock(&core->lock);
>
> icc_put(core->video_path);
> icc_put(core->cpucfg_path);
>

--
regards,
Stan

2020-09-17 01:55:06

by Mansur Alisha Shaik

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] venus: core: cancel pending work items in workqueue

On 2020-09-11 15:52, Stanimir Varbanov wrote:
> Hi,
>
> On 9/10/20 3:44 PM, Mansur Alisha Shaik wrote:
>> In concurrency usecase and reboot scenario we are
>> observing race condition and seeing NULL pointer
>> dereference crash. In shutdown path and system
>> recovery path we are destroying the same mutex
>> hence seeing crash.
>>
>> This case is handled by mutex protection and
>> cancel delayed work items in work queue.
>>
>> Below is the call trace for the crash
>> Call trace:
>> venus_remove+0xdc/0xec [venus_core]
>> venus_core_shutdown+0x1c/0x34 [venus_core]
>> platform_drv_shutdown+0x28/0x34
>> device_shutdown+0x154/0x1fc
>> kernel_restart_prepare+0x40/0x4c
>> kernel_restart+0x1c/0x64
>>
>> Call trace:
>> mutex_lock+0x34/0x60
>> venus_hfi_destroy+0x28/0x98 [venus_core]
>> hfi_destroy+0x1c/0x28 [venus_core]
>
> I queued up [1] and after it this cannot happen anymore because
> hfi_destroy() is not called by venus_sys_error_handler().
>
> So I guess this patch is not needed anymore.
>
> [1] https://www.spinics.net/lists/linux-arm-msm/msg70092.html
>
Yes, this patch is not needed any more. rebased and posted new version
https://lore.kernel.org/patchwork/project/lkml/list/?series=463091

>> venus_sys_error_handler+0x60/0x14c [venus_core]
>> process_one_work+0x210/0x3d0
>> worker_thread+0x248/0x3f4
>> kthread+0x11c/0x12c
>> ret_from_fork+0x10/0x18
>>
>> Signed-off-by: Mansur Alisha Shaik <[email protected]>
>> ---
>> drivers/media/platform/qcom/venus/core.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/media/platform/qcom/venus/core.c
>> b/drivers/media/platform/qcom/venus/core.c
>> index c5af428..69aa199 100644
>> --- a/drivers/media/platform/qcom/venus/core.c
>> +++ b/drivers/media/platform/qcom/venus/core.c
>> @@ -323,6 +323,8 @@ static int venus_remove(struct platform_device
>> *pdev)
>> struct device *dev = core->dev;
>> int ret;
>>
>> + cancel_delayed_work_sync(&core->work);
>> +
>> ret = pm_runtime_get_sync(dev);
>> WARN_ON(ret < 0);
>>
>> @@ -340,7 +342,9 @@ static int venus_remove(struct platform_device
>> *pdev)
>> if (pm_ops->core_put)
>> pm_ops->core_put(dev);
>>
>> + mutex_lock(&core->lock);
>> hfi_destroy(core);
>> + mutex_unlock(&core->lock);
>>
>> icc_put(core->video_path);
>> icc_put(core->cpucfg_path);
>>