2021-12-06 12:05:57

by Xiaolei Wang

[permalink] [raw]
Subject: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()

We observed the following kmemleak report:
unreferenced object 0xffff000007904500 (size 128):
comm "swapper/0", pid 1, jiffies 4294892671 (age 44.036s)
hex dump (first 32 bytes):
00 47 90 07 00 00 ff ff 60 00 c0 ff 00 00 00 00 .G......`.......
60 00 80 13 00 80 ff ff a0 00 00 00 00 00 00 00 `...............
backtrace:
[<000000004c12b1c7>] kmem_cache_alloc+0x1ac/0x2f4
[<000000005d23eb4f>] tee_shm_alloc+0x78/0x230
[<00000000794dd22c>] optee_handle_rpc+0x60/0x6f0
[<00000000d9f7c52d>] optee_do_call_with_arg+0x17c/0x1dc
[<00000000c35884da>] optee_open_session+0x128/0x1ec
[<000000001748f2ff>] tee_client_open_session+0x28/0x40
[<00000000aecb5389>] optee_enumerate_devices+0x84/0x2a0
[<000000003df18bf1>] optee_probe+0x674/0x6cc
[<000000003a4a534a>] platform_drv_probe+0x54/0xb0
[<000000000c51ce7d>] really_probe+0xe4/0x4d0
[<000000002f04c865>] driver_probe_device+0x58/0xc0
[<00000000b485397d>] device_driver_attach+0xc0/0xd0
[<00000000c835f0df>] __driver_attach+0x84/0x124
[<000000008e5a429c>] bus_for_each_dev+0x70/0xc0
[<000000001735e8a8>] driver_attach+0x24/0x30
[<000000006d94b04f>] bus_add_driver+0x104/0x1ec

This is not a memory leak because we pass the share memory pointer
to secure world and would get it from secure world before releasing it.

Signed-off-by: Xiaolei Wang <[email protected]>
---
drivers/tee/optee/smc_abi.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/tee/optee/smc_abi.c b/drivers/tee/optee/smc_abi.c
index 6196d7c3888f..cf2e3293567d 100644
--- a/drivers/tee/optee/smc_abi.c
+++ b/drivers/tee/optee/smc_abi.c
@@ -23,6 +23,7 @@
#include "optee_private.h"
#include "optee_smc.h"
#include "optee_rpc_cmd.h"
+#include <linux/kmemleak.h>
#define CREATE_TRACE_POINTS
#include "optee_trace.h"

@@ -783,6 +784,7 @@ static void optee_handle_rpc(struct tee_context *ctx,
param->a4 = 0;
param->a5 = 0;
}
+ kmemleak_not_leak(shm);
break;
case OPTEE_SMC_RPC_FUNC_FREE:
shm = reg_pair_to_ptr(param->a1, param->a2);
--
2.25.1



2021-12-09 11:41:15

by Sumit Garg

[permalink] [raw]
Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()

On Mon, 6 Dec 2021 at 17:35, Xiaolei Wang <[email protected]> wrote:
>
> We observed the following kmemleak report:
> unreferenced object 0xffff000007904500 (size 128):
> comm "swapper/0", pid 1, jiffies 4294892671 (age 44.036s)
> hex dump (first 32 bytes):
> 00 47 90 07 00 00 ff ff 60 00 c0 ff 00 00 00 00 .G......`.......
> 60 00 80 13 00 80 ff ff a0 00 00 00 00 00 00 00 `...............
> backtrace:
> [<000000004c12b1c7>] kmem_cache_alloc+0x1ac/0x2f4
> [<000000005d23eb4f>] tee_shm_alloc+0x78/0x230
> [<00000000794dd22c>] optee_handle_rpc+0x60/0x6f0
> [<00000000d9f7c52d>] optee_do_call_with_arg+0x17c/0x1dc
> [<00000000c35884da>] optee_open_session+0x128/0x1ec
> [<000000001748f2ff>] tee_client_open_session+0x28/0x40
> [<00000000aecb5389>] optee_enumerate_devices+0x84/0x2a0
> [<000000003df18bf1>] optee_probe+0x674/0x6cc
> [<000000003a4a534a>] platform_drv_probe+0x54/0xb0
> [<000000000c51ce7d>] really_probe+0xe4/0x4d0
> [<000000002f04c865>] driver_probe_device+0x58/0xc0
> [<00000000b485397d>] device_driver_attach+0xc0/0xd0
> [<00000000c835f0df>] __driver_attach+0x84/0x124
> [<000000008e5a429c>] bus_for_each_dev+0x70/0xc0
> [<000000001735e8a8>] driver_attach+0x24/0x30
> [<000000006d94b04f>] bus_add_driver+0x104/0x1ec
>
> This is not a memory leak because we pass the share memory pointer
> to secure world and would get it from secure world before releasing it.

How about if it's actually a memory leak caused by the secure world?
An example being secure world just allocates kernel memory via
OPTEE_SMC_RPC_FUNC_ALLOC and doesn't free it via
OPTEE_SMC_RPC_FUNC_FREE.

IMO, we need to cross-check optee-os if it's responsible for leaking
kernel memory.

-Sumit

>
> Signed-off-by: Xiaolei Wang <[email protected]>
> ---
> drivers/tee/optee/smc_abi.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/tee/optee/smc_abi.c b/drivers/tee/optee/smc_abi.c
> index 6196d7c3888f..cf2e3293567d 100644
> --- a/drivers/tee/optee/smc_abi.c
> +++ b/drivers/tee/optee/smc_abi.c
> @@ -23,6 +23,7 @@
> #include "optee_private.h"
> #include "optee_smc.h"
> #include "optee_rpc_cmd.h"
> +#include <linux/kmemleak.h>
> #define CREATE_TRACE_POINTS
> #include "optee_trace.h"
>
> @@ -783,6 +784,7 @@ static void optee_handle_rpc(struct tee_context *ctx,
> param->a4 = 0;
> param->a5 = 0;
> }
> + kmemleak_not_leak(shm);
> break;
> case OPTEE_SMC_RPC_FUNC_FREE:
> shm = reg_pair_to_ptr(param->a1, param->a2);
> --
> 2.25.1
>

2021-12-10 04:12:09

by Xiaolei Wang

[permalink] [raw]
Subject: RE: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()



-----Original Message-----
From: Sumit Garg <[email protected]>
Sent: Thursday, December 9, 2021 7:41 PM
To: Wang, Xiaolei <[email protected]>
Cc: [email protected]; [email protected]; [email protected]
Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()

[Please note: This e-mail is from an EXTERNAL e-mail address]

On Mon, 6 Dec 2021 at 17:35, Xiaolei Wang <[email protected]> wrote:
>
> We observed the following kmemleak report:
> unreferenced object 0xffff000007904500 (size 128):
> comm "swapper/0", pid 1, jiffies 4294892671 (age 44.036s)
> hex dump (first 32 bytes):
> 00 47 90 07 00 00 ff ff 60 00 c0 ff 00 00 00 00 .G......`.......
> 60 00 80 13 00 80 ff ff a0 00 00 00 00 00 00 00 `...............
> backtrace:
> [<000000004c12b1c7>] kmem_cache_alloc+0x1ac/0x2f4
> [<000000005d23eb4f>] tee_shm_alloc+0x78/0x230
> [<00000000794dd22c>] optee_handle_rpc+0x60/0x6f0
> [<00000000d9f7c52d>] optee_do_call_with_arg+0x17c/0x1dc
> [<00000000c35884da>] optee_open_session+0x128/0x1ec
> [<000000001748f2ff>] tee_client_open_session+0x28/0x40
> [<00000000aecb5389>] optee_enumerate_devices+0x84/0x2a0
> [<000000003df18bf1>] optee_probe+0x674/0x6cc
> [<000000003a4a534a>] platform_drv_probe+0x54/0xb0
> [<000000000c51ce7d>] really_probe+0xe4/0x4d0
> [<000000002f04c865>] driver_probe_device+0x58/0xc0
> [<00000000b485397d>] device_driver_attach+0xc0/0xd0
> [<00000000c835f0df>] __driver_attach+0x84/0x124
> [<000000008e5a429c>] bus_for_each_dev+0x70/0xc0
> [<000000001735e8a8>] driver_attach+0x24/0x30
> [<000000006d94b04f>] bus_add_driver+0x104/0x1ec
>
> This is not a memory leak because we pass the share memory pointer to
> secure world and would get it from secure world before releasing it.

> How about if it's actually a memory leak caused by the secure world?
> An example being secure world just allocates kernel memory via OPTEE_SMC_RPC_FUNC_ALLOC and doesn't free it via OPTEE_SMC_RPC_FUNC_FREE.

> IMO, we need to cross-check optee-os if it's responsible for leaking kernel memory.

Hi sumit,

You mean we need to check whether there is a real memleak,
If being secure world just allocate kernel memory via OPTEE_SMC_PRC_FUNC_ALLOC and until the end, there is no free
It via OPTEE_SMC_PRC_FUNC_FREE, then we should judge it as a memory leak, wo need to judge whether it is caused by secure os?

Thanks
Xiaolei


> -Sumit

>
> Signed-off-by: Xiaolei Wang <[email protected]>
> ---
> drivers/tee/optee/smc_abi.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/tee/optee/smc_abi.c b/drivers/tee/optee/smc_abi.c
> index 6196d7c3888f..cf2e3293567d 100644
> --- a/drivers/tee/optee/smc_abi.c
> +++ b/drivers/tee/optee/smc_abi.c
> @@ -23,6 +23,7 @@
> #include "optee_private.h"
> #include "optee_smc.h"
> #include "optee_rpc_cmd.h"
> +#include <linux/kmemleak.h>
> #define CREATE_TRACE_POINTS
> #include "optee_trace.h"
>
> @@ -783,6 +784,7 @@ static void optee_handle_rpc(struct tee_context *ctx,
> param->a4 = 0;
> param->a5 = 0;
> }
> + kmemleak_not_leak(shm);
> break;
> case OPTEE_SMC_RPC_FUNC_FREE:
> shm = reg_pair_to_ptr(param->a1, param->a2);
> --
> 2.25.1
>

2021-12-10 05:10:16

by Sumit Garg

[permalink] [raw]
Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()

On Fri, 10 Dec 2021 at 09:42, Wang, Xiaolei <[email protected]> wrote:
>
>
>
> -----Original Message-----
> From: Sumit Garg <[email protected]>
> Sent: Thursday, December 9, 2021 7:41 PM
> To: Wang, Xiaolei <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()
>
> [Please note: This e-mail is from an EXTERNAL e-mail address]
>
> On Mon, 6 Dec 2021 at 17:35, Xiaolei Wang <[email protected]> wrote:
> >
> > We observed the following kmemleak report:
> > unreferenced object 0xffff000007904500 (size 128):
> > comm "swapper/0", pid 1, jiffies 4294892671 (age 44.036s)
> > hex dump (first 32 bytes):
> > 00 47 90 07 00 00 ff ff 60 00 c0 ff 00 00 00 00 .G......`.......
> > 60 00 80 13 00 80 ff ff a0 00 00 00 00 00 00 00 `...............
> > backtrace:
> > [<000000004c12b1c7>] kmem_cache_alloc+0x1ac/0x2f4
> > [<000000005d23eb4f>] tee_shm_alloc+0x78/0x230
> > [<00000000794dd22c>] optee_handle_rpc+0x60/0x6f0
> > [<00000000d9f7c52d>] optee_do_call_with_arg+0x17c/0x1dc
> > [<00000000c35884da>] optee_open_session+0x128/0x1ec
> > [<000000001748f2ff>] tee_client_open_session+0x28/0x40
> > [<00000000aecb5389>] optee_enumerate_devices+0x84/0x2a0
> > [<000000003df18bf1>] optee_probe+0x674/0x6cc
> > [<000000003a4a534a>] platform_drv_probe+0x54/0xb0
> > [<000000000c51ce7d>] really_probe+0xe4/0x4d0
> > [<000000002f04c865>] driver_probe_device+0x58/0xc0
> > [<00000000b485397d>] device_driver_attach+0xc0/0xd0
> > [<00000000c835f0df>] __driver_attach+0x84/0x124
> > [<000000008e5a429c>] bus_for_each_dev+0x70/0xc0
> > [<000000001735e8a8>] driver_attach+0x24/0x30
> > [<000000006d94b04f>] bus_add_driver+0x104/0x1ec
> >
> > This is not a memory leak because we pass the share memory pointer to
> > secure world and would get it from secure world before releasing it.
>
> > How about if it's actually a memory leak caused by the secure world?
> > An example being secure world just allocates kernel memory via OPTEE_SMC_RPC_FUNC_ALLOC and doesn't free it via OPTEE_SMC_RPC_FUNC_FREE.
>
> > IMO, we need to cross-check optee-os if it's responsible for leaking kernel memory.
>
> Hi sumit,
>
> You mean we need to check whether there is a real memleak,
> If being secure world just allocate kernel memory via OPTEE_SMC_PRC_FUNC_ALLOC and until the end, there is no free
> It via OPTEE_SMC_PRC_FUNC_FREE, then we should judge it as a memory leak, wo need to judge whether it is caused by secure os?

Yes. AFAICT, optee-os should allocate shared memory to communicate
with tee-supplicant. So once the communication is done, the underlying
shared memory should be freed. I can't think of any scenario where
optee-os should keep hold-off shared memory indefinitely.

-Sumit

>
> Thanks
> Xiaolei
>
>
> > -Sumit
>
> >
> > Signed-off-by: Xiaolei Wang <[email protected]>
> > ---
> > drivers/tee/optee/smc_abi.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/tee/optee/smc_abi.c b/drivers/tee/optee/smc_abi.c
> > index 6196d7c3888f..cf2e3293567d 100644
> > --- a/drivers/tee/optee/smc_abi.c
> > +++ b/drivers/tee/optee/smc_abi.c
> > @@ -23,6 +23,7 @@
> > #include "optee_private.h"
> > #include "optee_smc.h"
> > #include "optee_rpc_cmd.h"
> > +#include <linux/kmemleak.h>
> > #define CREATE_TRACE_POINTS
> > #include "optee_trace.h"
> >
> > @@ -783,6 +784,7 @@ static void optee_handle_rpc(struct tee_context *ctx,
> > param->a4 = 0;
> > param->a5 = 0;
> > }
> > + kmemleak_not_leak(shm);
> > break;
> > case OPTEE_SMC_RPC_FUNC_FREE:
> > shm = reg_pair_to_ptr(param->a1, param->a2);
> > --
> > 2.25.1
> >

2021-12-10 08:10:58

by Jerome Forissier

[permalink] [raw]
Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()

+CC Jens, Etienne

On 12/10/21 06:00, Sumit Garg wrote:
> On Fri, 10 Dec 2021 at 09:42, Wang, Xiaolei <[email protected]> wrote:
>>
>> -----Original Message-----
>> From: Sumit Garg <[email protected]>
>> Sent: Thursday, December 9, 2021 7:41 PM
>> To: Wang, Xiaolei <[email protected]>
>> Cc: [email protected]; [email protected]; [email protected]
>> Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()
>>
>> [Please note: This e-mail is from an EXTERNAL e-mail address]
>>
>> On Mon, 6 Dec 2021 at 17:35, Xiaolei Wang <[email protected]> wrote:
>>>
>>> We observed the following kmemleak report:
>>> unreferenced object 0xffff000007904500 (size 128):
>>> comm "swapper/0", pid 1, jiffies 4294892671 (age 44.036s)
>>> hex dump (first 32 bytes):
>>> 00 47 90 07 00 00 ff ff 60 00 c0 ff 00 00 00 00 .G......`.......
>>> 60 00 80 13 00 80 ff ff a0 00 00 00 00 00 00 00 `...............
>>> backtrace:
>>> [<000000004c12b1c7>] kmem_cache_alloc+0x1ac/0x2f4
>>> [<000000005d23eb4f>] tee_shm_alloc+0x78/0x230
>>> [<00000000794dd22c>] optee_handle_rpc+0x60/0x6f0
>>> [<00000000d9f7c52d>] optee_do_call_with_arg+0x17c/0x1dc
>>> [<00000000c35884da>] optee_open_session+0x128/0x1ec
>>> [<000000001748f2ff>] tee_client_open_session+0x28/0x40
>>> [<00000000aecb5389>] optee_enumerate_devices+0x84/0x2a0
>>> [<000000003df18bf1>] optee_probe+0x674/0x6cc
>>> [<000000003a4a534a>] platform_drv_probe+0x54/0xb0
>>> [<000000000c51ce7d>] really_probe+0xe4/0x4d0
>>> [<000000002f04c865>] driver_probe_device+0x58/0xc0
>>> [<00000000b485397d>] device_driver_attach+0xc0/0xd0
>>> [<00000000c835f0df>] __driver_attach+0x84/0x124
>>> [<000000008e5a429c>] bus_for_each_dev+0x70/0xc0
>>> [<000000001735e8a8>] driver_attach+0x24/0x30
>>> [<000000006d94b04f>] bus_add_driver+0x104/0x1ec
>>>
>>> This is not a memory leak because we pass the share memory pointer to
>>> secure world and would get it from secure world before releasing it.
>>
>>> How about if it's actually a memory leak caused by the secure world?
>>> An example being secure world just allocates kernel memory via OPTEE_SMC_RPC_FUNC_ALLOC and doesn't free it via OPTEE_SMC_RPC_FUNC_FREE.
>>
>>> IMO, we need to cross-check optee-os if it's responsible for leaking kernel memory.
>>
>> Hi sumit,
>>
>> You mean we need to check whether there is a real memleak,
>> If being secure world just allocate kernel memory via OPTEE_SMC_PRC_FUNC_ALLOC and until the end, there is no free
>> It via OPTEE_SMC_PRC_FUNC_FREE, then we should judge it as a memory leak, wo need to judge whether it is caused by secure os?
>
> Yes. AFAICT, optee-os should allocate shared memory to communicate
> with tee-supplicant. So once the communication is done, the underlying
> shared memory should be freed. I can't think of any scenario where
> optee-os should keep hold-off shared memory indefinitely.

I believe it can happen when OP-TEE's CFG_PREALLOC_RPC_CACHE is y. See
the config file [1] and the commit which introduced this config [2].

[1] https://github.com/OP-TEE/optee_os/blob/3.15.0/mk/config.mk#L709
[2] https://github.com/OP-TEE/optee_os/commit/8887663248ad

--
Jerome

2021-12-10 09:38:19

by Etienne Carriere

[permalink] [raw]
Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()

Hello all,

On Fri, 10 Dec 2021 at 09:10, Jerome Forissier <[email protected]> wrote:
>
> +CC Jens, Etienne
>
> On 12/10/21 06:00, Sumit Garg wrote:
> > On Fri, 10 Dec 2021 at 09:42, Wang, Xiaolei <[email protected]> wrote:
> >>
> >> -----Original Message-----
> >> From: Sumit Garg <[email protected]>
> >> Sent: Thursday, December 9, 2021 7:41 PM
> >> To: Wang, Xiaolei <[email protected]>
> >> Cc: [email protected]; [email protected]; [email protected]
> >> Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()
> >>
> >> [Please note: This e-mail is from an EXTERNAL e-mail address]
> >>
> >> On Mon, 6 Dec 2021 at 17:35, Xiaolei Wang <[email protected]> wrote:
> >>>
> >>> We observed the following kmemleak report:
> >>> unreferenced object 0xffff000007904500 (size 128):
> >>> comm "swapper/0", pid 1, jiffies 4294892671 (age 44.036s)
> >>> hex dump (first 32 bytes):
> >>> 00 47 90 07 00 00 ff ff 60 00 c0 ff 00 00 00 00 .G......`.......
> >>> 60 00 80 13 00 80 ff ff a0 00 00 00 00 00 00 00 `...............
> >>> backtrace:
> >>> [<000000004c12b1c7>] kmem_cache_alloc+0x1ac/0x2f4
> >>> [<000000005d23eb4f>] tee_shm_alloc+0x78/0x230
> >>> [<00000000794dd22c>] optee_handle_rpc+0x60/0x6f0
> >>> [<00000000d9f7c52d>] optee_do_call_with_arg+0x17c/0x1dc
> >>> [<00000000c35884da>] optee_open_session+0x128/0x1ec
> >>> [<000000001748f2ff>] tee_client_open_session+0x28/0x40
> >>> [<00000000aecb5389>] optee_enumerate_devices+0x84/0x2a0
> >>> [<000000003df18bf1>] optee_probe+0x674/0x6cc
> >>> [<000000003a4a534a>] platform_drv_probe+0x54/0xb0
> >>> [<000000000c51ce7d>] really_probe+0xe4/0x4d0
> >>> [<000000002f04c865>] driver_probe_device+0x58/0xc0
> >>> [<00000000b485397d>] device_driver_attach+0xc0/0xd0
> >>> [<00000000c835f0df>] __driver_attach+0x84/0x124
> >>> [<000000008e5a429c>] bus_for_each_dev+0x70/0xc0
> >>> [<000000001735e8a8>] driver_attach+0x24/0x30
> >>> [<000000006d94b04f>] bus_add_driver+0x104/0x1ec
> >>>
> >>> This is not a memory leak because we pass the share memory pointer to
> >>> secure world and would get it from secure world before releasing it.
> >>
> >>> How about if it's actually a memory leak caused by the secure world?
> >>> An example being secure world just allocates kernel memory via OPTEE_SMC_RPC_FUNC_ALLOC and doesn't free it via OPTEE_SMC_RPC_FUNC_FREE.
> >>
> >>> IMO, we need to cross-check optee-os if it's responsible for leaking kernel memory.
> >>
> >> Hi sumit,
> >>
> >> You mean we need to check whether there is a real memleak,
> >> If being secure world just allocate kernel memory via OPTEE_SMC_PRC_FUNC_ALLOC and until the end, there is no free
> >> It via OPTEE_SMC_PRC_FUNC_FREE, then we should judge it as a memory leak, wo need to judge whether it is caused by secure os?
> >
> > Yes. AFAICT, optee-os should allocate shared memory to communicate
> > with tee-supplicant. So once the communication is done, the underlying
> > shared memory should be freed. I can't think of any scenario where
> > optee-os should keep hold-off shared memory indefinitely.
>
> I believe it can happen when OP-TEE's CFG_PREALLOC_RPC_CACHE is y. See
> the config file [1] and the commit which introduced this config [2].
>
> [1] https://github.com/OP-TEE/optee_os/blob/3.15.0/mk/config.mk#L709
> [2] https://github.com/OP-TEE/optee_os/commit/8887663248ad
>

It's been a while since OP-TEE caches some shm buffers to prevent
re-allocting them on and on.
OP-TEE does so for 1 shm buffer per "tee threads" OP-TEE has provisioned.
Each thread can cache a shm reference.
Note that used RPCs from optee to linux/u-boot/ree do not require such
message buffer (IMO).

The main issue is the shm buffer are allocated per optee thread
(thread context assigned to client invocation request when entreing
optee).
Therefore, if an optee thread caches a shm buffer, it makes the caller
tee session to have a shm reference with a refcount held, until Optee
thread releases its cached shm reference.

There are ugly side effects. Linux must disable the cache to release
all resources.
We recently saw some tee sessions may be left open because of such shm
refcount held.
It can lead to few misbehaviour of the TA service (restarting a
service, releasing a resource)

Config switch CFG_PREALLOC_RPC_CACHE was introduced [pr4896] to
disable the feature at boot time.
There are means to not use it, or to explicitly enable/disable it at
run time (already used optee smc services for that). Would maybe be a
better default config.
Note this discussion thread ending at his comment [issue1918]:

Comments are welcome. I may have missed something in the description
(or understanding :).

[pr4896] https://github.com/OP-TEE/optee_os/pull/4896
[issue1918] https://github.com/OP-TEE/optee_os/issues/1918#issuecomment-968747738

Best regards,
etienne



> --
> Jerome

2021-12-10 09:38:38

by Sumit Garg

[permalink] [raw]
Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()

On Fri, 10 Dec 2021 at 13:40, Jerome Forissier <[email protected]> wrote:
>
> +CC Jens, Etienne
>
> On 12/10/21 06:00, Sumit Garg wrote:
> > On Fri, 10 Dec 2021 at 09:42, Wang, Xiaolei <[email protected]> wrote:
> >>
> >> -----Original Message-----
> >> From: Sumit Garg <[email protected]>
> >> Sent: Thursday, December 9, 2021 7:41 PM
> >> To: Wang, Xiaolei <[email protected]>
> >> Cc: [email protected]; [email protected]; [email protected]
> >> Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()
> >>
> >> [Please note: This e-mail is from an EXTERNAL e-mail address]
> >>
> >> On Mon, 6 Dec 2021 at 17:35, Xiaolei Wang <[email protected]> wrote:
> >>>
> >>> We observed the following kmemleak report:
> >>> unreferenced object 0xffff000007904500 (size 128):
> >>> comm "swapper/0", pid 1, jiffies 4294892671 (age 44.036s)
> >>> hex dump (first 32 bytes):
> >>> 00 47 90 07 00 00 ff ff 60 00 c0 ff 00 00 00 00 .G......`.......
> >>> 60 00 80 13 00 80 ff ff a0 00 00 00 00 00 00 00 `...............
> >>> backtrace:
> >>> [<000000004c12b1c7>] kmem_cache_alloc+0x1ac/0x2f4
> >>> [<000000005d23eb4f>] tee_shm_alloc+0x78/0x230
> >>> [<00000000794dd22c>] optee_handle_rpc+0x60/0x6f0
> >>> [<00000000d9f7c52d>] optee_do_call_with_arg+0x17c/0x1dc
> >>> [<00000000c35884da>] optee_open_session+0x128/0x1ec
> >>> [<000000001748f2ff>] tee_client_open_session+0x28/0x40
> >>> [<00000000aecb5389>] optee_enumerate_devices+0x84/0x2a0
> >>> [<000000003df18bf1>] optee_probe+0x674/0x6cc
> >>> [<000000003a4a534a>] platform_drv_probe+0x54/0xb0
> >>> [<000000000c51ce7d>] really_probe+0xe4/0x4d0
> >>> [<000000002f04c865>] driver_probe_device+0x58/0xc0
> >>> [<00000000b485397d>] device_driver_attach+0xc0/0xd0
> >>> [<00000000c835f0df>] __driver_attach+0x84/0x124
> >>> [<000000008e5a429c>] bus_for_each_dev+0x70/0xc0
> >>> [<000000001735e8a8>] driver_attach+0x24/0x30
> >>> [<000000006d94b04f>] bus_add_driver+0x104/0x1ec
> >>>
> >>> This is not a memory leak because we pass the share memory pointer to
> >>> secure world and would get it from secure world before releasing it.
> >>
> >>> How about if it's actually a memory leak caused by the secure world?
> >>> An example being secure world just allocates kernel memory via OPTEE_SMC_RPC_FUNC_ALLOC and doesn't free it via OPTEE_SMC_RPC_FUNC_FREE.
> >>
> >>> IMO, we need to cross-check optee-os if it's responsible for leaking kernel memory.
> >>
> >> Hi sumit,
> >>
> >> You mean we need to check whether there is a real memleak,
> >> If being secure world just allocate kernel memory via OPTEE_SMC_PRC_FUNC_ALLOC and until the end, there is no free
> >> It via OPTEE_SMC_PRC_FUNC_FREE, then we should judge it as a memory leak, wo need to judge whether it is caused by secure os?
> >
> > Yes. AFAICT, optee-os should allocate shared memory to communicate
> > with tee-supplicant. So once the communication is done, the underlying
> > shared memory should be freed. I can't think of any scenario where
> > optee-os should keep hold-off shared memory indefinitely.
>
> I believe it can happen when OP-TEE's CFG_PREALLOC_RPC_CACHE is y. See
> the config file [1] and the commit which introduced this config [2].

Okay, I see the reasoning. So during the OP-TEE driver's lifetime, the
RPC shared memory remains allocated. I guess that is done primarily
for performance reasons.

But still it doesn't feel appropriate that we term all RPC shm
allocations as not leaking memory as we might miss obvious ones.

Xiaolei,

Can you once test with CFG_PREALLOC_RPC_CACHE=n while compiling
optee-os and see if the observed memory leak disappears or not?

-Sumit

>
> [1] https://github.com/OP-TEE/optee_os/blob/3.15.0/mk/config.mk#L709
> [2] https://github.com/OP-TEE/optee_os/commit/8887663248ad
>
> --
> Jerome

2021-12-10 09:44:06

by Etienne Carriere

[permalink] [raw]
Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()

fixing typos :(


On Fri, 10 Dec 2021 at 10:38, Etienne Carriere
<[email protected]> wrote:
> On Fri, 10 Dec 2021 at 09:10, Jerome Forissier <[email protected]> wrote:
> > +CC Jens, Etienne
> > On 12/10/21 06:00, Sumit Garg wrote:
> > > On Fri, 10 Dec 2021 at 09:42, Wang, Xiaolei <[email protected]> wrote:
> > >> From: Sumit Garg <[email protected]>
> > >> Sent: Thursday, December 9, 2021 7:41 PM
> > > Yes. AFAICT, optee-os should allocate shared memory to communicate
> > > with tee-supplicant. So once the communication is done, the underlying
> > > shared memory should be freed. I can't think of any scenario where
> > > optee-os should keep hold-off shared memory indefinitely.
> >
> > I believe it can happen when OP-TEE's CFG_PREALLOC_RPC_CACHE is y. See
> > the config file [1] and the commit which introduced this config [2].
> >
> > [1] https://github.com/OP-TEE/optee_os/blob/3.15.0/mk/config.mk#L709
> > [2] https://github.com/OP-TEE/optee_os/commit/8887663248ad
>
> (snip)
>
> It's been a while since OP-TEE caches some shm buffers to prevent
> re-allocting them on and on.
> OP-TEE does so for 1 shm buffer per "tee threads" OP-TEE has provisioned.
> Each thread can cache a shm reference.
> Note that used RPCs from optee to linux/u-boot/ree do not require such
> message buffer (IMO).

I meant: "Note that **most of the** used RPCs from ..."

br,
etienne

2021-12-10 10:29:07

by Sumit Garg

[permalink] [raw]
Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()

On Fri, 10 Dec 2021 at 15:08, Etienne Carriere
<[email protected]> wrote:
>
> Hello all,
>
> On Fri, 10 Dec 2021 at 09:10, Jerome Forissier <[email protected]> wrote:
> >
> > +CC Jens, Etienne
> >
> > On 12/10/21 06:00, Sumit Garg wrote:
> > > On Fri, 10 Dec 2021 at 09:42, Wang, Xiaolei <[email protected]> wrote:
> > >>
> > >> -----Original Message-----
> > >> From: Sumit Garg <[email protected]>
> > >> Sent: Thursday, December 9, 2021 7:41 PM
> > >> To: Wang, Xiaolei <[email protected]>
> > >> Cc: [email protected]; [email protected]; [email protected]
> > >> Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()
> > >>
> > >> [Please note: This e-mail is from an EXTERNAL e-mail address]
> > >>
> > >> On Mon, 6 Dec 2021 at 17:35, Xiaolei Wang <[email protected]> wrote:
> > >>>
> > >>> We observed the following kmemleak report:
> > >>> unreferenced object 0xffff000007904500 (size 128):
> > >>> comm "swapper/0", pid 1, jiffies 4294892671 (age 44.036s)
> > >>> hex dump (first 32 bytes):
> > >>> 00 47 90 07 00 00 ff ff 60 00 c0 ff 00 00 00 00 .G......`.......
> > >>> 60 00 80 13 00 80 ff ff a0 00 00 00 00 00 00 00 `...............
> > >>> backtrace:
> > >>> [<000000004c12b1c7>] kmem_cache_alloc+0x1ac/0x2f4
> > >>> [<000000005d23eb4f>] tee_shm_alloc+0x78/0x230
> > >>> [<00000000794dd22c>] optee_handle_rpc+0x60/0x6f0
> > >>> [<00000000d9f7c52d>] optee_do_call_with_arg+0x17c/0x1dc
> > >>> [<00000000c35884da>] optee_open_session+0x128/0x1ec
> > >>> [<000000001748f2ff>] tee_client_open_session+0x28/0x40
> > >>> [<00000000aecb5389>] optee_enumerate_devices+0x84/0x2a0
> > >>> [<000000003df18bf1>] optee_probe+0x674/0x6cc
> > >>> [<000000003a4a534a>] platform_drv_probe+0x54/0xb0
> > >>> [<000000000c51ce7d>] really_probe+0xe4/0x4d0
> > >>> [<000000002f04c865>] driver_probe_device+0x58/0xc0
> > >>> [<00000000b485397d>] device_driver_attach+0xc0/0xd0
> > >>> [<00000000c835f0df>] __driver_attach+0x84/0x124
> > >>> [<000000008e5a429c>] bus_for_each_dev+0x70/0xc0
> > >>> [<000000001735e8a8>] driver_attach+0x24/0x30
> > >>> [<000000006d94b04f>] bus_add_driver+0x104/0x1ec
> > >>>
> > >>> This is not a memory leak because we pass the share memory pointer to
> > >>> secure world and would get it from secure world before releasing it.
> > >>
> > >>> How about if it's actually a memory leak caused by the secure world?
> > >>> An example being secure world just allocates kernel memory via OPTEE_SMC_RPC_FUNC_ALLOC and doesn't free it via OPTEE_SMC_RPC_FUNC_FREE.
> > >>
> > >>> IMO, we need to cross-check optee-os if it's responsible for leaking kernel memory.
> > >>
> > >> Hi sumit,
> > >>
> > >> You mean we need to check whether there is a real memleak,
> > >> If being secure world just allocate kernel memory via OPTEE_SMC_PRC_FUNC_ALLOC and until the end, there is no free
> > >> It via OPTEE_SMC_PRC_FUNC_FREE, then we should judge it as a memory leak, wo need to judge whether it is caused by secure os?
> > >
> > > Yes. AFAICT, optee-os should allocate shared memory to communicate
> > > with tee-supplicant. So once the communication is done, the underlying
> > > shared memory should be freed. I can't think of any scenario where
> > > optee-os should keep hold-off shared memory indefinitely.
> >
> > I believe it can happen when OP-TEE's CFG_PREALLOC_RPC_CACHE is y. See
> > the config file [1] and the commit which introduced this config [2].
> >
> > [1] https://github.com/OP-TEE/optee_os/blob/3.15.0/mk/config.mk#L709
> > [2] https://github.com/OP-TEE/optee_os/commit/8887663248ad
> >
>
> It's been a while since OP-TEE caches some shm buffers to prevent
> re-allocting them on and on.
> OP-TEE does so for 1 shm buffer per "tee threads" OP-TEE has provisioned.
> Each thread can cache a shm reference.
> Note that used RPCs from optee to linux/u-boot/ree do not require such
> message buffer (IMO).
>
> The main issue is the shm buffer are allocated per optee thread
> (thread context assigned to client invocation request when entreing
> optee).
> Therefore, if an optee thread caches a shm buffer, it makes the caller
> tee session to have a shm reference with a refcount held, until Optee
> thread releases its cached shm reference.
>
> There are ugly side effects. Linux must disable the cache to release
> all resources.
> We recently saw some tee sessions may be left open because of such shm
> refcount held.
> It can lead to few misbehaviour of the TA service (restarting a
> service, releasing a resource)
>
> Config switch CFG_PREALLOC_RPC_CACHE was introduced [pr4896] to
> disable the feature at boot time.
> There are means to not use it, or to explicitly enable/disable it at
> run time (already used optee smc services for that). Would maybe be a
> better default config.
> Note this discussion thread ending at his comment [issue1918]:
>

Thanks etienne for the detailed description and references. Although,
we can set CFG_PREALLOC_RPC_CACHE=n by default but it feels like we
would miss a valuable optimization.

How about we just allocate a shared memory page during the OP-TEE
driver probe and share it with optee-os to use for RPC arguments? And
later it can be freed during OP-TEE driver removal. This would avoid
any refconting of this special memory to be associated with TA
sessions.

-Sumit

> Comments are welcome. I may have missed something in the description
> (or understanding :).
>
> [pr4896] https://github.com/OP-TEE/optee_os/pull/4896
> [issue1918] https://github.com/OP-TEE/optee_os/issues/1918#issuecomment-968747738
>
> Best regards,
> etienne
>
>
>
> > --
> > Jerome

2021-12-10 10:39:34

by Etienne Carriere

[permalink] [raw]
Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()

On Fri, 10 Dec 2021 at 11:29, Sumit Garg <[email protected]> wrote:
>
> On Fri, 10 Dec 2021 at 15:08, Etienne Carriere
> <[email protected]> wrote:
> >
> > Hello all,
> >
> > On Fri, 10 Dec 2021 at 09:10, Jerome Forissier <[email protected]> wrote:
> > >
> > > +CC Jens, Etienne
> > >
> > > On 12/10/21 06:00, Sumit Garg wrote:
> > > > On Fri, 10 Dec 2021 at 09:42, Wang, Xiaolei <[email protected]> wrote:
> > > >>
> > > >> -----Original Message-----
> > > >> From: Sumit Garg <[email protected]>
> > > >> Sent: Thursday, December 9, 2021 7:41 PM
> > > >> To: Wang, Xiaolei <[email protected]>
> > > >> Cc: [email protected]; [email protected]; [email protected]
> > > >> Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()
> > > >>
> > > >> [Please note: This e-mail is from an EXTERNAL e-mail address]
> > > >>
> > > >> On Mon, 6 Dec 2021 at 17:35, Xiaolei Wang <[email protected]> wrote:
> > > >>>
> > > >>> We observed the following kmemleak report:
> > > >>> unreferenced object 0xffff000007904500 (size 128):
> > > >>> comm "swapper/0", pid 1, jiffies 4294892671 (age 44.036s)
> > > >>> hex dump (first 32 bytes):
> > > >>> 00 47 90 07 00 00 ff ff 60 00 c0 ff 00 00 00 00 .G......`.......
> > > >>> 60 00 80 13 00 80 ff ff a0 00 00 00 00 00 00 00 `...............
> > > >>> backtrace:
> > > >>> [<000000004c12b1c7>] kmem_cache_alloc+0x1ac/0x2f4
> > > >>> [<000000005d23eb4f>] tee_shm_alloc+0x78/0x230
> > > >>> [<00000000794dd22c>] optee_handle_rpc+0x60/0x6f0
> > > >>> [<00000000d9f7c52d>] optee_do_call_with_arg+0x17c/0x1dc
> > > >>> [<00000000c35884da>] optee_open_session+0x128/0x1ec
> > > >>> [<000000001748f2ff>] tee_client_open_session+0x28/0x40
> > > >>> [<00000000aecb5389>] optee_enumerate_devices+0x84/0x2a0
> > > >>> [<000000003df18bf1>] optee_probe+0x674/0x6cc
> > > >>> [<000000003a4a534a>] platform_drv_probe+0x54/0xb0
> > > >>> [<000000000c51ce7d>] really_probe+0xe4/0x4d0
> > > >>> [<000000002f04c865>] driver_probe_device+0x58/0xc0
> > > >>> [<00000000b485397d>] device_driver_attach+0xc0/0xd0
> > > >>> [<00000000c835f0df>] __driver_attach+0x84/0x124
> > > >>> [<000000008e5a429c>] bus_for_each_dev+0x70/0xc0
> > > >>> [<000000001735e8a8>] driver_attach+0x24/0x30
> > > >>> [<000000006d94b04f>] bus_add_driver+0x104/0x1ec
> > > >>>
> > > >>> This is not a memory leak because we pass the share memory pointer to
> > > >>> secure world and would get it from secure world before releasing it.
> > > >>
> > > >>> How about if it's actually a memory leak caused by the secure world?
> > > >>> An example being secure world just allocates kernel memory via OPTEE_SMC_RPC_FUNC_ALLOC and doesn't free it via OPTEE_SMC_RPC_FUNC_FREE.
> > > >>
> > > >>> IMO, we need to cross-check optee-os if it's responsible for leaking kernel memory.
> > > >>
> > > >> Hi sumit,
> > > >>
> > > >> You mean we need to check whether there is a real memleak,
> > > >> If being secure world just allocate kernel memory via OPTEE_SMC_PRC_FUNC_ALLOC and until the end, there is no free
> > > >> It via OPTEE_SMC_PRC_FUNC_FREE, then we should judge it as a memory leak, wo need to judge whether it is caused by secure os?
> > > >
> > > > Yes. AFAICT, optee-os should allocate shared memory to communicate
> > > > with tee-supplicant. So once the communication is done, the underlying
> > > > shared memory should be freed. I can't think of any scenario where
> > > > optee-os should keep hold-off shared memory indefinitely.
> > >
> > > I believe it can happen when OP-TEE's CFG_PREALLOC_RPC_CACHE is y. See
> > > the config file [1] and the commit which introduced this config [2].
> > >
> > > [1] https://github.com/OP-TEE/optee_os/blob/3.15.0/mk/config.mk#L709
> > > [2] https://github.com/OP-TEE/optee_os/commit/8887663248ad
> > >
> >
> > It's been a while since OP-TEE caches some shm buffers to prevent
> > re-allocting them on and on.
> > OP-TEE does so for 1 shm buffer per "tee threads" OP-TEE has provisioned.
> > Each thread can cache a shm reference.
> > Note that used RPCs from optee to linux/u-boot/ree do not require such
> > message buffer (IMO).
> >
> > The main issue is the shm buffer are allocated per optee thread
> > (thread context assigned to client invocation request when entreing
> > optee).
> > Therefore, if an optee thread caches a shm buffer, it makes the caller
> > tee session to have a shm reference with a refcount held, until Optee
> > thread releases its cached shm reference.
> >
> > There are ugly side effects. Linux must disable the cache to release
> > all resources.
> > We recently saw some tee sessions may be left open because of such shm
> > refcount held.
> > It can lead to few misbehaviour of the TA service (restarting a
> > service, releasing a resource)
> >
> > Config switch CFG_PREALLOC_RPC_CACHE was introduced [pr4896] to
> > disable the feature at boot time.
> > There are means to not use it, or to explicitly enable/disable it at
> > run time (already used optee smc services for that). Would maybe be a
> > better default config.
> > Note this discussion thread ending at his comment [issue1918]:
> >
>
> Thanks etienne for the detailed description and references. Although,
> we can set CFG_PREALLOC_RPC_CACHE=n by default but it feels like we
> would miss a valuable optimization.
>
> How about we just allocate a shared memory page during the OP-TEE
> driver probe and share it with optee-os to use for RPC arguments? And
> later it can be freed during OP-TEE driver removal. This would avoid
> any refconting of this special memory to be associated with TA
> sessions.

True. The driver currently invokes OPTEE_SMC_ENABLE_SHM_CACHE
to start caching some shm allocations. The optee_os part of that command
could be changed to preallocate the required small rpc message buffer per
provisioned tee thread.

Existing OPTEE_SMC_DISABLE_SHM_CACHE should behave accordingly.

etienne

>
> -Sumit
>
> > Comments are welcome. I may have missed something in the description
> > (or understanding :).
> >
> > [pr4896] https://github.com/OP-TEE/optee_os/pull/4896
> > [issue1918] https://github.com/OP-TEE/optee_os/issues/1918#issuecomment-968747738
> >
> > Best regards,
> > etienne
> >
> >
> >
> > > --
> > > Jerome

2021-12-10 10:42:00

by Jens Wiklander

[permalink] [raw]
Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()

Hi,

On Fri, Dec 10, 2021 at 11:29 AM Sumit Garg <[email protected]> wrote:
>
> On Fri, 10 Dec 2021 at 15:08, Etienne Carriere
> <[email protected]> wrote:
> >
> > Hello all,
> >
> > On Fri, 10 Dec 2021 at 09:10, Jerome Forissier <[email protected]> wrote:
> > >
> > > +CC Jens, Etienne
> > >
> > > On 12/10/21 06:00, Sumit Garg wrote:
> > > > On Fri, 10 Dec 2021 at 09:42, Wang, Xiaolei <[email protected]> wrote:
> > > >>
> > > >> -----Original Message-----
> > > >> From: Sumit Garg <[email protected]>
> > > >> Sent: Thursday, December 9, 2021 7:41 PM
> > > >> To: Wang, Xiaolei <[email protected]>
> > > >> Cc: [email protected]; [email protected]; [email protected]
> > > >> Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()
> > > >>
> > > >> [Please note: This e-mail is from an EXTERNAL e-mail address]
> > > >>
> > > >> On Mon, 6 Dec 2021 at 17:35, Xiaolei Wang <[email protected]> wrote:
> > > >>>
> > > >>> We observed the following kmemleak report:
> > > >>> unreferenced object 0xffff000007904500 (size 128):
> > > >>> comm "swapper/0", pid 1, jiffies 4294892671 (age 44.036s)
> > > >>> hex dump (first 32 bytes):
> > > >>> 00 47 90 07 00 00 ff ff 60 00 c0 ff 00 00 00 00 .G......`.......
> > > >>> 60 00 80 13 00 80 ff ff a0 00 00 00 00 00 00 00 `...............
> > > >>> backtrace:
> > > >>> [<000000004c12b1c7>] kmem_cache_alloc+0x1ac/0x2f4
> > > >>> [<000000005d23eb4f>] tee_shm_alloc+0x78/0x230
> > > >>> [<00000000794dd22c>] optee_handle_rpc+0x60/0x6f0
> > > >>> [<00000000d9f7c52d>] optee_do_call_with_arg+0x17c/0x1dc
> > > >>> [<00000000c35884da>] optee_open_session+0x128/0x1ec
> > > >>> [<000000001748f2ff>] tee_client_open_session+0x28/0x40
> > > >>> [<00000000aecb5389>] optee_enumerate_devices+0x84/0x2a0
> > > >>> [<000000003df18bf1>] optee_probe+0x674/0x6cc
> > > >>> [<000000003a4a534a>] platform_drv_probe+0x54/0xb0
> > > >>> [<000000000c51ce7d>] really_probe+0xe4/0x4d0
> > > >>> [<000000002f04c865>] driver_probe_device+0x58/0xc0
> > > >>> [<00000000b485397d>] device_driver_attach+0xc0/0xd0
> > > >>> [<00000000c835f0df>] __driver_attach+0x84/0x124
> > > >>> [<000000008e5a429c>] bus_for_each_dev+0x70/0xc0
> > > >>> [<000000001735e8a8>] driver_attach+0x24/0x30
> > > >>> [<000000006d94b04f>] bus_add_driver+0x104/0x1ec
> > > >>>
> > > >>> This is not a memory leak because we pass the share memory pointer to
> > > >>> secure world and would get it from secure world before releasing it.
> > > >>
> > > >>> How about if it's actually a memory leak caused by the secure world?
> > > >>> An example being secure world just allocates kernel memory via OPTEE_SMC_RPC_FUNC_ALLOC and doesn't free it via OPTEE_SMC_RPC_FUNC_FREE.
> > > >>
> > > >>> IMO, we need to cross-check optee-os if it's responsible for leaking kernel memory.
> > > >>
> > > >> Hi sumit,
> > > >>
> > > >> You mean we need to check whether there is a real memleak,
> > > >> If being secure world just allocate kernel memory via OPTEE_SMC_PRC_FUNC_ALLOC and until the end, there is no free
> > > >> It via OPTEE_SMC_PRC_FUNC_FREE, then we should judge it as a memory leak, wo need to judge whether it is caused by secure os?
> > > >
> > > > Yes. AFAICT, optee-os should allocate shared memory to communicate
> > > > with tee-supplicant. So once the communication is done, the underlying
> > > > shared memory should be freed. I can't think of any scenario where
> > > > optee-os should keep hold-off shared memory indefinitely.
> > >
> > > I believe it can happen when OP-TEE's CFG_PREALLOC_RPC_CACHE is y. See
> > > the config file [1] and the commit which introduced this config [2].
> > >
> > > [1] https://github.com/OP-TEE/optee_os/blob/3.15.0/mk/config.mk#L709
> > > [2] https://github.com/OP-TEE/optee_os/commit/8887663248ad
> > >
> >
> > It's been a while since OP-TEE caches some shm buffers to prevent
> > re-allocting them on and on.
> > OP-TEE does so for 1 shm buffer per "tee threads" OP-TEE has provisioned.
> > Each thread can cache a shm reference.
> > Note that used RPCs from optee to linux/u-boot/ree do not require such
> > message buffer (IMO).
> >
> > The main issue is the shm buffer are allocated per optee thread
> > (thread context assigned to client invocation request when entreing
> > optee).
> > Therefore, if an optee thread caches a shm buffer, it makes the caller
> > tee session to have a shm reference with a refcount held, until Optee
> > thread releases its cached shm reference.
> >
> > There are ugly side effects. Linux must disable the cache to release
> > all resources.
> > We recently saw some tee sessions may be left open because of such shm
> > refcount held.
> > It can lead to few misbehaviour of the TA service (restarting a
> > service, releasing a resource)
> >
> > Config switch CFG_PREALLOC_RPC_CACHE was introduced [pr4896] to
> > disable the feature at boot time.
> > There are means to not use it, or to explicitly enable/disable it at
> > run time (already used optee smc services for that). Would maybe be a
> > better default config.
> > Note this discussion thread ending at his comment [issue1918]:
> >
>
> Thanks etienne for the detailed description and references. Although,
> we can set CFG_PREALLOC_RPC_CACHE=n by default but it feels like we
> would miss a valuable optimization.
>
> How about we just allocate a shared memory page during the OP-TEE
> driver probe and share it with optee-os to use for RPC arguments? And
> later it can be freed during OP-TEE driver removal. This would avoid
> any refconting of this special memory to be associated with TA
> sessions.

The FF-A ABI part of the driver avoids this problem. I've started on a
similar solution for the SMC based ABI, but it's not ready to post
yet.

Cheers,
Jens

>
> -Sumit
>
> > Comments are welcome. I may have missed something in the description
> > (or understanding :).
> >
> > [pr4896] https://github.com/OP-TEE/optee_os/pull/4896
> > [issue1918] https://github.com/OP-TEE/optee_os/issues/1918#issuecomment-968747738
> >
> > Best regards,
> > etienne
> >
> >
> >
> > > --
> > > Jerome

2021-12-10 15:50:23

by Daniel Thompson

[permalink] [raw]
Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()

On Fri, Dec 10, 2021 at 03:08:21PM +0530, Sumit Garg wrote:
> On Fri, 10 Dec 2021 at 13:40, Jerome Forissier <[email protected]> wrote:
> >
> > +CC Jens, Etienne
> >
> > On 12/10/21 06:00, Sumit Garg wrote:
> > > On Fri, 10 Dec 2021 at 09:42, Wang, Xiaolei <[email protected]> wrote:
> > >>
> > >> -----Original Message-----
> > >> From: Sumit Garg <[email protected]>
> > >> Sent: Thursday, December 9, 2021 7:41 PM
> > >> To: Wang, Xiaolei <[email protected]>
> > >> Cc: [email protected]; [email protected]; [email protected]
> > >> Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()
> > >>
> > >> [Please note: This e-mail is from an EXTERNAL e-mail address]
> > >>
> > >> On Mon, 6 Dec 2021 at 17:35, Xiaolei Wang <[email protected]> wrote:
> > >>>
> > >>> We observed the following kmemleak report:
> > >>> unreferenced object 0xffff000007904500 (size 128):
> > >>> comm "swapper/0", pid 1, jiffies 4294892671 (age 44.036s)
> > >>> hex dump (first 32 bytes):
> > >>> 00 47 90 07 00 00 ff ff 60 00 c0 ff 00 00 00 00 .G......`.......
> > >>> 60 00 80 13 00 80 ff ff a0 00 00 00 00 00 00 00 `...............
> > >>> backtrace:
> > >>> [<000000004c12b1c7>] kmem_cache_alloc+0x1ac/0x2f4
> > >>> [<000000005d23eb4f>] tee_shm_alloc+0x78/0x230
> > >>> [<00000000794dd22c>] optee_handle_rpc+0x60/0x6f0
> > >>> [<00000000d9f7c52d>] optee_do_call_with_arg+0x17c/0x1dc
> > >>> [<00000000c35884da>] optee_open_session+0x128/0x1ec
> > >>> [<000000001748f2ff>] tee_client_open_session+0x28/0x40
> > >>> [<00000000aecb5389>] optee_enumerate_devices+0x84/0x2a0
> > >>> [<000000003df18bf1>] optee_probe+0x674/0x6cc
> > >>> [<000000003a4a534a>] platform_drv_probe+0x54/0xb0
> > >>> [<000000000c51ce7d>] really_probe+0xe4/0x4d0
> > >>> [<000000002f04c865>] driver_probe_device+0x58/0xc0
> > >>> [<00000000b485397d>] device_driver_attach+0xc0/0xd0
> > >>> [<00000000c835f0df>] __driver_attach+0x84/0x124
> > >>> [<000000008e5a429c>] bus_for_each_dev+0x70/0xc0
> > >>> [<000000001735e8a8>] driver_attach+0x24/0x30
> > >>> [<000000006d94b04f>] bus_add_driver+0x104/0x1ec
> > >>>
> > >>> This is not a memory leak because we pass the share memory pointer to
> > >>> secure world and would get it from secure world before releasing it.
> > >>
> > >>> How about if it's actually a memory leak caused by the secure world?
> > >>> An example being secure world just allocates kernel memory via OPTEE_SMC_RPC_FUNC_ALLOC and doesn't free it via OPTEE_SMC_RPC_FUNC_FREE.
> > >>
> > >>> IMO, we need to cross-check optee-os if it's responsible for leaking kernel memory.
> > >>
> > >> Hi sumit,
> > >>
> > >> You mean we need to check whether there is a real memleak,
> > >> If being secure world just allocate kernel memory via OPTEE_SMC_PRC_FUNC_ALLOC and until the end, there is no free
> > >> It via OPTEE_SMC_PRC_FUNC_FREE, then we should judge it as a memory leak, wo need to judge whether it is caused by secure os?
> > >
> > > Yes. AFAICT, optee-os should allocate shared memory to communicate
> > > with tee-supplicant. So once the communication is done, the underlying
> > > shared memory should be freed. I can't think of any scenario where
> > > optee-os should keep hold-off shared memory indefinitely.
> >
> > I believe it can happen when OP-TEE's CFG_PREALLOC_RPC_CACHE is y. See
> > the config file [1] and the commit which introduced this config [2].
>
> Okay, I see the reasoning. So during the OP-TEE driver's lifetime, the
> RPC shared memory remains allocated. I guess that is done primarily
> for performance reasons.
>
> But still it doesn't feel appropriate that we term all RPC shm
> allocations as not leaking memory as we might miss obvious ones.

IIUC this patch adds kmemleak_not_leak() at (pretty much) the last
possible point before *ownership* of the SHM block is passed from kernel
to OP-TEE.

Sure, after we change ownership it could still be leaked... but it can
no longer be leaked by the kernel because the kernel no longer owns it!
More importantly, it makes no sense to run the kernel memory detector on the
buffer because it simply can't work.

After the RPC completes, doesn't it become impossible for kmemleak to
scan to see if the pointer is lost[1]? kmemleak is essentially a tracing
garbage collector and needs to be able to scan all memory that could
hold a pointer to leakable memory. After the RPC completes the
only copy of the pointer will be stored in a memory region that the
kernel is prohibited from reading. How could kmemleak possibly give you
a useful answer in this circumstance?

In other words if there's nothing kmemleak could do to fix this
situation then marking the memory as kmemleak_not_leak() seems entirely
appropriate (although it should be prefaced with a big comment
explaining the change of ownerhship and why kmemleak cannot work).


Daniel.


[1] Everything I've said here hinges on this being true... so if I've
made a mistake about where OP-TEE stores this pointer then most of
the rest of this post is junk ;-)

2021-12-13 08:56:00

by Xiaolei Wang

[permalink] [raw]
Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()


On 12/10/21 5:38 PM, Sumit Garg wrote:
> [Please note: This e-mail is from an EXTERNAL e-mail address]
>
> On Fri, 10 Dec 2021 at 13:40, Jerome Forissier <[email protected]> wrote:
>> +CC Jens, Etienne
>>
>> On 12/10/21 06:00, Sumit Garg wrote:
>>> On Fri, 10 Dec 2021 at 09:42, Wang, Xiaolei <[email protected]> wrote:
>>>> -----Original Message-----
>>>> From: Sumit Garg <[email protected]>
>>>> Sent: Thursday, December 9, 2021 7:41 PM
>>>> To: Wang, Xiaolei <[email protected]>
>>>> Cc: [email protected]; [email protected]; [email protected]
>>>> Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()
>>>>
>>>> [Please note: This e-mail is from an EXTERNAL e-mail address]
>>>>
>>>> On Mon, 6 Dec 2021 at 17:35, Xiaolei Wang <[email protected]> wrote:
>>>>> We observed the following kmemleak report:
>>>>> unreferenced object 0xffff000007904500 (size 128):
>>>>> comm "swapper/0", pid 1, jiffies 4294892671 (age 44.036s)
>>>>> hex dump (first 32 bytes):
>>>>> 00 47 90 07 00 00 ff ff 60 00 c0 ff 00 00 00 00 .G......`.......
>>>>> 60 00 80 13 00 80 ff ff a0 00 00 00 00 00 00 00 `...............
>>>>> backtrace:
>>>>> [<000000004c12b1c7>] kmem_cache_alloc+0x1ac/0x2f4
>>>>> [<000000005d23eb4f>] tee_shm_alloc+0x78/0x230
>>>>> [<00000000794dd22c>] optee_handle_rpc+0x60/0x6f0
>>>>> [<00000000d9f7c52d>] optee_do_call_with_arg+0x17c/0x1dc
>>>>> [<00000000c35884da>] optee_open_session+0x128/0x1ec
>>>>> [<000000001748f2ff>] tee_client_open_session+0x28/0x40
>>>>> [<00000000aecb5389>] optee_enumerate_devices+0x84/0x2a0
>>>>> [<000000003df18bf1>] optee_probe+0x674/0x6cc
>>>>> [<000000003a4a534a>] platform_drv_probe+0x54/0xb0
>>>>> [<000000000c51ce7d>] really_probe+0xe4/0x4d0
>>>>> [<000000002f04c865>] driver_probe_device+0x58/0xc0
>>>>> [<00000000b485397d>] device_driver_attach+0xc0/0xd0
>>>>> [<00000000c835f0df>] __driver_attach+0x84/0x124
>>>>> [<000000008e5a429c>] bus_for_each_dev+0x70/0xc0
>>>>> [<000000001735e8a8>] driver_attach+0x24/0x30
>>>>> [<000000006d94b04f>] bus_add_driver+0x104/0x1ec
>>>>>
>>>>> This is not a memory leak because we pass the share memory pointer to
>>>>> secure world and would get it from secure world before releasing it.
>>>>> How about if it's actually a memory leak caused by the secure world?
>>>>> An example being secure world just allocates kernel memory via OPTEE_SMC_RPC_FUNC_ALLOC and doesn't free it via OPTEE_SMC_RPC_FUNC_FREE.
>>>>> IMO, we need to cross-check optee-os if it's responsible for leaking kernel memory.
>>>> Hi sumit,
>>>>
>>>> You mean we need to check whether there is a real memleak,
>>>> If being secure world just allocate kernel memory via OPTEE_SMC_PRC_FUNC_ALLOC and until the end, there is no free
>>>> It via OPTEE_SMC_PRC_FUNC_FREE, then we should judge it as a memory leak, wo need to judge whether it is caused by secure os?
>>> Yes. AFAICT, optee-os should allocate shared memory to communicate
>>> with tee-supplicant. So once the communication is done, the underlying
>>> shared memory should be freed. I can't think of any scenario where
>>> optee-os should keep hold-off shared memory indefinitely.
>> I believe it can happen when OP-TEE's CFG_PREALLOC_RPC_CACHE is y. See
>> the config file [1] and the commit which introduced this config [2].
> Okay, I see the reasoning. So during the OP-TEE driver's lifetime, the
> RPC shared memory remains allocated. I guess that is done primarily
> for performance reasons.
>
> But still it doesn't feel appropriate that we term all RPC shm
> allocations as not leaking memory as we might miss obvious ones.
>
> Xiaolei,
>
> Can you once test with CFG_PREALLOC_RPC_CACHE=n while compiling
> optee-os and see if the observed memory leak disappears or not?
>
> -Sumit

Hi sumit


The version I am using has not increased the CFG_PREALLOC_RPC_CACHE

switch, I checked out to the latest version, but because of the need for

additional patches for the imx8 platform, I still have no way to test the

CFG_PREALLOC_RPC_CACHE=n situation


thanks

xiaolei

>
>> [1] https://github.com/OP-TEE/optee_os/blob/3.15.0/mk/config.mk#L709
>> [2] https://github.com/OP-TEE/optee_os/commit/8887663248ad
>>
>> --
>> Jerome

2021-12-13 08:58:18

by Sumit Garg

[permalink] [raw]
Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()

On Fri, 10 Dec 2021 at 21:19, Daniel Thompson
<[email protected]> wrote:
>
> On Fri, Dec 10, 2021 at 03:08:21PM +0530, Sumit Garg wrote:
> > On Fri, 10 Dec 2021 at 13:40, Jerome Forissier <[email protected]> wrote:
> > >
> > > +CC Jens, Etienne
> > >
> > > On 12/10/21 06:00, Sumit Garg wrote:
> > > > On Fri, 10 Dec 2021 at 09:42, Wang, Xiaolei <[email protected]> wrote:
> > > >>
> > > >> -----Original Message-----
> > > >> From: Sumit Garg <[email protected]>
> > > >> Sent: Thursday, December 9, 2021 7:41 PM
> > > >> To: Wang, Xiaolei <[email protected]>
> > > >> Cc: [email protected]; [email protected]; [email protected]
> > > >> Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()
> > > >>
> > > >> [Please note: This e-mail is from an EXTERNAL e-mail address]
> > > >>
> > > >> On Mon, 6 Dec 2021 at 17:35, Xiaolei Wang <[email protected]> wrote:
> > > >>>
> > > >>> We observed the following kmemleak report:
> > > >>> unreferenced object 0xffff000007904500 (size 128):
> > > >>> comm "swapper/0", pid 1, jiffies 4294892671 (age 44.036s)
> > > >>> hex dump (first 32 bytes):
> > > >>> 00 47 90 07 00 00 ff ff 60 00 c0 ff 00 00 00 00 .G......`.......
> > > >>> 60 00 80 13 00 80 ff ff a0 00 00 00 00 00 00 00 `...............
> > > >>> backtrace:
> > > >>> [<000000004c12b1c7>] kmem_cache_alloc+0x1ac/0x2f4
> > > >>> [<000000005d23eb4f>] tee_shm_alloc+0x78/0x230
> > > >>> [<00000000794dd22c>] optee_handle_rpc+0x60/0x6f0
> > > >>> [<00000000d9f7c52d>] optee_do_call_with_arg+0x17c/0x1dc
> > > >>> [<00000000c35884da>] optee_open_session+0x128/0x1ec
> > > >>> [<000000001748f2ff>] tee_client_open_session+0x28/0x40
> > > >>> [<00000000aecb5389>] optee_enumerate_devices+0x84/0x2a0
> > > >>> [<000000003df18bf1>] optee_probe+0x674/0x6cc
> > > >>> [<000000003a4a534a>] platform_drv_probe+0x54/0xb0
> > > >>> [<000000000c51ce7d>] really_probe+0xe4/0x4d0
> > > >>> [<000000002f04c865>] driver_probe_device+0x58/0xc0
> > > >>> [<00000000b485397d>] device_driver_attach+0xc0/0xd0
> > > >>> [<00000000c835f0df>] __driver_attach+0x84/0x124
> > > >>> [<000000008e5a429c>] bus_for_each_dev+0x70/0xc0
> > > >>> [<000000001735e8a8>] driver_attach+0x24/0x30
> > > >>> [<000000006d94b04f>] bus_add_driver+0x104/0x1ec
> > > >>>
> > > >>> This is not a memory leak because we pass the share memory pointer to
> > > >>> secure world and would get it from secure world before releasing it.
> > > >>
> > > >>> How about if it's actually a memory leak caused by the secure world?
> > > >>> An example being secure world just allocates kernel memory via OPTEE_SMC_RPC_FUNC_ALLOC and doesn't free it via OPTEE_SMC_RPC_FUNC_FREE.
> > > >>
> > > >>> IMO, we need to cross-check optee-os if it's responsible for leaking kernel memory.
> > > >>
> > > >> Hi sumit,
> > > >>
> > > >> You mean we need to check whether there is a real memleak,
> > > >> If being secure world just allocate kernel memory via OPTEE_SMC_PRC_FUNC_ALLOC and until the end, there is no free
> > > >> It via OPTEE_SMC_PRC_FUNC_FREE, then we should judge it as a memory leak, wo need to judge whether it is caused by secure os?
> > > >
> > > > Yes. AFAICT, optee-os should allocate shared memory to communicate
> > > > with tee-supplicant. So once the communication is done, the underlying
> > > > shared memory should be freed. I can't think of any scenario where
> > > > optee-os should keep hold-off shared memory indefinitely.
> > >
> > > I believe it can happen when OP-TEE's CFG_PREALLOC_RPC_CACHE is y. See
> > > the config file [1] and the commit which introduced this config [2].
> >
> > Okay, I see the reasoning. So during the OP-TEE driver's lifetime, the
> > RPC shared memory remains allocated. I guess that is done primarily
> > for performance reasons.
> >
> > But still it doesn't feel appropriate that we term all RPC shm
> > allocations as not leaking memory as we might miss obvious ones.
>
> IIUC this patch adds kmemleak_not_leak() at (pretty much) the last
> possible point before *ownership* of the SHM block is passed from kernel
> to OP-TEE.

I wouldn't say it's a transfer of ownership from kernel to OP-TEE but
rather a way for OP-TEE to access kernel's memory in order to pass
info or execute further RPC commands.

>
> Sure, after we change ownership it could still be leaked... but it can
> no longer be leaked by the kernel because the kernel no longer owns it!
> More importantly, it makes no sense to run the kernel memory detector on the
> buffer because it simply can't work.
>
> After the RPC completes, doesn't it become impossible for kmemleak to
> scan to see if the pointer is lost[1]?

Apart from the special OP-TEE prealloc SHM cache stuff, I can't think
of any scenario where an OP-TEE thread should hold off kernel's memory
pointers for more than 5 seconds before being passed onto kernel for
further RPC commands or RPC free action. So the kmemleak should be
able to detect if a pointer is lost.

/*
* Kmemleak configuration and common defines.
*/
<snip>
#define MSECS_MIN_AGE 5000 /* minimum object age for reporting */
<snip>

> kmemleak is essentially a tracing
> garbage collector and needs to be able to scan all memory that could
> hold a pointer to leakable memory. After the RPC completes the
> only copy of the pointer will be stored in a memory region that the
> kernel is prohibited from reading. How could kmemleak possibly give you
> a useful answer in this circumstance?
>

There is another aspect of kmemleak being the minimum age of an object
to be reported as a memory leak as described above. Also, this case
resembles where a pointer is stored on the CPU stack (see struct
optee_rpc_param param = { };).

In most of the scenarios apart from special prealloc shm cache case,
the flow should be as follows:

1) Alloc kernel memory via RPC
2) OP-TEE passes references to kernel memory for RPC action commands
3) Free kernel memory via RPC

kmemleak should be useful in case the 3rd step is skipped due to
incorrect behaviour of a particular OP-TEE thread. And I can't think
of any appropriate way in OP-TEE OS to detect this type of kernel
memory leak caused by one of its threads.

-Sumit

> In other words if there's nothing kmemleak could do to fix this
> situation then marking the memory as kmemleak_not_leak() seems entirely
> appropriate (although it should be prefaced with a big comment
> explaining the change of ownerhship and why kmemleak cannot work).
>
>
> Daniel.
>
>
> [1] Everything I've said here hinges on this being true... so if I've
> made a mistake about where OP-TEE stores this pointer then most of
> the rest of this post is junk ;-)

2021-12-13 09:04:29

by Sumit Garg

[permalink] [raw]
Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()

On Mon, 13 Dec 2021 at 14:25, wangxiaolei <[email protected]> wrote:
>
>
> On 12/10/21 5:38 PM, Sumit Garg wrote:
> > [Please note: This e-mail is from an EXTERNAL e-mail address]
> >
> > On Fri, 10 Dec 2021 at 13:40, Jerome Forissier <[email protected]> wrote:
> >> +CC Jens, Etienne
> >>
> >> On 12/10/21 06:00, Sumit Garg wrote:
> >>> On Fri, 10 Dec 2021 at 09:42, Wang, Xiaolei <[email protected]> wrote:
> >>>> -----Original Message-----
> >>>> From: Sumit Garg <[email protected]>
> >>>> Sent: Thursday, December 9, 2021 7:41 PM
> >>>> To: Wang, Xiaolei <[email protected]>
> >>>> Cc: [email protected]; [email protected]; [email protected]
> >>>> Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()
> >>>>
> >>>> [Please note: This e-mail is from an EXTERNAL e-mail address]
> >>>>
> >>>> On Mon, 6 Dec 2021 at 17:35, Xiaolei Wang <[email protected]> wrote:
> >>>>> We observed the following kmemleak report:
> >>>>> unreferenced object 0xffff000007904500 (size 128):
> >>>>> comm "swapper/0", pid 1, jiffies 4294892671 (age 44.036s)
> >>>>> hex dump (first 32 bytes):
> >>>>> 00 47 90 07 00 00 ff ff 60 00 c0 ff 00 00 00 00 .G......`.......
> >>>>> 60 00 80 13 00 80 ff ff a0 00 00 00 00 00 00 00 `...............
> >>>>> backtrace:
> >>>>> [<000000004c12b1c7>] kmem_cache_alloc+0x1ac/0x2f4
> >>>>> [<000000005d23eb4f>] tee_shm_alloc+0x78/0x230
> >>>>> [<00000000794dd22c>] optee_handle_rpc+0x60/0x6f0
> >>>>> [<00000000d9f7c52d>] optee_do_call_with_arg+0x17c/0x1dc
> >>>>> [<00000000c35884da>] optee_open_session+0x128/0x1ec
> >>>>> [<000000001748f2ff>] tee_client_open_session+0x28/0x40
> >>>>> [<00000000aecb5389>] optee_enumerate_devices+0x84/0x2a0
> >>>>> [<000000003df18bf1>] optee_probe+0x674/0x6cc
> >>>>> [<000000003a4a534a>] platform_drv_probe+0x54/0xb0
> >>>>> [<000000000c51ce7d>] really_probe+0xe4/0x4d0
> >>>>> [<000000002f04c865>] driver_probe_device+0x58/0xc0
> >>>>> [<00000000b485397d>] device_driver_attach+0xc0/0xd0
> >>>>> [<00000000c835f0df>] __driver_attach+0x84/0x124
> >>>>> [<000000008e5a429c>] bus_for_each_dev+0x70/0xc0
> >>>>> [<000000001735e8a8>] driver_attach+0x24/0x30
> >>>>> [<000000006d94b04f>] bus_add_driver+0x104/0x1ec
> >>>>>
> >>>>> This is not a memory leak because we pass the share memory pointer to
> >>>>> secure world and would get it from secure world before releasing it.
> >>>>> How about if it's actually a memory leak caused by the secure world?
> >>>>> An example being secure world just allocates kernel memory via OPTEE_SMC_RPC_FUNC_ALLOC and doesn't free it via OPTEE_SMC_RPC_FUNC_FREE.
> >>>>> IMO, we need to cross-check optee-os if it's responsible for leaking kernel memory.
> >>>> Hi sumit,
> >>>>
> >>>> You mean we need to check whether there is a real memleak,
> >>>> If being secure world just allocate kernel memory via OPTEE_SMC_PRC_FUNC_ALLOC and until the end, there is no free
> >>>> It via OPTEE_SMC_PRC_FUNC_FREE, then we should judge it as a memory leak, wo need to judge whether it is caused by secure os?
> >>> Yes. AFAICT, optee-os should allocate shared memory to communicate
> >>> with tee-supplicant. So once the communication is done, the underlying
> >>> shared memory should be freed. I can't think of any scenario where
> >>> optee-os should keep hold-off shared memory indefinitely.
> >> I believe it can happen when OP-TEE's CFG_PREALLOC_RPC_CACHE is y. See
> >> the config file [1] and the commit which introduced this config [2].
> > Okay, I see the reasoning. So during the OP-TEE driver's lifetime, the
> > RPC shared memory remains allocated. I guess that is done primarily
> > for performance reasons.
> >
> > But still it doesn't feel appropriate that we term all RPC shm
> > allocations as not leaking memory as we might miss obvious ones.
> >
> > Xiaolei,
> >
> > Can you once test with CFG_PREALLOC_RPC_CACHE=n while compiling
> > optee-os and see if the observed memory leak disappears or not?
> >
> > -Sumit
>
> Hi sumit
>
>
> The version I am using has not increased the CFG_PREALLOC_RPC_CACHE
>
> switch, I checked out to the latest version, but because of the need for
>
> additional patches for the imx8 platform, I still have no way to test the
>
> CFG_PREALLOC_RPC_CACHE=n situation
>

Can you just try to backport this [1] patch to your imx8 optee-os tree and test?

[1] https://github.com/OP-TEE/optee_os/commit/8887663248ad

-Sumit

>
> thanks
>
> xiaolei
>
> >
> >> [1] https://github.com/OP-TEE/optee_os/blob/3.15.0/mk/config.mk#L709
> >> [2] https://github.com/OP-TEE/optee_os/commit/8887663248ad
> >>
> >> --
> >> Jerome

2021-12-13 13:04:07

by Daniel Thompson

[permalink] [raw]
Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()

On Mon, Dec 13, 2021 at 02:28:01PM +0530, Sumit Garg wrote:
> On Fri, 10 Dec 2021 at 21:19, Daniel Thompson
> <[email protected]> wrote:
> > On Fri, Dec 10, 2021 at 03:08:21PM +0530, Sumit Garg wrote:
> > > On Fri, 10 Dec 2021 at 13:40, Jerome Forissier <[email protected]> wrote:
> > > > On 12/10/21 06:00, Sumit Garg wrote:
> > > > > On Fri, 10 Dec 2021 at 09:42, Wang, Xiaolei <[email protected]> wrote:
> > > > > Yes. AFAICT, optee-os should allocate shared memory to communicate
> > > > > with tee-supplicant. So once the communication is done, the underlying
> > > > > shared memory should be freed. I can't think of any scenario where
> > > > > optee-os should keep hold-off shared memory indefinitely.
> > > >
> > > > I believe it can happen when OP-TEE's CFG_PREALLOC_RPC_CACHE is y. See
> > > > the config file [1] and the commit which introduced this config [2].
> > >
> > > Okay, I see the reasoning. So during the OP-TEE driver's lifetime, the
> > > RPC shared memory remains allocated. I guess that is done primarily
> > > for performance reasons.
> > >
> > > But still it doesn't feel appropriate that we term all RPC shm
> > > allocations as not leaking memory as we might miss obvious ones.
> >
> > IIUC this patch adds kmemleak_not_leak() at (pretty much) the last
> > possible point before *ownership* of the SHM block is passed from kernel
> > to OP-TEE.
>
> I wouldn't say it's a transfer of ownership from kernel to OP-TEE but
> rather a way for OP-TEE to access kernel's memory in order to pass
> info or execute further RPC commands.

The RPC handler allocates a pointer (e.g. now the RPC handler owns the
allocated memory). The RPC handler then passes that pointer to OP-TEE and
forgets what it's value was.

That is a transfer of ownership: the RPC handler does not hold any pointer
to the memory and is incapable of freeing it. Moreover this situation is
what kmemleak_no_leak() is for! Its job it to inform kmemleak that the
pointer is owned/stored somewhere that is does not scan.


> > Sure, after we change ownership it could still be leaked... but it can
> > no longer be leaked by the kernel because the kernel no longer owns it!
> > More importantly, it makes no sense to run the kernel memory detector on the
> > buffer because it simply can't work.
> >
> > After the RPC completes, doesn't it become impossible for kmemleak to
> > scan to see if the pointer is lost[1]?
>
> Apart from the special OP-TEE prealloc SHM cache stuff, I can't think
> of any scenario where an OP-TEE thread should hold off kernel's memory
> pointers for more than 5 seconds before being passed onto kernel for
> further RPC commands or RPC free action. So the kmemleak should be
> able to detect if a pointer is lost.

Or putting this a different way: there is known to be firmware in the
field that allocates pointers for more then five seconds!


> /*
> * Kmemleak configuration and common defines.
> */
> <snip>
> #define MSECS_MIN_AGE 5000 /* minimum object age for reporting */
> <snip>
>
> > kmemleak is essentially a tracing
> > garbage collector and needs to be able to scan all memory that could
> > hold a pointer to leakable memory. After the RPC completes the
> > only copy of the pointer will be stored in a memory region that the
> > kernel is prohibited from reading. How could kmemleak possibly give you
> > a useful answer in this circumstance?
> >
>
> There is another aspect of kmemleak being the minimum age of an object
> to be reported as a memory leak as described above. Also, this case
> resembles where a pointer is stored on the CPU stack (see struct
> optee_rpc_param param = { };).

I can't see how this resembles pointers stored on the stack.

Firstly, stack memory is scanned by kmemleak meaning a thread is
permitted to own memory for more than five seconds without provoking a
warning. OP-TEE memory cannot be scanned like this.

Secondly, stacks don't have any concept of sessions. It is *really*
buggy behaviour for a TA to allocate SHM memory during a session open so
it can avoid critical path RPC round trips when operational?


> In most of the scenarios apart from special prealloc shm cache case,
> the flow should be as follows:
>
> 1) Alloc kernel memory via RPC
> 2) OP-TEE passes references to kernel memory for RPC action commands
> 3) Free kernel memory via RPC
>
> kmemleak should be useful in case the 3rd step is skipped due to
> incorrect behaviour of a particular OP-TEE thread. And I can't think
> of any appropriate way in OP-TEE OS to detect this type of kernel
> memory leak caused by one of its threads.

If OP-TEE is the only place the pointer is held and you can't think of
any way for OP-TEE OS to detect if it has leaked the pointer then how
can you expect the kernel to give a correct verdict when it has even
less visibility than OP-TEE OS.

Note that, if you think OP-TEE routinely leaks memory, then there are
ways that the corresponding kernel driver could track what memory it has
handed to OP-TEE. However this should be described as a list of
*allocations* rather than a list of *leaks* because the driver cannot
distinguish the two.


Daniel.

2021-12-14 07:03:27

by Sumit Garg

[permalink] [raw]
Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()

On Mon, 13 Dec 2021 at 18:34, Daniel Thompson
<[email protected]> wrote:
>
> On Mon, Dec 13, 2021 at 02:28:01PM +0530, Sumit Garg wrote:
> > On Fri, 10 Dec 2021 at 21:19, Daniel Thompson
> > <[email protected]> wrote:
> > > On Fri, Dec 10, 2021 at 03:08:21PM +0530, Sumit Garg wrote:
> > > > On Fri, 10 Dec 2021 at 13:40, Jerome Forissier <[email protected]> wrote:
> > > > > On 12/10/21 06:00, Sumit Garg wrote:
> > > > > > On Fri, 10 Dec 2021 at 09:42, Wang, Xiaolei <[email protected]> wrote:
> > > > > > Yes. AFAICT, optee-os should allocate shared memory to communicate
> > > > > > with tee-supplicant. So once the communication is done, the underlying
> > > > > > shared memory should be freed. I can't think of any scenario where
> > > > > > optee-os should keep hold-off shared memory indefinitely.
> > > > >
> > > > > I believe it can happen when OP-TEE's CFG_PREALLOC_RPC_CACHE is y. See
> > > > > the config file [1] and the commit which introduced this config [2].
> > > >
> > > > Okay, I see the reasoning. So during the OP-TEE driver's lifetime, the
> > > > RPC shared memory remains allocated. I guess that is done primarily
> > > > for performance reasons.
> > > >
> > > > But still it doesn't feel appropriate that we term all RPC shm
> > > > allocations as not leaking memory as we might miss obvious ones.
> > >
> > > IIUC this patch adds kmemleak_not_leak() at (pretty much) the last
> > > possible point before *ownership* of the SHM block is passed from kernel
> > > to OP-TEE.
> >
> > I wouldn't say it's a transfer of ownership from kernel to OP-TEE but
> > rather a way for OP-TEE to access kernel's memory in order to pass
> > info or execute further RPC commands.
>
> The RPC handler allocates a pointer (e.g. now the RPC handler owns the
> allocated memory). The RPC handler then passes that pointer to OP-TEE and
> forgets what it's value was.
>
> That is a transfer of ownership: the RPC handler does not hold any pointer
> to the memory and is incapable of freeing it. Moreover this situation is
> what kmemleak_no_leak() is for! Its job it to inform kmemleak that the
> pointer is owned/stored somewhere that is does not scan.
>

Let me put this another way. If the memory allocator belongs to the
kernel then how does OP-TEE get to know which memory is currently
allocated and it is to be scanned?

I think the complete solution would be to extend kmemleak to support
OP-TEE memory scanning via OP-TEE invocation to check if it's holding
any kernel memory references.

>
> > > Sure, after we change ownership it could still be leaked... but it can
> > > no longer be leaked by the kernel because the kernel no longer owns it!
> > > More importantly, it makes no sense to run the kernel memory detector on the
> > > buffer because it simply can't work.
> > >
> > > After the RPC completes, doesn't it become impossible for kmemleak to
> > > scan to see if the pointer is lost[1]?
> >
> > Apart from the special OP-TEE prealloc SHM cache stuff, I can't think
> > of any scenario where an OP-TEE thread should hold off kernel's memory
> > pointers for more than 5 seconds before being passed onto kernel for
> > further RPC commands or RPC free action. So the kmemleak should be
> > able to detect if a pointer is lost.
>
> Or putting this a different way: there is known to be firmware in the
> field that allocates pointers for more then five seconds!
>

If it's known that upstream OP-TEE doesn't hold any kernel memory
references for more than 5 seconds then IMO we should be fine to not
disable kmemleak until we have a future kmemleak extension. Otherwise
it would be very hard to keep track of kernel memory lost in this way.

>
> > /*
> > * Kmemleak configuration and common defines.
> > */
> > <snip>
> > #define MSECS_MIN_AGE 5000 /* minimum object age for reporting */
> > <snip>
> >
> > > kmemleak is essentially a tracing
> > > garbage collector and needs to be able to scan all memory that could
> > > hold a pointer to leakable memory. After the RPC completes the
> > > only copy of the pointer will be stored in a memory region that the
> > > kernel is prohibited from reading. How could kmemleak possibly give you
> > > a useful answer in this circumstance?
> > >
> >
> > There is another aspect of kmemleak being the minimum age of an object
> > to be reported as a memory leak as described above. Also, this case
> > resembles where a pointer is stored on the CPU stack (see struct
> > optee_rpc_param param = { };).
>
> I can't see how this resembles pointers stored on the stack.
>
> Firstly, stack memory is scanned by kmemleak meaning a thread is
> permitted to own memory for more than five seconds without provoking a
> warning. OP-TEE memory cannot be scanned like this.
>
> Secondly, stacks don't have any concept of sessions. It is *really*
> buggy behaviour for a TA to allocate SHM memory during a session open so
> it can avoid critical path RPC round trips when operational?
>

That's one optimization case for prealloc SHM we have currently. Jens
is already working on an update for SMC ABI to avoid such allocations.

BTW, that could be disabled with CFG_PREALLOC_RPC_CACHE=n in upstream OP-TEE.

>
> > In most of the scenarios apart from special prealloc shm cache case,
> > the flow should be as follows:
> >
> > 1) Alloc kernel memory via RPC
> > 2) OP-TEE passes references to kernel memory for RPC action commands
> > 3) Free kernel memory via RPC
> >
> > kmemleak should be useful in case the 3rd step is skipped due to
> > incorrect behaviour of a particular OP-TEE thread. And I can't think
> > of any appropriate way in OP-TEE OS to detect this type of kernel
> > memory leak caused by one of its threads.
>
> If OP-TEE is the only place the pointer is held and you can't think of
> any way for OP-TEE OS to detect if it has leaked the pointer then how
> can you expect the kernel to give a correct verdict when it has even
> less visibility than OP-TEE OS.

Let me try to elaborate here. Since the nature of shared memory under
consideration here are:

1) Allocated by kernel.
2) OP-TEE holds reference to it.

In order to detect its leakage, the memory leakage detector should be
invoked by the kernel and OP-TEE should support that detector to say
if it holds any references to the memory being scanned.

Also, as you may know OP-TEE is a passive firmware entity without any
threads of its own and is scheduled by the Linux kernel. So it won't
be possible to have an independent memory leak detector thread within
OP-TEE.

>
> Note that, if you think OP-TEE routinely leaks memory,

How can we be sure about any piece of complex software (lots of
platform specific code) that it will leak memory or not? I guess that
is the reason why the kernel has this runtime kmemleak debugging tool.

-Sumit

> then there are
> ways that the corresponding kernel driver could track what memory it has
> handed to OP-TEE. However this should be described as a list of
> *allocations* rather than a list of *leaks* because the driver cannot
> distinguish the two.
>
>
> Daniel.

2021-12-14 07:11:21

by Xiaolei Wang

[permalink] [raw]
Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()


On 12/13/21 5:04 PM, Sumit Garg wrote:
> [Please note: This e-mail is from an EXTERNAL e-mail address]
>
> On Mon, 13 Dec 2021 at 14:25, wangxiaolei <[email protected]> wrote:
>>
>> On 12/10/21 5:38 PM, Sumit Garg wrote:
>>> [Please note: This e-mail is from an EXTERNAL e-mail address]
>>>
>>> On Fri, 10 Dec 2021 at 13:40, Jerome Forissier <[email protected]> wrote:
>>>> +CC Jens, Etienne
>>>>
>>>> On 12/10/21 06:00, Sumit Garg wrote:
>>>>> On Fri, 10 Dec 2021 at 09:42, Wang, Xiaolei <[email protected]> wrote:
>>>>>> -----Original Message-----
>>>>>> From: Sumit Garg <[email protected]>
>>>>>> Sent: Thursday, December 9, 2021 7:41 PM
>>>>>> To: Wang, Xiaolei <[email protected]>
>>>>>> Cc: [email protected]; [email protected]; [email protected]
>>>>>> Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()
>>>>>>
>>>>>> [Please note: This e-mail is from an EXTERNAL e-mail address]
>>>>>>
>>>>>> On Mon, 6 Dec 2021 at 17:35, Xiaolei Wang <[email protected]> wrote:
>>>>>>> We observed the following kmemleak report:
>>>>>>> unreferenced object 0xffff000007904500 (size 128):
>>>>>>> comm "swapper/0", pid 1, jiffies 4294892671 (age 44.036s)
>>>>>>> hex dump (first 32 bytes):
>>>>>>> 00 47 90 07 00 00 ff ff 60 00 c0 ff 00 00 00 00 .G......`.......
>>>>>>> 60 00 80 13 00 80 ff ff a0 00 00 00 00 00 00 00 `...............
>>>>>>> backtrace:
>>>>>>> [<000000004c12b1c7>] kmem_cache_alloc+0x1ac/0x2f4
>>>>>>> [<000000005d23eb4f>] tee_shm_alloc+0x78/0x230
>>>>>>> [<00000000794dd22c>] optee_handle_rpc+0x60/0x6f0
>>>>>>> [<00000000d9f7c52d>] optee_do_call_with_arg+0x17c/0x1dc
>>>>>>> [<00000000c35884da>] optee_open_session+0x128/0x1ec
>>>>>>> [<000000001748f2ff>] tee_client_open_session+0x28/0x40
>>>>>>> [<00000000aecb5389>] optee_enumerate_devices+0x84/0x2a0
>>>>>>> [<000000003df18bf1>] optee_probe+0x674/0x6cc
>>>>>>> [<000000003a4a534a>] platform_drv_probe+0x54/0xb0
>>>>>>> [<000000000c51ce7d>] really_probe+0xe4/0x4d0
>>>>>>> [<000000002f04c865>] driver_probe_device+0x58/0xc0
>>>>>>> [<00000000b485397d>] device_driver_attach+0xc0/0xd0
>>>>>>> [<00000000c835f0df>] __driver_attach+0x84/0x124
>>>>>>> [<000000008e5a429c>] bus_for_each_dev+0x70/0xc0
>>>>>>> [<000000001735e8a8>] driver_attach+0x24/0x30
>>>>>>> [<000000006d94b04f>] bus_add_driver+0x104/0x1ec
>>>>>>>
>>>>>>> This is not a memory leak because we pass the share memory pointer to
>>>>>>> secure world and would get it from secure world before releasing it.
>>>>>>> How about if it's actually a memory leak caused by the secure world?
>>>>>>> An example being secure world just allocates kernel memory via OPTEE_SMC_RPC_FUNC_ALLOC and doesn't free it via OPTEE_SMC_RPC_FUNC_FREE.
>>>>>>> IMO, we need to cross-check optee-os if it's responsible for leaking kernel memory.
>>>>>> Hi sumit,
>>>>>>
>>>>>> You mean we need to check whether there is a real memleak,
>>>>>> If being secure world just allocate kernel memory via OPTEE_SMC_PRC_FUNC_ALLOC and until the end, there is no free
>>>>>> It via OPTEE_SMC_PRC_FUNC_FREE, then we should judge it as a memory leak, wo need to judge whether it is caused by secure os?
>>>>> Yes. AFAICT, optee-os should allocate shared memory to communicate
>>>>> with tee-supplicant. So once the communication is done, the underlying
>>>>> shared memory should be freed. I can't think of any scenario where
>>>>> optee-os should keep hold-off shared memory indefinitely.
>>>> I believe it can happen when OP-TEE's CFG_PREALLOC_RPC_CACHE is y. See
>>>> the config file [1] and the commit which introduced this config [2].
>>> Okay, I see the reasoning. So during the OP-TEE driver's lifetime, the
>>> RPC shared memory remains allocated. I guess that is done primarily
>>> for performance reasons.
>>>
>>> But still it doesn't feel appropriate that we term all RPC shm
>>> allocations as not leaking memory as we might miss obvious ones.
>>>
>>> Xiaolei,
>>>
>>> Can you once test with CFG_PREALLOC_RPC_CACHE=n while compiling
>>> optee-os and see if the observed memory leak disappears or not?
>>>
>>> -Sumit
>> Hi sumit
>>
>>
>> The version I am using has not increased the CFG_PREALLOC_RPC_CACHE
>>
>> switch, I checked out to the latest version, but because of the need for
>>
>> additional patches for the imx8 platform, I still have no way to test the
>>
>> CFG_PREALLOC_RPC_CACHE=n situation
>>
> Can you just try to backport this [1] patch to your imx8 optee-os tree and test?
>
> [1] https://github.com/OP-TEE/optee_os/commit/8887663248ad

Hi sumit

I upgraded optee-os from version 3.2.0 to 3.13.0, and the kernel did not
detect this problem.

I have not set CFG_PREALLOC_RPC_CACHE to n. This should be a problem
that occurs when compatible

with lower versions.


thanks

xiaolei

>
> -Sumit
>
>> thanks
>>
>> xiaolei
>>
>>>> [1] https://github.com/OP-TEE/optee_os/blob/3.15.0/mk/config.mk#L709
>>>> [2] https://github.com/OP-TEE/optee_os/commit/8887663248ad
>>>>
>>>> --
>>>> Jerome

2021-12-14 07:29:44

by Sumit Garg

[permalink] [raw]
Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()

On Tue, 14 Dec 2021 at 12:41, wangxiaolei <[email protected]> wrote:
>
>
> On 12/13/21 5:04 PM, Sumit Garg wrote:
> > [Please note: This e-mail is from an EXTERNAL e-mail address]
> >
> > On Mon, 13 Dec 2021 at 14:25, wangxiaolei <[email protected]> wrote:
> >>
> >> On 12/10/21 5:38 PM, Sumit Garg wrote:
> >>> [Please note: This e-mail is from an EXTERNAL e-mail address]
> >>>
> >>> On Fri, 10 Dec 2021 at 13:40, Jerome Forissier <[email protected]> wrote:
> >>>> +CC Jens, Etienne
> >>>>
> >>>> On 12/10/21 06:00, Sumit Garg wrote:
> >>>>> On Fri, 10 Dec 2021 at 09:42, Wang, Xiaolei <[email protected]> wrote:
> >>>>>> -----Original Message-----
> >>>>>> From: Sumit Garg <[email protected]>
> >>>>>> Sent: Thursday, December 9, 2021 7:41 PM
> >>>>>> To: Wang, Xiaolei <[email protected]>
> >>>>>> Cc: [email protected]; [email protected]; [email protected]
> >>>>>> Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()
> >>>>>>
> >>>>>> [Please note: This e-mail is from an EXTERNAL e-mail address]
> >>>>>>
> >>>>>> On Mon, 6 Dec 2021 at 17:35, Xiaolei Wang <[email protected]> wrote:
> >>>>>>> We observed the following kmemleak report:
> >>>>>>> unreferenced object 0xffff000007904500 (size 128):
> >>>>>>> comm "swapper/0", pid 1, jiffies 4294892671 (age 44.036s)
> >>>>>>> hex dump (first 32 bytes):
> >>>>>>> 00 47 90 07 00 00 ff ff 60 00 c0 ff 00 00 00 00 .G......`.......
> >>>>>>> 60 00 80 13 00 80 ff ff a0 00 00 00 00 00 00 00 `...............
> >>>>>>> backtrace:
> >>>>>>> [<000000004c12b1c7>] kmem_cache_alloc+0x1ac/0x2f4
> >>>>>>> [<000000005d23eb4f>] tee_shm_alloc+0x78/0x230
> >>>>>>> [<00000000794dd22c>] optee_handle_rpc+0x60/0x6f0
> >>>>>>> [<00000000d9f7c52d>] optee_do_call_with_arg+0x17c/0x1dc
> >>>>>>> [<00000000c35884da>] optee_open_session+0x128/0x1ec
> >>>>>>> [<000000001748f2ff>] tee_client_open_session+0x28/0x40
> >>>>>>> [<00000000aecb5389>] optee_enumerate_devices+0x84/0x2a0
> >>>>>>> [<000000003df18bf1>] optee_probe+0x674/0x6cc
> >>>>>>> [<000000003a4a534a>] platform_drv_probe+0x54/0xb0
> >>>>>>> [<000000000c51ce7d>] really_probe+0xe4/0x4d0
> >>>>>>> [<000000002f04c865>] driver_probe_device+0x58/0xc0
> >>>>>>> [<00000000b485397d>] device_driver_attach+0xc0/0xd0
> >>>>>>> [<00000000c835f0df>] __driver_attach+0x84/0x124
> >>>>>>> [<000000008e5a429c>] bus_for_each_dev+0x70/0xc0
> >>>>>>> [<000000001735e8a8>] driver_attach+0x24/0x30
> >>>>>>> [<000000006d94b04f>] bus_add_driver+0x104/0x1ec
> >>>>>>>
> >>>>>>> This is not a memory leak because we pass the share memory pointer to
> >>>>>>> secure world and would get it from secure world before releasing it.
> >>>>>>> How about if it's actually a memory leak caused by the secure world?
> >>>>>>> An example being secure world just allocates kernel memory via OPTEE_SMC_RPC_FUNC_ALLOC and doesn't free it via OPTEE_SMC_RPC_FUNC_FREE.
> >>>>>>> IMO, we need to cross-check optee-os if it's responsible for leaking kernel memory.
> >>>>>> Hi sumit,
> >>>>>>
> >>>>>> You mean we need to check whether there is a real memleak,
> >>>>>> If being secure world just allocate kernel memory via OPTEE_SMC_PRC_FUNC_ALLOC and until the end, there is no free
> >>>>>> It via OPTEE_SMC_PRC_FUNC_FREE, then we should judge it as a memory leak, wo need to judge whether it is caused by secure os?
> >>>>> Yes. AFAICT, optee-os should allocate shared memory to communicate
> >>>>> with tee-supplicant. So once the communication is done, the underlying
> >>>>> shared memory should be freed. I can't think of any scenario where
> >>>>> optee-os should keep hold-off shared memory indefinitely.
> >>>> I believe it can happen when OP-TEE's CFG_PREALLOC_RPC_CACHE is y. See
> >>>> the config file [1] and the commit which introduced this config [2].
> >>> Okay, I see the reasoning. So during the OP-TEE driver's lifetime, the
> >>> RPC shared memory remains allocated. I guess that is done primarily
> >>> for performance reasons.
> >>>
> >>> But still it doesn't feel appropriate that we term all RPC shm
> >>> allocations as not leaking memory as we might miss obvious ones.
> >>>
> >>> Xiaolei,
> >>>
> >>> Can you once test with CFG_PREALLOC_RPC_CACHE=n while compiling
> >>> optee-os and see if the observed memory leak disappears or not?
> >>>
> >>> -Sumit
> >> Hi sumit
> >>
> >>
> >> The version I am using has not increased the CFG_PREALLOC_RPC_CACHE
> >>
> >> switch, I checked out to the latest version, but because of the need for
> >>
> >> additional patches for the imx8 platform, I still have no way to test the
> >>
> >> CFG_PREALLOC_RPC_CACHE=n situation
> >>
> > Can you just try to backport this [1] patch to your imx8 optee-os tree and test?
> >
> > [1] https://github.com/OP-TEE/optee_os/commit/8887663248ad
>
> Hi sumit
>
> I upgraded optee-os from version 3.2.0 to 3.13.0, and the kernel did not
> detect this problem.

Can you check if CFG_TEE_CORE_EMBED_INTERNAL_TESTS is enabled in
optee-os version 3.13.0? As we would require atleast one RPC prealloc
SHM invocation from OP-TEE for kmemleak to detect the problem.

-Sumit

>
> I have not set CFG_PREALLOC_RPC_CACHE to n. This should be a problem
> that occurs when compatible
>
> with lower versions.
>
>
> thanks
>
> xiaolei
>
> >
> > -Sumit
> >
> >> thanks
> >>
> >> xiaolei
> >>
> >>>> [1] https://github.com/OP-TEE/optee_os/blob/3.15.0/mk/config.mk#L709
> >>>> [2] https://github.com/OP-TEE/optee_os/commit/8887663248ad
> >>>>
> >>>> --
> >>>> Jerome

2021-12-14 07:42:09

by Xiaolei Wang

[permalink] [raw]
Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()


On 12/14/21 3:29 PM, Sumit Garg wrote:
> [Please note: This e-mail is from an EXTERNAL e-mail address]
>
> On Tue, 14 Dec 2021 at 12:41, wangxiaolei <[email protected]> wrote:
>>
>> On 12/13/21 5:04 PM, Sumit Garg wrote:
>>> [Please note: This e-mail is from an EXTERNAL e-mail address]
>>>
>>> On Mon, 13 Dec 2021 at 14:25, wangxiaolei <[email protected]> wrote:
>>>> On 12/10/21 5:38 PM, Sumit Garg wrote:
>>>>> [Please note: This e-mail is from an EXTERNAL e-mail address]
>>>>>
>>>>> On Fri, 10 Dec 2021 at 13:40, Jerome Forissier <[email protected]> wrote:
>>>>>> +CC Jens, Etienne
>>>>>>
>>>>>> On 12/10/21 06:00, Sumit Garg wrote:
>>>>>>> On Fri, 10 Dec 2021 at 09:42, Wang, Xiaolei <[email protected]> wrote:
>>>>>>>> -----Original Message-----
>>>>>>>> From: Sumit Garg <[email protected]>
>>>>>>>> Sent: Thursday, December 9, 2021 7:41 PM
>>>>>>>> To: Wang, Xiaolei <[email protected]>
>>>>>>>> Cc: [email protected]; [email protected]; [email protected]
>>>>>>>> Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()
>>>>>>>>
>>>>>>>> [Please note: This e-mail is from an EXTERNAL e-mail address]
>>>>>>>>
>>>>>>>> On Mon, 6 Dec 2021 at 17:35, Xiaolei Wang <[email protected]> wrote:
>>>>>>>>> We observed the following kmemleak report:
>>>>>>>>> unreferenced object 0xffff000007904500 (size 128):
>>>>>>>>> comm "swapper/0", pid 1, jiffies 4294892671 (age 44.036s)
>>>>>>>>> hex dump (first 32 bytes):
>>>>>>>>> 00 47 90 07 00 00 ff ff 60 00 c0 ff 00 00 00 00 .G......`.......
>>>>>>>>> 60 00 80 13 00 80 ff ff a0 00 00 00 00 00 00 00 `...............
>>>>>>>>> backtrace:
>>>>>>>>> [<000000004c12b1c7>] kmem_cache_alloc+0x1ac/0x2f4
>>>>>>>>> [<000000005d23eb4f>] tee_shm_alloc+0x78/0x230
>>>>>>>>> [<00000000794dd22c>] optee_handle_rpc+0x60/0x6f0
>>>>>>>>> [<00000000d9f7c52d>] optee_do_call_with_arg+0x17c/0x1dc
>>>>>>>>> [<00000000c35884da>] optee_open_session+0x128/0x1ec
>>>>>>>>> [<000000001748f2ff>] tee_client_open_session+0x28/0x40
>>>>>>>>> [<00000000aecb5389>] optee_enumerate_devices+0x84/0x2a0
>>>>>>>>> [<000000003df18bf1>] optee_probe+0x674/0x6cc
>>>>>>>>> [<000000003a4a534a>] platform_drv_probe+0x54/0xb0
>>>>>>>>> [<000000000c51ce7d>] really_probe+0xe4/0x4d0
>>>>>>>>> [<000000002f04c865>] driver_probe_device+0x58/0xc0
>>>>>>>>> [<00000000b485397d>] device_driver_attach+0xc0/0xd0
>>>>>>>>> [<00000000c835f0df>] __driver_attach+0x84/0x124
>>>>>>>>> [<000000008e5a429c>] bus_for_each_dev+0x70/0xc0
>>>>>>>>> [<000000001735e8a8>] driver_attach+0x24/0x30
>>>>>>>>> [<000000006d94b04f>] bus_add_driver+0x104/0x1ec
>>>>>>>>>
>>>>>>>>> This is not a memory leak because we pass the share memory pointer to
>>>>>>>>> secure world and would get it from secure world before releasing it.
>>>>>>>>> How about if it's actually a memory leak caused by the secure world?
>>>>>>>>> An example being secure world just allocates kernel memory via OPTEE_SMC_RPC_FUNC_ALLOC and doesn't free it via OPTEE_SMC_RPC_FUNC_FREE.
>>>>>>>>> IMO, we need to cross-check optee-os if it's responsible for leaking kernel memory.
>>>>>>>> Hi sumit,
>>>>>>>>
>>>>>>>> You mean we need to check whether there is a real memleak,
>>>>>>>> If being secure world just allocate kernel memory via OPTEE_SMC_PRC_FUNC_ALLOC and until the end, there is no free
>>>>>>>> It via OPTEE_SMC_PRC_FUNC_FREE, then we should judge it as a memory leak, wo need to judge whether it is caused by secure os?
>>>>>>> Yes. AFAICT, optee-os should allocate shared memory to communicate
>>>>>>> with tee-supplicant. So once the communication is done, the underlying
>>>>>>> shared memory should be freed. I can't think of any scenario where
>>>>>>> optee-os should keep hold-off shared memory indefinitely.
>>>>>> I believe it can happen when OP-TEE's CFG_PREALLOC_RPC_CACHE is y. See
>>>>>> the config file [1] and the commit which introduced this config [2].
>>>>> Okay, I see the reasoning. So during the OP-TEE driver's lifetime, the
>>>>> RPC shared memory remains allocated. I guess that is done primarily
>>>>> for performance reasons.
>>>>>
>>>>> But still it doesn't feel appropriate that we term all RPC shm
>>>>> allocations as not leaking memory as we might miss obvious ones.
>>>>>
>>>>> Xiaolei,
>>>>>
>>>>> Can you once test with CFG_PREALLOC_RPC_CACHE=n while compiling
>>>>> optee-os and see if the observed memory leak disappears or not?
>>>>>
>>>>> -Sumit
>>>> Hi sumit
>>>>
>>>>
>>>> The version I am using has not increased the CFG_PREALLOC_RPC_CACHE
>>>>
>>>> switch, I checked out to the latest version, but because of the need for
>>>>
>>>> additional patches for the imx8 platform, I still have no way to test the
>>>>
>>>> CFG_PREALLOC_RPC_CACHE=n situation
>>>>
>>> Can you just try to backport this [1] patch to your imx8 optee-os tree and test?
>>>
>>> [1] https://github.com/OP-TEE/optee_os/commit/8887663248ad
>> Hi sumit
>>
>> I upgraded optee-os from version 3.2.0 to 3.13.0, and the kernel did not
>> detect this problem.
> Can you check if CFG_TEE_CORE_EMBED_INTERNAL_TESTS is enabled in
> optee-os version 3.13.0? As we would require atleast one RPC prealloc
> SHM invocation from OP-TEE for kmemleak to detect the problem.


Hi

CFG_TEE_CORE_EMBED_INTERNAL_TESTS is enabled
,I can see the "*.o" files compiled from the core/pta/tests directory

thanks
xiaolei


>
> -Sumit
>
>> I have not set CFG_PREALLOC_RPC_CACHE to n. This should be a problem
>> that occurs when compatible
>>
>> with lower versions.
>>
>>
>> thanks
>>
>> xiaolei
>>
>>> -Sumit
>>>
>>>> thanks
>>>>
>>>> xiaolei
>>>>
>>>>>> [1] https://github.com/OP-TEE/optee_os/blob/3.15.0/mk/config.mk#L709
>>>>>> [2] https://github.com/OP-TEE/optee_os/commit/8887663248ad
>>>>>>
>>>>>> --
>>>>>> Jerome

2021-12-15 10:20:01

by Daniel Thompson

[permalink] [raw]
Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()

On Tue, Dec 14, 2021 at 12:33:08PM +0530, Sumit Garg wrote:
> On Mon, 13 Dec 2021 at 18:34, Daniel Thompson
> <[email protected]> wrote:
> > On Mon, Dec 13, 2021 at 02:28:01PM +0530, Sumit Garg wrote:
> > > On Fri, 10 Dec 2021 at 21:19, Daniel Thompson
> > > <[email protected]> wrote:
> > > > On Fri, Dec 10, 2021 at 03:08:21PM +0530, Sumit Garg wrote:
> > > > > On Fri, 10 Dec 2021 at 13:40, Jerome Forissier <[email protected]> wrote:
> > > > > > On 12/10/21 06:00, Sumit Garg wrote:
> > > > > > > On Fri, 10 Dec 2021 at 09:42, Wang, Xiaolei <[email protected]> wrote:
> > > > IIUC this patch adds kmemleak_not_leak() at (pretty much) the last
> > > > possible point before *ownership* of the SHM block is passed from kernel
> > > > to OP-TEE.
> > >
> > > I wouldn't say it's a transfer of ownership from kernel to OP-TEE but
> > > rather a way for OP-TEE to access kernel's memory in order to pass
> > > info or execute further RPC commands.
> >
> > The RPC handler allocates a pointer (e.g. now the RPC handler owns the
> > allocated memory). The RPC handler then passes that pointer to OP-TEE and
> > forgets what it's value was.
> >
> > That is a transfer of ownership: the RPC handler does not hold any pointer
> > to the memory and is incapable of freeing it. Moreover this situation is
> > what kmemleak_no_leak() is for! Its job it to inform kmemleak that the
> > pointer is owned/stored somewhere that is does not scan.
>
> Let me put this another way. If the memory allocator belongs to the
> kernel then how does OP-TEE get to know which memory is currently
> allocated and it is to be scanned?

OP-TEE explicitly requested that the be allocated and responsible for
figuring out where to store the pointer. How could it *not* know this
information? More specifically OP-TEE is perfectly capable of recording
what memory it has allocated and where to scan to find out if it has
been lost.


> I think the complete solution would be to extend kmemleak to support
> OP-TEE memory scanning via OP-TEE invocation to check if it's holding
> any kernel memory references.

This is the part I get stuck on... and the reason I'm still posting on
the thread.

I struggle to see any value in using kmemleak to identify this type of
leak. That is the fundamental issue. False positives from kmemleak are
damaging to user confidence in the tool and are especially harmful when
it is complex and time consuming to verify that is actually is a false
positive (which would certainly be the case for OP-TEE false positives).
In short it is *not* always the case failure-to-detect is worse than
false-positive.

As discussed already the firmware/kernel contract prevents kmemleak from
working as it is designed to and I am unconvinced that relying on
fragile timeouts is sufficient.

Extending kmemleak to support OP-TEE memory scanning is also, IMHO,
pointless. The reason for this is that OP-TEE cannot perform any scan on
behalf of kmemleak without first validating the information provided to
it by the kernel (to avoid information leaks). However if OP-TEE tracks
enough state to validate the kernel input than it already has enough
state to do a scan for leaks independently anyway (apart from being
donated an execution context). Therefore it follows that any OP-TEE
extension to handle leaks should be independent of kmemleak because it
would still be useful to be able to ask OP-TEE to run a self-consistency
check even if kmemleak is disabled.

Or, in short, even if you implement improved leak detection for OP-TEE
(whether that is based on timers or scanning) then kmemleak_not_leak()
is still the right thing to do with pointers whose ownership we have
transferred to OP-TEE.


> > > > Sure, after we change ownership it could still be leaked... but it can
> > > > no longer be leaked by the kernel because the kernel no longer owns it!
> > > > More importantly, it makes no sense to run the kernel memory detector on the
> > > > buffer because it simply can't work.
> > > >
> > > > After the RPC completes, doesn't it become impossible for kmemleak to
> > > > scan to see if the pointer is lost[1]?
> > >
> > > Apart from the special OP-TEE prealloc SHM cache stuff, I can't think
> > > of any scenario where an OP-TEE thread should hold off kernel's memory
> > > pointers for more than 5 seconds before being passed onto kernel for
> > > further RPC commands or RPC free action. So the kmemleak should be
> > > able to detect if a pointer is lost.
> >
> > Or putting this a different way: there is known to be firmware in the
> > field that allocates pointers for more then five seconds!
>
> If it's known that upstream OP-TEE doesn't hold any kernel memory
> references for more than 5 seconds then IMO we should be fine to not
> disable kmemleak until we have a future kmemleak extension. Otherwise
> it would be very hard to keep track of kernel memory lost in this way.

In essence I am arguing for using the right tool for the right job (and
against turning down a correct patch because the right tool isn't yet
implemented). A memory scanning leak detector is the wrong tool to
search for leaks in memory that cannot be scanned.

For me having to rely on fragile implied contracts and undocumented
assumptions about timing further reinforces my view that kmemleak is not
the wrong tool. Especially so when we know that those assumptions are
not met by existing firmware.


Daniel.

2021-12-15 12:25:31

by Jens Wiklander

[permalink] [raw]
Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()

On Wed, Dec 15, 2021 at 11:19 AM Daniel Thompson
<[email protected]> wrote:
>
> On Tue, Dec 14, 2021 at 12:33:08PM +0530, Sumit Garg wrote:
> > On Mon, 13 Dec 2021 at 18:34, Daniel Thompson
> > <[email protected]> wrote:
> > > On Mon, Dec 13, 2021 at 02:28:01PM +0530, Sumit Garg wrote:
> > > > On Fri, 10 Dec 2021 at 21:19, Daniel Thompson
> > > > <[email protected]> wrote:
> > > > > On Fri, Dec 10, 2021 at 03:08:21PM +0530, Sumit Garg wrote:
> > > > > > On Fri, 10 Dec 2021 at 13:40, Jerome Forissier <[email protected]> wrote:
> > > > > > > On 12/10/21 06:00, Sumit Garg wrote:
> > > > > > > > On Fri, 10 Dec 2021 at 09:42, Wang, Xiaolei <[email protected]> wrote:
> > > > > IIUC this patch adds kmemleak_not_leak() at (pretty much) the last
> > > > > possible point before *ownership* of the SHM block is passed from kernel
> > > > > to OP-TEE.
> > > >
> > > > I wouldn't say it's a transfer of ownership from kernel to OP-TEE but
> > > > rather a way for OP-TEE to access kernel's memory in order to pass
> > > > info or execute further RPC commands.
> > >
> > > The RPC handler allocates a pointer (e.g. now the RPC handler owns the
> > > allocated memory). The RPC handler then passes that pointer to OP-TEE and
> > > forgets what it's value was.
> > >
> > > That is a transfer of ownership: the RPC handler does not hold any pointer
> > > to the memory and is incapable of freeing it. Moreover this situation is
> > > what kmemleak_no_leak() is for! Its job it to inform kmemleak that the
> > > pointer is owned/stored somewhere that is does not scan.
> >
> > Let me put this another way. If the memory allocator belongs to the
> > kernel then how does OP-TEE get to know which memory is currently
> > allocated and it is to be scanned?
>
> OP-TEE explicitly requested that the be allocated and responsible for
> figuring out where to store the pointer. How could it *not* know this
> information? More specifically OP-TEE is perfectly capable of recording
> what memory it has allocated and where to scan to find out if it has
> been lost.
>
>
> > I think the complete solution would be to extend kmemleak to support
> > OP-TEE memory scanning via OP-TEE invocation to check if it's holding
> > any kernel memory references.
>
> This is the part I get stuck on... and the reason I'm still posting on
> the thread.
>
> I struggle to see any value in using kmemleak to identify this type of
> leak. That is the fundamental issue. False positives from kmemleak are
> damaging to user confidence in the tool and are especially harmful when
> it is complex and time consuming to verify that is actually is a false
> positive (which would certainly be the case for OP-TEE false positives).
> In short it is *not* always the case failure-to-detect is worse than
> false-positive.
>
> As discussed already the firmware/kernel contract prevents kmemleak from
> working as it is designed to and I am unconvinced that relying on
> fragile timeouts is sufficient.
>
> Extending kmemleak to support OP-TEE memory scanning is also, IMHO,
> pointless. The reason for this is that OP-TEE cannot perform any scan on
> behalf of kmemleak without first validating the information provided to
> it by the kernel (to avoid information leaks). However if OP-TEE tracks
> enough state to validate the kernel input than it already has enough
> state to do a scan for leaks independently anyway (apart from being
> donated an execution context). Therefore it follows that any OP-TEE
> extension to handle leaks should be independent of kmemleak because it
> would still be useful to be able to ask OP-TEE to run a self-consistency
> check even if kmemleak is disabled.
>
> Or, in short, even if you implement improved leak detection for OP-TEE
> (whether that is based on timers or scanning) then kmemleak_not_leak()
> is still the right thing to do with pointers whose ownership we have
> transferred to OP-TEE.
>
>
> > > > > Sure, after we change ownership it could still be leaked... but it can
> > > > > no longer be leaked by the kernel because the kernel no longer owns it!
> > > > > More importantly, it makes no sense to run the kernel memory detector on the
> > > > > buffer because it simply can't work.
> > > > >
> > > > > After the RPC completes, doesn't it become impossible for kmemleak to
> > > > > scan to see if the pointer is lost[1]?
> > > >
> > > > Apart from the special OP-TEE prealloc SHM cache stuff, I can't think
> > > > of any scenario where an OP-TEE thread should hold off kernel's memory
> > > > pointers for more than 5 seconds before being passed onto kernel for
> > > > further RPC commands or RPC free action. So the kmemleak should be
> > > > able to detect if a pointer is lost.
> > >
> > > Or putting this a different way: there is known to be firmware in the
> > > field that allocates pointers for more then five seconds!
> >
> > If it's known that upstream OP-TEE doesn't hold any kernel memory
> > references for more than 5 seconds then IMO we should be fine to not
> > disable kmemleak until we have a future kmemleak extension. Otherwise
> > it would be very hard to keep track of kernel memory lost in this way.
>
> In essence I am arguing for using the right tool for the right job (and
> against turning down a correct patch because the right tool isn't yet
> implemented). A memory scanning leak detector is the wrong tool to
> search for leaks in memory that cannot be scanned.
>
> For me having to rely on fragile implied contracts and undocumented
> assumptions about timing further reinforces my view that kmemleak is not
> the wrong tool. Especially so when we know that those assumptions are
> not met by existing firmware.

I agree, this patch makes sense. It fixes a problem and I can't see a
downside with that. In a not too distant future we may change the way
this memory is passed to OP-TEE by keeping a reference in the driver,
but until then this patch will fix a problem.

Cheers,
Jens

2021-12-15 12:29:20

by Jens Wiklander

[permalink] [raw]
Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()

On Mon, Dec 06, 2021 at 08:05:33PM +0800, Xiaolei Wang wrote:
> We observed the following kmemleak report:
> unreferenced object 0xffff000007904500 (size 128):
> comm "swapper/0", pid 1, jiffies 4294892671 (age 44.036s)
> hex dump (first 32 bytes):
> 00 47 90 07 00 00 ff ff 60 00 c0 ff 00 00 00 00 .G......`.......
> 60 00 80 13 00 80 ff ff a0 00 00 00 00 00 00 00 `...............
> backtrace:
> [<000000004c12b1c7>] kmem_cache_alloc+0x1ac/0x2f4
> [<000000005d23eb4f>] tee_shm_alloc+0x78/0x230
> [<00000000794dd22c>] optee_handle_rpc+0x60/0x6f0
> [<00000000d9f7c52d>] optee_do_call_with_arg+0x17c/0x1dc
> [<00000000c35884da>] optee_open_session+0x128/0x1ec
> [<000000001748f2ff>] tee_client_open_session+0x28/0x40
> [<00000000aecb5389>] optee_enumerate_devices+0x84/0x2a0
> [<000000003df18bf1>] optee_probe+0x674/0x6cc
> [<000000003a4a534a>] platform_drv_probe+0x54/0xb0
> [<000000000c51ce7d>] really_probe+0xe4/0x4d0
> [<000000002f04c865>] driver_probe_device+0x58/0xc0
> [<00000000b485397d>] device_driver_attach+0xc0/0xd0
> [<00000000c835f0df>] __driver_attach+0x84/0x124
> [<000000008e5a429c>] bus_for_each_dev+0x70/0xc0
> [<000000001735e8a8>] driver_attach+0x24/0x30
> [<000000006d94b04f>] bus_add_driver+0x104/0x1ec
>
> This is not a memory leak because we pass the share memory pointer
> to secure world and would get it from secure world before releasing it.
>
> Signed-off-by: Xiaolei Wang <[email protected]>
> ---
> drivers/tee/optee/smc_abi.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/tee/optee/smc_abi.c b/drivers/tee/optee/smc_abi.c
> index 6196d7c3888f..cf2e3293567d 100644
> --- a/drivers/tee/optee/smc_abi.c
> +++ b/drivers/tee/optee/smc_abi.c
> @@ -23,6 +23,7 @@
> #include "optee_private.h"
> #include "optee_smc.h"
> #include "optee_rpc_cmd.h"
> +#include <linux/kmemleak.h>
> #define CREATE_TRACE_POINTS
> #include "optee_trace.h"
>
> @@ -783,6 +784,7 @@ static void optee_handle_rpc(struct tee_context *ctx,
> param->a4 = 0;
> param->a5 = 0;
> }
> + kmemleak_not_leak(shm);

Eventually this pointer will be freed below with the call to tee_shm_free().
I assume than once the memory is freed it's not execused from being a leak
any longer. Is that correct?

Thanks,
Jens

> break;
> case OPTEE_SMC_RPC_FUNC_FREE:
> shm = reg_pair_to_ptr(param->a1, param->a2);
> --
> 2.25.1
>

2021-12-15 13:33:29

by Xiaolei Wang

[permalink] [raw]
Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()


在 12/15/2021 8:29 PM, Jens Wiklander 写道:
> [Please note: This e-mail is from an EXTERNAL e-mail address]
>
> On Mon, Dec 06, 2021 at 08:05:33PM +0800, Xiaolei Wang wrote:
>> We observed the following kmemleak report:
>> unreferenced object 0xffff000007904500 (size 128):
>> comm "swapper/0", pid 1, jiffies 4294892671 (age 44.036s)
>> hex dump (first 32 bytes):
>> 00 47 90 07 00 00 ff ff 60 00 c0 ff 00 00 00 00 .G......`.......
>> 60 00 80 13 00 80 ff ff a0 00 00 00 00 00 00 00 `...............
>> backtrace:
>> [<000000004c12b1c7>] kmem_cache_alloc+0x1ac/0x2f4
>> [<000000005d23eb4f>] tee_shm_alloc+0x78/0x230
>> [<00000000794dd22c>] optee_handle_rpc+0x60/0x6f0
>> [<00000000d9f7c52d>] optee_do_call_with_arg+0x17c/0x1dc
>> [<00000000c35884da>] optee_open_session+0x128/0x1ec
>> [<000000001748f2ff>] tee_client_open_session+0x28/0x40
>> [<00000000aecb5389>] optee_enumerate_devices+0x84/0x2a0
>> [<000000003df18bf1>] optee_probe+0x674/0x6cc
>> [<000000003a4a534a>] platform_drv_probe+0x54/0xb0
>> [<000000000c51ce7d>] really_probe+0xe4/0x4d0
>> [<000000002f04c865>] driver_probe_device+0x58/0xc0
>> [<00000000b485397d>] device_driver_attach+0xc0/0xd0
>> [<00000000c835f0df>] __driver_attach+0x84/0x124
>> [<000000008e5a429c>] bus_for_each_dev+0x70/0xc0
>> [<000000001735e8a8>] driver_attach+0x24/0x30
>> [<000000006d94b04f>] bus_add_driver+0x104/0x1ec
>>
>> This is not a memory leak because we pass the share memory pointer
>> to secure world and would get it from secure world before releasing it.
>>
>> Signed-off-by: Xiaolei Wang <[email protected]>
>> ---
>> drivers/tee/optee/smc_abi.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/tee/optee/smc_abi.c b/drivers/tee/optee/smc_abi.c
>> index 6196d7c3888f..cf2e3293567d 100644
>> --- a/drivers/tee/optee/smc_abi.c
>> +++ b/drivers/tee/optee/smc_abi.c
>> @@ -23,6 +23,7 @@
>> #include "optee_private.h"
>> #include "optee_smc.h"
>> #include "optee_rpc_cmd.h"
>> +#include <linux/kmemleak.h>
>> #define CREATE_TRACE_POINTS
>> #include "optee_trace.h"
>>
>> @@ -783,6 +784,7 @@ static void optee_handle_rpc(struct tee_context *ctx,
>> param->a4 = 0;
>> param->a5 = 0;
>> }
>> + kmemleak_not_leak(shm);
> Eventually this pointer will be freed below with the call to tee_shm_free().
> I assume than once the memory is freed it's not execused from being a leak
> any longer. Is that correct?

Yes, it is the correct way to release memory through tee_shm_free, but
if a memory leak is detected by the kernel before free memory, it is
obviously a false alarm.

thanks

xiaolei

>
> Thanks,
> Jens
>
>> break;
>> case OPTEE_SMC_RPC_FUNC_FREE:
>> shm = reg_pair_to_ptr(param->a1, param->a2);
>> --
>> 2.25.1
>>

2021-12-15 13:42:51

by Sumit Garg

[permalink] [raw]
Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()

On Wed, 15 Dec 2021 at 17:55, Jens Wiklander <[email protected]> wrote:
>
> On Wed, Dec 15, 2021 at 11:19 AM Daniel Thompson
> <[email protected]> wrote:
> >
> > On Tue, Dec 14, 2021 at 12:33:08PM +0530, Sumit Garg wrote:
> > > On Mon, 13 Dec 2021 at 18:34, Daniel Thompson
> > > <[email protected]> wrote:
> > > > On Mon, Dec 13, 2021 at 02:28:01PM +0530, Sumit Garg wrote:
> > > > > On Fri, 10 Dec 2021 at 21:19, Daniel Thompson
> > > > > <[email protected]> wrote:
> > > > > > On Fri, Dec 10, 2021 at 03:08:21PM +0530, Sumit Garg wrote:
> > > > > > > On Fri, 10 Dec 2021 at 13:40, Jerome Forissier <[email protected]> wrote:
> > > > > > > > On 12/10/21 06:00, Sumit Garg wrote:
> > > > > > > > > On Fri, 10 Dec 2021 at 09:42, Wang, Xiaolei <[email protected]> wrote:
> > > > > > IIUC this patch adds kmemleak_not_leak() at (pretty much) the last
> > > > > > possible point before *ownership* of the SHM block is passed from kernel
> > > > > > to OP-TEE.
> > > > >
> > > > > I wouldn't say it's a transfer of ownership from kernel to OP-TEE but
> > > > > rather a way for OP-TEE to access kernel's memory in order to pass
> > > > > info or execute further RPC commands.
> > > >
> > > > The RPC handler allocates a pointer (e.g. now the RPC handler owns the
> > > > allocated memory). The RPC handler then passes that pointer to OP-TEE and
> > > > forgets what it's value was.
> > > >
> > > > That is a transfer of ownership: the RPC handler does not hold any pointer
> > > > to the memory and is incapable of freeing it. Moreover this situation is
> > > > what kmemleak_no_leak() is for! Its job it to inform kmemleak that the
> > > > pointer is owned/stored somewhere that is does not scan.
> > >
> > > Let me put this another way. If the memory allocator belongs to the
> > > kernel then how does OP-TEE get to know which memory is currently
> > > allocated and it is to be scanned?
> >
> > OP-TEE explicitly requested that the be allocated and responsible for
> > figuring out where to store the pointer. How could it *not* know this
> > information? More specifically OP-TEE is perfectly capable of recording
> > what memory it has allocated and where to scan to find out if it has
> > been lost.
> >
> >
> > > I think the complete solution would be to extend kmemleak to support
> > > OP-TEE memory scanning via OP-TEE invocation to check if it's holding
> > > any kernel memory references.
> >
> > This is the part I get stuck on... and the reason I'm still posting on
> > the thread.
> >
> > I struggle to see any value in using kmemleak to identify this type of
> > leak. That is the fundamental issue. False positives from kmemleak are
> > damaging to user confidence in the tool and are especially harmful when
> > it is complex and time consuming to verify that is actually is a false
> > positive (which would certainly be the case for OP-TEE false positives).
> > In short it is *not* always the case failure-to-detect is worse than
> > false-positive.
> >
> > As discussed already the firmware/kernel contract prevents kmemleak from
> > working as it is designed to and I am unconvinced that relying on
> > fragile timeouts is sufficient.
> >
> > Extending kmemleak to support OP-TEE memory scanning is also, IMHO,
> > pointless. The reason for this is that OP-TEE cannot perform any scan on
> > behalf of kmemleak without first validating the information provided to
> > it by the kernel (to avoid information leaks). However if OP-TEE tracks
> > enough state to validate the kernel input than it already has enough
> > state to do a scan for leaks independently anyway (apart from being
> > donated an execution context). Therefore it follows that any OP-TEE
> > extension to handle leaks should be independent of kmemleak because it
> > would still be useful to be able to ask OP-TEE to run a self-consistency
> > check even if kmemleak is disabled.
> >
> > Or, in short, even if you implement improved leak detection for OP-TEE
> > (whether that is based on timers or scanning) then kmemleak_not_leak()
> > is still the right thing to do with pointers whose ownership we have
> > transferred to OP-TEE.
> >
> >
> > > > > > Sure, after we change ownership it could still be leaked... but it can
> > > > > > no longer be leaked by the kernel because the kernel no longer owns it!
> > > > > > More importantly, it makes no sense to run the kernel memory detector on the
> > > > > > buffer because it simply can't work.
> > > > > >
> > > > > > After the RPC completes, doesn't it become impossible for kmemleak to
> > > > > > scan to see if the pointer is lost[1]?
> > > > >
> > > > > Apart from the special OP-TEE prealloc SHM cache stuff, I can't think
> > > > > of any scenario where an OP-TEE thread should hold off kernel's memory
> > > > > pointers for more than 5 seconds before being passed onto kernel for
> > > > > further RPC commands or RPC free action. So the kmemleak should be
> > > > > able to detect if a pointer is lost.
> > > >
> > > > Or putting this a different way: there is known to be firmware in the
> > > > field that allocates pointers for more then five seconds!
> > >
> > > If it's known that upstream OP-TEE doesn't hold any kernel memory
> > > references for more than 5 seconds then IMO we should be fine to not
> > > disable kmemleak until we have a future kmemleak extension. Otherwise
> > > it would be very hard to keep track of kernel memory lost in this way.
> >
> > In essence I am arguing for using the right tool for the right job (and
> > against turning down a correct patch because the right tool isn't yet
> > implemented). A memory scanning leak detector is the wrong tool to
> > search for leaks in memory that cannot be scanned.
> >
> > For me having to rely on fragile implied contracts and undocumented
> > assumptions about timing further reinforces my view that kmemleak is not
> > the wrong tool. Especially so when we know that those assumptions are
> > not met by existing firmware.
>
> I agree, this patch makes sense. It fixes a problem and I can't see a
> downside with that. In a not too distant future we may change the way
> this memory is passed to OP-TEE by keeping a reference in the driver,
> but until then this patch will fix a problem.

Fair enough, I was just trying to be more optimistic about leveraging
existing kmemleak infrastructure as shared memory bugs are catching on
us.

-Sumit

>
> Cheers,
> Jens

2021-12-16 14:55:17

by Jens Wiklander

[permalink] [raw]
Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()

On Mon, Dec 6, 2021 at 1:05 PM Xiaolei Wang <[email protected]> wrote:
>
> We observed the following kmemleak report:
> unreferenced object 0xffff000007904500 (size 128):
> comm "swapper/0", pid 1, jiffies 4294892671 (age 44.036s)
> hex dump (first 32 bytes):
> 00 47 90 07 00 00 ff ff 60 00 c0 ff 00 00 00 00 .G......`.......
> 60 00 80 13 00 80 ff ff a0 00 00 00 00 00 00 00 `...............
> backtrace:
> [<000000004c12b1c7>] kmem_cache_alloc+0x1ac/0x2f4
> [<000000005d23eb4f>] tee_shm_alloc+0x78/0x230
> [<00000000794dd22c>] optee_handle_rpc+0x60/0x6f0
> [<00000000d9f7c52d>] optee_do_call_with_arg+0x17c/0x1dc
> [<00000000c35884da>] optee_open_session+0x128/0x1ec
> [<000000001748f2ff>] tee_client_open_session+0x28/0x40
> [<00000000aecb5389>] optee_enumerate_devices+0x84/0x2a0
> [<000000003df18bf1>] optee_probe+0x674/0x6cc
> [<000000003a4a534a>] platform_drv_probe+0x54/0xb0
> [<000000000c51ce7d>] really_probe+0xe4/0x4d0
> [<000000002f04c865>] driver_probe_device+0x58/0xc0
> [<00000000b485397d>] device_driver_attach+0xc0/0xd0
> [<00000000c835f0df>] __driver_attach+0x84/0x124
> [<000000008e5a429c>] bus_for_each_dev+0x70/0xc0
> [<000000001735e8a8>] driver_attach+0x24/0x30
> [<000000006d94b04f>] bus_add_driver+0x104/0x1ec
>
> This is not a memory leak because we pass the share memory pointer
> to secure world and would get it from secure world before releasing it.
>
> Signed-off-by: Xiaolei Wang <[email protected]>
> ---
> drivers/tee/optee/smc_abi.c | 2 ++
> 1 file changed, 2 insertions(+)

I'm picking up this.

Thanks,
Jens

>
> diff --git a/drivers/tee/optee/smc_abi.c b/drivers/tee/optee/smc_abi.c
> index 6196d7c3888f..cf2e3293567d 100644
> --- a/drivers/tee/optee/smc_abi.c
> +++ b/drivers/tee/optee/smc_abi.c
> @@ -23,6 +23,7 @@
> #include "optee_private.h"
> #include "optee_smc.h"
> #include "optee_rpc_cmd.h"
> +#include <linux/kmemleak.h>
> #define CREATE_TRACE_POINTS
> #include "optee_trace.h"
>
> @@ -783,6 +784,7 @@ static void optee_handle_rpc(struct tee_context *ctx,
> param->a4 = 0;
> param->a5 = 0;
> }
> + kmemleak_not_leak(shm);
> break;
> case OPTEE_SMC_RPC_FUNC_FREE:
> shm = reg_pair_to_ptr(param->a1, param->a2);
> --
> 2.25.1
>