2024-06-14 04:09:56

by Lei Liu

[permalink] [raw]
Subject: [PATCH v3] binder_alloc: Replace kcalloc with kvcalloc to mitigate OOM issues

1.In binder_alloc, there is a frequent need for order3 memory
allocation, especially on small-memory mobile devices, which can lead
to OOM and cause foreground applications to be killed, resulting in
flashbacks.The kernel call stack after the issue occurred is as follows:
dumpsys invoked oom-killer:
gfp_mask=0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), order=3,
oom_score_adj=-950
CPU: 6 PID: 31329 Comm: dumpsys Tainted: G WC O
5.10.168-android12-9-00003-gc873b6b86254-ab10823632 #1
Call trace:
dump_backtrace.cfi_jt+0x0/0x8
dump_stack_lvl+0xdc/0x138
dump_header+0x5c/0x2ac
oom_kill_process+0x124/0x304
out_of_memory+0x25c/0x5e0
__alloc_pages_slowpath+0x690/0xf6c
__alloc_pages_nodemask+0x1f4/0x3dc
kmalloc_order+0x54/0x338
kmalloc_order_trace+0x34/0x1bc
__kmalloc+0x5e8/0x9c0
binder_alloc_mmap_handler+0x88/0x1f8
binder_mmap+0x90/0x10c
mmap_region+0x44c/0xc14
do_mmap+0x518/0x680
vm_mmap_pgoff+0x15c/0x378
ksys_mmap_pgoff+0x80/0x108
__arm64_sys_mmap+0x38/0x48
el0_svc_common+0xd4/0x270
el0_svc+0x28/0x98
el0_sync_handler+0x8c/0xf0
el0_sync+0x1b4/0x1c0
Mem-Info:
active_anon:47096 inactive_anon:57927 isolated_anon:100
active_file:43790 inactive_file:44434 isolated_file:0
unevictable:14693 dirty:171 writeback:0\x0a slab_reclaimable:21676
slab_unreclaimable:81771\x0a mapped:84485 shmem:4275 pagetables:33367
bounce:0\x0a free:3772 free_pcp:198 free_cma:11
Node 0 active_anon:188384kB inactive_anon:231708kB active_file:175160kB
inactive_file:177736kB unevictable:58772kB isolated(anon):400kB
isolated(file):0kB mapped:337940kB dirty:684kB writeback:0kB
shmem:17100kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB
writeback_tmp:0kB kernel_stack:84960kB shadow_call_stack:21340kB
Normal free:15088kB min:8192kB low:42616kB high:46164kB
reserved_highatomic:4096KB active_anon:187644kB inactive_anon:231608kB
active_file:174552kB inactive_file:178012kB unevictable:58772kB
writepending:684kB present:3701440kB managed:3550144kB mlocked:58508kB
pagetables:133468kB bounce:0kB free_pcp:1048kB local_pcp:12kB
free_cma:44kB
Normal: 3313*4kB (UMEH) 165*8kB (UMEH) 35*16kB (H) 15*32kB (H) 0*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15612kB
108356 total pagecache pages

2.We use kvcalloc to allocate memory, which can reduce system OOM
occurrences, as well as decrease the time and probability of failure
for order3 memory allocations. Additionally, it can also improve the
throughput of binder (as verified by Google's binder_benchmark testing
tool).

3.We have conducted multiple tests on an 12GB memory phone, and the
performance of kvcalloc is better. Below is a partial excerpt of the
test data.
throughput = (size * Iterations)/Time
kvcalloc->kvmalloc:
Benchmark-kvcalloc Time CPU Iterations throughput(Gb/s)
----------------------------------------------------------------
BM_sendVec_binder-4096 30926 ns 20481 ns 34457 4563.66↑
BM_sendVec_binder-8192 42667 ns 30837 ns 22631 4345.11↑
BM_sendVec_binder-16384 67586 ns 52381 ns 13318 3228.51↑
BM_sendVec_binder-32768 116496 ns 94893 ns 7416 2085.97↑
BM_sendVec_binder-65536 265482 ns 209214 ns 3530 871.40↑

kcalloc->kmalloc
Benchmark-kcalloc Time CPU Iterations throughput(Gb/s)
----------------------------------------------------------------
BM_sendVec_binder-4096 39070 ns 24207 ns 31063 3256.56
BM_sendVec_binder-8192 49476 ns 35099 ns 18817 3115.62
BM_sendVec_binder-16384 76866 ns 58924 ns 11883 2532.86
BM_sendVec_binder-32768 134022 ns 102788 ns 6535 1597.78
BM_sendVec_binder-65536 281004 ns 220028 ns 3135 731.14

Signed-off-by: Lei Liu <[email protected]>
---
Changelog:
v2->v3:
1.Modify the commit message description as the description for the V2
version is unclear.
---
drivers/android/binder_alloc.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 2e1f261ec5c8..5dcab4a5e341 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -836,7 +836,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,

alloc->buffer = vma->vm_start;

- alloc->pages = kcalloc(alloc->buffer_size / PAGE_SIZE,
+ alloc->pages = kvcalloc(alloc->buffer_size / PAGE_SIZE,
sizeof(alloc->pages[0]),
GFP_KERNEL);
if (alloc->pages == NULL) {
@@ -869,7 +869,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
return 0;

err_alloc_buf_struct_failed:
- kfree(alloc->pages);
+ kvfree(alloc->pages);
alloc->pages = NULL;
err_alloc_pages_failed:
alloc->buffer = 0;
@@ -939,7 +939,7 @@ void binder_alloc_deferred_release(struct binder_alloc *alloc)
__free_page(alloc->pages[i].page_ptr);
page_count++;
}
- kfree(alloc->pages);
+ kvfree(alloc->pages);
}
spin_unlock(&alloc->lock);
if (alloc->mm)
--
2.34.1



2024-06-14 18:47:49

by Carlos Llamas

[permalink] [raw]
Subject: Re: [PATCH v3] binder_alloc: Replace kcalloc with kvcalloc to mitigate OOM issues

On Fri, Jun 14, 2024 at 12:09:29PM +0800, Lei Liu wrote:
> 1.In binder_alloc, there is a frequent need for order3 memory
> allocation, especially on small-memory mobile devices, which can lead
> to OOM and cause foreground applications to be killed, resulting in
> flashbacks.The kernel call stack after the issue occurred is as follows:
> dumpsys invoked oom-killer:
> gfp_mask=0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), order=3,
> oom_score_adj=-950
> CPU: 6 PID: 31329 Comm: dumpsys Tainted: G WC O
> 5.10.168-android12-9-00003-gc873b6b86254-ab10823632 #1
> Call trace:
> dump_backtrace.cfi_jt+0x0/0x8
> dump_stack_lvl+0xdc/0x138
> dump_header+0x5c/0x2ac
> oom_kill_process+0x124/0x304
> out_of_memory+0x25c/0x5e0
> __alloc_pages_slowpath+0x690/0xf6c
> __alloc_pages_nodemask+0x1f4/0x3dc
> kmalloc_order+0x54/0x338
> kmalloc_order_trace+0x34/0x1bc
> __kmalloc+0x5e8/0x9c0
> binder_alloc_mmap_handler+0x88/0x1f8
> binder_mmap+0x90/0x10c
> mmap_region+0x44c/0xc14
> do_mmap+0x518/0x680
> vm_mmap_pgoff+0x15c/0x378
> ksys_mmap_pgoff+0x80/0x108
> __arm64_sys_mmap+0x38/0x48
> el0_svc_common+0xd4/0x270
> el0_svc+0x28/0x98
> el0_sync_handler+0x8c/0xf0
> el0_sync+0x1b4/0x1c0
> Mem-Info:
> active_anon:47096 inactive_anon:57927 isolated_anon:100
> active_file:43790 inactive_file:44434 isolated_file:0
> unevictable:14693 dirty:171 writeback:0\x0a slab_reclaimable:21676
> slab_unreclaimable:81771\x0a mapped:84485 shmem:4275 pagetables:33367
> bounce:0\x0a free:3772 free_pcp:198 free_cma:11
> Node 0 active_anon:188384kB inactive_anon:231708kB active_file:175160kB
> inactive_file:177736kB unevictable:58772kB isolated(anon):400kB
> isolated(file):0kB mapped:337940kB dirty:684kB writeback:0kB
> shmem:17100kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB
> writeback_tmp:0kB kernel_stack:84960kB shadow_call_stack:21340kB
> Normal free:15088kB min:8192kB low:42616kB high:46164kB
> reserved_highatomic:4096KB active_anon:187644kB inactive_anon:231608kB
> active_file:174552kB inactive_file:178012kB unevictable:58772kB
> writepending:684kB present:3701440kB managed:3550144kB mlocked:58508kB
> pagetables:133468kB bounce:0kB free_pcp:1048kB local_pcp:12kB
> free_cma:44kB
> Normal: 3313*4kB (UMEH) 165*8kB (UMEH) 35*16kB (H) 15*32kB (H) 0*64kB
> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15612kB
> 108356 total pagecache pages

Think about indenting this stacktrace. IMO, the v1 had a commit log that
was much easier to follow.

>
> 2.We use kvcalloc to allocate memory, which can reduce system OOM
> occurrences, as well as decrease the time and probability of failure
> for order3 memory allocations. Additionally, it can also improve the
> throughput of binder (as verified by Google's binder_benchmark testing
> tool).
>
> 3.We have conducted multiple tests on an 12GB memory phone, and the
> performance of kvcalloc is better. Below is a partial excerpt of the
> test data.
> throughput = (size * Iterations)/Time

Huh? Do you have an explanation for this performance improvement?
Did you test this under memory pressure?

My understanding is that kvcalloc() == kcalloc() if there is enough
contiguous memory no?

I would expect the performance to be the same at best.

> kvcalloc->kvmalloc:
> Benchmark-kvcalloc Time CPU Iterations throughput(Gb/s)
> ----------------------------------------------------------------
> BM_sendVec_binder-4096 30926 ns 20481 ns 34457 4563.66↑
> BM_sendVec_binder-8192 42667 ns 30837 ns 22631 4345.11↑
> BM_sendVec_binder-16384 67586 ns 52381 ns 13318 3228.51↑
> BM_sendVec_binder-32768 116496 ns 94893 ns 7416 2085.97↑
> BM_sendVec_binder-65536 265482 ns 209214 ns 3530 871.40↑
>
> kcalloc->kmalloc
> Benchmark-kcalloc Time CPU Iterations throughput(Gb/s)
> ----------------------------------------------------------------
> BM_sendVec_binder-4096 39070 ns 24207 ns 31063 3256.56
> BM_sendVec_binder-8192 49476 ns 35099 ns 18817 3115.62
> BM_sendVec_binder-16384 76866 ns 58924 ns 11883 2532.86
> BM_sendVec_binder-32768 134022 ns 102788 ns 6535 1597.78
> BM_sendVec_binder-65536 281004 ns 220028 ns 3135 731.14
>
> Signed-off-by: Lei Liu <[email protected]>
> ---
> Changelog:
> v2->v3:
> 1.Modify the commit message description as the description for the V2
> version is unclear.

The complete history of the changelog would be better.

> ---
> drivers/android/binder_alloc.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> index 2e1f261ec5c8..5dcab4a5e341 100644
> --- a/drivers/android/binder_alloc.c
> +++ b/drivers/android/binder_alloc.c
> @@ -836,7 +836,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
>
> alloc->buffer = vma->vm_start;
>
> - alloc->pages = kcalloc(alloc->buffer_size / PAGE_SIZE,
> + alloc->pages = kvcalloc(alloc->buffer_size / PAGE_SIZE,
> sizeof(alloc->pages[0]),
> GFP_KERNEL);

I believe Greg had asked for these to be aligned to the parenthesis.
You can double check by running checkpatch with the -strict flag.

> if (alloc->pages == NULL) {
> @@ -869,7 +869,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
> return 0;
>
> err_alloc_buf_struct_failed:
> - kfree(alloc->pages);
> + kvfree(alloc->pages);
> alloc->pages = NULL;
> err_alloc_pages_failed:
> alloc->buffer = 0;
> @@ -939,7 +939,7 @@ void binder_alloc_deferred_release(struct binder_alloc *alloc)
> __free_page(alloc->pages[i].page_ptr);
> page_count++;
> }
> - kfree(alloc->pages);
> + kvfree(alloc->pages);
> }
> spin_unlock(&alloc->lock);
> if (alloc->mm)
> --
> 2.34.1
>

I'm not so sure about the results and performance improvements that are
claimed here. However, the switch to kvcalloc() itself seems reasonable
to me.

I'll run these tests myself as the results might have some noise. I'll
get back with the results.

Thanks,
Carlos Llamas