by zhiguojiang

[permalink] [raw]

Subject: Re: [PATCH v2 1/2] mm:vmscan: the dirty folio in folio_list skip unmap

在 2023/10/24 15:07, David Hildenbrand 写道:
> On 24.10.23 04:04, zhiguojiang wrote:
>>
>>
>> 在 2023/10/23 21:01, Matthew Wilcox 写道:
>>> On Mon, Oct 23, 2023 at 08:44:55PM +0800, zhiguojiang wrote:
>>>> 在 2023/10/23 20:21, Matthew Wilcox 写道:
>>>>> On Mon, Oct 23, 2023 at 04:07:28PM +0800, zhiguojiang wrote:
>>>>>>> Are you seeing measurable changes for any workloads? It
>>>>>>> certainly seems
>>>>>>> like you should, but it would help if you chose a test from
>>>>>>> mmtests and
>>>>>>> showed how performance changed on your system.
>>>>>> In one mmtest, the max times for a invalid recyling of a
>>>>>> folio_list dirty
>>>>>> folio that does not support pageout and has been activated in
>>>>>> shrink_folio_list() are: cost=51us, exe=2365us.
>>>>>>
>>>>>> Calculate according to this formula: dirty_cost / total_cost *
>>>>>> 100%, the
>>>>>> recyling efficiency of dirty folios can be improved 53.13%、82.95%.
>>>>>>
>>>>>> So this patch can optimize shrink efficiency and reduce the
>>>>>> workload of
>>>>>> kswapd to a certain extent.
>>>>>>
>>>>>> kswapd0-96      (     96) [005] .....   387.218548:
>>>>>> mm_vmscan_lru_shrink_inactive: [Justin] nid 0 nr_scanned 32
>>>>>> nr_taken 32
>>>>>> nr_reclaimed 31 nr_dirty 1 nr_unqueued_dirty 1 nr_writeback 0
>>>>>> nr_activate[1] 1 nr_ref_keep 0 f RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
>>>>>> total_cost 96 total_exe 2365 dirty_cost 51 total_exe 2365
>>>>>>
>>>>>> kswapd0-96      (     96) [006] .....   412.822532:
>>>>>> mm_vmscan_lru_shrink_inactive: [Justin] nid 0 nr_scanned 32
>>>>>> nr_taken 32
>>>>>> nr_reclaimed 0 nr_dirty 32 nr_unqueued_dirty 32 nr_writeback 0
>>>>>> nr_activate[1] 19 nr_ref_keep 13 f RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
>>>>>> total_cost 88 total_exe 605 dirty_cost 73 total_exe 605
>>>>> I appreciate that you can put probes in and determine the cost,
>>>>> but do
>>>>> you see improvements for a real workload? Like doing a kernel
>>>>> compile
>>>>> -- does it speed up at all?
>>>> Can you help share a method for testing thread workload, like kswapd?
>>> Something dirt simple like 'time make -j8'.
>> Two compilations were conducted separately, and compared to the
>> unmodified compilation,
>> the compilation time for adding modified patches had a certain
>> reduction, as follows:
>>
>> Compilation command:
>> make distclean -j8
>> make ARCH=x86_64 x86_64_defconfig
>> time make -j8
>>
>> 1.Unmodified Compilation time:
>> real    2m40.276s
>> user    16m2.956s
>> sys     2m14.738s
>>
>> real    2m40.136s
>> user    16m2.617s
>> sys     2m14.722s
>>
>> 2.[Patch v2 1/2] Modified Compilation time:
>> real    2m40.067s
>> user    16m3.164s
>> sys     2m14.211s
>>
>> real    2m40.123s
>> user    16m2.439s
>> sys     2m14.508s
>>
>> 3 [Patch v2 1/2] + [Patch v2 2/2] Modified Compilation time:
>> real    2m40.367s
>> user    16m3.738s
>> sys     2m13.662s
>>
>> real    2m40.014s
>> user    16m3.108s
>> sys     2m14.096s
>>
>
> To get expressive numbers two iterations are usually not sufficient.
> How much memory does you system have? Does vmscan even ever get active?
Test system memory: MemTotal:    8161608 kB. When multiple Apps were
opened, vmscan can get active. I can capture a lot of tracelog data
through testing, I only posted two sets of tracelog data.

2023-10-25 15:38:50

by zhiguojiang

[permalink] [raw]

Subject: Re: [PATCH v2 1/2] mm:vmscan: the dirty folio in folio_list skip unmap

在 2023/10/24 15:21, zhiguojiang 写道:
>
>
> 在 2023/10/24 15:07, David Hildenbrand 写道:
>> On 24.10.23 04:04, zhiguojiang wrote:
>>>
>>>
>>> 在 2023/10/23 21:01, Matthew Wilcox 写道:
>>>> On Mon, Oct 23, 2023 at 08:44:55PM +0800, zhiguojiang wrote:
>>>>> 在 2023/10/23 20:21, Matthew Wilcox 写道:
>>>>>> On Mon, Oct 23, 2023 at 04:07:28PM +0800, zhiguojiang wrote:
>>>>>>>> Are you seeing measurable changes for any workloads? It
>>>>>>>> certainly seems
>>>>>>>> like you should, but it would help if you chose a test from
>>>>>>>> mmtests and
>>>>>>>> showed how performance changed on your system.
>>>>>>> In one mmtest, the max times for a invalid recyling of a
>>>>>>> folio_list dirty
>>>>>>> folio that does not support pageout and has been activated in
>>>>>>> shrink_folio_list() are: cost=51us, exe=2365us.
>>>>>>>
>>>>>>> Calculate according to this formula: dirty_cost / total_cost *
>>>>>>> 100%, the
>>>>>>> recyling efficiency of dirty folios can be improved 53.13%、82.95%.
>>>>>>>
>>>>>>> So this patch can optimize shrink efficiency and reduce the
>>>>>>> workload of
>>>>>>> kswapd to a certain extent.
>>>>>>>
>>>>>>> kswapd0-96      (     96) [005] .....   387.218548:
>>>>>>> mm_vmscan_lru_shrink_inactive: [Justin] nid 0 nr_scanned 32
>>>>>>> nr_taken 32
>>>>>>> nr_reclaimed 31 nr_dirty 1 nr_unqueued_dirty 1 nr_writeback 0
>>>>>>> nr_activate[1] 1 nr_ref_keep 0 f RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
>>>>>>> total_cost 96 total_exe 2365 dirty_cost 51 total_exe 2365
>>>>>>>
>>>>>>> kswapd0-96      (     96) [006] .....   412.822532:
>>>>>>> mm_vmscan_lru_shrink_inactive: [Justin] nid 0 nr_scanned 32
>>>>>>> nr_taken 32
>>>>>>> nr_reclaimed 0 nr_dirty 32 nr_unqueued_dirty 32 nr_writeback 0
>>>>>>> nr_activate[1] 19 nr_ref_keep 13 f RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
>>>>>>> total_cost 88 total_exe 605 dirty_cost 73 total_exe 605
>>>>>> I appreciate that you can put probes in and determine the cost,
>>>>>> but do
>>>>>> you see improvements for a real workload? Like doing a kernel
>>>>>> compile
>>>>>> -- does it speed up at all?
>>>>> Can you help share a method for testing thread workload, like kswapd?
>>>> Something dirt simple like 'time make -j8'.
>>> Two compilations were conducted separately, and compared to the
>>> unmodified compilation,
>>> the compilation time for adding modified patches had a certain
>>> reduction, as follows:
>>>
>>> Compilation command:
>>> make distclean -j8
>>> make ARCH=x86_64 x86_64_defconfig
>>> time make -j8
>>>
>>> 1.Unmodified Compilation time:
>>> real    2m40.276s
>>> user    16m2.956s
>>> sys     2m14.738s
>>>
>>> real    2m40.136s
>>> user    16m2.617s
>>> sys     2m14.722s
>>>
>>> 2.[Patch v2 1/2] Modified Compilation time:
>>> real    2m40.067s
>>> user    16m3.164s
>>> sys     2m14.211s
>>>
>>> real    2m40.123s
>>> user    16m2.439s
>>> sys     2m14.508s
>>>
>>> 3 [Patch v2 1/2] + [Patch v2 2/2] Modified Compilation time:
>>> real    2m40.367s
>>> user    16m3.738s
>>> sys     2m13.662s
>>>
>>> real    2m40.014s
>>> user    16m3.108s
>>> sys     2m14.096s
>>>
>>
>> To get expressive numbers two iterations are usually not sufficient.
>> How much memory does you system have? Does vmscan even ever get active?
> Test system memory: MemTotal:    8161608 kB. When multiple Apps were
> opened, vmscan can get active. I can capture a lot of tracelog data
> through testing, I only posted two sets of tracelog data.
Hi, please help to continue reviewing this path and draw a conclusion on
whether it can be merged. Thanks.
>
>