2020-12-22 12:22:23

by jun qian

[permalink] [raw]
Subject: [PATCH 1/1] mm:improve the performance during fork

From: jun qian <[email protected]>

In our project, Many business delays come from fork, so
we started looking for the reason why fork is time-consuming.
I used the ftrace with function_graph to trace the fork, found
that the vm_normal_page will be called tens of thousands and
the execution time of this vm_normal_page function is only a
few nanoseconds. And the vm_normal_page is not a inline function.
So I think if the function is inline style, it maybe reduce the
call time overhead.

I did the following experiment:

I have wrote the c test code, pls ignore the memory leak :)
Before fork, I will malloc 4G bytes, then acculate the fork
time.

int main()
{
char *p;
unsigned long long i=0;
float time_use=0;
struct timeval start;
struct timeval end;

for(i=0; i<LEN; i++) {
p = (char *)malloc(4096);
if (p == NULL) {
printf("malloc failed!\n");
return 0;
}
p[0] = 0x55;
}
gettimeofday(&start,NULL);
fork();
gettimeofday(&end,NULL);

time_use=(end.tv_sec * 1000000 + end.tv_usec) -
(start.tv_sec * 1000000 + start.tv_usec);
printf("time_use is %.10f us\n",time_use);

return 0;
}

We need to compare the changes in the size of vmlinux, the time of
fork in inline and non-inline cases, and the vm_normal_page will be
called in many function. So we also need to compare this function's
size. For examples, the do_wp_page will call vm_normal_page, so I
also calculated it's size.

inline non-inline diff
vmlinux size 9709248 bytes 9709824 bytes -576 bytes
fork time 23475ns 24638ns -4.7%
do_wp_page size 972 743 +229

According to the above test data, I think inline vm_normal_page can
reduce fork execution time.

Signed-off-by: jun qian <[email protected]>
---
mm/memory.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index 7d608765932b..a689bb5d3842 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -591,7 +591,7 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
* PFNMAP mappings in order to support COWable mappings.
*
*/
-struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
+inline struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
pte_t pte)
{
unsigned long pfn = pte_pfn(pte);
--
2.18.2


2020-12-22 15:09:53

by Souptick Joarder

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm:improve the performance during fork

On Tue, Dec 22, 2020 at 5:49 PM <[email protected]> wrote:
>
> From: jun qian <[email protected]>
>
> In our project, Many business delays come from fork, so
> we started looking for the reason why fork is time-consuming.
> I used the ftrace with function_graph to trace the fork, found
> that the vm_normal_page will be called tens of thousands and
> the execution time of this vm_normal_page function is only a
> few nanoseconds. And the vm_normal_page is not a inline function.
> So I think if the function is inline style, it maybe reduce the
> call time overhead.
>
> I did the following experiment:
>
> I have wrote the c test code, pls ignore the memory leak :)
> Before fork, I will malloc 4G bytes, then acculate the fork
> time.
>
> int main()
> {
> char *p;
> unsigned long long i=0;
> float time_use=0;
> struct timeval start;
> struct timeval end;
>
> for(i=0; i<LEN; i++) {
> p = (char *)malloc(4096);
> if (p == NULL) {
> printf("malloc failed!\n");
> return 0;
> }
> p[0] = 0x55;
> }
> gettimeofday(&start,NULL);
> fork();
> gettimeofday(&end,NULL);
>
> time_use=(end.tv_sec * 1000000 + end.tv_usec) -
> (start.tv_sec * 1000000 + start.tv_usec);
> printf("time_use is %.10f us\n",time_use);
>
> return 0;
> }
>
> We need to compare the changes in the size of vmlinux, the time of
> fork in inline and non-inline cases, and the vm_normal_page will be
> called in many function. So we also need to compare this function's
> size. For examples, the do_wp_page will call vm_normal_page, so I
> also calculated it's size.
>
> inline non-inline diff
> vmlinux size 9709248 bytes 9709824 bytes -576 bytes
> fork time 23475ns 24638ns -4.7%

Do you have time diff for both parent and child process ?

> do_wp_page size 972 743 +229
>
> According to the above test data, I think inline vm_normal_page can
> reduce fork execution time.
>
> Signed-off-by: jun qian <[email protected]>
> ---
> mm/memory.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 7d608765932b..a689bb5d3842 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -591,7 +591,7 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
> * PFNMAP mappings in order to support COWable mappings.
> *
> */
> -struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
> +inline struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
> pte_t pte)
> {
> unsigned long pfn = pte_pfn(pte);
> --
> 2.18.2
>
>

2020-12-22 15:34:44

by jun qian

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm:improve the performance during fork

Souptick Joarder <[email protected]> 于2020年12月22日周二 下午11:08写道:
>
> On Tue, Dec 22, 2020 at 5:49 PM <[email protected]> wrote:
> >
> > From: jun qian <[email protected]>
> >
> > In our project, Many business delays come from fork, so
> > we started looking for the reason why fork is time-consuming.
> > I used the ftrace with function_graph to trace the fork, found
> > that the vm_normal_page will be called tens of thousands and
> > the execution time of this vm_normal_page function is only a
> > few nanoseconds. And the vm_normal_page is not a inline function.
> > So I think if the function is inline style, it maybe reduce the
> > call time overhead.
> >
> > I did the following experiment:
> >
> > I have wrote the c test code, pls ignore the memory leak :)
> > Before fork, I will malloc 4G bytes, then acculate the fork
> > time.
> >
> > int main()
> > {
> > char *p;
> > unsigned long long i=0;
> > float time_use=0;
> > struct timeval start;
> > struct timeval end;
> >
> > for(i=0; i<LEN; i++) {
> > p = (char *)malloc(4096);
> > if (p == NULL) {
> > printf("malloc failed!\n");
> > return 0;
> > }
> > p[0] = 0x55;
> > }
> > gettimeofday(&start,NULL);
> > fork();
> > gettimeofday(&end,NULL);
> >
> > time_use=(end.tv_sec * 1000000 + end.tv_usec) -
> > (start.tv_sec * 1000000 + start.tv_usec);
> > printf("time_use is %.10f us\n",time_use);
> >
> > return 0;
> > }
> >
> > We need to compare the changes in the size of vmlinux, the time of
> > fork in inline and non-inline cases, and the vm_normal_page will be
> > called in many function. So we also need to compare this function's
> > size. For examples, the do_wp_page will call vm_normal_page, so I
> > also calculated it's size.
> >
> > inline non-inline diff
> > vmlinux size 9709248 bytes 9709824 bytes -576 bytes
> > fork time 23475ns 24638ns -4.7%
>
> Do you have time diff for both parent and child process ?

yes, the child time diff and the parent time diff are almost same,
just like this, a.out is the test program.

./a.out
time_use is 23342.0000000000 us
time_use is 23404.0000000000 us

>
> > do_wp_page size 972 743 +229
> >
> > According to the above test data, I think inline vm_normal_page can
> > reduce fork execution time.
> >
> > Signed-off-by: jun qian <[email protected]>
> > ---
> > mm/memory.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 7d608765932b..a689bb5d3842 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -591,7 +591,7 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
> > * PFNMAP mappings in order to support COWable mappings.
> > *
> > */
> > -struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
> > +inline struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
> > pte_t pte)
> > {
> > unsigned long pfn = pte_pfn(pte);
> > --
> > 2.18.2
> >
> >

2020-12-22 18:45:39

by David Laight

[permalink] [raw]
Subject: RE: [PATCH 1/1] mm:improve the performance during fork

From: qianjun
> Sent: 22 December 2020 12:19
>
> In our project, Many business delays come from fork, so
> we started looking for the reason why fork is time-consuming.
> I used the ftrace with function_graph to trace the fork, found
> that the vm_normal_page will be called tens of thousands and
> the execution time of this vm_normal_page function is only a
> few nanoseconds. And the vm_normal_page is not a inline function.
> So I think if the function is inline style, it maybe reduce the
> call time overhead.

Beware of taking timings from ftrace function trace.
The cost of the tracing is significant.

You can get sensible numbers if you only trace very specific
functions.
Slightly annoyingly the output format changes if you enable
the function exit trace - useful for the timestamp.
ISTR it is possible to get the process id traced if you fiddle
with enough options.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2020-12-23 01:48:20

by jun qian

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm:improve the performance during fork

David Laight <[email protected]> 于2020年12月23日周三 上午2:42写道:
>
> From: qianjun
> > Sent: 22 December 2020 12:19
> >
> > In our project, Many business delays come from fork, so
> > we started looking for the reason why fork is time-consuming.
> > I used the ftrace with function_graph to trace the fork, found
> > that the vm_normal_page will be called tens of thousands and
> > the execution time of this vm_normal_page function is only a
> > few nanoseconds. And the vm_normal_page is not a inline function.
> > So I think if the function is inline style, it maybe reduce the
> > call time overhead.
>
> Beware of taking timings from ftrace function trace.
> The cost of the tracing is significant.
>
> You can get sensible numbers if you only trace very specific
> functions.
> Slightly annoyingly the output format changes if you enable
> the function exit trace - useful for the timestamp.
> ISTR it is possible to get the process id traced if you fiddle
> with enough options.
>
> David
>

Thanks for your time

I have closed the ftrace when the test program is running. So the time
diff is without the
ftrace interference.And what does ISTR stand for :) thanks.

> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>