2004-09-20 09:41:45

by Denis Vlasenko

[permalink] [raw]
Subject: 2.4 -> 2.6: 'for(100000) close(open())' got slower

[mail threading will be broken, sorry]

> On Fri, Sep 17, 2004 at 03:34:59PM +0300, Denis Vlasenko wrote:
> > 2.4:
> > 2994 total 0.0013
> > 620 link_path_walk 0.2892
> > 285 d_lookup 1.2076
> > 156 dput 0.4815
> > 118 kmem_cache_free 0.7564
> > 109 system_call 1.9464
> > 106 kmem_cache_alloc 0.5096
> > 103 strncpy_from_user 1.2875
>
> [...]
>
> > 2.6:
> > 3200 total 0.0013
> > 790 link_path_walk 0.2759
> > 287 __d_lookup 1.2756
> > 277 get_empty_filp 1.4503
> > 146 strncpy_from_user 1.8250
> > 110 dput 0.3254
> > 109 system_call 2.4773
>
> Very odd that get_empty_filp() and strncpy_from_user() should be so
> high on your profiles. They're not tremendously different between 2.4
> and 2.6. It may be the case that 2.6 makes some inappropriate choice of
> algorithms for usercopying on systems of your vintage. get_empty_filp()
> is more mysterious still.

>On Fri, Sep 17, 2004 at 11:36:25PM +0300, Denis Vlasenko wrote:
>> Aha! Maybe it's just a might_sleep() overhead?
>
>How does 2.6 do with those might_sleep() calls removed?


I afraid this will have no effect. I revieved the code
and concluded that with my .configs, code will be identical.
I verified this:

2.4:
# gdb vmlinux
(gdb) disass strncpy_from_user
Dump of assembler code for function strncpy_from_user:
0xc03258a0 <strncpy_from_user>: push %ebp
0xc03258a1 <strncpy_from_user+1>: mov %esp,%ebp
0xc03258a3 <strncpy_from_user+3>: push %edi
0xc03258a4 <strncpy_from_user+4>: push %esi
0xc03258a5 <strncpy_from_user+5>: push %ebx
0xc03258a6 <strncpy_from_user+6>: sub $0x8,%esp
0xc03258a9 <strncpy_from_user+9>: mov 0xc(%ebp),%esi
0xc03258ac <strncpy_from_user+12>: movl $0xfffffff2,0xffffffec(%ebp)
0xc03258b3 <strncpy_from_user+19>: mov $0xffffe000,%eax
0xc03258b8 <strncpy_from_user+24>: mov %esi,%ebx
0xc03258ba <strncpy_from_user+26>: and %esp,%eax
0xc03258bc <strncpy_from_user+28>: add $0x1,%ebx
0xc03258bf <strncpy_from_user+31>: sbb %edx,%edx
0xc03258c1 <strncpy_from_user+33>: cmp %ebx,0xc(%eax)
0xc03258c4 <strncpy_from_user+36>: sbb $0x0,%edx
0xc03258c7 <strncpy_from_user+39>: test %edx,%edx
0xc03258c9 <strncpy_from_user+41>: mov 0x8(%ebp),%edi
0xc03258cc <strncpy_from_user+44>: mov 0x10(%ebp),%ecx
0xc03258cf <strncpy_from_user+47>: jne 0xc03258e5 <strncpy_from_user+69>
0xc03258d1 <strncpy_from_user+49>: mov %ecx,%edx
0xc03258d3 <strncpy_from_user+51>: test %ecx,%ecx
0xc03258d5 <strncpy_from_user+53>: je 0xc03258e2 <strncpy_from_user+66>
0xc03258d7 <strncpy_from_user+55>: lods %ds:(%esi),%al
0xc03258d8 <strncpy_from_user+56>: stos %al,%es:(%edi)
0xc03258d9 <strncpy_from_user+57>: test %al,%al
0xc03258db <strncpy_from_user+59>: je 0xc03258e0 <strncpy_from_user+64>
0xc03258dd <strncpy_from_user+61>: dec %ecx
0xc03258de <strncpy_from_user+62>: jne 0xc03258d7 <strncpy_from_user+55>
0xc03258e0 <strncpy_from_user+64>: sub %ecx,%edx
0xc03258e2 <strncpy_from_user+66>: mov %edx,0xffffffec(%ebp)
0xc03258e5 <strncpy_from_user+69>: mov 0xffffffec(%ebp),%eax
0xc03258e8 <strncpy_from_user+72>: add $0x8,%esp
0xc03258eb <strncpy_from_user+75>: pop %ebx
0xc03258ec <strncpy_from_user+76>: pop %esi
0xc03258ed <strncpy_from_user+77>: pop %edi
0xc03258ee <strncpy_from_user+78>: leave
0xc03258ef <strncpy_from_user+79>: ret
End of assembler dump.

2.6:
# gdb vmlinux
(gdb) disass strncpy_from_user
Dump of assembler code for function strncpy_from_user:
0xc01d77b9 <strncpy_from_user>: push %ebp
0xc01d77ba <strncpy_from_user+1>: mov %esp,%ebp
0xc01d77bc <strncpy_from_user+3>: push %edi
0xc01d77bd <strncpy_from_user+4>: push %esi
0xc01d77be <strncpy_from_user+5>: push %ebx
0xc01d77bf <strncpy_from_user+6>: sub $0x8,%esp
0xc01d77c2 <strncpy_from_user+9>: mov 0xc(%ebp),%esi
0xc01d77c5 <strncpy_from_user+12>: movl $0xfffffff2,0xffffffec(%ebp)
0xc01d77cc <strncpy_from_user+19>: mov $0xffffe000,%eax
0xc01d77d1 <strncpy_from_user+24>: mov %esi,%ebx
0xc01d77d3 <strncpy_from_user+26>: and %esp,%eax
0xc01d77d5 <strncpy_from_user+28>: add $0x1,%ebx
0xc01d77d8 <strncpy_from_user+31>: sbb %edx,%edx
0xc01d77da <strncpy_from_user+33>: cmp %ebx,0x18(%eax)
0xc01d77dd <strncpy_from_user+36>: sbb $0x0,%edx
0xc01d77e0 <strncpy_from_user+39>: test %edx,%edx
0xc01d77e2 <strncpy_from_user+41>: mov 0x8(%ebp),%edi
0xc01d77e5 <strncpy_from_user+44>: mov 0x10(%ebp),%ecx
0xc01d77e8 <strncpy_from_user+47>: jne 0xc01d77fe <strncpy_from_user+69>
0xc01d77ea <strncpy_from_user+49>: mov %ecx,%edx
0xc01d77ec <strncpy_from_user+51>: test %ecx,%ecx
0xc01d77ee <strncpy_from_user+53>: je 0xc01d77fb <strncpy_from_user+66>
0xc01d77f0 <strncpy_from_user+55>: lods %ds:(%esi),%al
0xc01d77f1 <strncpy_from_user+56>: stos %al,%es:(%edi)
0xc01d77f2 <strncpy_from_user+57>: test %al,%al
0xc01d77f4 <strncpy_from_user+59>: je 0xc01d77f9 <strncpy_from_user+64>
0xc01d77f6 <strncpy_from_user+61>: dec %ecx
0xc01d77f7 <strncpy_from_user+62>: jne 0xc01d77f0 <strncpy_from_user+55>
0xc01d77f9 <strncpy_from_user+64>: sub %ecx,%edx
0xc01d77fb <strncpy_from_user+66>: mov %edx,0xffffffec(%ebp)
0xc01d77fe <strncpy_from_user+69>: mov 0xffffffec(%ebp),%eax
0xc01d7801 <strncpy_from_user+72>: add $0x8,%esp
0xc01d7804 <strncpy_from_user+75>: pop %ebx
0xc01d7805 <strncpy_from_user+76>: pop %esi
0xc01d7806 <strncpy_from_user+77>: pop %edi
0xc01d7807 <strncpy_from_user+78>: leave
0xc01d7808 <strncpy_from_user+79>: ret
End of assembler dump.

The only difference apart from addresses is:

cmp %ebx,0xc(%eax) --> cmp %ebx,0x18(%eax)

It corresponds to:

57 #define __range_ok(addr,size) ({ \
58 unsigned long flag,sum; \
59 __chk_user_ptr(addr); \
60 asm("addl %3,%1 ; sbbl %0,%0; cmpl %1,%4; sbbl $0,%0" \
^^
61 :"=&r" (flag), "=r" (sum) \
62 :"1" (addr),"g" ((int)(size)),"g" (current_thread_info()->addr_limit.seg)); \
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
63 flag; })

addr_limit.seg must have different offset in 2.6

I did longer profiling runs, with 16x100000 open/close iterations,
twice for each kernel (wanted to see how much variability there is):

2.4_____________________________________ 2.6__________________________
11613 11331 total 13046 12819 total
2477 2300 link_path_walk 3465 3418 link_path_walk
1276 1107 d_lookup 1470 1293 __d_lookup
546 561 dput 793 629 get_empty_filp
518 360 fput 581 386 find_trylock_page
468 415 strncpy_from_user 541 483 dput
451 382 system_call 473 409 do_lookup
367 353 page_follow_link 401 528 strncpy_from_user
366 383 vfs_permission 378 451 system_call
337 368 kmem_cache_free 318 256 follow_mount
334 333 open_namei 280 339 may_open
324 338 __constant_c_and_count_memset 264 269 path_lookup
304 341 get_unused_fd 229 207 read_cache_page

(test script and unabridged results are attached)

strncpy_from_user difference is below variability. Maybe there is no
real problem with it.

However, link_path_walk did took ~50% more CPU.
--
vda


Attachments:
(No filename) (8.09 kB)
z.tar.bz2 (2.62 kB)
Download all attachments