2005-05-06 01:41:57

by Parag Warudkar

[permalink] [raw]
Subject: X86_64 Ctx switch times - 32bit vs 64bit

I was experimenting with the attached program (taken from an IBM
Developerworks article) to find the context switch times on AMD64 machine.

With a 64bit binary I get average 5 to 8 usec/cswitch, whereas the same
program compiled as 32bit consistently gives >= 10 usec/cswitch - sometimes
even 13 usec/cswitch.

Are there more context switching overheads when running 32bit programs on a
64bit kernel?

Kernel version is 2.6.11-gentoo x86_64.
64bit compile - g++ -O2 -pthread csfast5.cpp -ocsfast64
32bit compile - g++ -m32 -O2 -pthread csfast5.cpp -ocsfast32
Run - ./csfast{32/64} -t 40 -c4 10

Parag


Attachments:
(No filename) (608.00 B)
csfast5.cpp (11.33 kB)
Download all attachments

2005-05-07 14:40:17

by Andi Kleen

[permalink] [raw]
Subject: Re: X86_64 Ctx switch times - 32bit vs 64bit

Parag Warudkar <[email protected]> writes:

> I was experimenting with the attached program (taken from an IBM
> Developerworks article) to find the context switch times on AMD64 machine.
>
> With a 64bit binary I get average 5 to 8 usec/cswitch, whereas the same
> program compiled as 32bit consistently gives >= 10 usec/cswitch - sometimes
> even 13 usec/cswitch.
>
> Are there more context switching overheads when running 32bit programs on a
> 64bit kernel?

Should be nearly the same in theory, no. This means 32bit programs use %gs
as thread register which is a bit more costly to switch because
the kernel uses it internally too, but the difference should be less
than that.

I suspect your program is more testing the locks anyways, perhaps
there is some other difference in the glibc. e.g. 32bit glibc compiled
for pre 686 CPUs has slower locks.

oprofile might provide more clue where the overhead is.

-Andi

2005-05-10 01:33:58

by Parag Warudkar

[permalink] [raw]
Subject: Re: X86_64 Ctx switch times - 32bit vs 64bit

> > Are there more context switching overheads when running 32bit programs on
> > a 64bit kernel?
>
> Should be nearly the same in theory, no. This means 32bit programs use %gs
> as thread register which is a bit more costly to switch because
> the kernel uses it internally too, but the difference should be less
> than that.
>
> I suspect your program is more testing the locks anyways, perhaps
> there is some other difference in the glibc. e.g. 32bit glibc compiled
> for pre 686 CPUs has slower locks.
>
> oprofile might provide more clue where the overhead is.
>
> -Andi

I ran the 32 bit and 64 bit versions in a loop for good amount of time and
captured oprofile report for vmlinux after each run. For the 32 bit runs the
top 5 are - (Also attached are complete reports after 32 bit and 64 bit runs)

samples % symbol name
20820 11.5424 copy_user_generic_c
12990 7.2015 ia32_syscall
10131 5.6165 gs_change
9053 5.0189 __dequeue_signal
7494 4.1546 find_pid

For the 64 bit run top 5 are -
samples % symbol name
10604 8.1075 system_call
8497 6.4965 __switch_to
8041 6.1479 do_signal
7712 5.8963 schedule
6538 4.9987 __dequeue_signal

I am yet to interpret/analyze the above...

Parag


Attachments:
(No filename) (1.24 kB)
32 (28.62 kB)
64 (20.73 kB)
Download all attachments

2005-05-10 01:48:12

by Andi Kleen

[permalink] [raw]
Subject: Re: X86_64 Ctx switch times - 32bit vs 64bit

> samples % symbol name
> 20820 11.5424 copy_user_generic_c
> 12990 7.2015 ia32_syscall
> 10131 5.6165 gs_change
> 9053 5.0189 __dequeue_signal
> 7494 4.1546 find_pid

Context switch does not even appear. Probably the i386 glibc does
something much slower than 64bit. I would compare straces.

-Andi