2010-12-12 15:20:09

by Ralf Hildebrandt

[permalink] [raw]
Subject: Costly Context Switches

I recently made a parallel installation of dovecot-2.0 on my mailbox
server, which is running dovecot-1.2 without any problems whatsoever.

Using dovecot-2.0 on the same hardware, same kernel, with the same users and same
mailboxes and usage behaviour results in an immense increase in the
load numbers.

Switching back to 1.2 results in a immediate decrease of the load back
to "normal" numbers.

This is mainly due to a 10-20 fold increase of the number of context
switches. The same problem has been reported independently by Cor
Bosman of XS4All, on different hardware (64bit instead of 32bit, real
hardware instead of virtual hardware).

So, now the kernel related question: How can I find out WHY the
context switches are happening? Are there any "in kernel" statistics I
could look at?

I'm running an Ubuntu kernel: 2.6.32-27-generic-pae #49-Ubuntu SMP

--
Ralf Hildebrandt
Geschäftsbereich IT | Abteilung Netzwerk
Charité - Universitätsmedizin Berlin
Campus Benjamin Franklin
Hindenburgdamm 30 | D-12203 Berlin
Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
[email protected] | http://www.charite.de


2010-12-12 19:17:26

by Andres Freund

[permalink] [raw]
Subject: Re: Costly Context Switches

On Sunday 12 December 2010 16:11:12 Ralf Hildebrandt wrote:
> I recently made a parallel installation of dovecot-2.0 on my mailbox
> server, which is running dovecot-1.2 without any problems whatsoever.
>
> Using dovecot-2.0 on the same hardware, same kernel, with the same users
> and same mailboxes and usage behaviour results in an immense increase in
> the load numbers.
>
> Switching back to 1.2 results in a immediate decrease of the load back
> to "normal" numbers.
>
> This is mainly due to a 10-20 fold increase of the number of context
> switches. The same problem has been reported independently by Cor
> Bosman of XS4All, on different hardware (64bit instead of 32bit, real
> hardware instead of virtual hardware).
>
> So, now the kernel related question: How can I find out WHY the
> context switches are happening? Are there any "in kernel" statistics I
> could look at?
"strace" or "perf trace syscall-counts" would be a good start.

Andres

2010-12-13 13:51:17

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: Costly Context Switches

Em Sun, Dec 12, 2010 at 08:07:07PM +0100, Andres Freund escreveu:
> On Sunday 12 December 2010 16:11:12 Ralf Hildebrandt wrote:
> > I recently made a parallel installation of dovecot-2.0 on my mailbox
> > server, which is running dovecot-1.2 without any problems whatsoever.

> > Using dovecot-2.0 on the same hardware, same kernel, with the same
> > users and same mailboxes and usage behaviour results in an immense
> > increase in the load numbers.

> > Switching back to 1.2 results in a immediate decrease of the load back
> > to "normal" numbers.

> > This is mainly due to a 10-20 fold increase of the number of context
> > switches. The same problem has been reported independently by Cor
> > Bosman of XS4All, on different hardware (64bit instead of 32bit,
> > real hardware instead of virtual hardware).

> > So, now the kernel related question: How can I find out WHY the
> > context switches are happening? Are there any "in kernel" statistics
> > I could look at?

> "strace" or "perf trace syscall-counts" would be a good start.

Better to record just "cs" (Context Switches) events and also to collect
callchains when those events take place:

[acme@felicio ~]$ perf record -e cs -g chromium-browser
^C
[acme@felicio ~]$ perf report
# Overhead Command Shared Object Symbol
# ........ ............... ................. ......
#
91.32% chromium-browse [kernel.kallsyms] [k] perf_event_task_sched_out
|
--- perf_event_task_sched_out
|
|--37.80%-- sysret_careful
| |
| |--85.11%-- 0x365300e42d
| |
| --14.89%-- 0x365300e24a
|
|--30.29%-- retint_careful
| |
| |--29.20%-- 0x3652809658
| |
| |--28.32%-- 0x7fcb90463603
| |
| |--15.04%-- 0x7fcb8bca2e20
| |
| |--14.16%-- 0x7fcb903ff36e
| |
| --13.27%-- 0x3652809d2f
|
|--23.86%-- schedule_timeout
| |
| |--83.15%-- sys_epoll_wait
| | system_call_fastpath
| | 0x3652ce5013
| |
| --16.85%-- __skb_recv_datagram
| skb_recv_datagram
| unix_dgram_recvmsg
| __sock_recvmsg
| sock_recvmsg
| __sys_recvmsg
| sys_recvmsg
| system_call_fastpath
| __recvmsg
|
|--4.02%-- __cond_resched
| _cond_resched
| might_fault
| memcpy_toiovec
| unix_stream_recvmsg
| __sock_recvmsg

| sock_aio_read
| do_sync_read
| vfs_read
| sys_read
| system_call_fastpath
| 0x365300e48d
| (nil)
|
--4.02%-- futex_wait_queue_me
futex_wait
do_futex
sys_futex
system_call_fastpath
__pthread_cond_timedwait

7.05% chrome-sandbox [kernel.kallsyms] [k] perf_event_task_sched_out
|
--- perf_event_task_sched_out
__cond_resched
_cond_resched
might_fault
filldir
proc_fill_cache
proc_readfd_common
proc_readfd
vfs_readdir
sys_getdents
system_call_fastpath
__getdents64

1.34% gconftool-2 [kernel.kallsyms] [k] perf_event_task_sched_out
|
--- perf_event_task_sched_out
sysret_careful
|
|--51.61%-- __recv
|
--48.39%-- __recvmsg


- Arnaldo

2010-12-13 14:09:22

by Ralf Hildebrandt

[permalink] [raw]
Subject: Re: Costly Context Switches

* Arnaldo Carvalho de Melo <[email protected]>:

> Better to record just "cs" (Context Switches) events and also to collect
> callchains when those events take place:
>
> [acme@felicio ~]$ perf record -e cs -g chromium-browser


OK, I will try this. The application is dovecot-2.0, so I need to perf
record it and all it's children (imap, pop3, auth, etc).

Is this done by default?

--
Ralf Hildebrandt
Geschäftsbereich IT | Abteilung Netzwerk
Charité - Universitätsmedizin Berlin
Campus Benjamin Franklin
Hindenburgdamm 30 | D-12203 Berlin
Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
[email protected] | http://www.charite.de

2010-12-13 14:26:00

by Andres Freund

[permalink] [raw]
Subject: Re: Costly Context Switches

On Monday 13 December 2010 14:51:04 Arnaldo Carvalho de Melo wrote:
> Em Sun, Dec 12, 2010 at 08:07:07PM +0100, Andres Freund escreveu:
> > On Sunday 12 December 2010 16:11:12 Ralf Hildebrandt wrote:
> > > I recently made a parallel installation of dovecot-2.0 on my mailbox
> > > server, which is running dovecot-1.2 without any problems whatsoever.
> > >
> > > Using dovecot-2.0 on the same hardware, same kernel, with the same
> > > users and same mailboxes and usage behaviour results in an immense
> > > increase in the load numbers.
> > >
> > > Switching back to 1.2 results in a immediate decrease of the load back
> > > to "normal" numbers.
> > >
> > > This is mainly due to a 10-20 fold increase of the number of context
> > > switches. The same problem has been reported independently by Cor
> > > Bosman of XS4All, on different hardware (64bit instead of 32bit,
> > > real hardware instead of virtual hardware).
> > >
> > > So, now the kernel related question: How can I find out WHY the
> > > context switches are happening? Are there any "in kernel" statistics
> > > I could look at?
> >
> > "strace" or "perf trace syscall-counts" would be a good start.
>
> Better to record just "cs" (Context Switches) events and also to collect
> callchains when those events take place:
Hm. Its also a good starting point but it may be harder to see the differences
between dovecot-2.0 and dovecot-1.2 that way because its harder to see if its
the usage being different causing the problem (i.e. calling in a different
order, trashing caches) or if its the amount of syscalls that changed. But I
agree that both are very usefull analyze problems like that.

Andres

2010-12-13 14:29:18

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: Costly Context Switches

Em Mon, Dec 13, 2010 at 03:25:54PM +0100, Andres Freund escreveu:
> On Monday 13 December 2010 14:51:04 Arnaldo Carvalho de Melo wrote:
> > Em Sun, Dec 12, 2010 at 08:07:07PM +0100, Andres Freund escreveu:
> > > On Sunday 12 December 2010 16:11:12 Ralf Hildebrandt wrote:
> > > > I recently made a parallel installation of dovecot-2.0 on my mailbox
> > > > server, which is running dovecot-1.2 without any problems whatsoever.
> > > >
> > > > Using dovecot-2.0 on the same hardware, same kernel, with the same
> > > > users and same mailboxes and usage behaviour results in an immense
> > > > increase in the load numbers.
> > > >
> > > > Switching back to 1.2 results in a immediate decrease of the load back
> > > > to "normal" numbers.
> > > >
> > > > This is mainly due to a 10-20 fold increase of the number of context
> > > > switches. The same problem has been reported independently by Cor
> > > > Bosman of XS4All, on different hardware (64bit instead of 32bit,
> > > > real hardware instead of virtual hardware).
> > > >
> > > > So, now the kernel related question: How can I find out WHY the
> > > > context switches are happening? Are there any "in kernel" statistics
> > > > I could look at?
> > >
> > > "strace" or "perf trace syscall-counts" would be a good start.
> >
> > Better to record just "cs" (Context Switches) events and also to collect
> > callchains when those events take place:

> Hm. Its also a good starting point but it may be harder to see the differences
> between dovecot-2.0 and dovecot-1.2 that way because its harder to see if its

Well, for that he can try 'perf diff', i.e.:

1. run perf record on dovecot-1.2
2. run perf record on dovecot-2.0
3. perf diff

> the usage being different causing the problem (i.e. calling in a different
> order, trashing caches) or if its the amount of syscalls that changed. But I
> agree that both are very usefull analyze problems like that.

Well, I tried to answer his question:

"How can I find out WHY the context switches are happening?"

:-)

- Arnaldo

2010-12-13 14:30:50

by Andres Freund

[permalink] [raw]
Subject: Re: Costly Context Switches

On Monday 13 December 2010 15:29:11 Arnaldo Carvalho de Melo wrote:
> Em Mon, Dec 13, 2010 at 03:25:54PM +0100, Andres Freund escreveu:
> > On Monday 13 December 2010 14:51:04 Arnaldo Carvalho de Melo wrote:
> > > Em Sun, Dec 12, 2010 at 08:07:07PM +0100, Andres Freund escreveu:
> > > > On Sunday 12 December 2010 16:11:12 Ralf Hildebrandt wrote:
> > > > > I recently made a parallel installation of dovecot-2.0 on my
> > > > > mailbox server, which is running dovecot-1.2 without any problems
> > > > > whatsoever.
> > > > >
> > > > > Using dovecot-2.0 on the same hardware, same kernel, with the same
> > > > > users and same mailboxes and usage behaviour results in an immense
> > > > > increase in the load numbers.
> > > > >
> > > > > Switching back to 1.2 results in a immediate decrease of the load
> > > > > back to "normal" numbers.
> > > > >
> > > > > This is mainly due to a 10-20 fold increase of the number of
> > > > > context switches. The same problem has been reported independently
> > > > > by Cor Bosman of XS4All, on different hardware (64bit instead of
> > > > > 32bit, real hardware instead of virtual hardware).
> > > > >
> > > > > So, now the kernel related question: How can I find out WHY the
> > > > > context switches are happening? Are there any "in kernel"
> > > > > statistics I could look at?
> > > >
> > > > "strace" or "perf trace syscall-counts" would be a good start.
> > >
> > > Better to record just "cs" (Context Switches) events and also to
> > > collect
> >
> > > callchains when those events take place:
> > Hm. Its also a good starting point but it may be harder to see the
> > differences between dovecot-2.0 and dovecot-1.2 that way because its
> > harder to see if its
>
> Well, for that he can try 'perf diff', i.e.:
>
> 1. run perf record on dovecot-1.2
> 2. run perf record on dovecot-2.0
> 3. perf diff
Wow. That one I didnt know about yet.

Too bad that perf docs/examples are so spread out...

Greetings,

Andres

2010-12-13 14:36:37

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: Costly Context Switches

Em Mon, Dec 13, 2010 at 03:30:46PM +0100, Andres Freund escreveu:
> On Monday 13 December 2010 15:29:11 Arnaldo Carvalho de Melo wrote:
> > Em Mon, Dec 13, 2010 at 03:25:54PM +0100, Andres Freund escreveu:
> > > Hm. Its also a good starting point but it may be harder to see the
> > > differences between dovecot-2.0 and dovecot-1.2 that way because its
> > > harder to see if its

> > Well, for that he can try 'perf diff', i.e.:

> > 1. run perf record on dovecot-1.2
> > 2. run perf record on dovecot-2.0
> > 3. perf diff
> Wow. That one I didnt know about yet.
>
> Too bad that perf docs/examples are so spread out...

There is a bit of info on these places:

http://vger.kernel.org/~acme/perf/lk2010-perf-paper.pdf
http://vger.kernel.org/~acme/perf/perf-plumbers2010.pdf

Also please consider helping organizing the existing documentation in a
way that gets them more easily accessible :-)

- Arnaldo