2009-04-28 17:11:15

by Styner, Douglas W

[permalink] [raw]
Subject: Mainline kernel OLTP performance update

Summary: Measured the mainline kernel from kernel.org (2.6.30-rc3).

The regression for 2.6.30-rc3 against the baseline, 2.6.24.2 is 1.91%. Oprofile reports 71.1626% user, 28.8295% system.

Linux OLTP Performance summary
Kernel# Speedup(x) Intr/s CtxSw/s us% sys% idle% iowait%
2.6.24.2 1.000 22106 43709 75 24 0 0
2.6.30-rc3 0.981 30645 43027 75 25 0 0

Server configurations:
Intel Xeon Quad-core 2.0GHz 2 cpus/8 cores/8 threads
64GB memory, 3 qle2462 FC HBA, 450 spindles (30 logical units)


======oprofile CPU_CLK_UNHALTED for top 30 functions
Cycles% 2.6.24.2 Cycles% 2.6.30-rc3
74.8578 <database> 69.1925 <database>
1.0500 qla24xx_start_scsi 1.1314 qla24xx_intr_handler
0.8089 schedule 1.0031 qla24xx_start_scsi
0.5864 kmem_cache_alloc 0.8476 __schedule
0.4989 __blockdev_direct_IO 0.6532 kmem_cache_alloc
0.4357 __sigsetjmp 0.4490 __blockdev_direct_IO
0.4152 copy_user_generic_string 0.4199 __sigsetjmp
0.3953 qla24xx_intr_handler 0.3946 __switch_to
0.3850 memcpy 0.3538 __list_add
0.3596 scsi_request_fn 0.3499 task_rq_lock
0.3188 __switch_to 0.3402 scsi_request_fn
0.2889 lock_timer_base 0.3382 rb_get_reader_page
0.2750 memmove 0.3363 copy_user_generic_string
0.2519 task_rq_lock 0.3324 aio_complete
0.2474 aio_complete 0.3110 try_to_wake_up
0.2460 scsi_alloc_sgtable 0.2877 ring_buffer_consume
0.2445 generic_make_request 0.2683 mod_timer
0.2263 qla2x00_process_completed_re0.2605 qla2x00_process_completed_re
0.2118 blk_queue_end_tag 0.2566 blk_queue_end_tag
0.2085 dio_bio_complete 0.2566 generic_make_request
0.2021 e1000_xmit_frame 0.2547 tcp_sendmsg
0.2006 __end_that_request_first 0.2372 lock_timer_base
0.1954 generic_file_aio_read 0.2333 memmove
0.1949 kfree 0.2294 memset_c
0.1915 tcp_sendmsg 0.2080 mempool_free
0.1901 try_to_wake_up 0.2022 generic_file_aio_read
0.1895 kref_get 0.1963 scsi_device_unbusy
0.1864 __mod_timer 0.1963 plist_del
0.1863 thread_return 0.1944 dequeue_rt_stack
0.1854 math_state_restore 0.1924 e1000_xmit_frame

Thanks
Doug


2009-04-29 07:40:53

by Andrew Morton

[permalink] [raw]
Subject: Re: Mainline kernel OLTP performance update

On Tue, 28 Apr 2009 10:08:22 -0700 "Styner, Douglas W" <[email protected]> wrote:

> Summary: Measured the mainline kernel from kernel.org (2.6.30-rc3).
>
> The regression for 2.6.30-rc3 against the baseline, 2.6.24.2 is 1.91%. Oprofile reports 71.1626% user, 28.8295% system.
>
> Linux OLTP Performance summary
> Kernel# Speedup(x) Intr/s CtxSw/s us% sys% idle% iowait%
> 2.6.24.2 1.000 22106 43709 75 24 0 0
> 2.6.30-rc3 0.981 30645 43027 75 25 0 0

The main difference there is the interrupt frequency. Do we know which
interrupt source(s) caused this?

> Server configurations:
> Intel Xeon Quad-core 2.0GHz 2 cpus/8 cores/8 threads
> 64GB memory, 3 qle2462 FC HBA, 450 spindles (30 logical units)
>
>
> ======oprofile CPU_CLK_UNHALTED for top 30 functions
> Cycles% 2.6.24.2 Cycles% 2.6.30-rc3
> 74.8578 <database> 69.1925 <database>

ouch, that's a large drop in userspace CPU occupancy. It seems
inconsistent with the 1.91% above.

> 1.0500 qla24xx_start_scsi 1.1314 qla24xx_intr_handler
> 0.8089 schedule 1.0031 qla24xx_start_scsi
> 0.5864 kmem_cache_alloc 0.8476 __schedule
> 0.4989 __blockdev_direct_IO 0.6532 kmem_cache_alloc
> 0.4357 __sigsetjmp 0.4490 __blockdev_direct_IO
> 0.4152 copy_user_generic_string 0.4199 __sigsetjmp
> 0.3953 qla24xx_intr_handler 0.3946 __switch_to
> 0.3850 memcpy 0.3538 __list_add
> 0.3596 scsi_request_fn 0.3499 task_rq_lock
> 0.3188 __switch_to 0.3402 scsi_request_fn
> 0.2889 lock_timer_base 0.3382 rb_get_reader_page
> 0.2750 memmove 0.3363 copy_user_generic_string
> 0.2519 task_rq_lock 0.3324 aio_complete
> 0.2474 aio_complete 0.3110 try_to_wake_up
> 0.2460 scsi_alloc_sgtable 0.2877 ring_buffer_consume
> 0.2445 generic_make_request 0.2683 mod_timer
> 0.2263 qla2x00_process_completed_re0.2605 qla2x00_process_completed_re
> 0.2118 blk_queue_end_tag 0.2566 blk_queue_end_tag
> 0.2085 dio_bio_complete 0.2566 generic_make_request
> 0.2021 e1000_xmit_frame 0.2547 tcp_sendmsg
> 0.2006 __end_that_request_first 0.2372 lock_timer_base
> 0.1954 generic_file_aio_read 0.2333 memmove
> 0.1949 kfree 0.2294 memset_c
> 0.1915 tcp_sendmsg 0.2080 mempool_free
> 0.1901 try_to_wake_up 0.2022 generic_file_aio_read
> 0.1895 kref_get 0.1963 scsi_device_unbusy
> 0.1864 __mod_timer 0.1963 plist_del
> 0.1863 thread_return 0.1944 dequeue_rt_stack
> 0.1854 math_state_restore 0.1924 e1000_xmit_frame

2009-04-29 08:28:48

by Andi Kleen

[permalink] [raw]
Subject: Re: Mainline kernel OLTP performance update

Andrew Morton <[email protected]> writes:

>> ======oprofile CPU_CLK_UNHALTED for top 30 functions
>> Cycles% 2.6.24.2 Cycles% 2.6.30-rc3
>> 74.8578 <database> 69.1925 <database>
>
> ouch, that's a large drop in userspace CPU occupancy. It seems
> inconsistent with the 1.91% above.

That was determined to be an oprofile artifact/regression (see Doug's
other email+thread) The 2.6.30 oprofile seems to be less accurate than
the one in 2.6.24. Of course the question is if it can't get
the user space right, is the kernel data accurate. But I believe
Doug verified with vtune that the kernel data is roughly correct,
just user space profiling was slightly bogus (right, Doug, or
do I misrepresent that?)

-Andi

--
[email protected] -- Speaking for myself only.

2009-04-29 15:48:37

by Styner, Douglas W

[permalink] [raw]
Subject: RE: Mainline kernel OLTP performance update

Our analysis of the interrupts shows that rescheduling interrupts are up 2.2x from 2.6.24.2 --> 2.6.30-rc3. Qla2xxx interrupts are roughly the same.

Doug

>-----Original Message-----
>From: Andrew Morton [mailto:[email protected]]
>Sent: Wednesday, April 29, 2009 12:30 AM
>To: Styner, Douglas W
>Cc: [email protected]; Tripathi, Sharad C;
>[email protected]; Wilcox, Matthew R; Kleen, Andi; Siddha, Suresh B;
>Ma, Chinang; Wang, Peter Xihong; Nueckel, Hubert; Recalde, Luis F; Nelson,
>Doug; Cheng, Wu-sun; Prickett, Terry O; Shunmuganathan, Rajalakshmi; Garg,
>Anil K; Chilukuri, Harita; [email protected]
>Subject: Re: Mainline kernel OLTP performance update
>
>On Tue, 28 Apr 2009 10:08:22 -0700 "Styner, Douglas W"
><[email protected]> wrote:
>
>> Summary: Measured the mainline kernel from kernel.org (2.6.30-rc3).
>>
>> The regression for 2.6.30-rc3 against the baseline, 2.6.24.2 is 1.91%.
>Oprofile reports 71.1626% user, 28.8295% system.
>>
>> Linux OLTP Performance summary
>> Kernel# Speedup(x) Intr/s CtxSw/s us% sys% idle%
>iowait%
>> 2.6.24.2 1.000 22106 43709 75 24 0 0
>> 2.6.30-rc3 0.981 30645 43027 75 25 0 0
>
>The main difference there is the interrupt frequency. Do we know which
>interrupt source(s) caused this?
>
>> Server configurations:
>> Intel Xeon Quad-core 2.0GHz 2 cpus/8 cores/8 threads
>> 64GB memory, 3 qle2462 FC HBA, 450 spindles (30 logical units)
>>
>>
>> ======oprofile CPU_CLK_UNHALTED for top 30 functions
>> Cycles% 2.6.24.2 Cycles% 2.6.30-rc3
>> 74.8578 <database> 69.1925 <database>
>
>ouch, that's a large drop in userspace CPU occupancy. It seems
>inconsistent with the 1.91% above.
>
>> 1.0500 qla24xx_start_scsi 1.1314 qla24xx_intr_handler
>> 0.8089 schedule 1.0031 qla24xx_start_scsi
>> 0.5864 kmem_cache_alloc 0.8476 __schedule
>> 0.4989 __blockdev_direct_IO 0.6532 kmem_cache_alloc
>> 0.4357 __sigsetjmp 0.4490 __blockdev_direct_IO
>> 0.4152 copy_user_generic_string 0.4199 __sigsetjmp
>> 0.3953 qla24xx_intr_handler 0.3946 __switch_to
>> 0.3850 memcpy 0.3538 __list_add
>> 0.3596 scsi_request_fn 0.3499 task_rq_lock
>> 0.3188 __switch_to 0.3402 scsi_request_fn
>> 0.2889 lock_timer_base 0.3382 rb_get_reader_page
>> 0.2750 memmove 0.3363 copy_user_generic_string
>> 0.2519 task_rq_lock 0.3324 aio_complete
>> 0.2474 aio_complete 0.3110 try_to_wake_up
>> 0.2460 scsi_alloc_sgtable 0.2877 ring_buffer_consume
>> 0.2445 generic_make_request 0.2683 mod_timer
>> 0.2263 qla2x00_process_completed_re0.2605 qla2x00_process_completed_re
>> 0.2118 blk_queue_end_tag 0.2566 blk_queue_end_tag
>> 0.2085 dio_bio_complete 0.2566 generic_make_request
>> 0.2021 e1000_xmit_frame 0.2547 tcp_sendmsg
>> 0.2006 __end_that_request_first 0.2372 lock_timer_base
>> 0.1954 generic_file_aio_read 0.2333 memmove
>> 0.1949 kfree 0.2294 memset_c
>> 0.1915 tcp_sendmsg 0.2080 mempool_free
>> 0.1901 try_to_wake_up 0.2022 generic_file_aio_read
>> 0.1895 kref_get 0.1963 scsi_device_unbusy
>> 0.1864 __mod_timer 0.1963 plist_del
>> 0.1863 thread_return 0.1944 dequeue_rt_stack
>> 0.1854 math_state_restore 0.1924 e1000_xmit_frame

2009-04-29 16:00:42

by Styner, Douglas W

[permalink] [raw]
Subject: RE: Mainline kernel OLTP performance update

What we showed was that vmstat and vtune agreed wrt system/user time. Oprofile is off by ~4% (4% too low for user. 4% too high for system)

>-----Original Message-----
>From: Andi Kleen [mailto:[email protected]]
>Sent: Wednesday, April 29, 2009 1:28 AM
>To: Andrew Morton
>Cc: Styner, Douglas W; [email protected]; Tripathi, Sharad C;
>[email protected]; Wilcox, Matthew R; Kleen, Andi; Siddha, Suresh B;
>Ma, Chinang; Wang, Peter Xihong; Nueckel, Hubert; Recalde, Luis F; Nelson,
>Doug; Cheng, Wu-sun; Prickett, Terry O; Shunmuganathan, Rajalakshmi; Garg,
>Anil K; Chilukuri, Harita; [email protected]
>Subject: Re: Mainline kernel OLTP performance update
>
>Andrew Morton <[email protected]> writes:
>
>>> ======oprofile CPU_CLK_UNHALTED for top 30 functions
>>> Cycles% 2.6.24.2 Cycles% 2.6.30-rc3
>>> 74.8578 <database> 69.1925 <database>
>>
>> ouch, that's a large drop in userspace CPU occupancy. It seems
>> inconsistent with the 1.91% above.
>
>That was determined to be an oprofile artifact/regression (see Doug's
>other email+thread) The 2.6.30 oprofile seems to be less accurate than
>the one in 2.6.24. Of course the question is if it can't get
>the user space right, is the kernel data accurate. But I believe
>Doug verified with vtune that the kernel data is roughly correct,
>just user space profiling was slightly bogus (right, Doug, or
>do I misrepresent that?)
>
>-Andi
>
>--
>[email protected] -- Speaking for myself only.

2009-04-29 16:07:37

by Matthew Wilcox

[permalink] [raw]
Subject: RE: Mainline kernel OLTP performance update

Is it possible that's simply 'oprofile has a 4% overhead'?

> -----Original Message-----
> From: Styner, Douglas W
> Sent: Wednesday, April 29, 2009 9:00 AM
> To: Andi Kleen; Andrew Morton
> Cc: [email protected]; Tripathi, Sharad C;
> [email protected]; Wilcox, Matthew R; Kleen, Andi; Siddha, Suresh B;
> Ma, Chinang; Wang, Peter Xihong; Nueckel, Hubert; Recalde, Luis F; Nelson,
> Doug; Cheng, Wu-sun; Prickett, Terry O; Shunmuganathan, Rajalakshmi; Garg,
> Anil K; Chilukuri, Harita; [email protected]
> Subject: RE: Mainline kernel OLTP performance update
>
> What we showed was that vmstat and vtune agreed wrt system/user time.
> Oprofile is off by ~4% (4% too low for user. 4% too high for system)
>
> >-----Original Message-----
> >From: Andi Kleen [mailto:[email protected]]
> >Sent: Wednesday, April 29, 2009 1:28 AM
> >To: Andrew Morton
> >Cc: Styner, Douglas W; [email protected]; Tripathi, Sharad C;
> >[email protected]; Wilcox, Matthew R; Kleen, Andi; Siddha, Suresh B;
> >Ma, Chinang; Wang, Peter Xihong; Nueckel, Hubert; Recalde, Luis F;
> Nelson,
> >Doug; Cheng, Wu-sun; Prickett, Terry O; Shunmuganathan, Rajalakshmi;
> Garg,
> >Anil K; Chilukuri, Harita; [email protected]
> >Subject: Re: Mainline kernel OLTP performance update
> >
> >Andrew Morton <[email protected]> writes:
> >
> >>> ======oprofile CPU_CLK_UNHALTED for top 30 functions
> >>> Cycles% 2.6.24.2 Cycles% 2.6.30-rc3
> >>> 74.8578 <database> 69.1925 <database>
> >>
> >> ouch, that's a large drop in userspace CPU occupancy. It seems
> >> inconsistent with the 1.91% above.
> >
> >That was determined to be an oprofile artifact/regression (see Doug's
> >other email+thread) The 2.6.30 oprofile seems to be less accurate than
> >the one in 2.6.24. Of course the question is if it can't get
> >the user space right, is the kernel data accurate. But I believe
> >Doug verified with vtune that the kernel data is roughly correct,
> >just user space profiling was slightly bogus (right, Doug, or
> >do I misrepresent that?)
> >
> >-Andi
> >
> >--
> >[email protected] -- Speaking for myself only.
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2009-04-29 16:15:25

by Andi Kleen

[permalink] [raw]
Subject: Re: Mainline kernel OLTP performance update

On Wed, Apr 29, 2009 at 10:06:44AM -0600, Wilcox, Matthew R wrote:
> Is it possible that's simply 'oprofile has a 4% overhead'?

We would expect that overhead to be spread between kernel and
database then, not only database. Maybe it's partly, but it's
probably not the complete answer.

Also at least the user space/context part of oprofile is profiled by
oprofile too, just not the nmi handler, so if these parts are
expensive it should be visible.

That the lowering of the period made a difference was interesting.
It might be that oprofile is just getting more and more inaccurate.

-Andi

2009-04-29 16:17:36

by Andrew Morton

[permalink] [raw]
Subject: Re: Mainline kernel OLTP performance update

On Wed, 29 Apr 2009 08:48:19 -0700 "Styner, Douglas W" <[email protected]> wrote:

> >-----Original Message-----
> >From: Andrew Morton [mailto:[email protected]]
> >Sent: Wednesday, April 29, 2009 12:30 AM
> >To: Styner, Douglas W
> >Cc: [email protected]; Tripathi, Sharad C;
> >[email protected]; Wilcox, Matthew R; Kleen, Andi; Siddha, Suresh B;
> >Ma, Chinang; Wang, Peter Xihong; Nueckel, Hubert; Recalde, Luis F; Nelson,
> >Doug; Cheng, Wu-sun; Prickett, Terry O; Shunmuganathan, Rajalakshmi; Garg,
> >Anil K; Chilukuri, Harita; [email protected]
> >Subject: Re: Mainline kernel OLTP performance update
> >
> >On Tue, 28 Apr 2009 10:08:22 -0700 "Styner, Douglas W"
> ><[email protected]> wrote:
> >
> >> Summary: Measured the mainline kernel from kernel.org (2.6.30-rc3).
> >>
> >> The regression for 2.6.30-rc3 against the baseline, 2.6.24.2 is 1.91%.
> >Oprofile reports 71.1626% user, 28.8295% system.
> >>
> >> Linux OLTP Performance summary
> >> Kernel# Speedup(x) Intr/s CtxSw/s us% sys% idle%
> >iowait%
> >> 2.6.24.2 1.000 22106 43709 75 24 0 0
> >> 2.6.30-rc3 0.981 30645 43027 75 25 0 0
> >
> >The main difference there is the interrupt frequency. Do we know which
> >interrupt source(s) caused this?
>
> Our analysis of the interrupts shows that rescheduling interrupts are
> up 2.2x from 2.6.24.2 --> 2.6.30-rc3. Qla2xxx interrupts are roughly
> the same.

(top-posting repaired)

OK, thanks. Seems odd that the rescheduling interrupt rate increased
while the context-switch rate actually fell a couple of percent.

This came up a few weeks ago and iirc Peter was mainly involved, and I
don't believe that anything conclusive ended up happening. Peter,
could you please remind us of (and summarise) the story here?

2009-04-29 16:26:12

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Mainline kernel OLTP performance update

On Wed, 2009-04-29 at 09:07 -0700, Andrew Morton wrote:
> On Wed, 29 Apr 2009 08:48:19 -0700 "Styner, Douglas W" <[email protected]> wrote:
>
> > >-----Original Message-----
> > >From: Andrew Morton [mailto:[email protected]]
> > >Sent: Wednesday, April 29, 2009 12:30 AM
> > >To: Styner, Douglas W
> > >Cc: [email protected]; Tripathi, Sharad C;
> > >[email protected]; Wilcox, Matthew R; Kleen, Andi; Siddha, Suresh B;
> > >Ma, Chinang; Wang, Peter Xihong; Nueckel, Hubert; Recalde, Luis F; Nelson,
> > >Doug; Cheng, Wu-sun; Prickett, Terry O; Shunmuganathan, Rajalakshmi; Garg,
> > >Anil K; Chilukuri, Harita; [email protected]
> > >Subject: Re: Mainline kernel OLTP performance update
> > >
> > >On Tue, 28 Apr 2009 10:08:22 -0700 "Styner, Douglas W"
> > ><[email protected]> wrote:
> > >
> > >> Summary: Measured the mainline kernel from kernel.org (2.6.30-rc3).
> > >>
> > >> The regression for 2.6.30-rc3 against the baseline, 2.6.24.2 is 1.91%.
> > >Oprofile reports 71.1626% user, 28.8295% system.
> > >>
> > >> Linux OLTP Performance summary
> > >> Kernel# Speedup(x) Intr/s CtxSw/s us% sys% idle%
> > >iowait%
> > >> 2.6.24.2 1.000 22106 43709 75 24 0 0
> > >> 2.6.30-rc3 0.981 30645 43027 75 25 0 0
> > >
> > >The main difference there is the interrupt frequency. Do we know which
> > >interrupt source(s) caused this?
> >
> > Our analysis of the interrupts shows that rescheduling interrupts are
> > up 2.2x from 2.6.24.2 --> 2.6.30-rc3. Qla2xxx interrupts are roughly
> > the same.
>
> (top-posting repaired)
>
> OK, thanks. Seems odd that the rescheduling interrupt rate increased
> while the context-switch rate actually fell a couple of percent.
>
> This came up a few weeks ago and iirc Peter was mainly involved, and I
> don't believe that anything conclusive ended up happening. Peter,
> could you please remind us of (and summarise) the story here?

I've had several reports about the resched-ipi going in overdrive, but
nobody bothered to bisect it, nor have I yet done so -- no clear ideas
on why it is doing so.

I'll put it somewhere higher on the todo list.

2009-04-29 17:48:25

by Chris Mason

[permalink] [raw]
Subject: Re: Mainline kernel OLTP performance update

On Wed, 2009-04-29 at 18:25 +0200, Peter Zijlstra wrote:
> On Wed, 2009-04-29 at 09:07 -0700, Andrew Morton wrote:
> > On Wed, 29 Apr 2009 08:48:19 -0700 "Styner, Douglas W" <[email protected]> wrote:
> >
> > > >-----Original Message-----
> > > >From: Andrew Morton [mailto:[email protected]]
> > > >Sent: Wednesday, April 29, 2009 12:30 AM
> > > >To: Styner, Douglas W
> > > >Cc: [email protected]; Tripathi, Sharad C;
> > > >[email protected]; Wilcox, Matthew R; Kleen, Andi; Siddha, Suresh B;
> > > >Ma, Chinang; Wang, Peter Xihong; Nueckel, Hubert; Recalde, Luis F; Nelson,
> > > >Doug; Cheng, Wu-sun; Prickett, Terry O; Shunmuganathan, Rajalakshmi; Garg,
> > > >Anil K; Chilukuri, Harita; [email protected]
> > > >Subject: Re: Mainline kernel OLTP performance update
> > > >
> > > >On Tue, 28 Apr 2009 10:08:22 -0700 "Styner, Douglas W"
> > > ><[email protected]> wrote:
> > > >
> > > >> Summary: Measured the mainline kernel from kernel.org (2.6.30-rc3).
> > > >>
> > > >> The regression for 2.6.30-rc3 against the baseline, 2.6.24.2 is 1.91%.
> > > >Oprofile reports 71.1626% user, 28.8295% system.
> > > >>
> > > >> Linux OLTP Performance summary
> > > >> Kernel# Speedup(x) Intr/s CtxSw/s us% sys% idle%
> > > >iowait%
> > > >> 2.6.24.2 1.000 22106 43709 75 24 0 0
> > > >> 2.6.30-rc3 0.981 30645 43027 75 25 0 0
> > > >
> > > >The main difference there is the interrupt frequency. Do we know which
> > > >interrupt source(s) caused this?
> > >
> > > Our analysis of the interrupts shows that rescheduling interrupts are
> > > up 2.2x from 2.6.24.2 --> 2.6.30-rc3. Qla2xxx interrupts are roughly
> > > the same.
> >
> > (top-posting repaired)
> >
> > OK, thanks. Seems odd that the rescheduling interrupt rate increased
> > while the context-switch rate actually fell a couple of percent.
> >
> > This came up a few weeks ago and iirc Peter was mainly involved, and I
> > don't believe that anything conclusive ended up happening. Peter,
> > could you please remind us of (and summarise) the story here?
>
> I've had several reports about the resched-ipi going in overdrive, but
> nobody bothered to bisect it, nor have I yet done so -- no clear ideas
> on why it is doing so.
>
> I'll put it somewhere higher on the todo list.
>

One cause of them in the past was the ondemand cpufreq module. It got
fixed up for my laptop workload at least starting w/2.6.29, but it might
make sense to try without ondemand if you're running it.

-chris

2009-04-29 17:52:40

by Styner, Douglas W

[permalink] [raw]
Subject: RE: Mainline kernel OLTP performance update

Peter Ziljstra writes:
>
>I've had several reports about the resched-ipi going in overdrive, but
>nobody bothered to bisect it, nor have I yet done so -- no clear ideas
>on why it is doing so.
>
>I'll put it somewhere higher on the todo list.

FWIW, here is the interrupt data I was referring to. The kernel delta refers to the difference in /proc/interrupts between the start and end of the run. All database processes are running SCHED_RR.

2.6.24.2 2.6.30-rc3
delta delta % change
PCI-MSI-edge qla2xxx 5270060 6118088 16.1%
PCI-MSI-edge qla2xxx 5630742 5439656 -3.4%
PCI-MSI-edge qla2xxx 5836425 5938014 1.7%
PCI-MSI-edge qla2xxx 5774269 6007126 4.0%
PCI-MSI-edge qla2xxx 5239457 5774888 10.2%
PCI-MSI-edge qla2xxx 5965193 5424013 -9.1%
PCI-MSI-edge eth0 31404141 32443614 3.3%
PCI-MSI-edge eth1 1754 1453 -17.2%
Non-maskable interrupts 14041623 12980424 -7.6%
Local timer interrupts 27948168 28911532 3.4%
Rescheduling interrupts 1905119 4226516 121.9%
Function call interrupts 210 49 -76.7%
TLB shootdowns 684 1455 112.7%

2009-04-29 18:10:22

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: Re: Mainline kernel OLTP performance update

On Wed, 2009-04-29 at 10:46 -0700, Chris Mason wrote:
> On Wed, 2009-04-29 at 18:25 +0200, Peter Zijlstra wrote:
> > On Wed, 2009-04-29 at 09:07 -0700, Andrew Morton wrote:
> > > On Wed, 29 Apr 2009 08:48:19 -0700 "Styner, Douglas W" <[email protected]> wrote:
> > >
> > > > >-----Original Message-----
> > > > >From: Andrew Morton [mailto:[email protected]]
> > > > >Sent: Wednesday, April 29, 2009 12:30 AM
> > > > >To: Styner, Douglas W
> > > > >Cc: [email protected]; Tripathi, Sharad C;
> > > > >[email protected]; Wilcox, Matthew R; Kleen, Andi; Siddha, Suresh B;
> > > > >Ma, Chinang; Wang, Peter Xihong; Nueckel, Hubert; Recalde, Luis F; Nelson,
> > > > >Doug; Cheng, Wu-sun; Prickett, Terry O; Shunmuganathan, Rajalakshmi; Garg,
> > > > >Anil K; Chilukuri, Harita; [email protected]
> > > > >Subject: Re: Mainline kernel OLTP performance update
> > > > >
> > > > >On Tue, 28 Apr 2009 10:08:22 -0700 "Styner, Douglas W"
> > > > ><[email protected]> wrote:
> > > > >
> > > > >> Summary: Measured the mainline kernel from kernel.org (2.6.30-rc3).
> > > > >>
> > > > >> The regression for 2.6.30-rc3 against the baseline, 2.6.24.2 is 1.91%.
> > > > >Oprofile reports 71.1626% user, 28.8295% system.
> > > > >>
> > > > >> Linux OLTP Performance summary
> > > > >> Kernel# Speedup(x) Intr/s CtxSw/s us% sys% idle%
> > > > >iowait%
> > > > >> 2.6.24.2 1.000 22106 43709 75 24 0 0
> > > > >> 2.6.30-rc3 0.981 30645 43027 75 25 0 0
> > > > >
> > > > >The main difference there is the interrupt frequency. Do we know which
> > > > >interrupt source(s) caused this?
> > > >
> > > > Our analysis of the interrupts shows that rescheduling interrupts are
> > > > up 2.2x from 2.6.24.2 --> 2.6.30-rc3. Qla2xxx interrupts are roughly
> > > > the same.
> > >
> > > (top-posting repaired)
> > >
> > > OK, thanks. Seems odd that the rescheduling interrupt rate increased
> > > while the context-switch rate actually fell a couple of percent.
> > >
> > > This came up a few weeks ago and iirc Peter was mainly involved, and I
> > > don't believe that anything conclusive ended up happening. Peter,
> > > could you please remind us of (and summarise) the story here?
> >
> > I've had several reports about the resched-ipi going in overdrive, but
> > nobody bothered to bisect it, nor have I yet done so -- no clear ideas
> > on why it is doing so.
> >
> > I'll put it somewhere higher on the todo list.
> >
>
> One cause of them in the past was the ondemand cpufreq module. It got
> fixed up for my laptop workload at least starting w/2.6.29, but it might
> make sense to try without ondemand if you're running it.
>

Output of
# grep . /sys/devices/system/cpu/cpu0/cpufreq/*
can tell us whether P-state software coordination is the reason behind
excessive resched IPIs. Look for ondemand being the current_governor and
affected_cpus containing more than one CPU in it.

Thanks,
Venki

2009-04-29 18:25:57

by Styner, Douglas W

[permalink] [raw]
Subject: RE: Mainline kernel OLTP performance update

Pallipadi, Venkatesh writes:
>Output of
># grep . /sys/devices/system/cpu/cpu0/cpufreq/*
>can tell us whether P-state software coordination is the reason behind
>excessive resched IPIs. Look for ondemand being the current_governor and
>affected_cpus containing more than one CPU in it.

On these setups, we are disabling frequency scaling at the bios.