2006-10-12 23:25:56

by Chen, Kenneth W

[permalink] [raw]
Subject: [patch] rearrange kioctx and aio_ring_info

The following patch is result of our kernel data latency analysis on a
large database setup (128 logical CPU, 1 TB memory, a couple thousand
disks). The analysis is done at an individual cache line level through
hardware performance monitor unit. What we did essentially is to collect
data access latency on every kernel memory address and look for top 50
longest average access latency. The hardware performance unit also keeps
track of instruction address for each collected data access. This allows
us to further associate each of the top 50 cache lines to instruction
sequences that caused long latency.

One structure stood out in the woods is the AIO kioctx structure. In
particular, kioctx->users and kioctx->ctx_lock are the two hottest variables
within one cache line (there are multiple of these cache lines since db
app uses multiple kioctx). What also appears to be a more serious cacheline
conflict between io_submit_one and aio_complete path: aio_complete()
finishes process an iocb, unlocks ctx->ctx_lock (which implies it just lost
ownership of that cache line), turns around and access wait->task_list and
also does an atomic update to ctx->users. I think that is causing cache
lines to be bounced around when another process took the ctx_lock in between.

So here we go, I'd like to take this opportunity to rearrange kioctx, put
variables that used together in close proximity. The rearrangement doesn't
change number of cache line touched in io_submit_one or io_getevents, however,
it reduces one line in aio_complete path. Also I think we ought to watch
out on updating variable that is in the same cache line of ctx_lock.

Thoughts? Comments? Flames?


Signed-off-by: Ken Chen <[email protected]>


diff -Nurp linux-2.6.18/include/linux/aio.h linux-2.6.18.ken/include/linux/aio.h
--- linux-2.6.18/include/linux/aio.h 2006-09-19 20:42:06.000000000 -0700
+++ linux-2.6.18.ken/include/linux/aio.h 2006-10-12 13:28:17.000000000 -0700
@@ -156,41 +156,38 @@ struct aio_ring {

#define AIO_RING_PAGES 8
struct aio_ring_info {
- unsigned long mmap_base;
- unsigned long mmap_size;
-
- struct page **ring_pages;
spinlock_t ring_lock;
- long nr_pages;
-
+ int nr_pages;
unsigned nr, tail;

+ struct page **ring_pages;
struct page *internal_pages[AIO_RING_PAGES];
+
+ unsigned long mmap_base;
+ unsigned long mmap_size;
};

struct kioctx {
atomic_t users;
int dead;
- struct mm_struct *mm;
-
- /* This needs improving */
- unsigned long user_id;
- struct kioctx *next;
-
wait_queue_head_t wait;

spinlock_t ctx_lock;
-
int reqs_active;
struct list_head active_reqs; /* used for cancellation */
struct list_head run_list; /* used for kicked reqs */

- /* sys_io_setup currently limits this to an unsigned int */
- unsigned max_reqs;
-
struct aio_ring_info ring_info;

+ /* This needs improving */
+ unsigned long user_id;
+ struct kioctx *next;
+ struct mm_struct *mm;
+
struct work_struct wq;
+
+ /* sys_io_setup currently limits this to an unsigned int */
+ unsigned max_reqs;
};

/* prototypes */


2006-10-12 23:38:21

by Zach Brown

[permalink] [raw]
Subject: Re: [patch] rearrange kioctx and aio_ring_info


> Thoughts? Comments? Flames?

Well, to start there should be a comment in the code warning future
readers about the specific rational behind the resulting ordering.

I'll try to think hardware about it next week.

- z

2006-10-12 23:41:43

by Zach Brown

[permalink] [raw]
Subject: Re: [patch] rearrange kioctx and aio_ring_info


> I'll try to think hardware about it next week.

("harder", "hardware", who's to say?!)

2006-10-13 00:04:24

by Dave Jones

[permalink] [raw]
Subject: Re: [patch] rearrange kioctx and aio_ring_info

On Thu, Oct 12, 2006 at 04:25:54PM -0700, Chen, Kenneth W wrote:
> The following patch is result of our kernel data latency analysis on a
> large database setup (128 logical CPU, 1 TB memory, a couple thousand
> disks). The analysis is done at an individual cache line level through
> hardware performance monitor unit. What we did essentially is to collect
> data access latency on every kernel memory address and look for top 50
> longest average access latency. The hardware performance unit also keeps
> track of instruction address for each collected data access. This allows
> us to further associate each of the top 50 cache lines to instruction
> sequences that caused long latency.

Do you have performance numbers of before/after this change ?
How much is it worth?

Dave

--
http://www.codemonkey.org.uk

2006-10-13 00:20:56

by Chen, Kenneth W

[permalink] [raw]
Subject: RE: [patch] rearrange kioctx and aio_ring_info

Dave Jones wrote on Thursday, October 12, 2006 5:04 PM
> On Thu, Oct 12, 2006 at 04:25:54PM -0700, Chen, Kenneth W wrote:
> > The following patch is result of our kernel data latency analysis on a
> > large database setup (128 logical CPU, 1 TB memory, a couple thousand
> > disks). The analysis is done at an individual cache line level through
> > hardware performance monitor unit. What we did essentially is to collect
> > data access latency on every kernel memory address and look for top 50
> > longest average access latency. The hardware performance unit also keeps
> > track of instruction address for each collected data access. This allows
> > us to further associate each of the top 50 cache lines to instruction
> > sequences that caused long latency.
>
> Do you have performance numbers of before/after this change ?


Not yet, I just queued up experiment on our internal benchmark setup. I posted
the patch here to get some early discussion on how to attack the problem.