by Dan Carpenter

[permalink] [raw]

Subject: Re: [Patch 1/4] ipc/mqueue: improve performance of send/recv

On Tue, May 01, 2012 at 01:50:52PM -0400, Doug Ledford wrote:
> @@ -150,16 +241,25 @@ static struct inode *mqueue_get_inode(struct super_block *sb,
> info->attr.mq_maxmsg = attr->mq_maxmsg;
> info->attr.mq_msgsize = attr->mq_msgsize;
> }
> - mq_msg_tblsz = info->attr.mq_maxmsg * sizeof(struct msg_msg *);
> - if (mq_msg_tblsz > PAGE_SIZE)
> - info->messages = vmalloc(mq_msg_tblsz);
> - else
> - info->messages = kmalloc(mq_msg_tblsz, GFP_KERNEL);
> - if (!info->messages)
> - goto out_inode;
> + /*
> + * We used to allocate a static array of pointers and account
> + * the size of that array as well as one msg_msg struct per
> + * possible message into the queue size. That's no longer
> + * accurate as the queue is now an rbtree and will grow and
> + * shrink depending on usage patterns. We can, however, still
> + * account one msg_msg struct per message, but the nodes are
> + * allocated depending on priority usage, and most programs
> + * only use one, or a handful, of priorities. However, since
> + * this is pinned memory, we need to assume worst case, so
> + * that means the min(mq_maxmsg, max_priorities) * struct
> + * posix_msg_tree_node.
> + */
> + mq_treesize = info->attr.mq_maxmsg * sizeof(struct msg_msg) +
> + min_t(unsigned int, info->attr.mq_maxmsg, MQ_PRIO_MAX) *
> + sizeof(struct posix_msg_tree_node);

"info->attr.mq_maxmsg" is a long, but the min_t() truncates it to an
unsigned int. I'm not familiar with this code so I don't know if
that's a problem...

We do the same thing in mqueue_evict_inode() and mq_attr_ok().

regards,
dan carpenter

2012-05-03 10:05:18

by Nicholas Piggin

[permalink] [raw]

Subject: Re: [Patch 1/4] ipc/mqueue: improve performance of send/recv

On 2 May 2012 03:50, Doug Ledford <[email protected]> wrote:

> Avg time to send/recv (in nanoseconds per message)
> when queue empty 305/288 349/318
> when queue full (65528 messages)
> constant priority 526589/823 362/314
> increasing priority 403105/916 495/445
> decreasing priority 73420/594 482/409
> random priority 280147/920 546/436
>
> Time to fill/drain queue (65528 messages, in seconds)
> constant priority 17.37/.12 .13/.12
> increasing priority 4.14/.14 .21/.18
> decreasing priority 12.93/.13 .21/.18
> random priority 8.88/.16 .22/.17
>
> So, I think the results speak for themselves. It's possible this
> implementation could be improved by cacheing at least one priority
> level in the node tree (that would bring the queue empty performance
> more in line with the old implementation), but this works and is *so*
> much better than what we had, especially for the common case of a
> single priority in use, that further refinements can be in follow on
> patches.

Nice work! Yeah I think if you cache a last unused entry, that
should mostly solve the empty queue regression.

I would imagine most users won't have huge queues, so the
empty case should be important too.

2012-05-03 13:03:59

by Doug Ledford

[permalink] [raw]

Subject: Re: [Patch 1/4] ipc/mqueue: improve performance of send/recv

On 5/3/2012 5:21 AM, Dan Carpenter wrote:
> On Tue, May 01, 2012 at 01:50:52PM -0400, Doug Ledford wrote:
>> @@ -150,16 +241,25 @@ static struct inode *mqueue_get_inode(struct super_block *sb,
>> info->attr.mq_maxmsg = attr->mq_maxmsg;
>> info->attr.mq_msgsize = attr->mq_msgsize;
>> }
>> - mq_msg_tblsz = info->attr.mq_maxmsg * sizeof(struct msg_msg *);
>> - if (mq_msg_tblsz > PAGE_SIZE)
>> - info->messages = vmalloc(mq_msg_tblsz);
>> - else
>> - info->messages = kmalloc(mq_msg_tblsz, GFP_KERNEL);
>> - if (!info->messages)
>> - goto out_inode;
>> + /*
>> + * We used to allocate a static array of pointers and account
>> + * the size of that array as well as one msg_msg struct per
>> + * possible message into the queue size. That's no longer
>> + * accurate as the queue is now an rbtree and will grow and
>> + * shrink depending on usage patterns. We can, however, still
>> + * account one msg_msg struct per message, but the nodes are
>> + * allocated depending on priority usage, and most programs
>> + * only use one, or a handful, of priorities. However, since
>> + * this is pinned memory, we need to assume worst case, so
>> + * that means the min(mq_maxmsg, max_priorities) * struct
>> + * posix_msg_tree_node.
>> + */
>> + mq_treesize = info->attr.mq_maxmsg * sizeof(struct msg_msg) +
>> + min_t(unsigned int, info->attr.mq_maxmsg, MQ_PRIO_MAX) *
>> + sizeof(struct posix_msg_tree_node);
>
> "info->attr.mq_maxmsg" is a long, but the min_t() truncates it to an
> unsigned int. I'm not familiar with this code so I don't know if
> that's a problem...

It's fine. We currently cap mq_maxmsg at a hard limit of 65536, and
MQ_PRIO_MAX is 32768, so both well within the limits of truncating a
long to unsigned int. In order for this to ever be a problem, we would
first have to change the accounting of mq bytes in the user struct from
a 32bit type to a 64bit type. As long as it's still 32 bits, and as
long as mq_maxmsg * (sizeof(struct msg_msg) + mq_msgsize) must fit
within that 32bit struct, we will never have an mq_maxmsg large enough
to truncate in this situation.

> We do the same thing in mqueue_evict_inode() and mq_attr_ok().

All of the math in here would need an audit if we increased the maximum
mq bytes from 32bit to 64bit.

--
Doug Ledford <[email protected]>
GPG KeyID: 0E572FDD
http://people.redhat.com/dledford

Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband

Attachments:

signature.asc (898.00 B)
OpenPGP digital signature