Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755891AbbHCX4Z (ORCPT ); Mon, 3 Aug 2015 19:56:25 -0400 Received: from dcvr.yhbt.net ([64.71.152.64]:55369 "EHLO dcvr.yhbt.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754435AbbHCX4Y (ORCPT ); Mon, 3 Aug 2015 19:56:24 -0400 X-Greylist: delayed 461 seconds by postgrey-1.27 at vger.kernel.org; Mon, 03 Aug 2015 19:56:24 EDT Date: Mon, 3 Aug 2015 23:48:42 +0000 From: Eric Wong To: Madars Vitolins Cc: linux-kernel@vger.kernel.org, Jason Baron Subject: Re: epoll and multiple processes - eliminate unneeded process wake-ups Message-ID: <20150803234842.GA21995@dcvr.yhbt.net> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2701 Lines: 63 Madars Vitolins wrote: > Hi Folks, > > I am developing kind of open systems application, which uses > multiple processes/executables where each of them monitors some set > of resources (in this case POSIX Queues) via epoll interface. For > example when 10 processes on same queue are in state of epoll_wait() > and one message arrives, all 10 processes gets woken up and all of > them tries to read the message from Q. One succeeds, the others gets > EAGAIN error. The problem is with those others, which generates > extra context switches - useless CPU usage. With more processes > inefficiency gets higher. > > I tried to use EPOLLONESHOT, but no help. Seems this is suitable for > multi-threaded application and not for multi-process application. Correct. Most FDs are not shared across processes. > Ideal mechanism for this would be: > 1. If multiple epoll sets in kernel matches same event and one or > more processes are in state of epoll_wait() - then send event only > to one waiter. > 2. If none of processes are in wait state, then send the event to > all epoll sets (as it is currently). Then the first free process > will grab the event. Jason Baron was working on this (search LKML archives for EPOLLEXCLUSIVE, EPOLLROUNDROBIN, EPOLL_ROTATE) However, I was unconvinced about modifying epoll. Perhaps I may be more easily convinced about your mqueue case than his case for listen sockets, though[*] Typical applications have few (probably only one) listen sockets or POSIX mqueues; so I would rather use dedicated threads to issue blocking syscalls (accept4 or mq_timedreceive). Making blocking syscalls allows exclusive wakeups to avoid thundering herds. > How do you think, would it be real to implement this? How about > concurrency? > Can you please give me some hints from which points in code to start > to implement these changes? For now, I suggest dedicating a thread in each process to do mq_timedreceive/mq_receive, assuming you only have a small amount of queues in your system. [*] mq_timedreceive may copy a largish buffer which benefits from staying on the same CPU as much as possible. Contrary, accept4 only creates a client socket. With a C10K+ socket server (e.g. http/memcached/DB), a typical new client socket spends a fair amount of time idle. Thus I don't believe memory locality inside the kernel is much concern when there's thousands of accepted client sockets. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/