Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752827AbbLELtl (ORCPT ); Sat, 5 Dec 2015 06:49:41 -0500 Received: from mail2.elkosia.lv ([85.15.200.133]:33885 "EHLO prod.silodev.eu" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751768AbbLELtj (ORCPT ); Sat, 5 Dec 2015 06:49:39 -0500 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Date: Sat, 05 Dec 2015 13:47:10 +0200 From: Madars Vitolins To: Jason Baron Cc: Eric Wong , linux-kernel@vger.kernel.org Subject: Re: epoll and multiple processes - eliminate unneeded process wake-ups In-Reply-To: <565DFF0A.4060901@akamai.com> References: <565CA764.1090805@akamai.com> <565DFF0A.4060901@akamai.com> Message-ID: <9664870cea1bbe5938ac40ff2c161be6@silodev.com> User-Agent: Roundcube Webmail/1.1.3 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5821 Lines: 192 Hi Jason, I did the testing and wrote for it a blog article for this: https://mvitolin.wordpress.com/2015/12/05/endurox-testing-epollexclusive-flag/ But in summary is following: Test case: - One multi-threaded binary with 10 threads are doing total of 1'000'000 calls to 250 single threaded processes doing epoll() on the Posix queue - The 'call' are basically sending a message to shared queue (to those 250 load balanced processed) and they send reply back to client thread's private queue Tests done on following system: - Host system: Linux Mint Mate 17.2 64bit, kernel: 3.13.0-24-generic - CPU: Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz (two cores) - RAM: 16 GB - Visualization platform: Oracle Virtual Box 4.3.28 - Guest OS: Gentoo Linux 2015.03, kernel 4.3.0-gentoo, 64 bit. - CPU for guest: Two cores - RAM for guest: 5GB (no swap usage, free about 4GB) - Enduro/X version: 2.3.2 Results with original kernel (no EPOLLEXCLUSIVE): Gives: $ time ./bankcl ... real 14m20.561s user 0m21.823s sys 10m49.821s Patched kernel version with EPOLLEXCLUSIVE flag in use: $ time ./bankcl ... real 0m24.953s user 0m17.497s sys 0m4.445s Thus 14 minutes vs 24 seconds! So EPOLLEXCLUSIVE flag makes application to run *35 times faster*! Guys this is MUST HAVE patch! Thanks, Madars Jason Baron @ 2015-12-01 22:11 rakstīja: > Hi Madars, > > On 11/30/2015 04:28 PM, Madars Vitolins wrote: >> Hi Jason, >> >> I today did search the mail archive and checked your offered patch did >> on February, it basically does the some (flag for >> add_wait_queue_exclusive() + balance). >> >> So I plan to run off some tests with your patch, flag on/off and will >> provide results. I guess if I pull up 250 or 500 processes (which >> could real for production environment) waiting on one Q, then there >> could be a notable difference in performance with EPOLLEXCLUSIVE set >> or not. >> > > Sounds good. Below is an updated patch if you want to try it - it only > adds the 'EPOLLEXCLUSIVE' flag. > > > diff --git a/fs/eventpoll.c b/fs/eventpoll.c > index 1e009ca..265fa7b 100644 > --- a/fs/eventpoll.c > +++ b/fs/eventpoll.c > @@ -92,7 +92,7 @@ > */ > > /* Epoll private bits inside the event mask */ > -#define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET) > +#define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET | > EPOLLEXCLUSIVE) > > /* Maximum number of nesting allowed inside epoll sets */ > #define EP_MAX_NESTS 4 > @@ -1002,6 +1002,7 @@ static int ep_poll_callback(wait_queue_t *wait, > unsigned mode, int sync, void *k > unsigned long flags; > struct epitem *epi = ep_item_from_wait(wait); > struct eventpoll *ep = epi->ep; > + int ewake = 0; > > if ((unsigned long)key & POLLFREE) { > ep_pwq_from_wait(wait)->whead = NULL; > @@ -1066,8 +1067,10 @@ static int ep_poll_callback(wait_queue_t *wait, > unsigned mode, int sync, void *k > * Wake up ( if active ) both the eventpoll wait list and the > ->poll() > * wait list. > */ > - if (waitqueue_active(&ep->wq)) > + if (waitqueue_active(&ep->wq)) { > + ewake = 1; > wake_up_locked(&ep->wq); > + } > if (waitqueue_active(&ep->poll_wait)) > pwake++; > > @@ -1078,6 +1081,9 @@ out_unlock: > if (pwake) > ep_poll_safewake(&ep->poll_wait); > > + if (epi->event.events & EPOLLEXCLUSIVE) > + return ewake; > + > return 1; > } > > @@ -1095,7 +1101,10 @@ static void ep_ptable_queue_proc(struct file > *file, wait_queue_head_t *whead, > init_waitqueue_func_entry(&pwq->wait, ep_poll_callback); > pwq->whead = whead; > pwq->base = epi; > - add_wait_queue(whead, &pwq->wait); > + if (epi->event.events & EPOLLEXCLUSIVE) > + add_wait_queue_exclusive(whead, &pwq->wait); > + else > + add_wait_queue(whead, &pwq->wait); > list_add_tail(&pwq->llink, &epi->pwqlist); > epi->nwait++; > } else { > @@ -1861,6 +1870,10 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, > int, fd, > if (f.file == tf.file || !is_file_epoll(f.file)) > goto error_tgt_fput; > > + if ((epds.events & EPOLLEXCLUSIVE) && (op == EPOLL_CTL_MOD || > + (op == EPOLL_CTL_ADD && is_file_epoll(tf.file)))) > + goto error_tgt_fput; > + > /* > * At this point it is safe to assume that the "private_data" > contains > * our own data structure. > diff --git a/include/uapi/linux/eventpoll.h > b/include/uapi/linux/eventpoll.h > index bc81fb2..925bbfb 100644 > --- a/include/uapi/linux/eventpoll.h > +++ b/include/uapi/linux/eventpoll.h > @@ -26,6 +26,9 @@ > #define EPOLL_CTL_DEL 2 > #define EPOLL_CTL_MOD 3 > > +/* Add exclusively */ > +#define EPOLLEXCLUSIVE (1 << 28) > + > /* > * Request the handling of system wakeup events so as to prevent > system suspends > * from happening while those events are being processed. > > >> During kernel hacking with debug print, with 10 processes waiting on >> one event source, with original kernel I did see lot un-needed >> processing inside of eventpoll.c, it got 10x calls to >> ep_poll_callback() and other stuff for single event, which results >> with few processes waken up in user space (count probably gets >> randomly depending on concurrency). >> >> >> Meanwhile we are not the only ones who talk about this patch, see >> here: >> http://stackoverflow.com/questions/33226842/epollexclusive-and-epollroundrobin-flags-in-mainstream-kernel >> others are asking too. >> >> So what is the current situation with your patch, what is the blocking >> for getting it into mainline? >> > > If we can show some good test results here I will re-submit it. > > Thanks, > > -Jason -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/