Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1761021Ab2BNUsQ (ORCPT ); Tue, 14 Feb 2012 15:48:16 -0500 Received: from alternativer.internetendpunkt.de ([88.198.24.89]:35556 "EHLO geheimer.internetendpunkt.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1757250Ab2BNUsP (ORCPT ); Tue, 14 Feb 2012 15:48:15 -0500 From: Hagen Paul Pfeifer To: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org, Hagen Paul Pfeifer , Davide Libenzi , Eric Dumazet Subject: [PATCH net-next] epoll: add EPOLLEXCLUSIVE support Date: Tue, 14 Feb 2012 21:48:04 +0100 Message-Id: <1329252484-6309-1-git-send-email-hagen@jauu.net> X-Mailer: git-send-email 1.7.9 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2282 Lines: 64 High performance server sometimes create one listening socket (e.g. port 80), create a epoll file descriptor and add the socket. Afterwards create SC_NPROCESSORS_ONLN threads and wait for events. This often result in a thundering herd problem because all CPUs are scheduled. This patch add an additional flag to epoll_ctl(2) called EPOLLEXCLUSIVE. If a descriptor is added with this flag only one CPU is scheduled in. Signed-off-by: Hagen Paul Pfeifer Reported-by: Li Yu Cc: Davide Libenzi Cc: Eric Dumazet --- fs/eventpoll.c | 7 +++++-- include/linux/eventpoll.h | 3 +++ 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index aabdfc3..bb442b1 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -88,7 +88,7 @@ */ /* Epoll private bits inside the event mask */ -#define EP_PRIVATE_BITS (EPOLLONESHOT | EPOLLET) +#define EP_PRIVATE_BITS (EPOLLONESHOT | EPOLLET | EPOLLEXCLUSIVE) /* Maximum number of nesting allowed inside epoll sets */ #define EP_MAX_NESTS 4 @@ -913,7 +913,10 @@ static void ep_ptable_queue_proc(struct file *file, wait_queue_head_t *whead, init_waitqueue_func_entry(&pwq->wait, ep_poll_callback); pwq->whead = whead; pwq->base = epi; - add_wait_queue(whead, &pwq->wait); + if (unlikely(epi->event.events & EPOLLEXCLUSIVE)) + add_wait_queue_exclusive(whead, &pwq->wait); + else + add_wait_queue(whead, &pwq->wait); list_add_tail(&pwq->llink, &epi->pwqlist); epi->nwait++; } else { diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h index 657ab55..d334389 100644 --- a/include/linux/eventpoll.h +++ b/include/linux/eventpoll.h @@ -26,6 +26,9 @@ #define EPOLL_CTL_DEL 2 #define EPOLL_CTL_MOD 3 +/* Set Exclusive wake up behaviour for the target file descriptor */ +#define EPOLLEXCLUSIVE (1 << 29) + /* Set the One Shot behaviour for the target file descriptor */ #define EPOLLONESHOT (1 << 30) -- 1.7.9 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/