2018-07-27 05:37:44

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH -next 0/2] epoll: loosen irq safety in ep_poll()

Hi,

Along the same lines than the previous work. Details are in patch 1.
Patch 2 is an add on while eyeballing the code. Similar to the previous
patches, this has survived ltp testcases and various workloads.

Thanks,
Davidlohr

Davidlohr Bueso (2):
fs/epoll: loosen irq safety in ep_poll()
fs/eventpoll: simplify ep_is_linked callers

fs/eventpoll.c | 29 +++++++++++++++--------------
1 file changed, 15 insertions(+), 14 deletions(-)

--
2.16.4



2018-07-27 05:36:17

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 1/2] fs/epoll: loosen irq safety in ep_poll()

Similar to other calls, ep_poll() is not called with interrupts
disabled, and we can therefore avoid the irq save/restore dance
and just disable local irqs. In fact, the call should never be
called in irq context at all, considering that the only path is

epoll_wait(2) -> do_epoll_wait() -> ep_poll().

When running on a 2 socket 40-core (ht) IvyBridge a common pipe
based epoll_wait(2) microbenchmark, the following performance
improvements are seen:

# threads vanilla dirty
1 1805587 2106412
2 1854064 2090762
4 1805484 2017436
8 1751222 1974475
16 1725299 1962104
32 1378463 1571233
64 787368 900784

Which is a pretty constantly near 15%.

Also add a lockdep check such that we detect any mischief
before deadlocking.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
fs/eventpoll.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index b5e43e11f1e3..88473e6271ef 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1746,11 +1746,12 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
int maxevents, long timeout)
{
int res = 0, eavail, timed_out = 0;
- unsigned long flags;
u64 slack = 0;
wait_queue_entry_t wait;
ktime_t expires, *to = NULL;

+ lockdep_assert_irqs_enabled();
+
if (timeout > 0) {
struct timespec64 end_time = ep_set_mstimeout(timeout);

@@ -1763,7 +1764,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
* caller specified a non blocking operation.
*/
timed_out = 1;
- spin_lock_irqsave(&ep->wq.lock, flags);
+ spin_lock_irq(&ep->wq.lock);
goto check_events;
}

@@ -1772,7 +1773,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
if (!ep_events_available(ep))
ep_busy_loop(ep, timed_out);

- spin_lock_irqsave(&ep->wq.lock, flags);
+ spin_lock_irq(&ep->wq.lock);

if (!ep_events_available(ep)) {
/*
@@ -1814,11 +1815,11 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
break;
}

- spin_unlock_irqrestore(&ep->wq.lock, flags);
+ spin_unlock_irq(&ep->wq.lock);
if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS))
timed_out = 1;

- spin_lock_irqsave(&ep->wq.lock, flags);
+ spin_lock_irq(&ep->wq.lock);
}

__remove_wait_queue(&ep->wq, &wait);
@@ -1828,7 +1829,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
/* Is it worth to try to dig for events ? */
eavail = ep_events_available(ep);

- spin_unlock_irqrestore(&ep->wq.lock, flags);
+ spin_unlock_irq(&ep->wq.lock);

/*
* Try to transfer events to user space. In case we get 0 events and
--
2.16.4


2018-07-27 05:36:18

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 2/2] fs/eventpoll: simplify ep_is_linked callers

Instead of having each caller pass the rdllink explicitly,
just have ep_is_linked() pass it while the callers just need
the epi pointer. This helper is all about the rdllink, and
this change, furthermore, improves the function's self
documentation.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
fs/eventpoll.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 88473e6271ef..42bbe6824b4b 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -336,9 +336,9 @@ static inline int ep_cmp_ffd(struct epoll_filefd *p1,
}

/* Tells us if the item is currently linked */
-static inline int ep_is_linked(struct list_head *p)
+static inline int ep_is_linked(struct epitem *epi)
{
- return !list_empty(p);
+ return !list_empty(&epi->rdllink);
}

static inline struct eppoll_entry *ep_pwq_from_wait(wait_queue_entry_t *p)
@@ -721,7 +721,7 @@ static __poll_t ep_scan_ready_list(struct eventpoll *ep,
* queued into ->ovflist but the "txlist" might already
* contain them, and the list_splice() below takes care of them.
*/
- if (!ep_is_linked(&epi->rdllink)) {
+ if (!ep_is_linked(epi)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
ep_pm_stay_awake(epi);
}
@@ -790,7 +790,7 @@ static int ep_remove(struct eventpoll *ep, struct epitem *epi)
rb_erase_cached(&epi->rbn, &ep->rbr);

spin_lock_irq(&ep->wq.lock);
- if (ep_is_linked(&epi->rdllink))
+ if (ep_is_linked(epi))
list_del_init(&epi->rdllink);
spin_unlock_irq(&ep->wq.lock);

@@ -1171,7 +1171,7 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
}

/* If this file is already in the ready list we exit soon */
- if (!ep_is_linked(&epi->rdllink)) {
+ if (!ep_is_linked(epi)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
ep_pm_stay_awake_rcu(epi);
}
@@ -1495,7 +1495,7 @@ static int ep_insert(struct eventpoll *ep, const struct epoll_event *event,
ep_set_busy_poll_napi_id(epi);

/* If the file is already "ready" we drop it inside the ready list */
- if (revents && !ep_is_linked(&epi->rdllink)) {
+ if (revents && !ep_is_linked(epi)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
ep_pm_stay_awake(epi);

@@ -1533,7 +1533,7 @@ static int ep_insert(struct eventpoll *ep, const struct epoll_event *event,
* And ep_insert() is called with "mtx" held.
*/
spin_lock_irq(&ep->wq.lock);
- if (ep_is_linked(&epi->rdllink))
+ if (ep_is_linked(epi))
list_del_init(&epi->rdllink);
spin_unlock_irq(&ep->wq.lock);

@@ -1601,7 +1601,7 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi,
*/
if (ep_item_poll(epi, &pt, 1)) {
spin_lock_irq(&ep->wq.lock);
- if (!ep_is_linked(&epi->rdllink)) {
+ if (!ep_is_linked(epi)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
ep_pm_stay_awake(epi);

--
2.16.4