Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp935634imu; Wed, 9 Jan 2019 08:46:46 -0800 (PST) X-Google-Smtp-Source: ALg8bN4xn6N/2gkGsApRW3ESbHUPSGSATHybNWsF+Zf3MoBkRC2A0w4/1dYWO/XqNkzNCM4wizzR X-Received: by 2002:a63:dd15:: with SMTP id t21mr3715908pgg.347.1547052406277; Wed, 09 Jan 2019 08:46:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547052406; cv=none; d=google.com; s=arc-20160816; b=ogSEh7VKLZA6FfwvsqIrhr0Dp+lR9BL7U9q7Zug3Bl9fcbPmYDT0XYkoHxXNuJTIhz zt1KVNy1WoRNXjU7O3OPD2YlqkAbkGIecqipYzRZRUbG1yKQHWWlRyOegMhO7HvFNgjS bxCidQ4l2w5ogOwpSRUtow1qTqFgALXn/7FWrSdUzt5OQ3a1vR8lSmikVgCTfdxuXHuV IRlmLoUlhhiTvXLQj5WGgqGFpW2ZVRvBVf/rvMF9Csizq4bmOX/LgigLKeVygq3X0R9p whkm2Mk/dU9lUWvz4fWqIew+1YnBIjsrPhplA7wSRABn6W21A3kHTIxe49c3vSYcn2RS aRJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:from; bh=gtOStnww8qw8psr1ctx/5nsSHccdqegEAuRmb2C7Jxk=; b=PxIIwmC28XfxgRQCRfQoxaBlZjdtn2fbQAQuY14MtaHKrq6s3eVKmqoZ7VgC+E0d6T CQkybyfvMXmn8+bnfLezF9XHe5WggofLN2SIOQuaKN6kWZh5O+bcaSOzNyVg7uR7+mgL Q5QXu3WsctCTGtnoVyM2f0pHYmaJo8B0ArRNtE3xnxNf7NaUceK8OZv8HzwJuqlv2jp2 G17X+oqL0QyQT/jKl5lEJJWLzurJAio7VqP6g1EvL5TYXe2TttbcH46fmfGCvi3e4nwX HyO8fsRq2jnZ9EZ5Op+k0ncgyiLD117Tt2AIHd59I/1PPwFP7AZmJDjb+mT8cvX19DNH wP0A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y7si64409663plk.275.2019.01.09.08.46.31; Wed, 09 Jan 2019 08:46:46 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726708AbfAIQk7 (ORCPT + 99 others); Wed, 9 Jan 2019 11:40:59 -0500 Received: from mx2.suse.de ([195.135.220.15]:57620 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726629AbfAIQko (ORCPT ); Wed, 9 Jan 2019 11:40:44 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 53F07B015; Wed, 9 Jan 2019 16:40:42 +0000 (UTC) From: Roman Penyaev Cc: Roman Penyaev , Andrew Morton , Davidlohr Bueso , Jason Baron , Al Viro , "Paul E. McKenney" , Linus Torvalds , Andrea Parri , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 14/15] epoll: support polling from userspace for ep_poll() Date: Wed, 9 Jan 2019 17:40:24 +0100 Message-Id: <20190109164025.24554-15-rpenyaev@suse.de> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20190109164025.24554-1-rpenyaev@suse.de> References: <20190109164025.24554-1-rpenyaev@suse.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit To: unlisted-recipients:; (no To-header on input) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When epfd is polled from userspace and user calls epoll_wait(): 1. If user ring is not fully consumed (i.e. head != tail) returns -ESTALE, indicating that some actions on userside is required. 2. If events were routed to klists probably memory was expanded or shrink is still required. Do shrink if needed and transfer all collected events from kernel lists to uring. 3. Ensure with WARN that ep_poll_send_events() can't be called from ep_poll() when epfd is pollable from userspace. 4. Wait for events on wait queue, always return -ESTALE if were awekened indicating that events have to be consumed from user ring. Signed-off-by: Roman Penyaev Cc: Andrew Morton Cc: Davidlohr Bueso Cc: Jason Baron Cc: Al Viro Cc: "Paul E. McKenney" Cc: Linus Torvalds Cc: Andrea Parri Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- fs/eventpoll.c | 46 +++++++++++++++++++++++++++++++++++++--------- 1 file changed, 37 insertions(+), 9 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 2b38a3d884e8..5de640fcf28b 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -523,7 +523,8 @@ static inline bool ep_user_ring_events_available(struct eventpoll *ep) static inline int ep_events_available(struct eventpoll *ep) { return !list_empty_careful(&ep->rdllist) || - READ_ONCE(ep->ovflist) != EP_UNACTIVE_PTR; + READ_ONCE(ep->ovflist) != EP_UNACTIVE_PTR || + ep_user_ring_events_available(ep); } #ifdef CONFIG_NET_RX_BUSY_POLL @@ -2411,6 +2412,8 @@ static int ep_send_events(struct eventpoll *ep, { struct ep_send_events_data esed; + WARN_ON(ep_polled_by_user(ep)); + esed.maxevents = maxevents; esed.events = events; @@ -2607,6 +2610,24 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, lockdep_assert_irqs_enabled(); + if (ep_polled_by_user(ep)) { + if (ep_user_ring_events_available(ep)) + /* Firstly all events from ring have to be consumed */ + return -ESTALE; + + if (ep_events_routed_to_klists(ep)) { + res = ep_transfer_events_and_shrink_uring(ep); + if (unlikely(res < 0)) + return res; + if (res) + /* + * Events were transferred from klists to + * user ring + */ + return -ESTALE; + } + } + if (timeout > 0) { struct timespec64 end_time = ep_set_mstimeout(timeout); @@ -2695,14 +2716,21 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, __set_current_state(TASK_RUNNING); send_events: - /* - * Try to transfer events to user space. In case we get 0 events and - * there's still timeout left over, we go trying again in search of - * more luck. - */ - if (!res && eavail && - !(res = ep_send_events(ep, events, maxevents)) && !timed_out) - goto fetch_events; + if (!res && eavail) { + if (!ep_polled_by_user(ep)) { + /* + * Try to transfer events to user space. In case we get + * 0 events and there's still timeout left over, we go + * trying again in search of more luck. + */ + res = ep_send_events(ep, events, maxevents); + if (!res && !timed_out) + goto fetch_events; + } else { + /* User has to deal with the ring himself */ + res = -ESTALE; + } + } if (waiter) { spin_lock_irq(&ep->wq.lock); -- 2.19.1