Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp4564556ybi; Tue, 11 Jun 2019 08:40:33 -0700 (PDT) X-Google-Smtp-Source: APXvYqwT3wDAURoiLjILNZTC4gbQdChlXqE8YbA1YStNzfj0gbm5NiXxU32xnVwIS4dUTuiO4U/4 X-Received: by 2002:a17:90a:a10f:: with SMTP id s15mr28022763pjp.30.1560267633336; Tue, 11 Jun 2019 08:40:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560267633; cv=none; d=google.com; s=arc-20160816; b=JeCL3N9+j+/IdlosopoT/U7wO6vQAwN1GXnVI8+Kr6mC0vSQzQDoFlrGnPm+V6qObc zGlwHDA7fUM4wUQ4ALsIIoTxoeXk8cmUIEKPPBnaZ2Eg0ANjj0XHxwN1/UFv1+Q9Blo9 esjKemVei8vVh7xa6/Z+Tx0aaC6vePgTppYI+RYi8S1T+049usjKn40H7LMhCc/Yu1f2 fRSjK69TJWldzh80mMW4nd/s9U/boIqeMcvdC5AR9zR97i8tHb1TJCm/Ei6JvIOk1X95 tlPBtk4aTBzFffSkbYm6zxEUeCDQrCfDwE972V7/EJ+qSi8T5jkGp25dWBbmt304YEwS Sxxg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:from; bh=A0raneZn/mmV53bO8HkPnxKgvo0kACNvZzQCyClHBhI=; b=b+Gu/o7vNYUugNP5q3sDnqJoRXbSxd5FKRYeoIKiImC5Rniih5Vv1N6j9efwfZTfLT IbX6nr8MPdT8qVN1QWagClBC2tRWlRt1wHE+SLM3Y3BBme8ZZazzihKDr8yzvRHdgeoB g8Hvumi4ZGoGaDWaFA0HsoBc+EjdWI95LWCQHRN51q7DRmQHjD/+O7FHhZebfbtyiDaO 9gd5G4uoXnb+BgPSEMZ/yhMlnmfJ3lwhNvkZVMCTmw0FBG/Tp0h2zGItlvopfWc64pYC 3/v1w0NAlW0AxN/+Nup1Z3dAjI9RCK8TOvB6HxgB5W52JwY3ylPQKURMc30F7rMKsjqB +QLA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a17si4682702pgi.228.2019.06.11.08.40.18; Tue, 11 Jun 2019 08:40:33 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391633AbfFKOzM (ORCPT + 99 others); Tue, 11 Jun 2019 10:55:12 -0400 Received: from mx2.suse.de ([195.135.220.15]:52436 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2391599AbfFKOzL (ORCPT ); Tue, 11 Jun 2019 10:55:11 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 4BBFBAF1F; Tue, 11 Jun 2019 14:55:09 +0000 (UTC) From: Roman Penyaev Cc: Roman Penyaev , Andrew Morton , Al Viro , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 04/14] epoll: some sanity flags checks for epoll syscalls for polling from userspace Date: Tue, 11 Jun 2019 16:54:48 +0200 Message-Id: <20190611145458.9540-5-rpenyaev@suse.de> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190611145458.9540-1-rpenyaev@suse.de> References: <20190611145458.9540-1-rpenyaev@suse.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit To: unlisted-recipients:; (no To-header on input) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org There are various of limitations if epfd is polled by user: 1. Expect always EPOLLET flag (Edge Triggered behavior) 2. No support for EPOLLWAKEUP events are consumed from userspace, thus no way to call __pm_relax() 3. No support for EPOLLEXCLUSIVE If device does not pass pollflags to wake_up() there is no way to call poll() from the context under spinlock, thus special work is scheduled to offload polling. In this specific case we can't support exclusive wakeups, because we do not know actual result of scheduled work. 4. epoll_wait() for epfd, created with EPOLL_USERPOLL flag, accepts events as NULL and maxevents as 0. No other values are accepted. Signed-off-by: Roman Penyaev Cc: Andrew Morton Cc: Al Viro Cc: Linus Torvalds Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- fs/eventpoll.c | 68 ++++++++++++++++++++++++++++++++++---------------- 1 file changed, 46 insertions(+), 22 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 3a5c4d641ff0..529573266ff5 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -425,6 +425,11 @@ static inline unsigned int ep_to_items_bm_length(unsigned int nr) return PAGE_ALIGN(ALIGN(nr, 8) >> 3); } +static inline bool ep_polled_by_user(struct eventpoll *ep) +{ + return !!ep->user_header; +} + /** * ep_events_available - Checks if ready events might be available. * @@ -520,13 +525,17 @@ static inline void ep_set_busy_poll_napi_id(struct epitem *epi) #endif /* CONFIG_NET_RX_BUSY_POLL */ #ifdef CONFIG_PM_SLEEP -static inline void ep_take_care_of_epollwakeup(struct epoll_event *epev) +static inline void ep_take_care_of_epollwakeup(struct eventpoll *ep, + struct epoll_event *epev) { - if ((epev->events & EPOLLWAKEUP) && !capable(CAP_BLOCK_SUSPEND)) - epev->events &= ~EPOLLWAKEUP; + if (epev->events & EPOLLWAKEUP) { + if (!capable(CAP_BLOCK_SUSPEND) || ep_polled_by_user(ep)) + epev->events &= ~EPOLLWAKEUP; + } } #else -static inline void ep_take_care_of_epollwakeup(struct epoll_event *epev) +static inline void ep_take_care_of_epollwakeup(struct eventpoll *ep, + struct epoll_event *epev) { epev->events &= ~EPOLLWAKEUP; } @@ -2278,10 +2287,6 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd, if (!file_can_poll(tf.file)) goto error_tgt_fput; - /* Check if EPOLLWAKEUP is allowed */ - if (ep_op_has_event(op)) - ep_take_care_of_epollwakeup(&epds); - /* * We have to check that the file structure underneath the file descriptor * the user passed to us _is_ an eventpoll file. And also we do not permit @@ -2291,10 +2296,18 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd, if (f.file == tf.file || !is_file_epoll(f.file)) goto error_tgt_fput; + /* + * At this point it is safe to assume that the "private_data" contains + * our own data structure. + */ + ep = f.file->private_data; + /* * epoll adds to the wakeup queue at EPOLL_CTL_ADD time only, * so EPOLLEXCLUSIVE is not allowed for a EPOLL_CTL_MOD operation. - * Also, we do not currently supported nested exclusive wakeups. + * Also, we do not currently supported nested exclusive wakeups + * and EPOLLEXCLUSIVE is not supported for epoll which is polled + * from userspace. */ if (ep_op_has_event(op) && (epds.events & EPOLLEXCLUSIVE)) { if (op == EPOLL_CTL_MOD) @@ -2302,13 +2315,18 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd, if (op == EPOLL_CTL_ADD && (is_file_epoll(tf.file) || (epds.events & ~EPOLLEXCLUSIVE_OK_BITS))) goto error_tgt_fput; + if (ep_polled_by_user(ep)) + goto error_tgt_fput; } - /* - * At this point it is safe to assume that the "private_data" contains - * our own data structure. - */ - ep = f.file->private_data; + if (ep_op_has_event(op)) { + if (ep_polled_by_user(ep) && !(epds.events & EPOLLET)) + /* Polled by user has only edge triggered behaviour */ + goto error_tgt_fput; + + /* Check if EPOLLWAKEUP is allowed */ + ep_take_care_of_epollwakeup(ep, &epds); + } /* * When we insert an epoll file descriptor, inside another epoll file @@ -2410,14 +2428,6 @@ static int do_epoll_wait(int epfd, struct epoll_event __user *events, struct fd f; struct eventpoll *ep; - /* The maximum number of event must be greater than zero */ - if (maxevents <= 0 || maxevents > EP_MAX_EVENTS) - return -EINVAL; - - /* Verify that the area passed by the user is writeable */ - if (!access_ok(events, maxevents * sizeof(struct epoll_event))) - return -EFAULT; - /* Get the "struct file *" for the eventpoll file */ f = fdget(epfd); if (!f.file) @@ -2436,6 +2446,20 @@ static int do_epoll_wait(int epfd, struct epoll_event __user *events, * our own data structure. */ ep = f.file->private_data; + if (!ep_polled_by_user(ep)) { + /* The maximum number of event must be greater than zero */ + if (maxevents <= 0 || maxevents > EP_MAX_EVENTS) + goto error_fput; + + /* Verify that the area passed by the user is writeable */ + error = -EFAULT; + if (!access_ok(events, maxevents * sizeof(struct epoll_event))) + goto error_fput; + } else { + /* Use ring instead */ + if (maxevents != 0 || events != NULL) + goto error_fput; + } /* Time to fish for events ... */ error = ep_poll(ep, events, maxevents, timeout); -- 2.21.0