Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1131581imu; Wed, 9 Jan 2019 12:12:56 -0800 (PST) X-Google-Smtp-Source: ALg8bN54Mg+Gn+l7BJgvInxQQTwb1HryUTFhP5GGU3hitbI3QQHvAeTBbUrCJLeUpCtVdcAUDxJg X-Received: by 2002:a63:42c1:: with SMTP id p184mr6673105pga.202.1547064776720; Wed, 09 Jan 2019 12:12:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547064776; cv=none; d=google.com; s=arc-20160816; b=YuubaYaHtCL96vrKJdLFQmg6WwVOQkvejjbRkPl1EpBUc0YH9kpQ3uynvVfSPWayvF 1VNgon3a1dQTzZpJssHq29fcUmbHyjYlpn8bQDQmOWUiKDcE+QXLlfh2wrT143+m5yGw 9sP/C+juaCfAdsxM4rC6YTy/RRXIIoquchQ3xXGWQBb39+dqlDVboSVa9WZ0HRL4FCVR rt+wCS8g+IbogizhHeizaGWBh87hLK4N48pe/um0jnYHtU1L6K0Ay+PMQm7v1l29NzrP DJxNGQY4ugnLPJEcTMD0JUFZHe2vqdzvf89IJ5L6uKILMDhFWIO7Ude6wH6UWgT8whrJ Pdug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:from; bh=xnuphNQ4PeYee1aIePMsEN1tlJDwwyBkG2u5j8sAoxI=; b=SyyqCElUI/8bbijVS1/cm7ZnVhv8nAlJZM+UqqbwblX0Ay0RTmLfbq+gQjQPu++Ap4 M/Sg4/oMDf1/BaFa34zm/pyne63TlXuqMIsd8fps79rk377sYh/+KaqPO8IB8MFJG8bK FqHhHPO/E/TFcKM7YBi+cCWBTm+X+J4sNYCPs/oc0hleHgFLv5daArDEo1a3rCI4m7y9 9YRRerDRCCb9oiSxw7FdpCCvtKkB0iGRxtW4e3++0JalWF1pIJl5chelDlVHclUcparQ ep7nvv2xX/UBDYN+faaKgaNdS6tg1k6xgZhwvxv5OeKLh2DuIix5FjtBK92QDbvf0z1k R+Vg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y6si767771plr.186.2019.01.09.12.12.40; Wed, 09 Jan 2019 12:12:56 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726786AbfAIQlR (ORCPT + 99 others); Wed, 9 Jan 2019 11:41:17 -0500 Received: from mx2.suse.de ([195.135.220.15]:57600 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726613AbfAIQkm (ORCPT ); Wed, 9 Jan 2019 11:40:42 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id ADA53AF74; Wed, 9 Jan 2019 16:40:40 +0000 (UTC) From: Roman Penyaev Cc: Roman Penyaev , Andrew Morton , Davidlohr Bueso , Jason Baron , Al Viro , "Paul E. McKenney" , Linus Torvalds , Andrea Parri , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 10/15] epoll: support polling from userspace for ep_insert() Date: Wed, 9 Jan 2019 17:40:20 +0100 Message-Id: <20190109164025.24554-11-rpenyaev@suse.de> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20190109164025.24554-1-rpenyaev@suse.de> References: <20190109164025.24554-1-rpenyaev@suse.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit To: unlisted-recipients:; (no To-header on input) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When epfd is polled by userspace and new item is inserted: 1. Get free bit for a new item. 2. If expand for user items or user index is required - route all events to kernel lists and do expand. 3. If events are ready for newly inserted item - add event to uring, if events have been just routed to klists - add item to rdllist. 4. On error path mark user item as freed and route events to klist if ready event has not yet been observed by userspace. That is needed to postpone bit put, otherwise newly allocated bit will corrupt user item. Signed-off-by: Roman Penyaev Cc: Andrew Morton Cc: Davidlohr Bueso Cc: Jason Baron Cc: Al Viro Cc: "Paul E. McKenney" Cc: Linus Torvalds Cc: Andrea Parri Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- fs/eventpoll.c | 74 ++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 65 insertions(+), 9 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 36c451c26681..4618db9c077c 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1977,6 +1977,7 @@ static int ep_insert(struct eventpoll *ep, const struct epoll_event *event, struct epitem *epi; struct ep_pqueue epq; + lockdep_assert_held(&ep->mtx); lockdep_assert_irqs_enabled(); user_watches = atomic_long_read(&ep->user->epoll_watches); @@ -2002,6 +2003,43 @@ static int ep_insert(struct eventpoll *ep, const struct epoll_event *event, RCU_INIT_POINTER(epi->ws, NULL); } + if (ep_polled_by_user(ep)) { + struct user_epitem *uitem; + int bit; + + bit = ep_get_bit(ep); + if (unlikely(bit < 0)) { + error = bit; + goto error_get_bit; + } + epi->bit = bit; + ep->items_nr++; + + if (ep_expand_user_is_required(ep)) { + /* + * Expand of user header or user index is required, + * thus reroute all events to klists and then safely + * vrealloc() the memory. + */ + write_lock_irq(&ep->lock); + ep_route_events_to_klists(ep); + write_unlock_irq(&ep->lock); + + error = ep_expand_user_items(ep); + if (unlikely(error)) + goto error_expand; + + error = ep_expand_user_index(ep); + if (unlikely(error)) + goto error_expand; + } + + /* Now fill-in user item */ + uitem = &ep->user_header->items[epi->bit]; + uitem->ready_events = 0; + uitem->event = *event; + } + /* Initialize the poll table using the queue callback */ epq.epi = epi; init_poll_funcptr(&epq.pt, ep_ptable_queue_proc); @@ -2046,16 +2084,23 @@ static int ep_insert(struct eventpoll *ep, const struct epoll_event *event, /* record NAPI ID of new item if present */ ep_set_busy_poll_napi_id(epi); - /* If the file is already "ready" we drop it inside the ready list */ - if (revents && !ep_is_linked(epi)) { - list_add_tail(&epi->rdllink, &ep->rdllist); - ep_pm_stay_awake(epi); + if (revents) { + bool added = false; - /* Notify waiting tasks that events are available */ - if (waitqueue_active(&ep->wq)) - wake_up(&ep->wq); - if (waitqueue_active(&ep->poll_wait)) - pwake++; + if (ep_events_routed_to_uring(ep)) + added = ep_add_event_to_uring(epi, revents); + else if (!ep_is_linked(epi)) { + list_add_tail(&epi->rdllink, &ep->rdllist); + ep_pm_stay_awake(epi); + added = true; + } + if (added) { + /* Notify waiting tasks that events are available */ + if (waitqueue_active(&ep->wq)) + wake_up(&ep->wq); + if (waitqueue_active(&ep->poll_wait)) + pwake++; + } } write_unlock_irq(&ep->lock); @@ -2089,6 +2134,17 @@ static int ep_insert(struct eventpoll *ep, const struct epoll_event *event, list_del_init(&epi->rdllink); write_unlock_irq(&ep->lock); + if (ep_polled_by_user(ep)) { +error_expand: + /* + * No need to check return value: if events are routed to + * klists, that is done by code above, where we've expanded + * memory, but here, on rollback, we do not care. + */ + (void)ep_free_user_item(epi); + } + +error_get_bit: wakeup_source_unregister(ep_wakeup_source(epi)); error_create_wakeup_source: -- 2.19.1