Received: by 2002:a25:23cc:0:0:0:0:0 with SMTP id j195csp458399ybj; Tue, 5 May 2020 01:44:01 -0700 (PDT) X-Google-Smtp-Source: APiQypIdF6o4IlfG4ieJQBHA+S2IB5kXk2cucwSXyZYOnXU3chZDsjIWlfRVwGnO2w1IrXIhQFHE X-Received: by 2002:a05:6402:558:: with SMTP id i24mr1482878edx.347.1588668241042; Tue, 05 May 2020 01:44:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588668241; cv=none; d=google.com; s=arc-20160816; b=g6SewoRlrCp+rnY8+DbXUEoaaC+PO1hc9ZVBQ2jIHNi6/VlCzlgETWe//nRZmwdhY5 giuJDHS2vmehhtw3ap67EuXUFa+GHnVA6ZmqwSRj+jXqjMMY3UEQbgQk0xz0miE5hMiS z87CIY9pgzPPO4uSbN7Fg6M80DbG/uhWguUM5t7mZb3QiNid4e338q96wefnRBMtau6y x37q+43oumqn6mPQGf322nEyF0R7CC17ETmAuv8lXKS7eeXoYn3bQJ2OK8jq4FRv+3ZD wKX6IE7bn6ANKVCwsdRStUZT5g7HO0I0uKqLvXwN1JYKYTMZ2l/IDSSnPicp5gWXF+RK keEg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:message-id:references :in-reply-to:subject:cc:to:from:date:content-transfer-encoding :mime-version; bh=6S+NvNxb8jBCBRqITyoWuicMu8qNAEUT0RbMWIBSDXE=; b=dx7PVCri93DjrUo3WwJLzp674813oUlgnByTlwQTOnJS8OHTWb7nq3y/gX4JVAFh2r E83sLt+x68KZ6E5UdUhqu+byUP4/aqWJiDuXF7O7c5OamNGd4fhnjMStjV6cH7CMIQuQ 37ilKA31f2zfFvEWc0djyXB8WUsg+jdeZyvpOvtBaxmfqFjN0/npsDnQF92JU0xX+TJe Hds3LoKN8f3qkqfFjVcpTUmI9crnWsmuM+jpDE+okyDg1PKIVW6UcZqgt9zXxF96qjVO WGqk3QU1HFvuDLqbRH9TNGE34QR+N4DLoXCSnkKOpLc8W19OEU4OId1VIMOuXng+DtrU ubCQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p3si863243edr.391.2020.05.05.01.43.38; Tue, 05 May 2020 01:44:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728642AbgEEImI (ORCPT + 99 others); Tue, 5 May 2020 04:42:08 -0400 Received: from mx2.suse.de ([195.135.220.15]:39482 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728627AbgEEImH (ORCPT ); Tue, 5 May 2020 04:42:07 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 808AFAFF2; Tue, 5 May 2020 08:42:08 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Tue, 05 May 2020 10:42:05 +0200 From: Roman Penyaev To: Andrew Morton Cc: Jason Baron , Khazhismel Kumykov , Alexander Viro , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: Re: [PATCH 1/1] epoll: call final ep_events_available() check under the lock In-Reply-To: <20200505084049.1779243-1-rpenyaev@suse.de> References: <20200505084049.1779243-1-rpenyaev@suse.de> Message-ID: X-Sender: rpenyaev@suse.de User-Agent: Roundcube Webmail Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Andrew, May I ask you to remove "epoll: ensure ep_poll() doesn't miss wakeup events" from your -mm queue? Jason lately found out that the patch does not fully solve the problem and this one patch is a second attempt to do things correctly in a different way (namely to do the final check under the lock). Previous changes are not needed. Thanks. -- Roman On 2020-05-05 10:40, Roman Penyaev wrote: > The original problem was described here: > https://lkml.org/lkml/2020/4/27/1121 > > There is a possible race when ep_scan_ready_list() leaves ->rdllist > and ->obflist empty for a short period of time although some events > are pending. It is quite likely that ep_events_available() observes > empty lists and goes to sleep. Since 339ddb53d373 ("fs/epoll: remove > unnecessary wakeups of nested epoll") we are conservative in wakeups > (there is only one place for wakeup and this is ep_poll_callback()), > thus ep_events_available() must always observe correct state of > two lists. The easiest and correct way is to do the final check > under the lock. This does not impact the performance, since lock > is taken anyway for adding a wait entry to the wait queue. > > In this patch barrierless __set_current_state() is used. This is > safe since waitqueue_active() is called under the same lock on wakeup > side. > > Short-circuit for fatal signals (i.e. fatal_signal_pending() check) > is moved to the line just before actual events harvesting routine. > This is fully compliant to what is said in the comment of the patch > where the actual fatal_signal_pending() check was added: > c257a340ede0 ("fs, epoll: short circuit fetching events if thread > has been killed"). > > Signed-off-by: Roman Penyaev > Reported-by: Jason Baron > Cc: Andrew Morton > Cc: Khazhismel Kumykov > Cc: Alexander Viro > Cc: linux-fsdevel@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Cc: stable@vger.kernel.org > --- > fs/eventpoll.c | 48 ++++++++++++++++++++++++++++-------------------- > 1 file changed, 28 insertions(+), 20 deletions(-) > > diff --git a/fs/eventpoll.c b/fs/eventpoll.c > index aba03ee749f8..8453e5403283 100644 > --- a/fs/eventpoll.c > +++ b/fs/eventpoll.c > @@ -1879,34 +1879,33 @@ static int ep_poll(struct eventpoll *ep, > struct epoll_event __user *events, > * event delivery. > */ > init_wait(&wait); > - write_lock_irq(&ep->lock); > - __add_wait_queue_exclusive(&ep->wq, &wait); > - write_unlock_irq(&ep->lock); > > + write_lock_irq(&ep->lock); > /* > - * We don't want to sleep if the ep_poll_callback() sends us > - * a wakeup in between. That's why we set the task state > - * to TASK_INTERRUPTIBLE before doing the checks. > + * Barrierless variant, waitqueue_active() is called under > + * the same lock on wakeup ep_poll_callback() side, so it > + * is safe to avoid an explicit barrier. > */ > - set_current_state(TASK_INTERRUPTIBLE); > + __set_current_state(TASK_INTERRUPTIBLE); > + > /* > - * Always short-circuit for fatal signals to allow > - * threads to make a timely exit without the chance of > - * finding more events available and fetching > - * repeatedly. > + * Do the final check under the lock. ep_scan_ready_list() > + * plays with two lists (->rdllist and ->ovflist) and there > + * is always a race when both lists are empty for short > + * period of time although events are pending, so lock is > + * important. > */ > - if (fatal_signal_pending(current)) { > - res = -EINTR; > - break; > + eavail = ep_events_available(ep); > + if (!eavail) { > + if (signal_pending(current)) > + res = -EINTR; > + else > + __add_wait_queue_exclusive(&ep->wq, &wait); > } > + write_unlock_irq(&ep->lock); > > - eavail = ep_events_available(ep); > - if (eavail) > - break; > - if (signal_pending(current)) { > - res = -EINTR; > + if (eavail || res) > break; > - } > > if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS)) { > timed_out = 1; > @@ -1927,6 +1926,15 @@ static int ep_poll(struct eventpoll *ep, struct > epoll_event __user *events, > } > > send_events: > + if (fatal_signal_pending(current)) > + /* > + * Always short-circuit for fatal signals to allow > + * threads to make a timely exit without the chance of > + * finding more events available and fetching > + * repeatedly. > + */ > + res = -EINTR; > + > /* > * Try to transfer events to user space. In case we get 0 events and > * there's still timeout left over, we go trying again in search of