Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp1035366ybz; Fri, 1 May 2020 13:08:18 -0700 (PDT) X-Google-Smtp-Source: APiQypIi0vqiwY4AQTqbqwoWLA13vGu1SmrM4xCSRGJtr3anvHF1yrademl3jSE/W5GUWRBXjbnD X-Received: by 2002:aa7:dcd7:: with SMTP id w23mr5107132edu.300.1588363698600; Fri, 01 May 2020 13:08:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588363698; cv=none; d=google.com; s=arc-20160816; b=zKT0QDioNYF2t7Wuch2SuNDo+69Qfi48bawXpsTvgDLGf2yIDfwf4ozWsDOf0Xz6Qh 4eBAn7U1HuKhhhR5iQSxU5Qbct87KRc39xQQPVSLu8SYhngwFu0Tdr0M85uAy+KfH8wh xalYCZm1u4+G315bw56k42boiBRXHtXghq5WftiNREWzUBF2TpusO8eTIl2p3ck2cOXq euQFGyfDs1GIQY3qE2UNurhH80mJZBtJTbt86ujFQazaB7sN7304CMRj/BsPutcvzn/I FGrcYFOeYlw6Zxkp95Re+LZvQxOWP10VIBWDqIgFYRCujnAnz9mNlIXP3O32NMk5s46P omtA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=FIb1tRSakOixa+KWRoQwQxkx2NZ7hP7elrzO5eAo2xQ=; b=rbudMeauZEfhJLfk5KLmTLoZ2bJ5k6H7posvK/h7UR/HjXxnAhx4sraRV96CtBhKf0 FmTXgNnVoOYNGIhq8xAla2x3AyjndAp65XuJcjxASNh+d0vucJLFpxG2UtiajGGBzpv/ 0zTD5VRF786OASiuJp6gQJ4OnM9AQdOwffTSRCtV5YkPromZjBGZScBqvUn1aUSjzZLo lZx+uIGt1Ku8adH22i5EwEgWcOViftr3r9WxHHyp3CyWEi4XhZ5MNs6Zg8FoQNzOSckd oDLmbVlpCiFdGM8/V7+/PICastRcoELuc6zl4Cmv+0/kYU0SI+8/zn49QunGfT38PRS0 3V6w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@akamai.com header.s=jan2016.eng header.b=HaZQ4QnF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=akamai.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s15si2118777edq.231.2020.05.01.13.07.52; Fri, 01 May 2020 13:08:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@akamai.com header.s=jan2016.eng header.b=HaZQ4QnF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=akamai.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726654AbgEAUGR (ORCPT + 99 others); Fri, 1 May 2020 16:06:17 -0400 Received: from mx0a-00190b01.pphosted.com ([67.231.149.131]:16390 "EHLO mx0a-00190b01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726307AbgEAUGQ (ORCPT ); Fri, 1 May 2020 16:06:16 -0400 X-Greylist: delayed 2664 seconds by postgrey-1.27 at vger.kernel.org; Fri, 01 May 2020 16:06:16 EDT Received: from pps.filterd (m0122332.ppops.net [127.0.0.1]) by mx0a-00190b01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 041JN6vN020977; Fri, 1 May 2020 20:27:20 +0100 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=akamai.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=jan2016.eng; bh=FIb1tRSakOixa+KWRoQwQxkx2NZ7hP7elrzO5eAo2xQ=; b=HaZQ4QnFRwOyGdnOW8jnUd0CBlZwxoilEXTBO5jKomHjZLo+9MIRyC7/PjV6BijWAG64 EMQqmBozS+ft/tl+ryi4e8Ez+1RIHkh+J7DcfpuYyuRlgO9XbB3DkrytCdSV44s67dj4 JYZmW3W81fxXb0ai/pOcBcvI8JpKfrs4OH2lQJMOImka0Y4W8se3OeN1v9hGN8pjbAIv mzFLn8MVAKaFBHcvoolc6Cy1aa4Id5OD93Lk4nbKfnl38vjVs1Kn2Ds5OFMSAh1n+IxT U8vZUFqQuSMFg5D8nv+0YnAiqH8L/n6Nx00sYfUHsohKDBcEx5B7zIRBcPB9oEoZDrgx 7A== Received: from prod-mail-ppoint5 (prod-mail-ppoint5.akamai.com [184.51.33.60] (may be forged)) by mx0a-00190b01.pphosted.com with ESMTP id 30r7jqbfqe-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 01 May 2020 20:27:19 +0100 Received: from pps.filterd (prod-mail-ppoint5.akamai.com [127.0.0.1]) by prod-mail-ppoint5.akamai.com (8.16.0.27/8.16.0.27) with SMTP id 041JIvlP013453; Fri, 1 May 2020 12:27:18 -0700 Received: from prod-mail-relay10.akamai.com ([172.27.118.251]) by prod-mail-ppoint5.akamai.com with ESMTP id 30mk6909vh-1; Fri, 01 May 2020 12:27:18 -0700 Received: from [0.0.0.0] (prod-ssh-gw01.bos01.corp.akamai.com [172.27.119.138]) by prod-mail-relay10.akamai.com (Postfix) with ESMTP id DAD2034906; Fri, 1 May 2020 19:27:17 +0000 (GMT) Subject: Re: [PATCH 2/2] epoll: atomically remove wait entry on wake up To: Roman Penyaev Cc: Andrew Morton , Khazhismel Kumykov , Alexander Viro , Heiher , stable@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org References: <20200430130326.1368509-1-rpenyaev@suse.de> <20200430130326.1368509-2-rpenyaev@suse.de> From: Jason Baron Message-ID: <6cb1fc30-d4a1-f483-48b7-9fa594d9e46f@akamai.com> Date: Fri, 1 May 2020 15:27:17 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <20200430130326.1368509-2-rpenyaev@suse.de> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138,18.0.676 definitions=2020-05-01_14:2020-05-01,2020-05-01 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-2002250000 definitions=main-2005010143 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138,18.0.676 definitions=2020-05-01_11:2020-05-01,2020-05-01 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 bulkscore=0 mlxlogscore=999 suspectscore=0 clxscore=1015 impostorscore=0 spamscore=0 lowpriorityscore=0 adultscore=0 malwarescore=0 priorityscore=1501 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005010143 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/30/20 9:03 AM, Roman Penyaev wrote: > This patch does two things: > > 1. fixes lost wakeup introduced by: > 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll") > > 2. improves performance for events delivery. > > The description of the problem is the following: if N (>1) threads > are waiting on ep->wq for new events and M (>1) events come, it is > quite likely that >1 wakeups hit the same wait queue entry, because > there is quite a big window between __add_wait_queue_exclusive() and > the following __remove_wait_queue() calls in ep_poll() function. This > can lead to lost wakeups, because thread, which was woken up, can > handle not all the events in ->rdllist. (in better words the problem > is described here: https://lkml.org/lkml/2019/10/7/905) > > The idea of the current patch is to use init_wait() instead of > init_waitqueue_entry(). Internally init_wait() sets > autoremove_wake_function as a callback, which removes the wait entry > atomically (under the wq locks) from the list, thus the next coming > wakeup hits the next wait entry in the wait queue, thus preventing > lost wakeups. > > Problem is very well reproduced by the epoll60 test case [1]. > > Wait entry removal on wakeup has also performance benefits, because > there is no need to take a ep->lock and remove wait entry from the > queue after the successful wakeup. Here is the timing output of > the epoll60 test case: > > With explicit wakeup from ep_scan_ready_list() (the state of the > code prior 339ddb53d373): > > real 0m6.970s > user 0m49.786s > sys 0m0.113s > > After this patch: > > real 0m5.220s > user 0m36.879s > sys 0m0.019s > > The other testcase is the stress-epoll [2], where one thread consumes > all the events and other threads produce many events: > > With explicit wakeup from ep_scan_ready_list() (the state of the > code prior 339ddb53d373): > > threads events/ms run-time ms > 8 5427 1474 > 16 6163 2596 > 32 6824 4689 > 64 7060 9064 > 128 6991 18309 > > After this patch: > > threads events/ms run-time ms > 8 5598 1429 > 16 7073 2262 > 32 7502 4265 > 64 7640 8376 > 128 7634 16767 > > (number of "events/ms" represents event bandwidth, thus higher is > better; number of "run-time ms" represents overall time spent > doing the benchmark, thus lower is better) > > [1] tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c > [2] https://github.com/rouming/test-tools/blob/master/stress-epoll.c > > Signed-off-by: Roman Penyaev > Cc: Andrew Morton > Cc: Khazhismel Kumykov > Cc: Alexander Viro > Cc: Heiher > Cc: Jason Baron > Cc: stable@vger.kernel.org > Cc: linux-fsdevel@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > --- > fs/eventpoll.c | 43 ++++++++++++++++++++++++------------------- > 1 file changed, 24 insertions(+), 19 deletions(-) > Looks good to me and nice speedups. Reviewed-by: Jason Baron Thanks, -Jason