Received: by 10.223.176.46 with SMTP id f43csp24670wra; Thu, 18 Jan 2018 13:20:44 -0800 (PST) X-Google-Smtp-Source: ACJfBovYP1r9pWycmsZ2k8JivONdGRObUZGcUJqKY2tzXeZbfk5JdsehPReK3kRRrdK1l82rWJ1R X-Received: by 10.101.78.207 with SMTP id w15mr15514310pgq.349.1516310444624; Thu, 18 Jan 2018 13:20:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516310444; cv=none; d=google.com; s=arc-20160816; b=LgGyWEsKZC42+0AykY9LTq7FMojk6NqrUt+1OkQgBurpnS4rHX9qEhWthpeLt7TwB1 4eGLTidkH/IX72zR5ehTsPKvMJnul99C2Fymmih4+ZnCjXGASt9ACsRVkB/EUAEPYG8t LLXFmqeSrdRExINkFLRr/nQggeTJCRWthiOwbXpZE4a/mdvLw26KAmwpAZ3NrIROxn2M B2mRJEBoLnnEbaH00PaCFkN/awgdQkhLMWwl3FZCPhj2ZgV42el172CVK81wOeO6/9YU RHWM2TRRexTcXQYOcbQh1u+aoGzGurOgsCP3Dl7FAf4Vs/swz6AWbvxhCSSHr4R2n49d ZCYQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=+v4xlGUV4Mpc+r1aFssrquNuO+y03tyxKjNss8sZ6bU=; b=S7rIVeFD8I15TMl0AuhetJIT5vIiyaQ0oy4ePEqLffkQZkC7i4H5MGcjvBul1eyGVw NC56ZeclPYTdL/aqzWIVvicnca8ABNO9OwllOC6A12n+VqqOU1VRnqdXy8m0ggCvIun6 cBRSLJX1FfPnicgLIM6GPhnpDa34+2gZUXhRrL+V8smhXFBxrQow6bbwszDFni63jLqE TeJ/q/LHh+RQdrGREGXFR4aZUaApPVJOj/o0xyB2sh6r+T4F1NEFvZoWAcfUTNhNyhgZ rjoRaCdREuIwy4jjTvjBqTDnophL2GZyiWdhkmVId2/ezYySKuWUasG9jaWj/bmRruEz 605Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@akamai.com header.s=jan2016.eng header.b=glCA0wYA; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=akamai.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b14si4761800pgu.764.2018.01.18.13.20.30; Thu, 18 Jan 2018 13:20:44 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@akamai.com header.s=jan2016.eng header.b=glCA0wYA; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=akamai.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754597AbeARVSz (ORCPT + 99 others); Thu, 18 Jan 2018 16:18:55 -0500 Received: from mx0a-00190b01.pphosted.com ([67.231.149.131]:37048 "EHLO mx0a-00190b01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752840AbeARVSs (ORCPT ); Thu, 18 Jan 2018 16:18:48 -0500 Received: from pps.filterd (m0050093.ppops.net [127.0.0.1]) by mx0a-00190b01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w0ILI2KQ006880; Thu, 18 Jan 2018 21:18:42 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=akamai.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=jan2016.eng; bh=+v4xlGUV4Mpc+r1aFssrquNuO+y03tyxKjNss8sZ6bU=; b=glCA0wYA9B46eKfabSwYLXJTeB82PtdgGv8K4bVkD+FuU8sNfWFyZwhFjJrjavkiP4uj 7ICmBzLjUTYaGz6jkfVNjcfk3cvHr+KLsSBzFnpjaweeofwnbI/SWVA7V5R+Peb4dGEu w68zSikz+qhcPw8TG4Zecul0F38eNsvBnw+vORqfcHyHVHI0h1JKed7KZLMoCd4mXLdz krJWiVaRKtkrwEqmngJ5Fz5K0xHmHP1RIaEPlcXbyU6C1y5lwmeo+0NmQr7agxRmQoxu C4YBEEG5ONmK/tPsBqqTG0mkQoZFWqhQbAE8hGaQrfGIqvv7aKbvvfN41Uw/ik0r+5X4 pA== Received: from prod-mail-ppoint2 (prod-mail-ppoint2.akamai.com [184.51.33.19]) by m0050093.ppops.net-00190b01. with ESMTP id 2fhpteneyp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 18 Jan 2018 21:18:41 +0000 Received: from pps.filterd (prod-mail-ppoint2.akamai.com [127.0.0.1]) by prod-mail-ppoint2.akamai.com (8.16.0.21/8.16.0.21) with SMTP id w0ILHQcJ005215; Thu, 18 Jan 2018 16:18:40 -0500 Received: from prod-mail-relay10.akamai.com ([172.27.118.251]) by prod-mail-ppoint2.akamai.com with ESMTP id 2ffeaytaud-1; Thu, 18 Jan 2018 16:18:40 -0500 Received: from [172.28.12.191] (bos-lpjec.kendall.corp.akamai.com [172.28.12.191]) by prod-mail-relay10.akamai.com (Postfix) with ESMTP id 7D9D82807F; Thu, 18 Jan 2018 21:18:40 +0000 (GMT) Subject: Re: [PATCH] epoll: avoid calling ep_call_nested() from ep_poll_safewake() To: Hou Tao , Davidlohr Bueso Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Alexander Viro , Salman Qazi References: <1507920533-8812-1-git-send-email-jbaron@akamai.com> <20171017153737.kvrzzhvy55ocgb7u@linux-n805> <05416148-bf19-7718-b6b4-7658b5aca2ee@akamai.com> From: Jason Baron Message-ID: <4f13de48-1fef-ca42-06f7-76995729e490@akamai.com> Date: Thu, 18 Jan 2018 16:18:40 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2018-01-18_09:,, signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1801180277 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2018-01-18_09:,, signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1801180278 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/18/2018 06:00 AM, Hou Tao wrote: > Hi Jason, > > On 2017/10/18 22:03, Jason Baron wrote: >> >> >> On 10/17/2017 11:37 AM, Davidlohr Bueso wrote: >>> On Fri, 13 Oct 2017, Jason Baron wrote: >>> >>>> The ep_poll_safewake() function is used to wakeup potentially nested >>>> epoll >>>> file descriptors. The function uses ep_call_nested() to prevent entering >>>> the same wake up queue more than once, and to prevent excessively deep >>>> wakeup paths (deeper than EP_MAX_NESTS). However, this is not necessary >>>> since we are already preventing these conditions during EPOLL_CTL_ADD. >>>> This >>>> saves extra function calls, and avoids taking a global lock during the >>>> ep_call_nested() calls. >>> >>> This makes sense. >>> >>>> >>>> I have, however, left ep_call_nested() for the CONFIG_DEBUG_LOCK_ALLOC >>>> case, since ep_call_nested() keeps track of the nesting level, and >>>> this is >>>> required by the call to spin_lock_irqsave_nested(). It would be nice to >>>> remove the ep_call_nested() calls for the CONFIG_DEBUG_LOCK_ALLOC case as >>>> well, however its not clear how to simply pass the nesting level through >>>> multiple wake_up() levels without more surgery. In any case, I don't >>>> think >>>> CONFIG_DEBUG_LOCK_ALLOC is generally used for production. This patch, >>>> also >>>> apparently fixes a workload at Google that Salman Qazi reported by >>>> completely removing the poll_safewake_ncalls->lock from wakeup paths. >>> >>> I'm a bit curious about the workload (which uses lots of EPOLL_CTL_ADDs) as >>> I was tackling the nested epoll scaling issue with loop and readywalk lists >>> in mind. >>>> >> >> I'm not sure the details of the workload - perhaps Salman can elaborate >> further about it. >> >> It would seem that the safewake would potentially be the most contended >> in general in the nested case, because generally you have a few epoll >> fds attached to lots of sources doing wakeups. So those sources are all >> going to conflict on the safewake lock. The readywalk is used when >> performing a 'nested' poll and in general this is likely going to be >> called on a few epoll fds. That said, we should remove it too. I will >> post a patch to remove it. >> >> The loop lock is used during EPOLL_CTL_ADD to check for loops and deep >> wakeup paths and so I would expect this to be less common, but I >> wouldn't doubt there are workloads impacted by it. We can potentially, I >> think remove this one too - and the global 'epmutex'. I posted some >> ideas a while ago on it: >> >> http://lkml.iu.edu/hypermail//linux/kernel/1501.1/05905.html >> >> We can work through these ideas or others... > > I have tested these patches on a simple non-nested epoll workload and found that > there are performance regression (-0.5% which is better than -1.0% from my patch set) > under normal nginx conf and performance improvement (+3%, and the number of my > patch set is +4%) under the one-fd-per-request conf. The reason for performance > improvement is clear, however, I haven't figured out the reason for the performance > regression (maybe cache related ?) > > It seems that your patch set will work fine both under normal and nested epoll workload, > so I don't plan to continue my patch set which only works for normal epoll workload. > I have one question about your patch set. It is about the newly added fields in > struct file. I had tried to move these fields into struct eventpoll, so only epoll fds > will be added into a disjoint set and the target fd will be linked to a disjoint set > dynamically, but that method doesn't work out because there are multiple target fds and > the order in which these target fds are added to a disjoint set will lead to deadlock. > So do you have other ideas to reduce the size of these fields ? > > Thanks, > Tao Hi, I agree that increasing the size of struct file is no go. One idea is to simply have a single pointer in 'struct file' for epoll usage, that could be dynamically allocated when a file is added to an epoll set. That would reduce the size of things from where we currently are. I'm also not sure that the target fds (or non-epoll fds) need to be added to the disjoint sets. The sets are to serialize loop checks of which the target fds can not be a part of. If there is interest in resurrecting this patchset, I can pick it back up. I hadn't been pushing it forward because I wasn't convinced it mattered on real workloads... Thanks, -Jason > > The following lines are the result of performance result: > > baseline: result for v4.14 > ep_cc: result for v4.14 + ep_cc patch set > my: result for v4.14 + my patch set > > The first 15 columns are the results from wrk and their unit is Requests/sec, > the lats two columns are their average value and standard derivation. > > * normal scenario: nginx + wrk > > baseline > 376704.74 377223.88 378096.67 379484.45 379199.04 379526.21 378008.82 \ > 379959.98 377634.47 379127.27 377622.92 379442.12 378994.44 376000.08 377046.58 \ > 378271.4447 1210.673592 > > ep_cc > 373376.08 376897.65 373493.65 376154.44 377374.36 379080.88 375124.44 \ > 376404.16 376539.28 375141.84 379075.32 377214.39 374524.52 375605.87 372167.8 \ > 375878.312 1983.190539 > > my > 373067.96 372369 375317.61 375564.66 373841.88 373699.88 376802.91 369988.97 \ > 376080.45 371580.64 374265.34 376370.34 377033.75 372786.91 375521.61 \ > 374286.1273 2059.92093 > > > * one-fd-per-request scenario: nginx + wrk > baseline > 124509.77 106542.46 118074.46 111434.02 117628.9 108260.56 119037.91 \ > 114413.75 108706.82 116277.99 111588.4 118216.56 121267.63 110950.4 116455.43 \ > 114891.004 5168.30288 > > ep_cc > 124209.45 118297.8 120258.1 122785.95 118744.64 118471.76 117621.07 \ > 117525.14 114797.14 115348.47 117154.02 117208.37 118618.92 118578.8 118945.52 \ > 118571.01 2437.050182 > > my > 125090.99 121408.92 116444.09 120055.4 119399.24 114938.04 122889.56 \ > 114363.49 120934.15 119052.03 118060.47 129976.45 121108.08 114849.93 114810.64 \ > 119558.7653 4338.18401 > >> Thanks, >> >> -Jason >> >> >>>> Signed-off-by: Jason Baron >>>> Cc: Alexander Viro >>>> Cc: Andrew Morton >>>> Cc: Salman Qazi >>> >>> Acked-by: Davidlohr Bueso >>> >>>> --- >>>> fs/eventpoll.c | 47 ++++++++++++++++++----------------------------- >>>> 1 file changed, 18 insertions(+), 29 deletions(-) >>> >>> Yay for getting rid of some of the callback hell. >>> >>> Thanks, >>> Davidlohr >> >> . >> >