Received: by 10.192.165.148 with SMTP id m20csp1563387imm; Thu, 3 May 2018 01:19:47 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqml4mY+aSOEMd6akhpCbmJnrbI8Si4ecjnpEA6Eb9QNV6p+ZPL02tid85RUKBO1sSDyRTI X-Received: by 2002:a17:902:1e2:: with SMTP id b89-v6mr23469323plb.389.1525335587715; Thu, 03 May 2018 01:19:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525335587; cv=none; d=google.com; s=arc-20160816; b=cECppJO37Ae+0UdUmf5A8R2rWtnF5FZBSRG/milmkd0yl/kI4NY5FH/X2ws8v0CrkF VAst0ItBM2VQW3dhtOAa6c57crPmYvJxKGmyrdNrpTVvqk2UwusdP/HhW1JhPyTYK4NA Xj9qimtV86qZbP6Jdjd2gIpLoZSbwFIsSwBNTYDHuK146Bho5zyx/JdQSSPa0Nw5rcef BteCAdj7pyjl03oyAm4RGUb00kNdl4cpDtSo9ZyrvJkR7T2b7TfRuHjBT47LmNNL+2lJ JAaiZQOg0ARR99IgMzxYkjOrgnpZkDOba1SEAXCAmSaLy+UmmOo9LsEAXwcu/PHCDyPk E4lw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=NpTBiURJB5YnXtxVhAf2j+437XMVhrU5I0AHJj0H+XI=; b=z7Elkd1jf02fcXhpaLC1y79VSMmcR1JzSseoKAI3LnGcjLIdXgAXtAH+FJCIORbBVX fPCJNNz/EaMg3Urz4h2Mr0rGCdy1gnsSIVrWC0ou/4V5aHZwhJiGqgaI5qVEcvEToGQK r797Z2wlfpVCyOgpp6keXCiwdoHqrJteFmqQPxiDFhFkS53NLGks5bFHAdI42UwqES8K exdzjzekBJg52tSnpr8SJbD4wnOPw1hSjOJHV2zeMEUSR8Z3twarCx4GQzG4bivvT0xE NmKI4c14KgoJVO4IGiQZo78XBu4HY2dCJ+fn9EmsM960hR9R4JiKSLoFuvLQcIKOfgeC mG5g== ARC-Authentication-Results: i=1; mx.google.com; dkim=temperror (no key for signature) header.i=@szeredi.hu header.s=google header.b=iydrv7/y; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o10-v6si10717271pgs.410.2018.05.03.01.19.32; Thu, 03 May 2018 01:19:47 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=temperror (no key for signature) header.i=@szeredi.hu header.s=google header.b=iydrv7/y; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751141AbeECISG (ORCPT + 99 others); Thu, 3 May 2018 04:18:06 -0400 Received: from mail-ot0-f170.google.com ([74.125.82.170]:42351 "EHLO mail-ot0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750945AbeECISC (ORCPT ); Thu, 3 May 2018 04:18:02 -0400 Received: by mail-ot0-f170.google.com with SMTP id l13-v6so19668875otk.9 for ; Thu, 03 May 2018 01:18:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=szeredi.hu; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=NpTBiURJB5YnXtxVhAf2j+437XMVhrU5I0AHJj0H+XI=; b=iydrv7/y4Oq/NwnLggoFA6d3OjXdBY6qLS8TsxcdFuyxANxMsYVPS3MXmhcgi9BlUT BiH2jCEDQd1hgP0soMH75USRxJngxQ09sOM6xRGjEKGa0nkBmxxHYl9/HZ4tMekJ+gmI MEu/gKLLqsDi8W8+YKxIUsSxDUAPGtHBfvPmU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=NpTBiURJB5YnXtxVhAf2j+437XMVhrU5I0AHJj0H+XI=; b=dwvTkU8wX/26wGQU2e1Alwim5i7tAc2/dKbDFSdf1de/RVWS3RmxcnvusEXPc6puN0 9CDnAcHk7KJEOfmfaTDtopKQ/FfAXCo0Q8zE+thn6V8qxtjAYowxxJ+gHkTI6WcG3VQ/ E4zzfbUKdZwb0Jcxwt3d8M/otCn/bDxB3wWUyHSUB6wrjwqYWawKKKGGL3RrR90rXxTF tge4yQPQIiLngWJdoatmd8u5MmRxeaN+8tWu59hV9zYPy6dXdiBP1Ggk6cf7Fj8WZWPX HaABBdJNxsxpgyeUqPRilaLAEyJXUHQeDKYMyiET6I+IbGYIw8zJgcKICxQFlk72DSFo S56g== X-Gm-Message-State: ALQs6tB8BqzGxEdRytxfXl6zWrjL5juUf0HqJcNDPpx5OLbvgYPR3m9n lqtVW/O0ZMHfJ4dSkIRLTEwHw2pUtc74tP2bWoUeTw== X-Received: by 2002:a9d:3534:: with SMTP id o49-v6mr15046680otc.368.1525335481950; Thu, 03 May 2018 01:18:01 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a9d:5303:0:0:0:0:0 with HTTP; Thu, 3 May 2018 01:18:01 -0700 (PDT) X-Originating-IP: [194.176.227.33] In-Reply-To: References: <20180502222635.1862-1-mszeredi@redhat.com> <20180502224533.GW30522@ZenIV.linux.org.uk> From: Miklos Szeredi Date: Thu, 3 May 2018 10:18:01 +0200 Message-ID: Subject: Re: [PATCH] dcache: fix quadratic behavior with parallel shrinkers To: Al Viro Cc: Miklos Szeredi , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 3, 2018 at 9:44 AM, Miklos Szeredi wrote: > On Thu, May 3, 2018 at 12:45 AM, Al Viro wrote: >> On Thu, May 03, 2018 at 12:26:35AM +0200, Miklos Szeredi wrote: >>> When multiple shrinkers are operating on a directory containing many >>> dentries, it takes much longer than if only one shrinker is operating on >>> the directory. >>> >>> Call the shrinker instances A and B, which shrink DIR containing NUM >>> dentries. >>> >>> Assume A wins the race for locking DIR's d_lock, then it goes onto moving >>> all unlinked dentries to its dispose list. When it's done, then B will >>> scan the directory once again, but will find that all dentries are already >>> being shrunk, so it will have an empty dispose list. Both A and B will >>> have found NUM dentries (data.found == NUM). >>> >>> Now comes the interesting part: A will proceed to shrink the dispose list >>> by killing individual dentries and decrementing the refcount of the parent >>> (which is DIR). NB: decrementing DIR's refcount will block if DIR's d_lock >>> is held. B will shrink a zero size list and then immediately restart >>> scanning the directory, where it will lock DIR's d_lock, scan the remaining >>> dentries and find no dentry to dispose. >>> >>> So that results in B doing the directory scan over and over again, holding >>> d_lock of DIR, while A is waiting for a chance to decrement refcount of DIR >>> and making very slow progress because of this. B is wasting time and >>> holding up progress of A at the same time. >>> >>> Proposed fix is to check this situation in B (found some dentries, but >>> all are being shrunk already) and just sleep for some time, before retrying >>> the scan. The sleep is proportional to the number of found dentries. >> >> The thing is, the majority of massive shrink_dcache_parent() can be killed. >> Let's do that first and see if anything else is really needed. >> >> As it is, rmdir() and rename() are ridiculously bad - they should only call >> shrink_dcache_parent() after successful ->rmdir() or ->rename(). Sure, >> there are other places where we do large shrink_dcache_parent() runs, >> but those won't trigger in parallel on the same tree. > > I think we are cat hit this also with lru pruner (prune_dcache_sb(), > shrink_dcache_sb()) running in parallel with shrink_dcache_parent(). > Although shrink_dcache_sb() looks better in this regard, since it will > only hold up to 1024 dentries in the dispose list. Looking more, prune_dcache_sb() will also batch with a max of 1024 objects. Which mitigates the problem, but doesn't make it go away. Killing 1024 dentries still takes on the order of 100us without contention on d_lock. If shrink_dcache_parent() is busy looping on those dentries, then contention will make this much worse. Thanks, Miklos