Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp3800768ybz; Mon, 20 Apr 2020 09:40:35 -0700 (PDT) X-Google-Smtp-Source: APiQypLBVkyjtzakD64tUgMkPySr0CYgUfFzshGWsVK07aQH1FZadTBBoYcA6w8X8UPKI8nv/Y24 X-Received: by 2002:a17:906:8257:: with SMTP id f23mr17376676ejx.196.1587400835035; Mon, 20 Apr 2020 09:40:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1587400835; cv=none; d=google.com; s=arc-20160816; b=Oa8URwmWAJB0wtHftPUgrpVZd//cfhrsni+eu5N0/8fHNxdMQfBOyClPPlKmjrzJ9F Imp2xy+6HoQg3bbtougzFXAFglMyPcepHdyLc/XbgN3qIaYX6W/4l43bKxaz5shZpJaN NI/Ia1paVMhBIJfYTUTlzk/ELw3sY3dxxMV7Q13BVgnMytA7/6L7zrN/Z0CsQSIOsd/c Um8qt2g9CkZdXVgoYgs4OtZQcKdcz0GBVvgKwYNYXSVr3LQIPp3l3ldKYbRg+KSEQzxv JbNHl8+RBMuf87VHmNPvFfhnEp7heCKM8aGateipE2M8rSZhO0DEuWY0JFhIfE0NNK24 QC+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=Q/IW0rVYUl8ibIrc/id1YI3H7MyQzSZkTchvxN0ESh4=; b=gJkK7EhStIx1X/5Ose93xixOk27Qgks8UEk76E7a0lTK+DkYgjhEtpRsPTsGq5d/9V xM+2BGx1vKOTtS149skXXQWkJgbR70F0RHUW/XgeJvwnnZ2/nuU7Wc7LIEir4l00MvrG 4/IwfVr9/2vWVkySIvgnsZEZsrvB5gfvYY2nCZZUJNtDIDHNwNAWhaFYhnTpUgHa8JTy iyYULuTfDrckn2lZsuxJcnNXpUXZMnZyDyzUL2fesq/2Mb+9mUXItU2DYPZ3fmLgEafi iuwP+3fEORs4fO/LxFY4T+PEvSG3eeIECyNs5oWTzXCRAkeuIVKw6D7lrKMBWNAFoZY8 xKAw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=F4+FBrar; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id e5si815071edc.160.2020.04.20.09.40.11; Mon, 20 Apr 2020 09:40:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=F4+FBrar; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728281AbgDTOwF (ORCPT + 99 others); Mon, 20 Apr 2020 10:52:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52858 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1726102AbgDTOwE (ORCPT ); Mon, 20 Apr 2020 10:52:04 -0400 Received: from mail-ua1-x943.google.com (mail-ua1-x943.google.com [IPv6:2607:f8b0:4864:20::943]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5F4E8C061A0C for ; Mon, 20 Apr 2020 07:52:03 -0700 (PDT) Received: by mail-ua1-x943.google.com with SMTP id f59so3317138uaf.9 for ; Mon, 20 Apr 2020 07:52:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Q/IW0rVYUl8ibIrc/id1YI3H7MyQzSZkTchvxN0ESh4=; b=F4+FBrarJu0YoXSL1GjRqkxVmKsLO5XfO/Qs41zBmWfrM1YtnNIRDNZddMm3uGJzRR 6IyS7frBmtuqPU2YVFEaunzD89A0XXdrGsiob6IpNx53OqaA+ESaMVwbZTEOOOeLn6CR kAqvT+FIOpGn9LyykzPnBgEZD+qUVJA1LUm0Q= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Q/IW0rVYUl8ibIrc/id1YI3H7MyQzSZkTchvxN0ESh4=; b=H6jj6zhd+E3eaQ3K5NGnUzeedo2xeMozVYcPAq32njBy0lH4doOkyX/8STolHo59XA 9iE7MFIk5mOg3xaUYYAH6TrMWo0crseRC2AzA8GzKgo23nEAvyXeUn+jBwYQLDoc9cZM R54sVPsTvI/39oum/3jsHIziwJPkYpo0Ox082nSLBPpb6T2pVVJsRctv4cytJXh9ufhN G9pbuHo/8D94ZC/NbQ89O5a3xp68nBLmeQt7PMLVia0GZ+r5ggF/JoWsFrKctt6fG7qq 2rc64AqzNEHt/FESc/I2W2vxpqsre6OdTpljr3lHAVYyw5vuzA9ocSrE9XQ6W+f3pjtK pMwQ== X-Gm-Message-State: AGi0PuY+UNFl8rDsd4rnAFov4kPtv5QuMcfZUm9zPiXu/sva5ZcWjfMy qLIf3rE+ElaJJnGNfs4Zs08NBZNSdb8= X-Received: by 2002:a9f:3042:: with SMTP id i2mr6454515uab.138.1587394321860; Mon, 20 Apr 2020 07:52:01 -0700 (PDT) Received: from mail-ua1-f47.google.com (mail-ua1-f47.google.com. [209.85.222.47]) by smtp.gmail.com with ESMTPSA id d83sm261735vka.34.2020.04.20.07.52.00 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 20 Apr 2020 07:52:01 -0700 (PDT) Received: by mail-ua1-f47.google.com with SMTP id t8so3737822uap.3 for ; Mon, 20 Apr 2020 07:52:00 -0700 (PDT) X-Received: by 2002:a9f:27ca:: with SMTP id b68mr8236936uab.8.1587394319880; Mon, 20 Apr 2020 07:51:59 -0700 (PDT) MIME-Version: 1.0 References: <20200324144754.v2.1.I9df0264e151a740be292ad3ee3825f31b5997776@changeid> In-Reply-To: <20200324144754.v2.1.I9df0264e151a740be292ad3ee3825f31b5997776@changeid> From: Doug Anderson Date: Mon, 20 Apr 2020 07:51:48 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v2] bdev: Reduce time holding bd_mutex in sync in blkdev_close() To: Alexander Viro Cc: Salman Qazi , Guenter Roeck , Paolo Valente , Christoph Hellwig , linux-fsdevel@vger.kernel.org, LKML , Jens Axboe Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Alexander, On Tue, Mar 24, 2020 at 2:48 PM Douglas Anderson wrote: > > While trying to "dd" to the block device for a USB stick, I > encountered a hung task warning (blocked for > 120 seconds). I > managed to come up with an easy way to reproduce this on my system > (where /dev/sdb is the block device for my USB stick) with: > > while true; do dd if=/dev/zero of=/dev/sdb bs=4M; done > > With my reproduction here are the relevant bits from the hung task > detector: > > INFO: task udevd:294 blocked for more than 122 seconds. > ... > udevd D 0 294 1 0x00400008 > Call trace: > ... > mutex_lock_nested+0x40/0x50 > __blkdev_get+0x7c/0x3d4 > blkdev_get+0x118/0x138 > blkdev_open+0x94/0xa8 > do_dentry_open+0x268/0x3a0 > vfs_open+0x34/0x40 > path_openat+0x39c/0xdf4 > do_filp_open+0x90/0x10c > do_sys_open+0x150/0x3c8 > ... > > ... > Showing all locks held in the system: > ... > 1 lock held by dd/2798: > #0: ffffff814ac1a3b8 (&bdev->bd_mutex){+.+.}, at: __blkdev_put+0x50/0x204 > ... > dd D 0 2798 2764 0x00400208 > Call trace: > ... > schedule+0x8c/0xbc > io_schedule+0x1c/0x40 > wait_on_page_bit_common+0x238/0x338 > __lock_page+0x5c/0x68 > write_cache_pages+0x194/0x500 > generic_writepages+0x64/0xa4 > blkdev_writepages+0x24/0x30 > do_writepages+0x48/0xa8 > __filemap_fdatawrite_range+0xac/0xd8 > filemap_write_and_wait+0x30/0x84 > __blkdev_put+0x88/0x204 > blkdev_put+0xc4/0xe4 > blkdev_close+0x28/0x38 > __fput+0xe0/0x238 > ____fput+0x1c/0x28 > task_work_run+0xb0/0xe4 > do_notify_resume+0xfc0/0x14bc > work_pending+0x8/0x14 > > The problem appears related to the fact that my USB disk is terribly > slow and that I have a lot of RAM in my system to cache things. > Specifically my writes seem to be happening at ~15 MB/s and I've got > ~4 GB of RAM in my system that can be used for buffering. To write 4 > GB of buffer to disk thus takes ~4000 MB / ~15 MB/s = ~267 seconds. > > The 267 second number is a problem because in __blkdev_put() we call > sync_blockdev() while holding the bd_mutex. Any other callers who > want the bd_mutex will be blocked for the whole time. > > The problem is made worse because I believe blkdev_put() specifically > tells other tasks (namely udev) to go try to access the device at right > around the same time we're going to hold the mutex for a long time. > > Putting some traces around this (after disabling the hung task detector), > I could confirm: > dd: 437.608600: __blkdev_put() right before sync_blockdev() for sdb > udevd: 437.623901: blkdev_open() right before blkdev_get() for sdb > dd: 661.468451: __blkdev_put() right after sync_blockdev() for sdb > udevd: 663.820426: blkdev_open() right after blkdev_get() for sdb > > A simple fix for this is to realize that sync_blockdev() works fine if > you're not holding the mutex. Also, it's not the end of the world if > you sync a little early (though it can have performance impacts). > Thus we can make a guess that we're going to need to do the sync and > then do it without holding the mutex. We still do one last sync with > the mutex but it should be much, much faster. > > With this, my hung task warnings for my test case are gone. > > Signed-off-by: Douglas Anderson > --- > I didn't put a "Fixes" annotation here because, as far as I can tell, > this issue has been here "forever" unless someone knows of something > else that changed that made this possible to hit. This could probably > get picked back to any stable tree that anyone is still maintaining. > > Changes in v2: > - Don't bother holding the mutex when checking "bd_openers". > > fs/block_dev.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/fs/block_dev.c b/fs/block_dev.c > index 9501880dff5e..40c57a9cc91a 100644 > --- a/fs/block_dev.c > +++ b/fs/block_dev.c > @@ -1892,6 +1892,16 @@ static void __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part) > struct gendisk *disk = bdev->bd_disk; > struct block_device *victim = NULL; > > + /* > + * Sync early if it looks like we're the last one. If someone else > + * opens the block device between now and the decrement of bd_openers > + * then we did a sync that we didn't need to, but that's not the end > + * of the world and we want to avoid long (could be several minute) > + * syncs while holding the mutex. > + */ > + if (bdev->bd_openers == 1) > + sync_blockdev(bdev); > + > mutex_lock_nested(&bdev->bd_mutex, for_part); > if (for_part) > bdev->bd_part_count--; > -- > 2.25.1.696.g5e7596f4ac-goog Are you the right person to land this patch? If so, is there anything else that needs to be done? Jens: if you should be the person to land (as suggested by "git log" but not by "get_maintainer") I'm happy to repost with collected tags. Originally I trusted "get_maintainer" to help point me to the right person. Thanks! -Doug