Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3701699imu; Mon, 7 Jan 2019 08:00:46 -0800 (PST) X-Google-Smtp-Source: ALg8bN5CLxXVan3ECEvQVljpd6PRe54cAwYTxxAbGgAUB882WOEJQ+G3x+hj1cryRNf6VK4SvYNu X-Received: by 2002:a17:902:654a:: with SMTP id d10mr60570038pln.324.1546876846613; Mon, 07 Jan 2019 08:00:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1546876846; cv=none; d=google.com; s=arc-20160816; b=DoY0H7wnLzvyVOAgkT2zptbh85TVHTVv5FyO4zx2QEEjcb+YtbbpV3o2ZXT/PodV6Y j9n+eh+ao85loVU4ritUJdbYXeXg5ApvxnHRLbCBQZ3T15Zq4JxOCLqaKA4Lj0A6cU9y rigWjANWLrrCnrNB9dAd6K0uwCRvbMA6f0WAEzpb4MmG61t0fCZeif8WksSeF6mMzVhA Ka0qLHFMgysNNc9BYotyVX0u1oXHCyav8AFvDb6tBlRErcy5A+s6oSQevRaCL2h4+2Nv ZQOOAu49Ovr0Jwbgwdww1wHf9uoKI6mXiOtjPQqujrWpbLsRIsNzPFL61R0yHBFRrqsN bYmA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=euQjAJT5wRWNAP4k7YkThe4a7GEhlCHSsWG45l3YX5U=; b=PKM3q1xOII9g4x/7YmPeZ90ggT1x9iDqFJE3Ntgv9bKM/bRCYpub6+aJxf57VhvYGQ ixTKT1EX9t3liNB+rY/h9M8cmITdxxO1i+3jkV1FTP489tZLVJQQ5O2zPtK9j8Sj3V21 vu3Mi84VYhzQQuaI/a6TjLyXcPW5QcvRlwtEHabxI2ZZM+O+GQvT2I5xShnfY051CfR0 1ZvRHjou/Jb/chwDaJVeJ1TXlejy1mae6ZLxfq5tEKgZagWv0LQ5JTfpdncBo9gPsGTo y2oqENB5gFEkAL3nxuEFE+HQjIVQtlPuo6IK4zVQM9V/uW7Dmje1MeSgukLMTD/sOvBJ +0wQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=iRC9YO+s; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x8si694692plo.259.2019.01.07.08.00.31; Mon, 07 Jan 2019 08:00:46 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=iRC9YO+s; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728469AbfAGMns (ORCPT + 99 others); Mon, 7 Jan 2019 07:43:48 -0500 Received: from mail.kernel.org ([198.145.29.99]:60024 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728458AbfAGMno (ORCPT ); Mon, 7 Jan 2019 07:43:44 -0500 Received: from localhost (5356596B.cm-6-7b.dynamic.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id E395220449; Mon, 7 Jan 2019 12:43:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1546865023; bh=omYbFQerSKtGyLfnFVztcU0wQLHtI7Jo2ahFjhYWdDU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=iRC9YO+sUBZsjnH6R3tjJezdUact4zlBORm9Fy9GELPHyI8tV39bgwgbSByre254d ACDdNPl29lXL0qBjjxAUslDNCepWwiVwd1PareM81ngWaSeHJkXWJiG9weuHeBgy6q iH6Ift6q13Lp8ztr5hQFjSVeYURpNoWl7/eJdWws= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Nikolay Borisov , Filipe Manana , David Sterba Subject: [PATCH 4.20 094/145] Btrfs: fix deadlock with memory reclaim during scrub Date: Mon, 7 Jan 2019 13:32:11 +0100 Message-Id: <20190107104449.511419128@linuxfoundation.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190107104437.308206189@linuxfoundation.org> References: <20190107104437.308206189@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review X-Patchwork-Hint: ignore MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.20-stable review patch. If anyone has any objections, please let me know. ------------------ From: Filipe Manana commit a5fb11429167ee6ddeeacc554efaf5776b36433a upstream. When a transaction commit starts, it attempts to pause scrub and it blocks until the scrub is paused. So while the transaction is blocked waiting for scrub to pause, we can not do memory allocation with GFP_KERNEL from scrub, otherwise we risk getting into a deadlock with reclaim. Checking for scrub pause requests is done early at the beginning of the while loop of scrub_stripe() and later in the loop, scrub_extent() and scrub_raid56_parity() are called, which in turn call scrub_pages() and scrub_pages_for_parity() respectively. These last two functions do memory allocations using GFP_KERNEL. Same problem could happen while scrubbing the super blocks, since it calls scrub_pages(). We also can not have any of the worker tasks, created by the scrub task, doing GFP_KERNEL allocations, because before pausing, the scrub task waits for all the worker tasks to complete (also done at scrub_stripe()). So make sure GFP_NOFS is used for the memory allocations because at any time a scrub pause request can happen from another task that started to commit a transaction. Fixes: 58c4e173847a ("btrfs: scrub: use GFP_KERNEL on the submission path") CC: stable@vger.kernel.org # 4.6+ Reviewed-by: Nikolay Borisov Signed-off-by: Filipe Manana Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/scrub.c | 35 ++++++++++++++++++++++++++++++++++- 1 file changed, 34 insertions(+), 1 deletion(-) --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -322,6 +322,7 @@ static struct full_stripe_lock *insert_f struct rb_node *parent = NULL; struct full_stripe_lock *entry; struct full_stripe_lock *ret; + unsigned int nofs_flag; lockdep_assert_held(&locks_root->lock); @@ -339,8 +340,17 @@ static struct full_stripe_lock *insert_f } } - /* Insert new lock */ + /* + * Insert new lock. + * + * We must use GFP_NOFS because the scrub task might be waiting for a + * worker task executing this function and in turn a transaction commit + * might be waiting the scrub task to pause (which needs to wait for all + * the worker tasks to complete before pausing). + */ + nofs_flag = memalloc_nofs_save(); ret = kmalloc(sizeof(*ret), GFP_KERNEL); + memalloc_nofs_restore(nofs_flag); if (!ret) return ERR_PTR(-ENOMEM); ret->logical = fstripe_logical; @@ -1620,8 +1630,19 @@ static int scrub_add_page_to_wr_bio(stru mutex_lock(&sctx->wr_lock); again: if (!sctx->wr_curr_bio) { + unsigned int nofs_flag; + + /* + * We must use GFP_NOFS because the scrub task might be waiting + * for a worker task executing this function and in turn a + * transaction commit might be waiting the scrub task to pause + * (which needs to wait for all the worker tasks to complete + * before pausing). + */ + nofs_flag = memalloc_nofs_save(); sctx->wr_curr_bio = kzalloc(sizeof(*sctx->wr_curr_bio), GFP_KERNEL); + memalloc_nofs_restore(nofs_flag); if (!sctx->wr_curr_bio) { mutex_unlock(&sctx->wr_lock); return -ENOMEM; @@ -3772,6 +3793,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info struct scrub_ctx *sctx; int ret; struct btrfs_device *dev; + unsigned int nofs_flag; if (btrfs_fs_closing(fs_info)) return -EINVAL; @@ -3875,6 +3897,16 @@ int btrfs_scrub_dev(struct btrfs_fs_info atomic_inc(&fs_info->scrubs_running); mutex_unlock(&fs_info->scrub_lock); + /* + * In order to avoid deadlock with reclaim when there is a transaction + * trying to pause scrub, make sure we use GFP_NOFS for all the + * allocations done at btrfs_scrub_pages() and scrub_pages_for_parity() + * invoked by our callees. The pausing request is done when the + * transaction commit starts, and it blocks the transaction until scrub + * is paused (done at specific points at scrub_stripe() or right above + * before incrementing fs_info->scrubs_running). + */ + nofs_flag = memalloc_nofs_save(); if (!is_dev_replace) { /* * by holding device list mutex, we can @@ -3887,6 +3919,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info if (!ret) ret = scrub_enumerate_chunks(sctx, dev, start, end); + memalloc_nofs_restore(nofs_flag); wait_event(sctx->list_wait, atomic_read(&sctx->bios_in_flight) == 0); atomic_dec(&fs_info->scrubs_running);