Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp2010211imm; Tue, 10 Jul 2018 11:32:24 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdxBcCgxwtjvGO+k5KyiWv13LQfHnCUAlXNx71Rynkye3fWuLq7XXgw26UW9knIEOQgMGot X-Received: by 2002:a62:9541:: with SMTP id p62-v6mr26740816pfd.152.1531247544203; Tue, 10 Jul 2018 11:32:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531247544; cv=none; d=google.com; s=arc-20160816; b=U2qPzv56xRpzYhPu8e/h/Ll6L4eIWgIzDvgpCXTlwpfAMCE8dP/o8vkeAkv7s3KbPZ lSGOIip91a485aFCP0/QGbPGH8JYvGuKR4LKySejVMvenVqOEAzrlefava4vRSNkntLg Pl6Fi2tX2iAtubvJO5/pHvBtUd0uqbx4udv/WRutU5Z70BiJrIEQG+D8whEWiggVGwpF U+0VWtKI1LiPxMtof9picio3pVkmaLH5kfwlsSAXKvzyqWU7nyKCirJPPIRa+A4zA8XF ARXBU/jJ6hOMU8JiFRh9L370AHiJXtrkqRgfnMaepdahVzwRpMt3O+EwXgSxrduDGsgo ODtA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=q9NNI6nTalmj7BaSEEFpmQYqMkJdaczTBF0UdbIbjoI=; b=Ah+Vaii4eg6de2NOgHcDcPpHfiQfP/AKXxnqoTZjVS2iOWaXkJuYlay2ifhkZIWJCA VgbP326bfj/NAbgOL4O34UZ3KuoH3VCLFXLJe75+vXDYRIOga7tGI0e6SOsv2X53vm+o +2znOWIQkMqyBfjlSU6ny0X0s2lLRk0ycMIBEXMJ/HKOutqzrwM5YnVdx+g6u5a5tYTL e+dQwiTwaPDkAdRc95j0db0qSWPjiokRawxhzkBLJza9fHXMwxPFGmrGOKIKzIQF9/l8 NAkY1QhjaKOq5Tfi8g+dIgN7c2KQTbRrQYjfT91HZNmxA1cjb/mDHHPa2U3W1LO0WYD6 rneA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x3-v6si19179158pfj.289.2018.07.10.11.32.09; Tue, 10 Jul 2018 11:32:24 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388272AbeGJSbH (ORCPT + 99 others); Tue, 10 Jul 2018 14:31:07 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:45726 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732920AbeGJSbG (ORCPT ); Tue, 10 Jul 2018 14:31:06 -0400 Received: from localhost (LFbn-1-12247-202.w90-92.abo.wanadoo.fr [90.92.61.202]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id A156BEB4; Tue, 10 Jul 2018 18:30:54 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, David Rientjes , Guenter Roeck , Douglas Anderson , Mike Snitzer Subject: [PATCH 4.9 44/52] dm bufio: avoid sleeping while holding the dm_bufio lock Date: Tue, 10 Jul 2018 20:25:12 +0200 Message-Id: <20180710182453.462571488@linuxfoundation.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180710182449.285532226@linuxfoundation.org> References: <20180710182449.285532226@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.9-stable review patch. If anyone has any objections, please let me know. ------------------ From: Douglas Anderson commit 9ea61cac0b1ad0c09022f39fd97e9b99a2cfc2dc upstream. We've seen in-field reports showing _lots_ (18 in one case, 41 in another) of tasks all sitting there blocked on: mutex_lock+0x4c/0x68 dm_bufio_shrink_count+0x38/0x78 shrink_slab.part.54.constprop.65+0x100/0x464 shrink_zone+0xa8/0x198 In the two cases analyzed, we see one task that looks like this: Workqueue: kverityd verity_prefetch_io __switch_to+0x9c/0xa8 __schedule+0x440/0x6d8 schedule+0x94/0xb4 schedule_timeout+0x204/0x27c schedule_timeout_uninterruptible+0x44/0x50 wait_iff_congested+0x9c/0x1f0 shrink_inactive_list+0x3a0/0x4cc shrink_lruvec+0x418/0x5cc shrink_zone+0x88/0x198 try_to_free_pages+0x51c/0x588 __alloc_pages_nodemask+0x648/0xa88 __get_free_pages+0x34/0x7c alloc_buffer+0xa4/0x144 __bufio_new+0x84/0x278 dm_bufio_prefetch+0x9c/0x154 verity_prefetch_io+0xe8/0x10c process_one_work+0x240/0x424 worker_thread+0x2fc/0x424 kthread+0x10c/0x114 ...and that looks to be the one holding the mutex. The problem has been reproduced on fairly easily: 0. Be running Chrome OS w/ verity enabled on the root filesystem 1. Pick test patch: http://crosreview.com/412360 2. Install launchBalloons.sh and balloon.arm from http://crbug.com/468342 ...that's just a memory stress test app. 3. On a 4GB rk3399 machine, run nice ./launchBalloons.sh 4 900 100000 ...that tries to eat 4 * 900 MB of memory and keep accessing. 4. Login to the Chrome web browser and restore many tabs With that, I've seen printouts like: DOUG: long bufio 90758 ms ...and stack trace always show's we're in dm_bufio_prefetch(). The problem is that we try to allocate memory with GFP_NOIO while we're holding the dm_bufio lock. Instead we should be using GFP_NOWAIT. Using GFP_NOIO can cause us to sleep while holding the lock and that causes the above problems. The current behavior explained by David Rientjes: It will still try reclaim initially because __GFP_WAIT (or __GFP_KSWAPD_RECLAIM) is set by GFP_NOIO. This is the cause of contention on dm_bufio_lock() that the thread holds. You want to pass GFP_NOWAIT instead of GFP_NOIO to alloc_buffer() when holding a mutex that can be contended by a concurrent slab shrinker (if count_objects didn't use a trylock, this pattern would trivially deadlock). This change significantly increases responsiveness of the system while in this state. It makes a real difference because it unblocks kswapd. In the bug report analyzed, kswapd was hung: kswapd0 D ffffffc000204fd8 0 72 2 0x00000000 Call trace: [] __switch_to+0x9c/0xa8 [] __schedule+0x440/0x6d8 [] schedule+0x94/0xb4 [] schedule_preempt_disabled+0x28/0x44 [] __mutex_lock_slowpath+0x120/0x1ac [] mutex_lock+0x4c/0x68 [] dm_bufio_shrink_count+0x38/0x78 [] shrink_slab.part.54.constprop.65+0x100/0x464 [] shrink_zone+0xa8/0x198 [] balance_pgdat+0x328/0x508 [] kswapd+0x424/0x51c [] kthread+0x10c/0x114 [] ret_from_fork+0x10/0x40 By unblocking kswapd memory pressure should be reduced. Suggested-by: David Rientjes Reviewed-by: Guenter Roeck Signed-off-by: Douglas Anderson Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman --- drivers/md/dm-bufio.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) --- a/drivers/md/dm-bufio.c +++ b/drivers/md/dm-bufio.c @@ -824,7 +824,8 @@ static struct dm_buffer *__alloc_buffer_ * dm-bufio is resistant to allocation failures (it just keeps * one buffer reserved in cases all the allocations fail). * So set flags to not try too hard: - * GFP_NOIO: don't recurse into the I/O layer + * GFP_NOWAIT: don't wait; if we need to sleep we'll release our + * mutex and wait ourselves. * __GFP_NORETRY: don't retry and rather return failure * __GFP_NOMEMALLOC: don't use emergency reserves * __GFP_NOWARN: don't print a warning in case of failure @@ -834,7 +835,7 @@ static struct dm_buffer *__alloc_buffer_ */ while (1) { if (dm_bufio_cache_size_latch != 1) { - b = alloc_buffer(c, GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN); + b = alloc_buffer(c, GFP_NOWAIT | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN); if (b) return b; }