Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp2005656imm; Tue, 10 Jul 2018 11:27:27 -0700 (PDT) X-Google-Smtp-Source: AAOMgpco/KK+aWW+KxU/iaAAjrLpeq9PDQhLEmrTZm85oLb9NnoQODGEmGlLQNGXtcxh1pJx7HwE X-Received: by 2002:a63:bd51:: with SMTP id d17-v6mr24418507pgp.42.1531247247930; Tue, 10 Jul 2018 11:27:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531247247; cv=none; d=google.com; s=arc-20160816; b=jQz9gXCSIAS8cTkq8Xp+2G+ELK01vNZhAfGf53xaasayL0X1SQcEDCLjA54Vg3Sznp /NNAbehRhmhFeCHqt1jWE/6C38zNbvadibay3rJ5ve46z4smQGE74rH2TFT/8XfCEx+a tZF8GAZr51KROyy6+wBkZjFzfHL6tIsNqq8S0+c5gT436mR31KnGJFh7qiTC43s8/NqL lQQHE0iZDADyWPmIQymYvIiM4GXPwYOi+rKeEBnFNDsssG0M8JY6n7dck5eD30xkRoLC ARdr6it0jgT6ARRMm8ICAOy9PfDrpRWyFHHiTxYctK3Rb41L/U0StFFLeP3IFrmoBCio p08g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=VRKgMcdTs3fP+R9BYYYBijMGAuZl/EEq+WO6hE/83J4=; b=L9U9wgfpSbyVoTpVSSwshdk0BVa9qeMwxkKiQWd5O3vGkPACI98KN1pVIg8hYeLPgW 0rkkM1sKFesv1aX5xPmX/XLCHWT02yyYPp3xcl8yCnd50xGQ9yjDWi9zPxO3DQ9Po/WA 9c1PqO83hJwoEEP2UdWajVn2OIQ4oWrGevSLdWUDhjHMIJ7Fk/qbQj+iEbqPR4Gk6OtC aIIE14mM0BRCQssGaJaQWAiVwNKUs/rsadjFLPPVfnUxbrtCAVO1KoBokDtVhNGWR5R7 yCK7t3WlMvvcyes9uy4UEONc6cDSHu8wKeMUI4L1DVs2/W84oEGeKB8ZGO4s8unAICnb eItQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o1-v6si17067968plb.279.2018.07.10.11.27.12; Tue, 10 Jul 2018 11:27:27 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732731AbeGJS0U (ORCPT + 99 others); Tue, 10 Jul 2018 14:26:20 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:43676 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732406AbeGJS0T (ORCPT ); Tue, 10 Jul 2018 14:26:19 -0400 Received: from localhost (LFbn-1-12247-202.w90-92.abo.wanadoo.fr [90.92.61.202]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id B486CD71; Tue, 10 Jul 2018 18:26:08 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, David Rientjes , Guenter Roeck , Douglas Anderson , Mike Snitzer Subject: [PATCH 3.18 17/23] dm bufio: avoid sleeping while holding the dm_bufio lock Date: Tue, 10 Jul 2018 20:24:50 +0200 Message-Id: <20180710182309.616208199@linuxfoundation.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180710182308.877332304@linuxfoundation.org> References: <20180710182308.877332304@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 3.18-stable review patch. If anyone has any objections, please let me know. ------------------ From: Douglas Anderson commit 9ea61cac0b1ad0c09022f39fd97e9b99a2cfc2dc upstream. We've seen in-field reports showing _lots_ (18 in one case, 41 in another) of tasks all sitting there blocked on: mutex_lock+0x4c/0x68 dm_bufio_shrink_count+0x38/0x78 shrink_slab.part.54.constprop.65+0x100/0x464 shrink_zone+0xa8/0x198 In the two cases analyzed, we see one task that looks like this: Workqueue: kverityd verity_prefetch_io __switch_to+0x9c/0xa8 __schedule+0x440/0x6d8 schedule+0x94/0xb4 schedule_timeout+0x204/0x27c schedule_timeout_uninterruptible+0x44/0x50 wait_iff_congested+0x9c/0x1f0 shrink_inactive_list+0x3a0/0x4cc shrink_lruvec+0x418/0x5cc shrink_zone+0x88/0x198 try_to_free_pages+0x51c/0x588 __alloc_pages_nodemask+0x648/0xa88 __get_free_pages+0x34/0x7c alloc_buffer+0xa4/0x144 __bufio_new+0x84/0x278 dm_bufio_prefetch+0x9c/0x154 verity_prefetch_io+0xe8/0x10c process_one_work+0x240/0x424 worker_thread+0x2fc/0x424 kthread+0x10c/0x114 ...and that looks to be the one holding the mutex. The problem has been reproduced on fairly easily: 0. Be running Chrome OS w/ verity enabled on the root filesystem 1. Pick test patch: http://crosreview.com/412360 2. Install launchBalloons.sh and balloon.arm from http://crbug.com/468342 ...that's just a memory stress test app. 3. On a 4GB rk3399 machine, run nice ./launchBalloons.sh 4 900 100000 ...that tries to eat 4 * 900 MB of memory and keep accessing. 4. Login to the Chrome web browser and restore many tabs With that, I've seen printouts like: DOUG: long bufio 90758 ms ...and stack trace always show's we're in dm_bufio_prefetch(). The problem is that we try to allocate memory with GFP_NOIO while we're holding the dm_bufio lock. Instead we should be using GFP_NOWAIT. Using GFP_NOIO can cause us to sleep while holding the lock and that causes the above problems. The current behavior explained by David Rientjes: It will still try reclaim initially because __GFP_WAIT (or __GFP_KSWAPD_RECLAIM) is set by GFP_NOIO. This is the cause of contention on dm_bufio_lock() that the thread holds. You want to pass GFP_NOWAIT instead of GFP_NOIO to alloc_buffer() when holding a mutex that can be contended by a concurrent slab shrinker (if count_objects didn't use a trylock, this pattern would trivially deadlock). This change significantly increases responsiveness of the system while in this state. It makes a real difference because it unblocks kswapd. In the bug report analyzed, kswapd was hung: kswapd0 D ffffffc000204fd8 0 72 2 0x00000000 Call trace: [] __switch_to+0x9c/0xa8 [] __schedule+0x440/0x6d8 [] schedule+0x94/0xb4 [] schedule_preempt_disabled+0x28/0x44 [] __mutex_lock_slowpath+0x120/0x1ac [] mutex_lock+0x4c/0x68 [] dm_bufio_shrink_count+0x38/0x78 [] shrink_slab.part.54.constprop.65+0x100/0x464 [] shrink_zone+0xa8/0x198 [] balance_pgdat+0x328/0x508 [] kswapd+0x424/0x51c [] kthread+0x10c/0x114 [] ret_from_fork+0x10/0x40 By unblocking kswapd memory pressure should be reduced. Suggested-by: David Rientjes Reviewed-by: Guenter Roeck Signed-off-by: Douglas Anderson Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman --- drivers/md/dm-bufio.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) --- a/drivers/md/dm-bufio.c +++ b/drivers/md/dm-bufio.c @@ -766,7 +766,8 @@ static struct dm_buffer *__alloc_buffer_ * dm-bufio is resistant to allocation failures (it just keeps * one buffer reserved in cases all the allocations fail). * So set flags to not try too hard: - * GFP_NOIO: don't recurse into the I/O layer + * GFP_NOWAIT: don't wait; if we need to sleep we'll release our + * mutex and wait ourselves. * __GFP_NORETRY: don't retry and rather return failure * __GFP_NOMEMALLOC: don't use emergency reserves * __GFP_NOWARN: don't print a warning in case of failure @@ -776,7 +777,7 @@ static struct dm_buffer *__alloc_buffer_ */ while (1) { if (dm_bufio_cache_size_latch != 1) { - b = alloc_buffer(c, GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN); + b = alloc_buffer(c, GFP_NOWAIT | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN); if (b) return b; }