Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3329401imu; Sun, 11 Nov 2018 12:30:20 -0800 (PST) X-Google-Smtp-Source: AJdET5fL5XirnZN4VmRpU3jUHsKBv5DBOx++k3MByr1Bie4gMCe+ds9ogkWQNNp/2NqpoJasiEdz X-Received: by 2002:a17:902:b592:: with SMTP id a18-v6mr16858244pls.248.1541968220168; Sun, 11 Nov 2018 12:30:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1541968220; cv=none; d=google.com; s=arc-20160816; b=R13hLPp+TVDr1Ao9LbaibhC31QLyLtj3rUZuEOWVSqf5WANKKalqwfuoDqQv9x6Zde WrnHvc4y9/+0+r2Ud7g9stSWsKv0tFwof4/W5Ooo/FtDXt11BK1gmGwz7SXAiHET7T23 v8kN3LOKVyTgG+Vb8riZ/DYfdKMzBMAC+YNcN0hrd4xo0teV46zSQeoDLfBAsIA/RLna +PWxUroErNbD7yrJW2OhO+gcfW6jsJIFcwYICjUsgZVHNrFN4gq30GxB55KKz9RuVZib M6wYsg1sLLdSsiHoR4NoYYbuJjhXo+5sXj/g1ype7K7Bl0IDVOQzZowKSwLx0qrP787A qkBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:subject:message-id:date:cc:to :from:mime-version:content-transfer-encoding:content-disposition; bh=oytWhXF864outO43vy4uVpQ7oYDjfZ90/i35tfAjcmw=; b=arMRIGQqMPgUNYzfjVzf0PiXYktcsaVtg9g9Bnp9J2VKKtgYiKp0gj1R090h5WEYtE oXfB8lYW7cfFfFJ+tSBmx4AggAznVQ9Uqzqp59JTwfvQ2jPgUMPR3RPpi2QOeWueoC+L us5aYqC4mFFII8UQ68iWg0a0FFEuY8p3q8IgGNoT+wMnKmIu2O9uOyCqxyJY2zC4DtEc 99UKhqBoEUgvQ6zjUhW5OF+i4cWRTI2/KnIDbFos/0XQqNdhcXRMmj9L8+Pl+lPKqMe1 +rInGJhlvqa8z3epK8XbYgSm0FWYAFFRSuux5KB26k5EaFmkHeajkJCQiVTFV6BKTSEj Iskg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y5-v6si16174128pfg.52.2018.11.11.12.30.04; Sun, 11 Nov 2018 12:30:20 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730547AbeKLFsX (ORCPT + 99 others); Mon, 12 Nov 2018 00:48:23 -0500 Received: from shadbolt.e.decadent.org.uk ([88.96.1.126]:50592 "EHLO shadbolt.e.decadent.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730488AbeKLFsX (ORCPT ); Mon, 12 Nov 2018 00:48:23 -0500 Received: from [192.168.4.242] (helo=deadeye) by shadbolt.decadent.org.uk with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1gLvsi-0000lG-Lw; Sun, 11 Nov 2018 19:58:52 +0000 Received: from ben by deadeye with local (Exim 4.91) (envelope-from ) id 1gLvsZ-0001qU-Qw; Sun, 11 Nov 2018 19:58:43 +0000 Content-Type: text/plain; charset="UTF-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit MIME-Version: 1.0 From: Ben Hutchings To: linux-kernel@vger.kernel.org, stable@vger.kernel.org CC: akpm@linux-foundation.org, "David Rientjes" , "Mikulas Patocka" , "Douglas Anderson" , "Guenter Roeck" , "Mike Snitzer" Date: Sun, 11 Nov 2018 19:49:05 +0000 Message-ID: X-Mailer: LinuxStableQueue (scripts by bwh) Subject: [PATCH 3.16 328/366] dm bufio: avoid sleeping while holding the dm_bufio lock In-Reply-To: X-SA-Exim-Connect-IP: 192.168.4.242 X-SA-Exim-Mail-From: ben@decadent.org.uk X-SA-Exim-Scanned: No (on shadbolt.decadent.org.uk); SAEximRunCond expanded to false Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 3.16.61-rc1 review patch. If anyone has any objections, please let me know. ------------------ From: Douglas Anderson commit 9ea61cac0b1ad0c09022f39fd97e9b99a2cfc2dc upstream. We've seen in-field reports showing _lots_ (18 in one case, 41 in another) of tasks all sitting there blocked on: mutex_lock+0x4c/0x68 dm_bufio_shrink_count+0x38/0x78 shrink_slab.part.54.constprop.65+0x100/0x464 shrink_zone+0xa8/0x198 In the two cases analyzed, we see one task that looks like this: Workqueue: kverityd verity_prefetch_io __switch_to+0x9c/0xa8 __schedule+0x440/0x6d8 schedule+0x94/0xb4 schedule_timeout+0x204/0x27c schedule_timeout_uninterruptible+0x44/0x50 wait_iff_congested+0x9c/0x1f0 shrink_inactive_list+0x3a0/0x4cc shrink_lruvec+0x418/0x5cc shrink_zone+0x88/0x198 try_to_free_pages+0x51c/0x588 __alloc_pages_nodemask+0x648/0xa88 __get_free_pages+0x34/0x7c alloc_buffer+0xa4/0x144 __bufio_new+0x84/0x278 dm_bufio_prefetch+0x9c/0x154 verity_prefetch_io+0xe8/0x10c process_one_work+0x240/0x424 worker_thread+0x2fc/0x424 kthread+0x10c/0x114 ...and that looks to be the one holding the mutex. The problem has been reproduced on fairly easily: 0. Be running Chrome OS w/ verity enabled on the root filesystem 1. Pick test patch: http://crosreview.com/412360 2. Install launchBalloons.sh and balloon.arm from http://crbug.com/468342 ...that's just a memory stress test app. 3. On a 4GB rk3399 machine, run nice ./launchBalloons.sh 4 900 100000 ...that tries to eat 4 * 900 MB of memory and keep accessing. 4. Login to the Chrome web browser and restore many tabs With that, I've seen printouts like: DOUG: long bufio 90758 ms ...and stack trace always show's we're in dm_bufio_prefetch(). The problem is that we try to allocate memory with GFP_NOIO while we're holding the dm_bufio lock. Instead we should be using GFP_NOWAIT. Using GFP_NOIO can cause us to sleep while holding the lock and that causes the above problems. The current behavior explained by David Rientjes: It will still try reclaim initially because __GFP_WAIT (or __GFP_KSWAPD_RECLAIM) is set by GFP_NOIO. This is the cause of contention on dm_bufio_lock() that the thread holds. You want to pass GFP_NOWAIT instead of GFP_NOIO to alloc_buffer() when holding a mutex that can be contended by a concurrent slab shrinker (if count_objects didn't use a trylock, this pattern would trivially deadlock). This change significantly increases responsiveness of the system while in this state. It makes a real difference because it unblocks kswapd. In the bug report analyzed, kswapd was hung: kswapd0 D ffffffc000204fd8 0 72 2 0x00000000 Call trace: [] __switch_to+0x9c/0xa8 [] __schedule+0x440/0x6d8 [] schedule+0x94/0xb4 [] schedule_preempt_disabled+0x28/0x44 [] __mutex_lock_slowpath+0x120/0x1ac [] mutex_lock+0x4c/0x68 [] dm_bufio_shrink_count+0x38/0x78 [] shrink_slab.part.54.constprop.65+0x100/0x464 [] shrink_zone+0xa8/0x198 [] balance_pgdat+0x328/0x508 [] kswapd+0x424/0x51c [] kthread+0x10c/0x114 [] ret_from_fork+0x10/0x40 By unblocking kswapd memory pressure should be reduced. Suggested-by: David Rientjes Reviewed-by: Guenter Roeck Signed-off-by: Douglas Anderson Signed-off-by: Mike Snitzer Cc: Mikulas Patocka Signed-off-by: Ben Hutchings --- drivers/md/dm-bufio.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) --- a/drivers/md/dm-bufio.c +++ b/drivers/md/dm-bufio.c @@ -778,7 +778,8 @@ static struct dm_buffer *__alloc_buffer_ * dm-bufio is resistant to allocation failures (it just keeps * one buffer reserved in cases all the allocations fail). * So set flags to not try too hard: - * GFP_NOIO: don't recurse into the I/O layer + * GFP_NOWAIT: don't wait; if we need to sleep we'll release our + * mutex and wait ourselves. * __GFP_NORETRY: don't retry and rather return failure * __GFP_NOMEMALLOC: don't use emergency reserves * __GFP_NOWARN: don't print a warning in case of failure @@ -788,7 +789,7 @@ static struct dm_buffer *__alloc_buffer_ */ while (1) { if (dm_bufio_cache_size_latch != 1) { - b = alloc_buffer(c, GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN); + b = alloc_buffer(c, GFP_NOWAIT | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN); if (b) return b; }