Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3309878imu; Sun, 11 Nov 2018 12:06:02 -0800 (PST) X-Google-Smtp-Source: AJdET5cARgghPGMA2/DBh9DQ5dXBs7FwqzXfdw2dVvibGNfbjQkIGTRtATyWfl45f4apnDPoo4DK X-Received: by 2002:a63:2d82:: with SMTP id t124mr14923365pgt.260.1541966762301; Sun, 11 Nov 2018 12:06:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1541966762; cv=none; d=google.com; s=arc-20160816; b=fw58iWpJ7y7YoLTNpoZYSoiS2Z0R8FcEBr1yoitH0Gr7r+Gq0G+wMQq3gMuxyBvkVB ButhX0Srh4zqqD1vPjoyOOBscIs5Vthe9fh+gadPjyuDwoFUJDZipsSF2mfHFZF2Ig3Y GGJN+VBOOVD/QSFQfQGJk8gwOOq9r/OAJL5XbFeezkCdly1+nMOJfp7MbSZ81FjbvuQq XZ4maBQ8c11wHzNbjHrN5xnhhyuQzHoZBKBMVH6UkGpHxScsPk8kTpKzzqJP6ePWf7FQ hWiULYyQZ/IrEK2C/TKw6O0iuGcItN1BjPZjnV01R6dcv2yisBIGbwm0fae8fPalgCfp Wx1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:subject:message-id:date:cc:to :from:mime-version:content-transfer-encoding:content-disposition; bh=W1jmiGXWjh37u5MYrG0wxEIQQHnpyNOUjZghRZLnGkk=; b=tFlxBjicAeLIT/uC9z8D0RcI2rmwQU/9LEvY4Q5zIa32LIhS45kBF0Ub2xXRJQKRfD 1mhy9+zdFdF8whtn6o2pEpF7GPDF4ZdkBE0NPYQqsasnMeIqXlUOMc/OsLKuBcRfUnqJ NJNrpEkxmnDXsdVC1Y3qs5gJKxDL5t9z9R+bK05Z3tscaVhnh/abjt0j5HeqTsfEAZTx EiH01qJ/g1+2a9wpiYYez3ptWWgV0DIk1KFK7WyOKafolf9feBpkY3TecTiDznmVJw5z 6IalMHGsPa/2QvsWaaWolTmWEH8m1ljbPO1U3HL1kZBYbYZHkgDEG1hCiX9l6AhYlnHg G+Dg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z38si2002014pga.193.2018.11.11.12.05.47; Sun, 11 Nov 2018 12:06:02 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731170AbeKLFyq (ORCPT + 99 others); Mon, 12 Nov 2018 00:54:46 -0500 Received: from shadbolt.e.decadent.org.uk ([88.96.1.126]:51780 "EHLO shadbolt.e.decadent.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728706AbeKLFyp (ORCPT ); Mon, 12 Nov 2018 00:54:45 -0500 Received: from [192.168.4.242] (helo=deadeye) by shadbolt.decadent.org.uk with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1gLvsw-0000lM-DA; Sun, 11 Nov 2018 19:59:06 +0000 Received: from ben by deadeye with local (Exim 4.91) (envelope-from ) id 1gLvsU-0001eI-RX; Sun, 11 Nov 2018 19:58:38 +0000 Content-Type: text/plain; charset="UTF-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit MIME-Version: 1.0 From: Ben Hutchings To: linux-kernel@vger.kernel.org, stable@vger.kernel.org CC: akpm@linux-foundation.org, "Dennis Yang" , "Mike Snitzer" Date: Sun, 11 Nov 2018 19:49:05 +0000 Message-ID: X-Mailer: LinuxStableQueue (scripts by bwh) Subject: [PATCH 3.16 207/366] dm thin: handle running out of data space vs concurrent discard In-Reply-To: X-SA-Exim-Connect-IP: 192.168.4.242 X-SA-Exim-Mail-From: ben@decadent.org.uk X-SA-Exim-Scanned: No (on shadbolt.decadent.org.uk); SAEximRunCond expanded to false Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 3.16.61-rc1 review patch. If anyone has any objections, please let me know. ------------------ From: Mike Snitzer commit a685557fbbc3122ed11e8ad3fa63a11ebc5de8c3 upstream. Discards issued to a DM thin device can complete to userspace (via fstrim) _before_ the metadata changes associated with the discards is reflected in the thinp superblock (e.g. free blocks). As such, if a user constructs a test that loops repeatedly over these steps, block allocation can fail due to discards not having completed yet: 1) fill thin device via filesystem file 2) remove file 3) fstrim =46rominitial report, here: https://www.redhat.com/archives/dm-devel/2018-April/msg00022.html "The root cause of this issue is that dm-thin will first remove mapping and increase corresponding blocks' reference count to prevent them from being reused before DISCARD bios get processed by the underlying layers. However. increasing blocks' reference count could also increase the nr_allocated_this_transaction in struct sm_disk which makes smd->old_ll.nr_allocated + smd->nr_allocated_this_transaction bigger than smd->old_ll.nr_blocks. In this case, alloc_data_block() will never commit metadata to reset the begin pointer of struct sm_disk, because sm_disk_get_nr_free() always return an underflow value." While there is room for improvement to the space-map accounting that thinp is making use of: the reality is this test is inherently racey and will result in the previous iteration's fstrim's discard(s) completing vs concurrent block allocation, via dd, in the next iteration of the loop. No amount of space map accounting improvements will be able to allow user's to use a block before a discard of that block has completed. So the best we can really do is allow DM thinp to gracefully handle such aggressive use of all the pool's data by degrading the pool into out-of-data-space (OODS) mode. We _should_ get that behaviour already (if space map accounting didn't falsely cause alloc_data_block() to believe free space was available).. but short of that we handle the current reality that dm_pool_alloc_data_block() can return -ENOSPC. Reported-by: Dennis Yang Signed-off-by: Mike Snitzer Signed-off-by: Ben Hutchings --- drivers/md/dm-thin.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) --- a/drivers/md/dm-thin.c +++ b/drivers/md/dm-thin.c @@ -938,6 +938,8 @@ static void schedule_zero(struct thin_c static void set_pool_mode(struct pool *pool, enum pool_mode new_mode); +static void requeue_bios(struct pool *pool); + static void check_for_space(struct pool *pool) { int r; @@ -950,8 +952,10 @@ static void check_for_space(struct pool if (r) return; - if (nr_free) + if (nr_free) { set_pool_mode(pool, PM_WRITE); + requeue_bios(pool); + } } /* @@ -1028,7 +1032,10 @@ static int alloc_data_block(struct thin_ r = dm_pool_alloc_data_block(pool->pmd, result); if (r) { - metadata_operation_failed(pool, "dm_pool_alloc_data_block", r); + if (r == -ENOSPC) + set_pool_mode(pool, PM_OUT_OF_DATA_SPACE); + else + metadata_operation_failed(pool, "dm_pool_alloc_data_block", r); return r; }