Received: by 10.223.185.116 with SMTP id b49csp818812wrg; Sat, 10 Feb 2018 21:06:00 -0800 (PST) X-Google-Smtp-Source: AH8x227R4V45KWtMB+QuJwm5UkUFwRZaoDxScheC9CJScB9xJy2V87cQLPRkYTqpk1efSBp/lgXZ X-Received: by 2002:a17:902:d81:: with SMTP id 1-v6mr7357759plv.270.1518325560205; Sat, 10 Feb 2018 21:06:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518325560; cv=none; d=google.com; s=arc-20160816; b=p2jDp5qqoLKwhxh+ESQOXkk4W7jzutEdXDMXDzXjlgoRHNPLSml5D3H9sDrLW/+7I/ Avdrs+OVOV0Zj8GXe0Exdpf4qi8tp9adRwQWv/XBA5rQCf+qe3cCLLBQeR6f+II6QchO 3xKPwboAfnZmTAbzyi9QeBXQEeLCQ0LN113jbVV49dUxEZnWjhk4RYtFsOfh3fN8wOaf mP6ooJdFshKYwFB8waNUVZjL0X1y7A5jr4cR1xD5ang2NDLC59TvD+QuOP1TquPFyw0L GsiBm59nlbIFTrPbaw56dLA9rHlhQAZykaCn6wZQKxrAb39evjOorx9sagG6GX2124kE +cjA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:subject:message-id:date:cc:to :from:mime-version:content-transfer-encoding:content-disposition :arc-authentication-results; bh=hxzA2n/bc91hHz4ZFaaXgHjPiIUWsxlwyYjVboeT3X8=; b=dUiD25bDOaUrpx0cCkmbVKhl4rJGpXoAI3xCi84wkRZWDx6h+yfB4CdH2jSXyhwD8X yjmuI6aU1FaUfGncb+kKCQ0mncQKUN0BN6g25J/G74/g7LecfFo8ApHK+xjl+nmtPqkv KgmV3KvGPlqCLcU9gLASoH2LGguWh36i4ozOItVjp9exn985dFlpUKQx+r3c8u2KMqAA +XGnZ9ZGbu3CD0/+qMqigWUagr5BGlijScrwIJlx743dP+89r6CWfYubgKJ4FUHPLvAY zVRXMkiQ5glb8733VLpt+AfGKrWvTsdQdl399r29PpgSEm+H3znhvEr/pLkdKOtlNsdZ qtxA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i7-v6si3960010plt.572.2018.02.10.21.05.46; Sat, 10 Feb 2018 21:06:00 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753847AbeBKFFM (ORCPT + 99 others); Sun, 11 Feb 2018 00:05:12 -0500 Received: from shadbolt.e.decadent.org.uk ([88.96.1.126]:41535 "EHLO shadbolt.e.decadent.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752696AbeBKEdn (ORCPT ); Sat, 10 Feb 2018 23:33:43 -0500 Received: from [2a02:8011:400e:2:6f00:88c8:c921:d332] (helo=deadeye) by shadbolt.decadent.org.uk with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1ekjKd-0002hL-Qp; Sun, 11 Feb 2018 04:33:39 +0000 Received: from ben by deadeye with local (Exim 4.90) (envelope-from ) id 1ekjKY-0004Vh-Dp; Sun, 11 Feb 2018 04:33:34 +0000 Content-Type: text/plain; charset="UTF-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit MIME-Version: 1.0 From: Ben Hutchings To: linux-kernel@vger.kernel.org, stable@vger.kernel.org CC: akpm@linux-foundation.org, "Andreas Rohner" , "Ryusuke Konishi" , "Linus Torvalds" Date: Sun, 11 Feb 2018 04:20:06 +0000 Message-ID: X-Mailer: LinuxStableQueue (scripts by bwh) Subject: [PATCH 3.2 53/79] nilfs2: fix race condition that causes file system corruption In-Reply-To: X-SA-Exim-Connect-IP: 2a02:8011:400e:2:6f00:88c8:c921:d332 X-SA-Exim-Mail-From: ben@decadent.org.uk X-SA-Exim-Scanned: No (on shadbolt.decadent.org.uk); SAEximRunCond expanded to false Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 3.2.99-rc1 review patch. If anyone has any objections, please let me know. ------------------ From: Andreas Rohner commit 31ccb1f7ba3cfe29631587d451cf5bb8ab593550 upstream. There is a race condition between nilfs_dirty_inode() and nilfs_set_file_dirty(). When a file is opened, nilfs_dirty_inode() is called to update the access timestamp in the inode. It calls __nilfs_mark_inode_dirty() in a separate transaction. __nilfs_mark_inode_dirty() caches the ifile buffer_head in the i_bh field of the inode info structure and marks it as dirty. After some data was written to the file in another transaction, the function nilfs_set_file_dirty() is called, which adds the inode to the ns_dirty_files list. Then the segment construction calls nilfs_segctor_collect_dirty_files(), which goes through the ns_dirty_files list and checks the i_bh field. If there is a cached buffer_head in i_bh it is not marked as dirty again. Since nilfs_dirty_inode() and nilfs_set_file_dirty() use separate transactions, it is possible that a segment construction that writes out the ifile occurs in-between the two. If this happens the inode is not on the ns_dirty_files list, but its ifile block is still marked as dirty and written out. In the next segment construction, the data for the file is written out and nilfs_bmap_propagate() updates the b-tree. Eventually the bmap root is written into the i_bh block, which is not dirty, because it was written out in another segment construction. As a result the bmap update can be lost, which leads to file system corruption. Either the virtual block address points to an unallocated DAT block, or the DAT entry will be reused for something different. The error can remain undetected for a long time. A typical error message would be one of the "bad btree" errors or a warning that a DAT entry could not be found. This bug can be reproduced reliably by a simple benchmark that creates and overwrites millions of 4k files. Link: http://lkml.kernel.org/r/1509367935-3086-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp Signed-off-by: Andreas Rohner Signed-off-by: Ryusuke Konishi Tested-by: Andreas Rohner Tested-by: Ryusuke Konishi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings --- fs/nilfs2/segment.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) --- a/fs/nilfs2/segment.c +++ b/fs/nilfs2/segment.c @@ -1880,8 +1880,6 @@ static int nilfs_segctor_collect_dirty_f "failed to get inode block.\n"); return err; } - mark_buffer_dirty(ibh); - nilfs_mdt_mark_dirty(ifile); spin_lock(&nilfs->ns_inode_lock); if (likely(!ii->i_bh)) ii->i_bh = ibh; @@ -1890,6 +1888,10 @@ static int nilfs_segctor_collect_dirty_f goto retry; } + // Always redirty the buffer to avoid race condition + mark_buffer_dirty(ii->i_bh); + nilfs_mdt_mark_dirty(ifile); + clear_bit(NILFS_I_QUEUED, &ii->i_state); set_bit(NILFS_I_BUSY, &ii->i_state); list_move_tail(&ii->i_dirty, &sci->sc_dirty_files);