Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp61521pxk; Mon, 5 Oct 2020 17:49:35 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzNJxm7s+eSooaA0cPq2ByRum1nGZncDogpIyX+n0Zbk62YFFMQDRcNGm7aN+qrfFsubJj0 X-Received: by 2002:a17:906:d92d:: with SMTP id rn13mr2546997ejb.439.1601945375492; Mon, 05 Oct 2020 17:49:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1601945375; cv=none; d=google.com; s=arc-20160816; b=u3cw5GWnFf6FjqAu9at8ZtwzD6++9vNt3++tOeRVF9XRxVo48uXdgMjr1OoUKI99iS 36NCGSgV4+unZNBC44Qt+Au7Stzd91+M4ZcQR9Fpqkt2YNKxjxfA7xWNc+EGMyRAE0bt cer3F8REYfQpyzn3IaDH+6cCYHJXewgjCQ5y1BTydOJoecP9Ms7d3rgkRYiW01qUguM4 SedoaL/JX8v5Gl+20YYLw7Y3E1ci9FBWtkWcJ3oaQPbE04yG6C8vHUuEYqfFCgqQesDh 83CqZKmi/iT5YUlPd/Ad8moL4Ns24dHXMqSq03qZdGkeZI1u98Bz4655CkJXHfYqL/Go nsXw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=UC5Ocn3Tw5Ct8smHnaeE3PO2yoiTUfAo3AuOvhlc9Ng=; b=0vrT3FDhEdwhKG/8Tfrn211p/9WJdBFSqpCQiPOa1b7MJKNvitsbfb548B661By9D3 GiK4FRiF1ZkiZj5vY6R5qaEvZSSNdAZHaxiNsLrMZ5dX1SAmcT/OC//ZAi53o/d5yD+p w1JzslodiHxELjPfsOtHi+uWa4lNWEDFRgeoqCOWIDY302UEXsxQcDwxfSQLfDw9dQY6 /c2XbDTmfIrqCCjiWRh8hLqYXd9s8mtwX2V+HEW+do7+HTfieRbtoaqe0gVfHuix/Sn4 +DWS9aE8uBduTQPYxig0/QsgACAL+mLLt5HIr+pAy084vmQP0wpaA0AJ2fy6b+UiOcTE R37g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l2si954494ejc.209.2020.10.05.17.49.12; Mon, 05 Oct 2020 17:49:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726603AbgJFAtG (ORCPT + 99 others); Mon, 5 Oct 2020 20:49:06 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:56657 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725865AbgJFAtG (ORCPT ); Mon, 5 Oct 2020 20:49:06 -0400 Received: from mail-qk1-f197.google.com ([209.85.222.197]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kPbA7-0004XQ-8q for linux-ext4@vger.kernel.org; Tue, 06 Oct 2020 00:49:03 +0000 Received: by mail-qk1-f197.google.com with SMTP id m23so8069990qkh.10 for ; Mon, 05 Oct 2020 17:49:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=UC5Ocn3Tw5Ct8smHnaeE3PO2yoiTUfAo3AuOvhlc9Ng=; b=EiCntVP4+ejMYjnz1njUmwyxWl94b7EQ4i40/5zuHi1GHH14knyYjJoC2hjuex+7lH im0VXoajD7mv3r99mDWGOlAgWKo49YjNVZFaF9TeqXK2vZUwvgciIpRlABR38p7dnH5Y a4TVpZiwgxaBibzpSfXvUpnvWN5/nhiBU379z+8aTFK/KYBeklXXxQ61J6N5I+fd9Qmd 9x2xj0a3IfF3cAocyLW+xAcvYljEQLB+SPKEFwLyLprFJMYFGkdnmQT3m5Rib+QLHrue n6bckvlIFUBmZRL3cPuWwTrtnCPnM/b0rLPf1M10Q/I75XVF2LWbe43n4HdsfOia5D8P QEIQ== X-Gm-Message-State: AOAM533cw0uLFPG8yNhWm/x3S3F9n3e5Yq2bhNAhZGjE2MagZneZ5SYJ aAh+l7lFFM7Le498p8ZGg41x1/Y8jn61y/4cmLvYeMmIpqLeeW85spSxWsXjl6m0DbtvksoGCOW 2NhqobhnsyiFs5DNOQOKuoWM8tLrVdVetmK2/CDw= X-Received: by 2002:a05:620a:2e7:: with SMTP id a7mr2772976qko.48.1601945341836; Mon, 05 Oct 2020 17:49:01 -0700 (PDT) X-Received: by 2002:a05:620a:2e7:: with SMTP id a7mr2772961qko.48.1601945341552; Mon, 05 Oct 2020 17:49:01 -0700 (PDT) Received: from localhost.localdomain ([201.82.49.101]) by smtp.gmail.com with ESMTPSA id l125sm1355322qke.23.2020.10.05.17.48.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Oct 2020 17:49:00 -0700 (PDT) From: Mauricio Faria de Oliveira To: linux-ext4@vger.kernel.org, ocfs2-devel@oss.oracle.com Cc: Jan Kara , Andreas Dilger , dann frazier , Joseph Qi Subject: [PATCH v5 4/4] ext4: data=journal: write-protect pages on j_submit_inode_data_buffers() Date: Mon, 5 Oct 2020 21:48:41 -0300 Message-Id: <20201006004841.600488-5-mfo@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201006004841.600488-1-mfo@canonical.com> References: <20201006004841.600488-1-mfo@canonical.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org This implements journal callbacks j_submit|finish_inode_data_buffers() with different behavior for data=journal: to write-protect pages under commit, preventing changes to buffers writeably mapped to userspace. If a buffer's content changes between commit's checksum calculation and write-out to disk, it can cause journal recovery/mount failures upon a kernel crash or power loss. [ 27.334874] EXT4-fs: Warning: mounting with data=journal disables delayed allocation, dioread_nolock, and O_DIRECT support! [ 27.339492] JBD2: Invalid checksum recovering data block 8705 in log [ 27.342716] JBD2: recovery failed [ 27.343316] EXT4-fs (loop0): error loading journal mount: /ext4: can't read superblock on /dev/loop0. In j_submit_inode_data_buffers() we write-protect the inode's pages with write_cache_pages() and redirty w/ writepage callback if needed. In j_finish_inode_data_buffers() there is nothing do to. And in order to use the callbacks, inodes are added to the inode list in transaction in __ext4_journalled_writepage() and ext4_page_mkwrite(). In ext4_page_mkwrite() we must make sure that the buffers are attached to the transaction as jbddirty with write_end_fn(), as already done in __ext4_journalled_writepage(). Signed-off-by: Mauricio Faria de Oliveira Reported-by: Dann Frazier Reported-by: kernel test robot # wbc.nr_to_write Suggested-by: Jan Kara Reviewed-by: Jan Kara --- fs/ext4/inode.c | 25 +++++++++----- fs/ext4/super.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 101 insertions(+), 11 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index ac153e340a6f..af5de62c1214 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1910,6 +1910,9 @@ static int __ext4_journalled_writepage(struct page *page, err = ext4_walk_page_buffers(handle, page_bufs, 0, len, NULL, write_end_fn); } + if (ret == 0) + ret = err; + err = ext4_jbd2_inode_add_write(handle, inode, 0, len); if (ret == 0) ret = err; EXT4_I(inode)->i_datasync_tid = handle->h_transaction->t_tid; @@ -6052,10 +6055,8 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) size = i_size_read(inode); /* Page got truncated from under us? */ if (page->mapping != mapping || page_offset(page) > size) { - unlock_page(page); ret = VM_FAULT_NOPAGE; - ext4_journal_stop(handle); - goto out; + goto out_error; } if (page->index == size >> PAGE_SHIFT) @@ -6065,13 +6066,15 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) err = __block_write_begin(page, 0, len, ext4_get_block); if (!err) { + ret = VM_FAULT_SIGBUS; if (ext4_walk_page_buffers(handle, page_buffers(page), - 0, len, NULL, do_journal_get_write_access)) { - unlock_page(page); - ret = VM_FAULT_SIGBUS; - ext4_journal_stop(handle); - goto out; - } + 0, len, NULL, do_journal_get_write_access)) + goto out_error; + if (ext4_walk_page_buffers(handle, page_buffers(page), + 0, len, NULL, write_end_fn)) + goto out_error; + if (ext4_jbd2_inode_add_write(handle, inode, 0, len)) + goto out_error; ext4_set_inode_state(inode, EXT4_STATE_JDATA); } else { unlock_page(page); @@ -6086,6 +6089,10 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) up_read(&EXT4_I(inode)->i_mmap_sem); sb_end_pagefault(inode->i_sb); return ret; +out_error: + unlock_page(page); + ext4_journal_stop(handle); + goto out; } vm_fault_t ext4_filemap_fault(struct vm_fault *vmf) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index a14c1ed39aa3..a2fc62a6d3b7 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -472,6 +472,89 @@ static void ext4_journal_commit_callback(journal_t *journal, transaction_t *txn) spin_unlock(&sbi->s_md_lock); } +/* + * This writepage callback for write_cache_pages() + * takes care of a few cases after page cleaning. + * + * write_cache_pages() already checks for dirty pages + * and calls clear_page_dirty_for_io(), which we want, + * to write protect the pages. + * + * However, we may have to redirty a page (see below.) + */ +static int ext4_journalled_writepage_callback(struct page *page, + struct writeback_control *wbc, + void *data) +{ + transaction_t *transaction = (transaction_t *) data; + struct buffer_head *bh, *head; + struct journal_head *jh; + + bh = head = page_buffers(page); + do { + /* + * We have to redirty a page in these cases: + * 1) If buffer is dirty, it means the page was dirty because it + * contains a buffer that needs checkpointing. So the dirty bit + * needs to be preserved so that checkpointing writes the buffer + * properly. + * 2) If buffer is not part of the committing transaction + * (we may have just accidentally come across this buffer because + * inode range tracking is not exact) or if the currently running + * transaction already contains this buffer as well, dirty bit + * needs to be preserved so that the buffer gets writeprotected + * properly on running transaction's commit. + */ + jh = bh2jh(bh); + if (buffer_dirty(bh) || + (jh && (jh->b_transaction != transaction || + jh->b_next_transaction))) { + redirty_page_for_writepage(wbc, page); + goto out; + } + } while ((bh = bh->b_this_page) != head); + +out: + return AOP_WRITEPAGE_ACTIVATE; +} + +static int ext4_journalled_submit_inode_data_buffers(struct jbd2_inode *jinode) +{ + struct address_space *mapping = jinode->i_vfs_inode->i_mapping; + struct writeback_control wbc = { + .sync_mode = WB_SYNC_ALL, + .nr_to_write = LONG_MAX, + .range_start = jinode->i_dirty_start, + .range_end = jinode->i_dirty_end, + }; + + return write_cache_pages(mapping, &wbc, + ext4_journalled_writepage_callback, + jinode->i_transaction); +} + +static int ext4_journal_submit_inode_data_buffers(struct jbd2_inode *jinode) +{ + int ret; + + if (ext4_should_journal_data(jinode->i_vfs_inode)) + ret = ext4_journalled_submit_inode_data_buffers(jinode); + else + ret = jbd2_journal_submit_inode_data_buffers(jinode); + + return ret; +} + +static int ext4_journal_finish_inode_data_buffers(struct jbd2_inode *jinode) +{ + int ret = 0; + + if (!ext4_should_journal_data(jinode->i_vfs_inode)) + ret = jbd2_journal_finish_inode_data_buffers(jinode); + + return ret; +} + static bool system_going_down(void) { return system_state == SYSTEM_HALT || system_state == SYSTEM_POWER_OFF @@ -4647,9 +4730,9 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) sbi->s_journal->j_commit_callback = ext4_journal_commit_callback; sbi->s_journal->j_submit_inode_data_buffers = - jbd2_journal_submit_inode_data_buffers; + ext4_journal_submit_inode_data_buffers; sbi->s_journal->j_finish_inode_data_buffers = - jbd2_journal_finish_inode_data_buffers; + ext4_journal_finish_inode_data_buffers; no_journal: if (!test_opt(sb, NO_MBCACHE)) { -- 2.17.1