From: Jan Kara Subject: Re: [PATCH 4/5] jbd: fix error handling for checkpoint io Date: Mon, 23 Jun 2008 14:22:40 +0200 Message-ID: <20080623122240.GJ26743@duck.suse.cz> References: <4843CE15.6080506@hitachi.com> <4843CFBD.7040706@hitachi.com> <20080602124409.GL30613@duck.suse.cz> <4844CB39.6060409@hitachi.com> <20080603080219.GA17936@duck.suse.cz> <485F85AE.1010704@hitachi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: akpm@linux-foundation.org, sct@redhat.com, adilger@clusterfs.com, linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org, jbacik@redhat.com, cmm@us.ibm.com, tytso@mit.edu, sugita , Satoshi OSHIMA To: Hidehiro Kawai Return-path: Received: from styx.suse.cz ([82.119.242.94]:40382 "EHLO mail.suse.cz" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753360AbYFWMWl (ORCPT ); Mon, 23 Jun 2008 08:22:41 -0400 Content-Disposition: inline In-Reply-To: <485F85AE.1010704@hitachi.com> Sender: linux-ext4-owner@vger.kernel.org List-ID: On Mon 23-06-08 20:14:54, Hidehiro Kawai wrote: > Hi, > > I noticed a problem of this patch. Please see below. > > Jan Kara wrote: > > > On Tue 03-06-08 13:40:25, Hidehiro Kawai wrote: > > > >>Subject: [PATCH 4/5] jbd: fix error handling for checkpoint io > >> > >>When a checkpointing IO fails, current JBD code doesn't check the > >>error and continue journaling. This means latest metadata can be > >>lost from both the journal and filesystem. > >> > >>This patch leaves the failed metadata blocks in the journal space > >>and aborts journaling in the case of log_do_checkpoint(). > >>To achieve this, we need to do: > >> > >>1. don't remove the failed buffer from the checkpoint list where in > >> the case of __try_to_free_cp_buf() because it may be released or > >> overwritten by a later transaction > >>2. log_do_checkpoint() is the last chance, remove the failed buffer > >> from the checkpoint list and abort the journal > >>3. when checkpointing fails, don't update the journal super block to > >> prevent the journaled contents from being cleaned. For safety, > >> don't update j_tail and j_tail_sequence either > > 3. is implemented as described below. > (1) if log_do_checkpoint() detects an I/O error during > checkpointing, it calls journal_abort() to abort the journal > (2) if the journal has aborted, don't update s_start and s_sequence > in the on-disk journal superblock > > So, if the journal aborts, journaled data will be replayed on the > next mount. > > Now, please remember that some dirty metadata buffers are written > back to the filesystem without journaling if the journal aborted. > We are happy if all dirty metadata buffers are written to the disk, > the integrity of the filesystem will be kept. However, replaying > the journaled data can overwrite the latest on-disk metadata blocks > partly with old data. It would break the filesystem. Yes, it would. But how do you think it can happen that a metadata buffer will be written back to the filesystem when it is a part of running transaction? Note that checkpointing code specifically checks whether the buffer being written back is part of a running transaction and if so, it waits for commit before writing back the buffer. So I don't think this can happen but maybe I miss something... > My idea to resolve this problem is that we don't write out metadata > buffers which belong to uncommitted transactions if journal has > aborted. Although the latest filesystem updates will be lost, > we can ensure the integrity. It will also be effective for the > kernel panic in the middle of writing metadata buffers without > journaling (this would occur in the `mount -o errors=panic' case.) > > Which integrity or latest state should we choose? Honza -- Jan Kara SUSE Labs, CR