Resending, since there were no responses to the earlier post.
Hi,
I was working on a reiserfs panic with 2.6.17-rc3, while running fs
stress tests.
The panic message looked like :
" REISERFS: panic (device Null superblock): reiserfs[4248]: assertion
!(truncate && (REISERFS_I(inode)->i_flags & i_link_saved_truncate_mask)
) failed at fs/reiserfs/super.c:328:add_save_link: saved link already re
exists for truncated inode 13b5a "
------ Summary of the problem -----------
Reiserfs uses "safe links" ( directory entries with some special key
value) to keep track of "truncated" or "unlinked" files to ensure
integrity across crashes.
Whenever there is a truncate/unlink on a file, Reiserfs creates a safe
link for the same and deletes the same once the operation is complete.
If the machine crashes before committing the operation, whenever the fs
is mounted next time, the fs will look for the saved links ( easy to
find out, since they have special key) and commit the operation that was
unfinished.
The problem here occurs as follows:
Whenever there is an extending DIO write operation, the fs would
create a safe link so as to ensure the file size consistent, if there is
crash in between the DIO. This will be deleted once the write operation
finishes.
If the DIO write happens to go through a "HOLE" region in the file, it
will fall into normal "buffered write", which is done through the
address space operations prepare_write() & commit_write(). Now, the
prepare_write() might allocate blocks for the file (if needed). So if
there is some error at a later point (say ENOSPC) in prepare_write(), we
need to discard the allocated blocks. This is done by calling
"vmtruncate()" on the file. This call leads to reiserfs specific
truncate, which would try to add a save link for the file.
This addition causes a reiserfs_panic, since there is already a "save
link" stored for the file.
I have a simple testcase to reproduce the problem, which does the same
as described above. I will attach it if required.
Any thoughts on how to fix this ?
thanks,
Suzuki K P
Linux Technology Centre,
IBM Software Labs.
Andrew Morton wrote:
> On Mon, 08 May 2006 17:03:41 +0530
> Suzuki <[email protected]> wrote:
>
>
>>Resending, since there were no responses to the earlier post.
>>
>>Hi,
>>
>>
>>I was working on a reiserfs panic with 2.6.17-rc3, while running fs
>>stress tests.
>>
>>The panic message looked like :
>>
>>" REISERFS: panic (device Null superblock): reiserfs[4248]: assertion
>>!(truncate && (REISERFS_I(inode)->i_flags & i_link_saved_truncate_mask)
>>) failed at fs/reiserfs/super.c:328:add_save_link: saved link already re
>>exists for truncated inode 13b5a "
>>
>>------ Summary of the problem -----------
>>
>>Reiserfs uses "safe links" ( directory entries with some special key
>>value) to keep track of "truncated" or "unlinked" files to ensure
>>integrity across crashes.
>>
>>Whenever there is a truncate/unlink on a file, Reiserfs creates a safe
>>link for the same and deletes the same once the operation is complete.
>>If the machine crashes before committing the operation, whenever the fs
>>is mounted next time, the fs will look for the saved links ( easy to
>>find out, since they have special key) and commit the operation that was
>>unfinished.
>>
>>
>>The problem here occurs as follows:
>>
>> Whenever there is an extending DIO write operation, the fs would
>>create a safe link so as to ensure the file size consistent, if there is
>>crash in between the DIO. This will be deleted once the write operation
>>finishes.
>>
>> If the DIO write happens to go through a "HOLE" region in the file, it
>>will fall into normal "buffered write", which is done through the
>>address space operations prepare_write() & commit_write(). Now, the
>>prepare_write() might allocate blocks for the file (if needed). So if
>>there is some error at a later point (say ENOSPC) in prepare_write(), we
>>need to discard the allocated blocks. This is done by calling
>>"vmtruncate()" on the file. This call leads to reiserfs specific
>>truncate, which would try to add a save link for the file.
>>
>>This addition causes a reiserfs_panic, since there is already a "save
>>link" stored for the file.
>>
>>
>>I have a simple testcase to reproduce the problem, which does the same
>>as described above. I will attach it if required.
>>
>>Any thoughts on how to fix this ?
>>
>
>
> Did we end up fixing this?
Finally, we were able to verify this on 2.6.18-rc7. Sorry for the delay.
Vladimir wrote :
"I am not sure why safe link is needed for write. Maybe one who
added that still remembers why that was done and can explain, please? "
The patch attached by Vladimir has been modified to fit into the current
level.
The patch is on 2.6.19-rc1.
Thanks,
-Suzuki
>
> Thanks.
On Mon, 27 Nov 2006 15:37:49 -0800
Suzuki <[email protected]> wrote:
> * Do not add save links for O_DIRECT writes.
>
> We add a save link for O_DIRECT writes to protect the i_size against the crashes before we actually finish the I/O. If we hit an -ENOSPC in aops->prepare_write(), we would do a truncate() to release the blocks which might have got initialized. Now the truncate would add another save link for the same inode causing a reiserfs panic for having multiple save links for the same inode.
>
>
OK...
But how does this patch fix it? It removes a lot of code - how come we
don't need it any more?
>
>
> Index: linux-2.6.19-rc1/fs/reiserfs/file.c
> ===================================================================
> --- linux-2.6.19-rc1.orig/fs/reiserfs/file.c 2006-10-10 05:54:30.000000000 -0700
> +++ linux-2.6.19-rc1/fs/reiserfs/file.c 2006-11-21 17:17:36.000000000 -0800
> @@ -1306,56 +1306,8 @@
> count = MAX_NON_LFS - (unsigned long)*ppos;
> }
>
> - if (file->f_flags & O_DIRECT) { // Direct IO needs treatment
> - ssize_t result, after_file_end = 0;
> - if ((*ppos + count >= inode->i_size)
> - || (file->f_flags & O_APPEND)) {
> - /* If we are appending a file, we need to put this savelink in here.
> - If we will crash while doing direct io, finish_unfinished will
> - cut the garbage from the file end. */
> - reiserfs_write_lock(inode->i_sb);
> - err =
> - journal_begin(&th, inode->i_sb,
> - JOURNAL_PER_BALANCE_CNT);
> - if (err) {
> - reiserfs_write_unlock(inode->i_sb);
> - return err;
> - }
> - reiserfs_update_inode_transaction(inode);
> - add_save_link(&th, inode, 1 /* Truncate */ );
> - after_file_end = 1;
> - err =
> - journal_end(&th, inode->i_sb,
> - JOURNAL_PER_BALANCE_CNT);
> - reiserfs_write_unlock(inode->i_sb);
> - if (err)
> - return err;
> - }
> - result = do_sync_write(file, buf, count, ppos);
> -
> - if (after_file_end) { /* Now update i_size and remove the savelink */
> - struct reiserfs_transaction_handle th;
> - reiserfs_write_lock(inode->i_sb);
> - err = journal_begin(&th, inode->i_sb, 1);
> - if (err) {
> - reiserfs_write_unlock(inode->i_sb);
> - return err;
> - }
> - reiserfs_update_inode_transaction(inode);
> - mark_inode_dirty(inode);
> - err = journal_end(&th, inode->i_sb, 1);
> - if (err) {
> - reiserfs_write_unlock(inode->i_sb);
> - return err;
> - }
> - err = remove_save_link(inode, 1 /* truncate */ );
> - reiserfs_write_unlock(inode->i_sb);
> - if (err)
> - return err;
> - }
> -
> - return result;
> - }
> + if (file->f_flags & O_DIRECT)
> + return do_sync_write(file, buf, count, ppos);
>
> if (unlikely((ssize_t) count < 0))
> return -EINVAL;
Andrew Morton wrote:
> On Mon, 27 Nov 2006 15:37:49 -0800
> Suzuki <[email protected]> wrote:
>
>
>>* Do not add save links for O_DIRECT writes.
>>
>>We add a save link for O_DIRECT writes to protect the i_size against the crashes before we actually finish the I/O. If we hit an -ENOSPC in aops->prepare_write(), we would do a truncate() to release the blocks which might have got initialized. Now the truncate would add another save link for the same inode causing a reiserfs panic for having multiple save links for the same inode.
>>
>>
>
>
> OK...
>
> But how does this patch fix it? It removes a lot of code - how come we
> don't need it any more?
We were adding save links for appending writes only. The links were
removed once we finish the write operation successfully.
Now we don't add the save links at all.
May be Valdimir has better answers for this.
Thanks,
Suzuki
>
>
>>
>>Index: linux-2.6.19-rc1/fs/reiserfs/file.c
>>===================================================================
>>--- linux-2.6.19-rc1.orig/fs/reiserfs/file.c 2006-10-10 05:54:30.000000000 -0700
>>+++ linux-2.6.19-rc1/fs/reiserfs/file.c 2006-11-21 17:17:36.000000000 -0800
>>@@ -1306,56 +1306,8 @@
>> count = MAX_NON_LFS - (unsigned long)*ppos;
>> }
>>
>>- if (file->f_flags & O_DIRECT) { // Direct IO needs treatment
>>- ssize_t result, after_file_end = 0;
>>- if ((*ppos + count >= inode->i_size)
>>- || (file->f_flags & O_APPEND)) {
>>- /* If we are appending a file, we need to put this savelink in here.
>>- If we will crash while doing direct io, finish_unfinished will
>>- cut the garbage from the file end. */
>>- reiserfs_write_lock(inode->i_sb);
>>- err =
>>- journal_begin(&th, inode->i_sb,
>>- JOURNAL_PER_BALANCE_CNT);
>>- if (err) {
>>- reiserfs_write_unlock(inode->i_sb);
>>- return err;
>>- }
>>- reiserfs_update_inode_transaction(inode);
>>- add_save_link(&th, inode, 1 /* Truncate */ );
>>- after_file_end = 1;
>>- err =
>>- journal_end(&th, inode->i_sb,
>>- JOURNAL_PER_BALANCE_CNT);
>>- reiserfs_write_unlock(inode->i_sb);
>>- if (err)
>>- return err;
>>- }
>>- result = do_sync_write(file, buf, count, ppos);
>>-
>>- if (after_file_end) { /* Now update i_size and remove the savelink */
>>- struct reiserfs_transaction_handle th;
>>- reiserfs_write_lock(inode->i_sb);
>>- err = journal_begin(&th, inode->i_sb, 1);
>>- if (err) {
>>- reiserfs_write_unlock(inode->i_sb);
>>- return err;
>>- }
>>- reiserfs_update_inode_transaction(inode);
>>- mark_inode_dirty(inode);
>>- err = journal_end(&th, inode->i_sb, 1);
>>- if (err) {
>>- reiserfs_write_unlock(inode->i_sb);
>>- return err;
>>- }
>>- err = remove_save_link(inode, 1 /* truncate */ );
>>- reiserfs_write_unlock(inode->i_sb);
>>- if (err)
>>- return err;
>>- }
>>-
>>- return result;
>>- }
>>+ if (file->f_flags & O_DIRECT)
>>+ return do_sync_write(file, buf, count, ppos);
>>
>> if (unlikely((ssize_t) count < 0))
>> return -EINVAL;
Hello
On Tuesday 28 November 2006 05:57, suzuki wrote:
> Andrew Morton wrote:
> > On Mon, 27 Nov 2006 15:37:49 -0800
> > Suzuki <[email protected]> wrote:
> >
> >
> >>* Do not add save links for O_DIRECT writes.
> >>
> >>We add a save link for O_DIRECT writes to protect the i_size against the crashes before we actually finish the I/O. If we hit an -ENOSPC in aops->prepare_write(), we would do a truncate() to release the blocks which might have got initialized. Now the truncate would add another save link for the same inode causing a reiserfs panic for having multiple save links for the same inode.
> >>
> >>
> >
> >
> > OK...
> >
> > But how does this patch fix it? It removes a lot of code - how come we
> > don't need it any more?
>
> We were adding save links for appending writes only. The links were
> removed once we finish the write operation successfully.
>
> Now we don't add the save links at all.
>
The removed code was to return file to a length it had before direct writing far past end started
if that direct write did not complete due to unclean shutdown.
I am not sure why it was needed. I guess that if long append (direct of bufferred) did not complete - it is ok if file becomes longer than it was before.
However, reiserfs seems to fail to update ondisk inode file size when appending big holes. The included patch fixes that.
Please, apply
From: Vladimir Saveliev <[email protected]>
To write far past the end of a file reiserfs appends a hole first.
While the hole grows file size in ondisk inode has to be updated
so that in case of unclean shutdown it matches real file size.
Signed-off-by: Vladimir Saveliev <[email protected]>
diff -puN fs/reiserfs/file.c~reiserfs-update-file-size-on-hole-creation fs/reiserfs/file.c
--- linux-2.6.19-rc6-mm2/fs/reiserfs/file.c~reiserfs-update-file-size-on-hole-creation 2006-11-29 17:54:03.000000000 +0300
+++ linux-2.6.19-rc6-mm2-vs/fs/reiserfs/file.c 2006-11-29 17:54:03.000000000 +0300
@@ -406,6 +406,8 @@ static int reiserfs_allocate_blocks_for_
we restart it. This will also free the path. */
if (journal_transaction_should_end
(th, th->t_blocks_allocated)) {
+ inode->i_size = cpu_key_k_offset(&key) +
+ (to_paste << inode->i_blkbits);
res =
restart_transaction(th, inode,
&path);
_