LinuxLists.cc - [EXT3/JBD] Periodic journal flush not enough?

2004-03-26 23:20:14

Subject: [EXT3/JBD] Periodic journal flush not enough?

Hi:

I've encountered a problem with the journal flush timer. The problem
is that when a filesystem is short on space, relying on a timer-based
flushing mechanism is no longer adequate. For example, on my P4 2GHz
I can trigger an ENOSPC error by doing

while :; do echo test > a; [ -s a ] || break; rm a; done; echo Out of space

on an ext3 file system with 12Mb of free space using the usual 5s
journal flush timer.

Of course, when you extend the flushing period as you do with laptop-mode,
this problem becomes a lot worse.

So would it be possible to have the flushing activated on demand?

Thanks,
--
Debian GNU/Linux 3.0 is out! ( http://www.debian.org/ )
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2004-03-26 23:46:51

by Andrew Morton

[permalink] [raw]

Subject: Re: [EXT3/JBD] Periodic journal flush not enough?

Herbert Xu <[email protected]> wrote:
>
> I've encountered a problem with the journal flush timer. The problem
> is that when a filesystem is short on space, relying on a timer-based
> flushing mechanism is no longer adequate. For example, on my P4 2GHz
> I can trigger an ENOSPC error by doing
>
> while :; do echo test > a; [ -s a ] || break; rm a; done; echo Out of space
>
> on an ext3 file system with 12Mb of free space using the usual 5s
> journal flush timer.

I cannot reproduce this. Please send more details. Journalling mode,
kernel version, etc.

2004-03-26 23:55:25

by Andreas Dilger

[permalink] [raw]

Subject: Re: [EXT3/JBD] Periodic journal flush not enough?

On Mar 27, 2004 10:19 +1100, Herbert Xu wrote:
> I've encountered a problem with the journal flush timer. The problem
> is that when a filesystem is short on space, relying on a timer-based
> flushing mechanism is no longer adequate. For example, on my P4 2GHz
> I can trigger an ENOSPC error by doing
>
> while :; do echo test > a; [ -s a ] || break; rm a; done; echo Out of space
>
> on an ext3 file system with 12Mb of free space using the usual 5s
> journal flush timer.
>
> Of course, when you extend the flushing period as you do with laptop-mode,
> this problem becomes a lot worse.
>
> So would it be possible to have the flushing activated on demand?

I had created a patch a while ago, but never really got any testing on it.
It would be great to get this problem fixed. Patch against a 2.4.x kernel.

--- fs/ext3/balloc.c.orig Fri Jul 25 19:55:34 2003
+++ fs/ext3/balloc.c Tue Sep 2 16:27:51 2003
@@ -547,6 +547,8 @@ int ext3_new_block (handle_t *handle, st
#ifdef EXT3FS_DEBUG
static int goal_hits = 0, goal_attempts = 0;
#endif
+ int tried_commit = 0;
+
*errp = -ENOSPC;
sb = inode->i_sb;
if (!sb) {
@@ -643,6 +645,26 @@ repeat:
}
}

+ /* We can only try to commit the previous transaction, or we will
+ * deadlock because the current op has a transaction handle open.
+ * We also can't restart the current handle in a new transaction as
+ * that might break the atomicity guarantees of this transaction.
+ * Set current handle h_sync to allow it to be committed ASAP. */
+ if (!tried_commit) {
+ journal_t *journal = handle->h_transaction->t_journal;
+ transaction_t *prev_trans = journal->j_committing_transaction;
+
+ if (prev_trans) {
+ tid_t prev_tid = prev_trans->t_tid;
+ log_start_commit(journal, prev_trans);
+ log_wait_commit(journal, prev_tid);
+ }
+ handle->h_sync = 1;
+ tried_commit = 1;
+
+ goto repeat;
+ }
+
/* No space left on the device */
goto out;

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

2004-03-27 00:02:05

by Herbert Xu

[permalink] [raw]

Subject: Re: [EXT3/JBD] Periodic journal flush not enough?

On Fri, Mar 26, 2004 at 03:48:51PM -0800, Andrew Morton wrote:
> Herbert Xu <[email protected]> wrote:
> >
> > I've encountered a problem with the journal flush timer. The problem
> > is that when a filesystem is short on space, relying on a timer-based
> > flushing mechanism is no longer adequate. For example, on my P4 2GHz
> > I can trigger an ENOSPC error by doing
> >
> > while :; do echo test > a; [ -s a ] || break; rm a; done; echo Out of space
> >
> > on an ext3 file system with 12Mb of free space using the usual 5s
> > journal flush timer.
>
> I cannot reproduce this. Please send more details. Journalling mode,
> kernel version, etc.

OK. I can reproduce this under both 2.4.25 and 2.6.4.

To prepare for the test, you need to arrange for an ext3 file system
which is short on space. In the following example, I'm using one with
around 12Mb of free space.

You'll also need a way to write/delete files quickly. The above shell
fragment probably doesn't in bash since it's very slow.

So I've attached a C program which does a similar thing.

I've also attached the script output of it running in my VMware machine
running 2.6.4.

Cheers,
--
Debian GNU/Linux 3.0 is out! ( http://www.debian.org/ )
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Attachments:

(No filename) (1.36 kB)
b.c (444.00 B)
c (319.00 B)
Download all attachments

2004-03-29 18:56:35

by Stephen C. Tweedie

[permalink] [raw]

Subject: Re: [EXT3/JBD] Periodic journal flush not enough?

Hi,

On Fri, 2004-03-26 at 23:48, Andrew Morton wrote:
> Herbert Xu <[email protected]> wrote:
> >
> > I've encountered a problem with the journal flush timer. The problem
> > is that when a filesystem is short on space, relying on a timer-based
> > flushing mechanism is no longer adequate. For example, on my P4 2GHz
> > I can trigger an ENOSPC error by doing
> >
> > while :; do echo test > a; [ -s a ] || break; rm a; done; echo Out of space
> >
> > on an ext3 file system with 12Mb of free space using the usual 5s
> > journal flush timer.
>
> I cannot reproduce this. Please send more details. Journalling mode,
> kernel version, etc.

Sounds like it's due to the "b_committed_data" avoidance code. Ext3
cannot immediately reuse disk space after a delete, because of lazy
writeback --- until the final writeback of the delete hits disk, we have
to be able to undo it. And because in non-data-journaled modes we allow
new disk writes to hit disk before a transaction commit, that means we
can't reuse deleted blocks until after they are committed.

I've never seen it reported as a problem outside of artificial test
scenarios, but if it is something we need to address, Andreas Dilger's
patch looks good.

Cheers,
Stephen

2004-03-29 19:17:59

by Chris Mason

[permalink] [raw]

Subject: Re: [EXT3/JBD] Periodic journal flush not enough?

On Mon, 2004-03-29 at 13:56, Stephen C. Tweedie wrote:

> Sounds like it's due to the "b_committed_data" avoidance code. Ext3
> cannot immediately reuse disk space after a delete, because of lazy
> writeback --- until the final writeback of the delete hits disk, we have
> to be able to undo it. And because in non-data-journaled modes we allow
> new disk writes to hit disk before a transaction commit, that means we
> can't reuse deleted blocks until after they are committed.
>
> I've never seen it reported as a problem outside of artificial test
> scenarios, but if it is something we need to address, Andreas Dilger's
> patch looks good.

Just FYI, reiserfs does something slightly different. When
reiserfs_file_write and get_block routines see -ENOSPC, they get things
into a consistent state, commit the running transaction and try again
(once). It didn't end up very complex...

-chris

2004-03-29 20:04:11

by Stephen C. Tweedie

[permalink] [raw]

Subject: Re: [EXT3/JBD] Periodic journal flush not enough?

Hi,

On Fri, 2004-03-26 at 23:55, Andreas Dilger wrote:

> I had created a patch a while ago, but never really got any testing on it.
> It would be great to get this problem fixed.

We don't really want to turn ENOSPC into an operation which always
requires synchronous IO, though, which your patch would do. We can
probably detect the case where we've failed the allocate due to a
b_committed_data collision, though, and perform the retry only in that
case.

Cheers,
Stephen