2003-06-15 10:47:31

by Roy Sigurd Karlsbakk

[permalink] [raw]
Subject: [PATCH] O_DIRECT for ext3 (2.4.21)

hi all

I've been waiting for the official O_DIRECT on ext3 for some time now, so I
thought perhaps it's time to get it into 2.4.22 or so. The patch I've used, is
the one below (for 2.4.21):

Please apply

roy


diff -urN linux/fs/ext3/inode.c prontux-1.1.0/fs/ext3/inode.c
--- linux/fs/ext3/inode.c Sun Jun 15 12:55:34 2003
+++ prontux-1.1.0/fs/ext3/inode.c Sun Jun 15 12:53:15 2003
@@ -27,6 +27,7 @@
#include <linux/ext3_jbd.h>
#include <linux/jbd.h>
#include <linux/locks.h>
+#include <linux/iobuf.h>
#include <linux/smp_lock.h>
#include <linux/highuid.h>
#include <linux/quotaops.h>
@@ -732,9 +733,9 @@
* The BKL may not be held on entry here. Be sure to take it early.
*/

-static int ext3_get_block_handle(handle_t *handle, struct inode *inode,
- long iblock,
- struct buffer_head *bh_result, int create)
+static int
+ext3_get_block_handle(handle_t *handle, struct inode *inode, long iblock,
+ struct buffer_head *bh_result, int create, int extend_disksize)
{
int err = -EIO;
int offsets[4];
@@ -814,16 +815,18 @@
if (err)
goto cleanup;

- new_size = inode->i_size;
- /*
- * This is not racy against ext3_truncate's modification of i_disksize
- * because VM/VFS ensures that the file cannot be extended while
- * truncate is in progress. It is racy between multiple parallel
- * instances of get_block, but we have the BKL.
- */
- if (new_size > inode->u.ext3_i.i_disksize)
- inode->u.ext3_i.i_disksize = new_size;
-
+ if (extend_disksize) {
+ /*
+ * This is not racy against ext3_truncate's modification of
+ * i_disksize because VM/VFS ensures that the file cannot be
+ * extended while truncate is in progress. It is racy between
+ * multiple parallel instances of get_block, but we have BKL.
+ */
+ struct ext3_inode_info *ei = EXT3_I(inode);
+ new_size = inode->i_size;
+ if (new_size > ei->i_disksize)
+ ei->i_disksize = new_size;
+ }
bh_result->b_state |= (1UL << BH_New);
goto got_it;

@@ -850,10 +853,41 @@
handle = ext3_journal_current_handle();
J_ASSERT(handle != 0);
}
- ret = ext3_get_block_handle(handle, inode, iblock, bh_result, create);
+ ret = ext3_get_block_handle(handle, inode, iblock,
+ bh_result, create, 1);
return ret;
}

+#define DIO_CREDITS (EXT3_RESERVE_TRANS_BLOCKS + 32)
+
+static int
+ext3_direct_io_get_block(struct inode *inode, long iblock,
+ struct buffer_head *bh_result, int create)
+{
+ handle_t *handle = journal_current_handle();
+ int ret = 0;
+
+ lock_kernel();
+ if (handle && handle->h_buffer_credits <= EXT3_RESERVE_TRANS_BLOCKS) {
+ /*
+ * Getting low on buffer credits...
+ */
+ if (!ext3_journal_extend(handle, DIO_CREDITS)) {
+ /*
+ * Couldn't extend the transaction. Start a new one
+ */
+ ret = ext3_journal_restart(handle, DIO_CREDITS);
+ }
+ }
+ if (ret == 0)
+ ret = ext3_get_block_handle(handle, inode, iblock,
+ bh_result, create, 0);
+ if (ret == 0)
+ bh_result->b_size = (1 << inode->i_blkbits);
+ unlock_kernel();
+ return ret;
+ }
+
/*
* `handle' can be NULL if create is zero
*/
@@ -868,7 +902,7 @@
dummy.b_state = 0;
dummy.b_blocknr = -1000;
buffer_trace_init(&dummy.b_history);
- *errp = ext3_get_block_handle(handle, inode, block, &dummy, create);
+ *errp = ext3_get_block_handle(handle, inode, block, &dummy, create, 1);
if (!*errp && buffer_mapped(&dummy)) {
struct buffer_head *bh;
bh = sb_getblk(inode->i_sb, dummy.b_blocknr);
@@ -1374,6 +1408,67 @@
return journal_try_to_free_buffers(journal, page, wait);
}

+static int
+ext3_direct_IO(int rw, struct inode *inode, struct kiobuf *iobuf,
+ unsigned long blocknr, int blocksize)
+{
+ struct ext3_inode_info *ei = EXT3_I(inode);
+ handle_t *handle = NULL;
+ int ret;
+ int orphan = 0;
+ loff_t offset = blocknr << inode->i_blkbits; /* ugh */
+ ssize_t count = iobuf->length; /* ditto */
+
+ if (rw == WRITE) {
+ loff_t final_size = offset + count;
+
+ lock_kernel();
+ handle = ext3_journal_start(inode, DIO_CREDITS);
+ unlock_kernel();
+ if (IS_ERR(handle)) {
+ ret = PTR_ERR(handle);
+ goto out;
+ }
+ if (final_size > inode->i_size) {
+ lock_kernel();
+ ret = ext3_orphan_add(handle, inode);
+ unlock_kernel();
+ if (ret)
+ goto out_stop;
+ orphan = 1;
+ ei->i_disksize = inode->i_size;
+ }
+ }
+
+ ret = generic_direct_IO(rw, inode, iobuf, blocknr,
+ blocksize, ext3_direct_io_get_block);
+
+out_stop:
+ if (handle) {
+ int err;
+
+ lock_kernel();
+ if (orphan)
+ ext3_orphan_del(handle, inode);
+ if (orphan && ret > 0) {
+ loff_t end = offset + ret;
+ if (end > inode->i_size) {
+ ei->i_disksize = end;
+ inode->i_size = end;
+ err = ext3_mark_inode_dirty(handle, inode);
+ if (!ret)
+ ret = err;
+ }
+ }
+ err = ext3_journal_stop(handle, inode);
+ if (ret == 0)
+ ret = err;
+ unlock_kernel();
+ }
+out:
+ return ret;
+
+}

struct address_space_operations ext3_aops = {
readpage: ext3_readpage, /* BKL not held. Don't need */
@@ -1384,6 +1479,7 @@
bmap: ext3_bmap, /* BKL held */
flushpage: ext3_flushpage, /* BKL not held. Don't need */
releasepage: ext3_releasepage, /* BKL not held. Don't need */
+ direct_IO: ext3_direct_IO, /* BKL not held. Don't need */
};

/*


2003-06-15 11:18:38

by Matti Aarnio

[permalink] [raw]
Subject: Re: [PATCH] O_DIRECT for ext3 (2.4.21)

On Sun, Jun 15, 2003 at 01:01:06PM +0200, Roy Sigurd Karlsbakk wrote:
> hi all
>
> I've been waiting for the official O_DIRECT on ext3 for some time now, so I
> thought perhaps it's time to get it into 2.4.22 or so. The patch I've used, is
> the one below (for 2.4.21):

There is O_DIRECT support for ext3 in 2.5.
How does this relate to that 2.5 version ?

> Please apply
>
> roy
....

/Matti Aarnio

2003-06-16 08:33:19

by Roy Sigurd Karlsbakk

[permalink] [raw]
Subject: Re: [PATCH] O_DIRECT for ext3 (2.4.21)

On Sunday 15 June 2003 13:32, Matti Aarnio wrote:
> There is O_DIRECT support for ext3 in 2.5.
> How does this relate to that 2.5 version ?

I don't know. Someone (sorry - don't remember who) sent me this patch, and it
works fine with my application (which is video streaming). Anyway - we need
O_DIRECT for ext3 somehow, and it should've been there already for ext3, IMHO
--
Roy Sigurd Karlsbakk, Datavaktmester
ProntoTV AS - http://www.pronto.tv/
Tel: +47 9801 3356

Computers are like air conditioners.
They stop working when you open Windows.

2003-06-17 14:35:29

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: [PATCH] O_DIRECT for ext3 (2.4.21)

Hi,

On Sun, 2003-06-15 at 12:01, Roy Sigurd Karlsbakk wrote:
> hi all
>
> I've been waiting for the official O_DIRECT on ext3 for some time now, so I
> thought perhaps it's time to get it into 2.4.22 or so. The patch I've used, is
> the one below (for 2.4.21):

This is Andrea's patch, and it has a few problems which I've been fixing
(like allowing direct IO in journaled data mode --- bad move --- and a
couple of casting errors, nothing hugely problematic.) Please don't
apply this, I'll send updated code later today.

Cheers,
Stephen

2003-06-17 19:47:07

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] O_DIRECT for ext3 (2.4.21)

"Stephen C. Tweedie" <[email protected]> wrote:
>
> Hi,
>
> On Sun, 2003-06-15 at 12:01, Roy Sigurd Karlsbakk wrote:
> > hi all
> >
> > I've been waiting for the official O_DIRECT on ext3 for some time now, so I
> > thought perhaps it's time to get it into 2.4.22 or so. The patch I've used, is
> > the one below (for 2.4.21):
>
> This is Andrea's patch,

Actually I'm the culprit.

> (like allowing direct IO in journaled data mode --- bad move ---

hmm, OK, it doesn't even vaguely work in journalled mode either...

I think the check should be implemented in (the new) ext3_open(). Because
checking the return from open() is the way in which a good application would
determine whether the underlying fs supports O_DIRECT.

Unfortunately O_DIRECT can also be set with fcntl(F_SETFL), and we seem to
have forgotten to provide a way for the fs to be told about fcntl.


2003-06-17 20:04:51

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] O_DIRECT for ext3 (2.4.21)

Andrew Morton <[email protected]> wrote:
>
> I think the check should be implemented in (the new) ext3_open(). Because
> checking the return from open() is the way in which a good application would
> determine whether the underlying fs supports O_DIRECT.
>
> Unfortunately O_DIRECT can also be set with fcntl(F_SETFL), and we seem to
> have forgotten to provide a way for the fs to be told about fcntl.

It works out OK in 2.5, and we should do it this way in 2.4 too:

- dentry_open() checks for inode->i_mapping->a_ops->direct_IO

- setfl() checks for inode->i_mapping->a_ops->direct_IO

- the a_ops for data-journalled inodes have a null ->direct_IO.


2003-06-17 20:33:29

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: [PATCH] O_DIRECT for ext3 (2.4.21)

Hi,

On Tue, 2003-06-17 at 21:19, Andrew Morton wrote:

> It works out OK in 2.5, and we should do it this way in 2.4 too:
>
> - dentry_open() checks for inode->i_mapping->a_ops->direct_IO
>
> - setfl() checks for inode->i_mapping->a_ops->direct_IO
>
> - the a_ops for data-journalled inodes have a null ->direct_IO.

That's what the -aa patches do, and I've got those queued in my local
O_DIRECT stuff already. ext3 will just expose a different a_ops for
data-journaled files.

Cheers,
Stephen

2003-06-19 10:35:45

by Roy Sigurd Karlsbakk

[permalink] [raw]
Subject: Re: [PATCH] O_DIRECT for ext3 (2.4.21)

On Tuesday 17 June 2003 22:46, Stephen C. Tweedie wrote:
> That's what the -aa patches do, and I've got those queued in my local
> O_DIRECT stuff already. ext3 will just expose a different a_ops for
> data-journaled files.

is ext3 O_DIRECT support in -aa?
--
Roy Sigurd Karlsbakk, Datavaktmester
ProntoTV AS - http://www.pronto.tv/
Tel: +47 9801 3356

Computers are like air conditioners.
They stop working when you open Windows.