2003-07-09 12:16:59

by Christoph Hellwig

[permalink] [raw]
Subject: ->direct_IO API change in current 2.4 BK

I just got a nice XFS oops due to the direct_IO API change in
2.4. Guys, this is a STABLE series and APIs are supposed to be exactly
that, _STABLE_. If you really think O_DIRECT on NFS is soo important
please add a ->direct_IO2 for NFS like the reiserfs read_inode2 hack.

But what's the use of it anyway? AFAIK it's mostly for whoracle setups
that have their data on netapps but that needs a certified vendor kernel
not mainline..


2003-07-09 16:50:06

by Andreas Dilger

[permalink] [raw]
Subject: Re: ->direct_IO API change in current 2.4 BK

On Jul 09, 2003 13:31 +0100, Christoph Hellwig wrote:
> I just got a nice XFS oops due to the direct_IO API change in
> 2.4. Guys, this is a STABLE series and APIs are supposed to be exactly
> that, _STABLE_. If you really think O_DIRECT on NFS is soo important
> please add a ->direct_IO2 for NFS like the reiserfs read_inode2 hack.

I would have to agree with that sentiment - we shouldn't change the
API in an "almost compatible" way, although I would have hoped that
compile warnings and/or module symbol versioning would have avoided
a crash.

> But what's the use of it anyway? AFAIK it's mostly for whoracle setups
> that have their data on netapps but that needs a certified vendor kernel
> not mainline..

Actually, it is useful for Lustre to do this, because it allows us to have
a file handle (which, naturally, holds per-file data) at the time the IO is
sent over the wire, instead of the "anonymous" writes that happen now.
This helps us with readahead on the server and other minor improvements.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

2003-07-09 17:13:40

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: ->direct_IO API change in current 2.4 BK



On Wed, 9 Jul 2003, Andreas Dilger wrote:

> On Jul 09, 2003 13:31 +0100, Christoph Hellwig wrote:
> > I just got a nice XFS oops due to the direct_IO API change in
> > 2.4. Guys, this is a STABLE series and APIs are supposed to be exactly
> > that, _STABLE_. If you really think O_DIRECT on NFS is soo important
> > please add a ->direct_IO2 for NFS like the reiserfs read_inode2 hack.
>
> I would have to agree with that sentiment - we shouldn't change the
> API in an "almost compatible" way, although I would have hoped that
> compile warnings and/or module symbol versioning would have avoided
> a crash.
>
> > But what's the use of it anyway? AFAIK it's mostly for whoracle setups
> > that have their data on netapps but that needs a certified vendor kernel
> > not mainline..
>
> Actually, it is useful for Lustre to do this, because it allows us to have
> a file handle (which, naturally, holds per-file data) at the time the IO is
> sent over the wire, instead of the "anonymous" writes that happen now.
> This helps us with readahead on the server and other minor improvements.

Fine, I agree. Trond, I'll have to revert your direct IO patches because
they break the _stable_ API, indeed.

Please come up with another solution for the problem (->direct_IO2 ? its
ugly, but...).

2003-07-09 17:30:01

by Marc-Christian Petersen

[permalink] [raw]
Subject: Re: ->direct_IO API change in current 2.4 BK

On Wednesday 09 July 2003 19:24, Marcelo Tosatti wrote:

Hi,

> > > I just got a nice XFS oops due to the direct_IO API change in
> > > 2.4. Guys, this is a STABLE series and APIs are supposed to be exactly
> > > that, _STABLE_. If you really think O_DIRECT on NFS is soo important
> > > please add a ->direct_IO2 for NFS like the reiserfs read_inode2 hack.
I wonder why -aa and -wolk don't have these problems with O_DIRECT vs. XFS.

ciao, Marc

2003-07-09 17:40:46

by Marc-Christian Petersen

[permalink] [raw]
Subject: Re: ->direct_IO API change in current 2.4 BK

On Wednesday 09 July 2003 19:46, Marcelo Tosatti wrote:

Hi Marcelo,

> > > > > I just got a nice XFS oops due to the direct_IO API change in
> > > > > 2.4. Guys, this is a STABLE series and APIs are supposed to be
> > > > > exactly that, _STABLE_. If you really think O_DIRECT on NFS is soo
> > > > > important please add a ->direct_IO2 for NFS like the reiserfs
> > > > > read_inode2 hack.
> >
> > I wonder why -aa and -wolk don't have these problems with O_DIRECT vs.
> > XFS.
> Do they have the NFS DIRECT IO patch?
Yes. If not, my sentence would be superflous ;)

ciao, Marc

2003-07-09 18:00:25

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: ->direct_IO API change in current 2.4 BK



On Wed, 9 Jul 2003, Marc-Christian Petersen wrote:

> On Wednesday 09 July 2003 19:46, Marcelo Tosatti wrote:
>
> Hi Marcelo,
>
> > > > > > I just got a nice XFS oops due to the direct_IO API change in
> > > > > > 2.4. Guys, this is a STABLE series and APIs are supposed to be
> > > > > > exactly that, _STABLE_. If you really think O_DIRECT on NFS is soo
> > > > > > important please add a ->direct_IO2 for NFS like the reiserfs
> > > > > > read_inode2 hack.
> > >
> > > I wonder why -aa and -wolk don't have these problems with O_DIRECT vs.
> > > XFS.
> > Do they have the NFS DIRECT IO patch?
> Yes. If not, my sentence would be superflous ;)

Ok, right. Well, I dont know why it doesnt happen there. Maybe not enough
testing?

Anyway, I'm going to revert the NFS DIRECT IO patch because, as Christoph
mentioned, breaks the API.

I except another solution from Trond (maybe ->direct_IO2).

2003-07-09 17:55:55

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: ->direct_IO API change in current 2.4 BK



On Wed, 9 Jul 2003, Marc-Christian Petersen wrote:

> On Wednesday 09 July 2003 19:24, Marcelo Tosatti wrote:
>
> Hi,
>
> > > > I just got a nice XFS oops due to the direct_IO API change in
> > > > 2.4. Guys, this is a STABLE series and APIs are supposed to be exactly
> > > > that, _STABLE_. If you really think O_DIRECT on NFS is soo important
> > > > please add a ->direct_IO2 for NFS like the reiserfs read_inode2 hack.
> I wonder why -aa and -wolk don't have these problems with O_DIRECT vs. XFS.

Do they have the NFS DIRECT IO patch?

2003-07-09 18:08:55

by Marc-Christian Petersen

[permalink] [raw]
Subject: Re: ->direct_IO API change in current 2.4 BK

On Wednesday 09 July 2003 20:08, Marcelo Tosatti wrote:

Hi Marcelo,

> Ok, right. Well, I dont know why it doesnt happen there. Maybe not enough
> testing?
aehm, I'll bet -aa is tested _alot_ and also -wolk has tons of testers, but
well, this is a different talk ;)

> Anyway, I'm going to revert the NFS DIRECT IO patch because, as Christoph
> mentioned, breaks the API.
> I except another solution from Trond (maybe ->direct_IO2).
I wonder why you merge such stuff then in the first place, if you agree with
hch's opinion, that an API should not change in stable kernel series in the
2nd place? ... Did you temporarly forget it? ;)

ciao, Marc

2003-07-09 18:17:34

by Trond Myklebust

[permalink] [raw]
Subject: Re: ->direct_IO API change in current 2.4 BK

>>>>> " " == Marcelo Tosatti <[email protected]> writes:

> Fine, I agree. Trond, I'll have to revert your direct IO
> patches because they break the _stable_ API, indeed.

> Please come up with another solution for the problem
> (->direct_IO2 ? its ugly, but...).

Like this? Blech...

How about instead following Alan's suggestion to replace
KERNEL_HAS_O_DIRECT with a KERNEL_HAS_O_DIRECT2 that can be used to
switch between the old and new O_DIRECT format in the XFS patches?

Cheers,
Trond

diff -u --recursive --new-file linux-2.4.22-nodirect/Documentation/Configure.help linux-2.4.22-blech/Documentation/Configure.help
--- linux-2.4.22-nodirect/Documentation/Configure.help 2003-07-09 19:39:39.000000000 +0200
+++ linux-2.4.22-blech/Documentation/Configure.help 2003-07-09 20:07:36.000000000 +0200
@@ -15938,6 +15938,30 @@

If unsure, say N.

+Allow direct I/O on files in NFS
+CONFIG_NFS_DIRECTIO
+ There are important applications whose performance or correctness
+ depends on uncached access to file data. Database clusters (multiple
+ copies of the same instance running on separate hosts) implement their
+ own cache coherency protocol that subsumes the NFS cache protocols.
+ Applications that process datasets considerably larger than the client's
+ memory do not always benefit from a local cache. A streaming video
+ server, for instance, has no need to cache the contents of a file.
+
+ This option enables applications to perform direct I/O on files in NFS
+ file systems using the O_DIRECT open() flag. When O_DIRECT is set for
+ files, their data is not cached in the system's page cache. Direct
+ read and write operations are aligned to block boundaries. Data is
+ moved to and from user-level application buffers directly.
+
+ Unless your program is designed to use O_DIRECT properly, you are much
+ better off allowing the NFS client to manage caching for you. Misusing
+ O_DIRECT can cause poor server performance or network storms. This
+ kernel build option defaults OFF to avoid exposing system administrators
+ unwittingly to a potentially hazardous feature.
+
+ If unsure, say N.
+
Root file system on NFS
CONFIG_ROOT_NFS
If you want your Linux box to mount its whole root file system (the
diff -u --recursive --new-file linux-2.4.22-nodirect/fs/Config.in linux-2.4.22-blech/fs/Config.in
--- linux-2.4.22-nodirect/fs/Config.in 2003-07-09 19:39:39.000000000 +0200
+++ linux-2.4.22-blech/fs/Config.in 2003-07-09 20:08:20.000000000 +0200
@@ -106,6 +106,7 @@
dep_tristate 'InterMezzo file system support (replicating fs) (EXPERIMENTAL)' CONFIG_INTERMEZZO_FS $CONFIG_INET $CONFIG_EXPERIMENTAL
dep_tristate 'NFS file system support' CONFIG_NFS_FS $CONFIG_INET
dep_mbool ' Provide NFSv3 client support' CONFIG_NFS_V3 $CONFIG_NFS_FS
+ dep_mbool ' Allow direct I/O on NFS files (EXPERIMENTAL)' CONFIG_NFS_DIRECTIO $CONFIG_NFS_FS $CONFIG_EXPERIMENTAL
dep_bool ' Root file system on NFS' CONFIG_ROOT_NFS $CONFIG_NFS_FS $CONFIG_IP_PNP

dep_tristate 'NFS server support' CONFIG_NFSD $CONFIG_INET
diff -u --recursive --new-file linux-2.4.22-nodirect/fs/nfs/direct.c linux-2.4.22-blech/fs/nfs/direct.c
--- linux-2.4.22-nodirect/fs/nfs/direct.c 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.4.22-blech/fs/nfs/direct.c 2003-07-08 11:52:37.000000000 +0200
@@ -0,0 +1,382 @@
+/*
+ * linux/fs/nfs/direct.c
+ *
+ * High-performance direct I/O for the NFS client
+ *
+ * When an application requests uncached I/O, all read and write requests
+ * are made directly to the server; data stored or fetched via these
+ * requests is not cached in the Linux page cache. The client does not
+ * correct unaligned requests from applications. All requested bytes are
+ * held on permanent storage before a direct write system call returns to
+ * an application. Applications that manage their own data caching, such
+ * as databases, make very good use of direct I/O on local file systems.
+ *
+ * Solaris implements an uncached I/O facility called directio() that
+ * is used for backups and sequential I/O to very large files. Solaris
+ * also supports uncaching whole NFS partitions with "-o forcedirectio,"
+ * an undocumented mount option.
+ *
+ * Note that I/O to read in executables (e.g. kernel_read) cannot use
+ * direct (kiobuf) reads because there is no vma backing the passed-in
+ * data buffer.
+ *
+ * Designed by Jeff Kimmel, Chuck Lever, and Trond Myklebust.
+ *
+ * Initial implementation: 12/2001 by Chuck Lever <[email protected]>
+ *
+ * TODO:
+ *
+ * 1. Use concurrent asynchronous network requests rather than
+ * serialized synchronous network requests for normal (non-sync)
+ * direct I/O.
+ */
+
+#include <linux/config.h>
+#include <linux/sched.h>
+#include <linux/kernel.h>
+#include <linux/file.h>
+#include <linux/errno.h>
+#include <linux/nfs_fs.h>
+#include <linux/smp_lock.h>
+#include <linux/sunrpc/clnt.h>
+#include <linux/iobuf.h>
+
+#include <asm/system.h>
+#include <asm/uaccess.h>
+
+#define NFSDBG_FACILITY (NFSDBG_PAGECACHE | NFSDBG_VFS)
+#define VERF_SIZE (2 * sizeof(__u32))
+
+static inline int
+nfs_direct_read_rpc(struct file *file, struct nfs_readargs *arg)
+{
+ int result;
+ struct inode * inode = file->f_dentry->d_inode;
+ struct nfs_fattr fattr;
+ struct rpc_message msg;
+ struct nfs_readres res = { &fattr, arg->count, 0 };
+
+#ifdef CONFIG_NFS_V3
+ msg.rpc_proc = (NFS_PROTO(inode)->version == 3) ?
+ NFS3PROC_READ : NFSPROC_READ;
+#else
+ msg.rpc_proc = NFSPROC_READ;
+#endif
+ msg.rpc_argp = arg;
+ msg.rpc_resp = &res;
+
+ lock_kernel();
+ msg.rpc_cred = nfs_file_cred(file);
+ fattr.valid = 0;
+ result = rpc_call_sync(NFS_CLIENT(inode), &msg, 0);
+ nfs_refresh_inode(inode, &fattr);
+ unlock_kernel();
+
+ return result;
+}
+
+static inline int
+nfs_direct_write_rpc(struct file *file, struct nfs_writeargs *arg,
+ struct nfs_writeverf *verf)
+{
+ int result;
+ struct inode *inode = file->f_dentry->d_inode;
+ struct nfs_fattr fattr;
+ struct rpc_message msg;
+ struct nfs_writeres res = { &fattr, verf, 0 };
+
+#ifdef CONFIG_NFS_V3
+ msg.rpc_proc = (NFS_PROTO(inode)->version == 3) ?
+ NFS3PROC_WRITE : NFSPROC_WRITE;
+#else
+ msg.rpc_proc = NFSPROC_WRITE;
+#endif
+ msg.rpc_argp = arg;
+ msg.rpc_resp = &res;
+
+ lock_kernel();
+ msg.rpc_cred = get_rpccred(nfs_file_cred(file));
+ fattr.valid = 0;
+ result = rpc_call_sync(NFS_CLIENT(inode), &msg, 0);
+ nfs_write_attributes(inode, &fattr);
+ put_rpccred(msg.rpc_cred);
+ unlock_kernel();
+
+#ifdef CONFIG_NFS_V3
+ if (NFS_PROTO(inode)->version == 3) {
+ if (result > 0) {
+ if ((arg->stable == NFS_FILE_SYNC) &&
+ (verf->committed != NFS_FILE_SYNC)) {
+ printk(KERN_ERR
+ "%s: server didn't sync stable write request\n",
+ __FUNCTION__);
+ return -EIO;
+ }
+
+ if (result != arg->count) {
+ printk(KERN_INFO
+ "%s: short write, count=%u, result=%d\n",
+ __FUNCTION__, arg->count, result);
+ }
+ }
+ return result;
+ } else {
+#endif
+ verf->committed = NFS_FILE_SYNC; /* NFSv2 always syncs data */
+ if (result == 0)
+ return arg->count;
+ return result;
+#ifdef CONFIG_NFS_V3
+ }
+#endif
+}
+
+#ifdef CONFIG_NFS_V3
+static inline int
+nfs_direct_commit_rpc(struct inode *inode, loff_t offset, size_t count,
+ struct nfs_writeverf *verf)
+{
+ int result;
+ struct nfs_fattr fattr;
+ struct nfs_writeargs arg = { NFS_FH(inode), offset, count, 0, 0,
+ NULL };
+ struct nfs_writeres res = { &fattr, verf, 0 };
+ struct rpc_message msg = { NFS3PROC_COMMIT, &arg, &res, NULL };
+
+ fattr.valid = 0;
+
+ lock_kernel();
+ result = rpc_call_sync(NFS_CLIENT(inode), &msg, 0);
+ nfs_write_attributes(inode, &fattr);
+ unlock_kernel();
+
+ return result;
+}
+#else
+static inline int
+nfs_direct_commit_rpc(struct inode *inode, loff_t offset, size_t count,
+ struct nfs_writeverf *verf)
+{
+ return 0;
+}
+#endif
+
+/*
+ * Walk through the iobuf and create an iovec for each "rsize" bytes.
+ */
+static int
+nfs_direct_read(struct file *file, struct kiobuf *iobuf, loff_t offset,
+ size_t count)
+{
+ int curpage, total;
+ int result = 0;
+ struct inode *inode = file->f_dentry->d_inode;
+ int rsize = NFS_SERVER(inode)->rsize;
+ struct page *pages[NFS_READ_MAXIOV];
+ struct nfs_readargs args = { NFS_FH(inode), offset, 0, iobuf->offset,
+ pages };
+
+ total = 0;
+ curpage = 0;
+ while (count) {
+ int len, request;
+ struct page **dest = pages;
+
+ request = count;
+ if (count > rsize)
+ request = rsize;
+ args.count = request;
+ args.offset = offset;
+ args.pgbase = (iobuf->offset + total) & ~PAGE_MASK;
+ len = PAGE_SIZE - args.pgbase;
+
+ do {
+ struct page *page = iobuf->maplist[curpage];
+
+ if (curpage >= iobuf->nr_pages || !page) {
+ result = -EFAULT;
+ goto out_err;
+ }
+
+ *dest++ = page;
+ /* zero after the first iov */
+ if (request < len)
+ break;
+ request -= len;
+ len = PAGE_SIZE;
+ curpage++;
+ } while (request != 0);
+
+ result = nfs_direct_read_rpc(file, &args);
+
+ if (result < 0)
+ break;
+
+ total += result;
+ if (result < args.count) /* NFSv2ism */
+ break;
+ count -= result;
+ offset += result;
+ };
+out_err:
+ if (!total)
+ return result;
+ return total;
+}
+
+/*
+ * Walk through the iobuf and create an iovec for each "wsize" bytes.
+ * If only one network write is necessary, or if the O_SYNC flag or
+ * 'sync' mount option are present, or if this is a V2 inode, use
+ * FILE_SYNC. Otherwise, use UNSTABLE and finish with a COMMIT.
+ *
+ * The mechanics of this function are much the same as nfs_direct_read,
+ * with the added complexity of committing unstable writes.
+ */
+static int
+nfs_direct_write(struct file *file, struct kiobuf *iobuf,
+ loff_t offset, size_t count)
+{
+ int curpage, total;
+ int need_commit = 0;
+ int result = 0;
+ loff_t save_offset = offset;
+ struct inode *inode = file->f_dentry->d_inode;
+ int wsize = NFS_SERVER(inode)->wsize;
+ struct nfs_writeverf first_verf, ret_verf;
+ struct page *pages[NFS_WRITE_MAXIOV];
+ struct nfs_writeargs args = { NFS_FH(inode), 0, 0, NFS_FILE_SYNC, 0,
+ pages };
+
+#ifdef CONFIG_NFS_V3
+ if ((NFS_PROTO(inode)->version == 3) && (count > wsize) &&
+ (!IS_SYNC(inode)))
+ args.stable = NFS_UNSTABLE;
+#endif
+
+retry:
+ total = 0;
+ curpage = 0;
+ while (count) {
+ int len, request;
+ struct page **dest = pages;
+
+ request = count;
+ if (count > wsize)
+ request = wsize;
+ args.count = request;
+ args.offset = offset;
+ args.pgbase = (iobuf->offset + total) & ~PAGE_MASK;
+ len = PAGE_SIZE - args.pgbase;
+
+ do {
+ struct page *page = iobuf->maplist[curpage];
+
+ if (curpage >= iobuf->nr_pages || !page) {
+ result = -EFAULT;
+ goto out_err;
+ }
+
+ *dest++ = page;
+ /* zero after the first iov */
+ if (request < len)
+ break;
+ request -= len;
+ len = PAGE_SIZE;
+ curpage++;
+ } while (request != 0);
+
+ result = nfs_direct_write_rpc(file, &args, &ret_verf);
+
+ if (result < 0)
+ break;
+
+ if (!total)
+ memcpy(&first_verf.verifier, &ret_verf.verifier,
+ VERF_SIZE);
+ if (ret_verf.committed != NFS_FILE_SYNC) {
+ need_commit = 1;
+ if (memcmp(&first_verf.verifier, &ret_verf.verifier,
+ VERF_SIZE))
+ goto print_retry;
+ }
+
+ total += result;
+ count -= result;
+ offset += result;
+ };
+
+out_err:
+ /*
+ * Commit data written so far, even in the event of an error
+ */
+ if (need_commit) {
+ if (nfs_direct_commit_rpc(inode, save_offset,
+ iobuf->length - count, &ret_verf))
+ goto print_retry;
+ if (memcmp(&first_verf.verifier, &ret_verf.verifier,
+ VERF_SIZE))
+ goto print_retry;
+ }
+
+ if (!total)
+ return result;
+ return total;
+
+print_retry:
+ printk(KERN_INFO "%s: detected server restart; retrying with FILE_SYNC\n",
+ __FUNCTION__);
+ args.stable = NFS_FILE_SYNC;
+ offset = save_offset;
+ count = iobuf->length;
+ goto retry;
+}
+
+/*
+ * Read or write data, moving the data directly to/from the
+ * application's buffer without caching in the page cache.
+ *
+ * Rules for direct I/O
+ *
+ * 1. block size = 512 bytes or more
+ * 2. file byte offset is block aligned
+ * 3. byte count is a multiple of block size
+ * 4. user buffer is not aligned
+ * 5. user buffer is faulted in and pinned
+ *
+ * These are verified before we get here.
+ */
+int
+nfs_direct_IO(int rw, struct file *file, struct kiobuf *iobuf,
+ unsigned long blocknr, int blocksize)
+{
+ int result = -EINVAL;
+ size_t count = iobuf->length;
+ struct dentry *dentry = file->f_dentry;
+ struct inode *inode = dentry->d_inode;
+ loff_t offset = blocknr << inode->i_blkbits;
+
+ switch (rw) {
+ case READ:
+ dfprintk(VFS,
+ "NFS: direct_IO(READ) (%s/%s) off/cnt(%Lu/%d)\n",
+ dentry->d_parent->d_name.name,
+ dentry->d_name.name, offset, count);
+
+ result = nfs_direct_read(file, iobuf, offset, count);
+ break;
+ case WRITE:
+ dfprintk(VFS,
+ "NFS: direct_IO(WRITE) (%s/%s) off/cnt(%Lu/%d)\n",
+ dentry->d_parent->d_name.name,
+ dentry->d_name.name, offset, count);
+
+ result = nfs_direct_write(file, iobuf, offset, count);
+ break;
+ default:
+ break;
+ }
+
+ dfprintk(VFS, "NFS: direct_IO result = %d\n", result);
+ return result;
+}
diff -u --recursive --new-file linux-2.4.22-nodirect/fs/nfs/file.c linux-2.4.22-blech/fs/nfs/file.c
--- linux-2.4.22-nodirect/fs/nfs/file.c 2003-07-09 19:39:39.000000000 +0200
+++ linux-2.4.22-blech/fs/nfs/file.c 2003-07-09 20:10:21.000000000 +0200
@@ -16,6 +16,7 @@
* nfs regular file handling functions
*/

+#include <linux/config.h>
#include <linux/sched.h>
#include <linux/kernel.h>
#include <linux/errno.h>
@@ -199,6 +200,9 @@
readpage: nfs_readpage,
sync_page: nfs_sync_page,
writepage: nfs_writepage,
+#ifdef CONFIG_NFS_DIRECTIO
+ direct_fileIO: nfs_direct_IO,
+#endif
prepare_write: nfs_prepare_write,
commit_write: nfs_commit_write
};
diff -u --recursive --new-file linux-2.4.22-nodirect/fs/nfs/Makefile linux-2.4.22-blech/fs/nfs/Makefile
--- linux-2.4.22-nodirect/fs/nfs/Makefile 2003-07-09 19:39:39.000000000 +0200
+++ linux-2.4.22-blech/fs/nfs/Makefile 2003-07-09 20:08:47.000000000 +0200
@@ -14,6 +14,7 @@

obj-$(CONFIG_ROOT_NFS) += nfsroot.o mount_clnt.o
obj-$(CONFIG_NFS_V3) += nfs3proc.o nfs3xdr.o
+obj-$(CONFIG_NFS_DIRECTIO) += direct.o

obj-m := $(O_TARGET)

diff -u --recursive --new-file linux-2.4.22-nodirect/fs/nfs/write.c linux-2.4.22-blech/fs/nfs/write.c
--- linux-2.4.22-nodirect/fs/nfs/write.c 2003-07-09 19:39:39.000000000 +0200
+++ linux-2.4.22-blech/fs/nfs/write.c 2003-07-09 20:11:04.000000000 +0200
@@ -123,23 +123,6 @@
}

/*
- * This function will be used to simulate weak cache consistency
- * under NFSv2 when the NFSv3 attribute patch is included.
- * For the moment, we just call nfs_refresh_inode().
- */
-static __inline__ int
-nfs_write_attributes(struct inode *inode, struct nfs_fattr *fattr)
-{
- if ((fattr->valid & NFS_ATTR_FATTR) && !(fattr->valid & NFS_ATTR_WCC)) {
- fattr->pre_size = NFS_CACHE_ISIZE(inode);
- fattr->pre_mtime = NFS_CACHE_MTIME(inode);
- fattr->pre_ctime = NFS_CACHE_CTIME(inode);
- fattr->valid |= NFS_ATTR_WCC;
- }
- return nfs_refresh_inode(inode, fattr);
-}
-
-/*
* Write a page synchronously.
* Offset is the data offset within the page.
*/
diff -u --recursive --new-file linux-2.4.22-nodirect/include/linux/fs.h linux-2.4.22-blech/include/linux/fs.h
--- linux-2.4.22-nodirect/include/linux/fs.h 2003-07-09 19:39:39.000000000 +0200
+++ linux-2.4.22-blech/include/linux/fs.h 2003-07-09 19:42:37.000000000 +0200
@@ -396,6 +396,8 @@
int (*releasepage) (struct page *, int);
#define KERNEL_HAS_O_DIRECT /* this is for modules out of the kernel */
int (*direct_IO)(int, struct inode *, struct kiobuf *, unsigned long, int);
+#define KERNEL_HAS_O_DIRECT2 /* Unfortunate kludge due to lack of foresight */
+ int (*direct_fileIO)(int, struct file *, struct kiobuf *, unsigned long, int);
void (*removepage)(struct page *); /* called when page gets removed from the inode */
};

diff -u --recursive --new-file linux-2.4.22-nodirect/include/linux/nfs_fs.h linux-2.4.22-blech/include/linux/nfs_fs.h
--- linux-2.4.22-nodirect/include/linux/nfs_fs.h 2003-07-09 19:39:39.000000000 +0200
+++ linux-2.4.22-blech/include/linux/nfs_fs.h 2003-07-09 20:12:27.000000000 +0200
@@ -274,6 +274,11 @@
#define NFS_TestClearPageSync(page) test_and_clear_bit(PG_fs_1, &(page)->flags)

/*
+ * linux/fs/nfs/direct.c
+ */
+extern int nfs_direct_IO(int, struct file *, struct kiobuf *, unsigned long, int);
+
+/*
* linux/fs/mount_clnt.c
* (Used only by nfsroot module)
*/
@@ -302,6 +307,23 @@
return __nfs_refresh_inode(inode,fattr);
}

+/*
+ * This function will be used to simulate weak cache consistency
+ * under NFSv2 when the NFSv3 attribute patch is included.
+ * For the moment, we just call nfs_refresh_inode().
+ */
+static __inline__ int
+nfs_write_attributes(struct inode *inode, struct nfs_fattr *fattr)
+{
+ if ((fattr->valid & NFS_ATTR_FATTR) && !(fattr->valid & NFS_ATTR_WCC)) {
+ fattr->pre_size = NFS_CACHE_ISIZE(inode);
+ fattr->pre_mtime = NFS_CACHE_MTIME(inode);
+ fattr->pre_ctime = NFS_CACHE_CTIME(inode);
+ fattr->valid |= NFS_ATTR_WCC;
+ }
+ return nfs_refresh_inode(inode, fattr);
+}
+
static inline loff_t
nfs_size_to_loff_t(__u64 size)
{
diff -u --recursive --new-file linux-2.4.22-nodirect/include/linux/nfs_xdr.h linux-2.4.22-blech/include/linux/nfs_xdr.h
--- linux-2.4.22-nodirect/include/linux/nfs_xdr.h 2003-07-09 19:39:39.000000000 +0200
+++ linux-2.4.22-blech/include/linux/nfs_xdr.h 2003-07-09 20:13:22.000000000 +0200
@@ -59,7 +59,7 @@
/* Arguments to the read call.
* Note that NFS_READ_MAXIOV must be <= (MAX_IOVEC-2) from sunrpc/xprt.h
*/
-#define NFS_READ_MAXIOV 8
+#define NFS_READ_MAXIOV (9)

struct nfs_readargs {
struct nfs_fh * fh;
@@ -78,7 +78,7 @@
/* Arguments to the write call.
* Note that NFS_WRITE_MAXIOV must be <= (MAX_IOVEC-2) from sunrpc/xprt.h
*/
-#define NFS_WRITE_MAXIOV 8
+#define NFS_WRITE_MAXIOV (9)
struct nfs_writeargs {
struct nfs_fh * fh;
__u64 offset;
diff -u --recursive --new-file linux-2.4.22-nodirect/mm/filemap.c linux-2.4.22-blech/mm/filemap.c
--- linux-2.4.22-nodirect/mm/filemap.c 2003-07-09 19:39:39.000000000 +0200
+++ linux-2.4.22-blech/mm/filemap.c 2003-07-09 19:57:09.000000000 +0200
@@ -1651,6 +1651,95 @@
return retval;
}

+/* Differs from the above in that it uses the a_op->direct_fileIO() call */
+static ssize_t direct_fileIO(int rw, struct file * filp, char * buf, size_t count, loff_t offset)
+{
+ ssize_t retval;
+ int new_iobuf, chunk_size, blocksize_mask, blocksize, blocksize_bits, iosize, progress;
+ struct kiobuf * iobuf;
+ struct address_space * mapping = filp->f_dentry->d_inode->i_mapping;
+ struct inode * inode = mapping->host;
+ loff_t size = inode->i_size;
+
+ new_iobuf = 0;
+ iobuf = filp->f_iobuf;
+ if (test_and_set_bit(0, &filp->f_iobuf_lock)) {
+ /*
+ * A parallel read/write is using the preallocated iobuf
+ * so just run slow and allocate a new one.
+ */
+ retval = alloc_kiovec(1, &iobuf);
+ if (retval)
+ goto out;
+ new_iobuf = 1;
+ }
+
+ blocksize = 1 << inode->i_blkbits;
+ blocksize_bits = inode->i_blkbits;
+ blocksize_mask = blocksize - 1;
+ chunk_size = KIO_MAX_ATOMIC_IO << 10;
+
+ retval = -EINVAL;
+ if ((offset & blocksize_mask) || (count & blocksize_mask) || ((unsigned long) buf & blocksize_mask))
+ goto out_free;
+ if (!mapping->a_ops->direct_fileIO)
+ goto out_free;
+
+ if ((rw == READ) && (offset + count > size))
+ count = size - offset;
+
+ /*
+ * Flush to disk exclusively the _data_, metadata must remain
+ * completly asynchronous or performance will go to /dev/null.
+ */
+ retval = filemap_fdatasync(mapping);
+ if (retval == 0)
+ retval = fsync_inode_data_buffers(inode);
+ if (retval == 0)
+ retval = filemap_fdatawait(mapping);
+ if (retval < 0)
+ goto out_free;
+
+ progress = retval = 0;
+ while (count > 0) {
+ iosize = count;
+ if (iosize > chunk_size)
+ iosize = chunk_size;
+
+ retval = map_user_kiobuf(rw, iobuf, (unsigned long) buf, iosize);
+ if (retval)
+ break;
+
+ retval = mapping->a_ops->direct_fileIO(rw, filp, iobuf, (offset+progress) >> blocksize_bits, blocksize);
+
+ if (rw == READ && retval > 0)
+ mark_dirty_kiobuf(iobuf, retval);
+
+ if (retval >= 0) {
+ count -= retval;
+ buf += retval;
+ /* warning: weird semantics here, we're reporting a read behind the end of the file */
+ progress += retval;
+ }
+
+ unmap_kiobuf(iobuf);
+
+ if (retval != iosize)
+ break;
+ }
+
+ if (progress)
+ retval = progress;
+
+ out_free:
+ if (!new_iobuf)
+ clear_bit(0, &filp->f_iobuf_lock);
+ else
+ free_kiovec(1, &iobuf);
+ out:
+ return retval;
+}
+
int file_read_actor(read_descriptor_t * desc, struct page *page, unsigned long offset, unsigned long size)
{
char *kaddr;
@@ -1721,7 +1810,10 @@
down(&inode->i_sem);
size = inode->i_size;
if (pos < size) {
- retval = generic_file_direct_IO(READ, filp, buf, count, pos);
+ if (mapping->a_ops->direct_fileIO)
+ retval = direct_fileIO(READ, filp, buf, count, pos);
+ else
+ retval = generic_file_direct_IO(READ, filp, buf, count, pos);
if (retval > 0)
*ppos = pos + retval;
}
@@ -3230,7 +3322,10 @@
inode->i_ctime = inode->i_mtime = CURRENT_TIME;
mark_inode_dirty_sync(inode);

- written = generic_file_direct_IO(WRITE, file, (char *) buf, count, pos);
+ if (mapping->a_ops->direct_fileIO)
+ written = direct_fileIO(WRITE, file, (char *) buf, count, pos);
+ else
+ written = generic_file_direct_IO(WRITE, file, (char *) buf, count, pos);
if (written > 0) {
loff_t end = pos + written;
if (end > inode->i_size && !S_ISBLK(inode->i_mode)) {

2003-07-09 18:24:30

by Trond Myklebust

[permalink] [raw]
Subject: Re: ->direct_IO API change in current 2.4 BK

>>>>> " " == Marcelo Tosatti <[email protected]> writes:

> Ok, right. Well, I dont know why it doesnt happen there. Maybe
> not enough testing?

Probably nobody is applying the XFS patches on those kernel?

Cheers,
Trond

2003-07-09 18:27:30

by Marc-Christian Petersen

[permalink] [raw]
Subject: Re: ->direct_IO API change in current 2.4 BK

On Wednesday 09 July 2003 20:33, Trond Myklebust wrote:

Hi Trond,

> >>>>> " " == Marcelo Tosatti <[email protected]> writes:
> > Ok, right. Well, I dont know why it doesnt happen there. Maybe
> > not enough testing?
> Probably nobody is applying the XFS patches on those kernel?
err, -aa has XFS per default, -wolk has XFS per default. So ... ;)

ciao, Marc

2003-07-09 18:41:50

by Marc-Christian Petersen

[permalink] [raw]
Subject: Re: ->direct_IO API change in current 2.4 BK

On Wednesday 09 July 2003 20:50, Trond Myklebust wrote:

Hi Trond,

> So they have both XFS + NFS O_DIRECT?
yes.

> The answer to your question is then that somebody made the trivial
> conversion on XFS... It's just a question of replacing the second
> argument of the direct_IO() method with a filp, then extracting the
> inode from that. A 2-liner patch at most...
EXACT! That was my intention with my small first post ;)

> The point here is that Marcelo's tree does not include XFS, so my
> patch can't fix it up...
> As I said, I suggest replacing KERNEL_HAS_O_DIRECT with
> KERNEL_HAS_O_DIRECT2 so that the XFS patches can switch on that, and
> hence provide the 2-liner on newer kernels...
It's very okay with me.

ciao, Marc

2003-07-09 18:37:32

by Trond Myklebust

[permalink] [raw]
Subject: Re: ->direct_IO API change in current 2.4 BK

>>>>> " " == Marc-Christian Petersen <[email protected]> writes:

> err, -aa has XFS per default, -wolk has XFS per default. So
> ... ;)

So they have both XFS + NFS O_DIRECT?

The answer to your question is then that somebody made the trivial
conversion on XFS... It's just a question of replacing the second
argument of the direct_IO() method with a filp, then extracting the
inode from that. A 2-liner patch at most...

The point here is that Marcelo's tree does not include XFS, so my
patch can't fix it up...
As I said, I suggest replacing KERNEL_HAS_O_DIRECT with
KERNEL_HAS_O_DIRECT2 so that the XFS patches can switch on that, and
hence provide the 2-liner on newer kernels...

Cheers,
Trond

2003-07-09 18:47:18

by Andreas Dilger

[permalink] [raw]
Subject: Re: ->direct_IO API change in current 2.4 BK

On Jul 09, 2003 20:29 +0200, Trond Myklebust wrote:
> How about instead following Alan's suggestion to replace
> KERNEL_HAS_O_DIRECT with a KERNEL_HAS_O_DIRECT2 that can be used to
> switch between the old and new O_DIRECT format in the XFS patches?

Actually, I like that a lot more, as it allows out-of-tree code to
know which interface to use, and we don't have to key on kernel version
(which is bogus if compiling against a vendor kernel that back-ports
this fix).

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

2003-07-09 18:50:56

by Jeff Garzik

[permalink] [raw]
Subject: Re: ->direct_IO API change in current 2.4 BK

On Wed, Jul 09, 2003 at 08:50:59PM +0200, Trond Myklebust wrote:
> >>>>> " " == Marc-Christian Petersen <[email protected]> writes:
>
> > err, -aa has XFS per default, -wolk has XFS per default. So
> > ... ;)
>
> So they have both XFS + NFS O_DIRECT?
>
> The answer to your question is then that somebody made the trivial
> conversion on XFS... It's just a question of replacing the second
> argument of the direct_IO() method with a filp, then extracting the
> inode from that. A 2-liner patch at most...
>
> The point here is that Marcelo's tree does not include XFS, so my
> patch can't fix it up...
> As I said, I suggest replacing KERNEL_HAS_O_DIRECT with
> KERNEL_HAS_O_DIRECT2 so that the XFS patches can switch on that, and
> hence provide the 2-liner on newer kernels...

s/replacing/adding/

A new ->direct_IO2 hook would be an addition, so you really want to
simply add another feature flag.

Since the 2.5 direct_IO API is already different from current 2.4, I
would also suggestion considering KERNEL24_HAS_O_DIRECT2 as the name, to
specify the feature is specific to 2.4.

Jeff



2003-07-09 18:54:31

by Trond Myklebust

[permalink] [raw]
Subject: Re: ->direct_IO API change in current 2.4 BK

>>>>> " " == Jeff Garzik <[email protected]> writes:

> s/replacing/adding/

Revert ;-)

> A new ->direct_IO2 hook would be an addition, so you really
> want to simply add another feature flag.

You missed the point. This is *instead* of ->direct_IO2. It's only
purpose would be to distinguish between the old

int (*direct_IO)(int, struct inode *, struct kiobuf *, unsigned long, int);


and the new

int (*direct_IO)(int, struct file *, struct kiobuf *, unsigned long, int);

Cheers,
Trond

2003-07-09 19:03:08

by Jeff Garzik

[permalink] [raw]
Subject: Re: ->direct_IO API change in current 2.4 BK

On Wed, Jul 09, 2003 at 09:08:53PM +0200, Trond Myklebust wrote:
> >>>>> " " == Jeff Garzik <[email protected]> writes:
>
> > s/replacing/adding/
>
> Revert ;-)
>
> > A new ->direct_IO2 hook would be an addition, so you really
> > want to simply add another feature flag.
>
> You missed the point. This is *instead* of ->direct_IO2. It's only
> purpose would be to distinguish between the old
>
> int (*direct_IO)(int, struct inode *, struct kiobuf *, unsigned long, int);
>
>
> and the new
>
> int (*direct_IO)(int, struct file *, struct kiobuf *, unsigned long, int);

I respectfully disagree, then.

The 2.5 direct_IO API is already different from 2.4, so, changing
the 2.4 stable API implies creating yet another different version of
the API.

When this situation presented itself earlier, with reiserfs, the
solution was ->read_inode2. I think that same situation applies here.

Having the stable API change, conditional on a define, is really
nasty and IMO will create maintenance and support headaches down
the line. I do not recall Linux VFS _ever_ having a hook's definition
conditional. We should not start now...

We need a 2.4-specific solution for this. ->direct_IO2 should suffice,
and it follows historical example.

XFS, ocfs, and others need only to test the HAVE_NEW_24_DIRECT_IO
define.

In the core kernel implementation, it is trivial to say "if direct_IO2
is non-NULL, use that, otherwise use direct_IO", without needing to make
any code conditional at all.

Jeff



2003-07-09 19:04:12

by Jeff Garzik

[permalink] [raw]
Subject: Re: ->direct_IO API change in current 2.4 BK

On Wed, Jul 09, 2003 at 11:51:48AM -0700, Andreas Dilger wrote:
> On Jul 09, 2003 20:29 +0200, Trond Myklebust wrote:
> > How about instead following Alan's suggestion to replace
> > KERNEL_HAS_O_DIRECT with a KERNEL_HAS_O_DIRECT2 that can be used to
> > switch between the old and new O_DIRECT format in the XFS patches?
>
> Actually, I like that a lot more, as it allows out-of-tree code to
> know which interface to use, and we don't have to key on kernel version
> (which is bogus if compiling against a vendor kernel that back-ports
> this fix).

Feature test macros are definitely quite useful...

...but making the stable series VFS API definition _conditional_
is quite unprecedented, and IMO wrong.

Jeff



2003-07-09 19:31:17

by Marc-Christian Petersen

[permalink] [raw]
Subject: Re: ->direct_IO API change in current 2.4 BK

On Wednesday 09 July 2003 21:13, Marcelo Tosatti wrote:

Hi Marcelo,

> No, I did not.
> I applied it because, in my ignorance, I did not noticed it would break
> the stable API.
> I applied it because I wanted comments useful from people (Like hch and
> others did).
> Thank you very much for your flaming.
Somewhere deep in my head I knew you'd say that :-( .. When I flame, it looks
very different.

It was not any kind of flaming or any kind of bad meaning or such, just a
simple question because I was just curious!

ciao, Marc

2003-07-09 19:37:36

by Trond Myklebust

[permalink] [raw]
Subject: Re: ->direct_IO API change in current 2.4 BK

>>>>> " " == Jeff Garzik <[email protected]> writes:

> Having the stable API change, conditional on a define, is
> really nasty and IMO will create maintenance and support
> headaches down the line. I do not recall Linux VFS _ever_
> having a hook's definition conditional. We should not start
> now...

direct_IO() was precisely such a conditional hook definition. It
appeared in 2.4.15, and anybody who does not check for
KERNEL_HAS_O_DIRECT is not backward compatible.

To comment further: There is at least one example I can think of which
was exactly equivalent to the proposed change, namely the redefinition
of the filldir_t type in 2.4.9. It was admittedly not documented using
a define...

Note: We could at the same time replace the name direct_IO() with
direct_IO2() (that has several precedents). There are currently only
a small number of filesystems that provide O_DIRECT, and converting
them all is (as has been pointed out before) trivial...

The problem with read_inode2() was rather that it overloaded the the
existing iget4() interface...

Cheers,
Trond


2003-07-09 19:52:09

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: ->direct_IO API change in current 2.4 BK



On Wed, 9 Jul 2003, Marc-Christian Petersen wrote:

> On Wednesday 09 July 2003 20:08, Marcelo Tosatti wrote:
>
> Hi Marcelo,
>
> > Ok, right. Well, I dont know why it doesnt happen there. Maybe not enough
> > testing?
> aehm, I'll bet -aa is tested _alot_ and also -wolk has tons of testers, but
> well, this is a different talk ;)
>
> > Anyway, I'm going to revert the NFS DIRECT IO patch because, as Christoph
> > mentioned, breaks the API.
> > I except another solution from Trond (maybe ->direct_IO2).

> I wonder why you merge such stuff then in the first place, if you agree with
> hch's opinion, that an API should not change in stable kernel series in the
> 2nd place? ... Did you temporarly forget it? ;)

No, I did not.

I applied it because, in my ignorance, I did not noticed it would break
the stable API.

I applied it because I wanted comments useful from people (Like hch and
others did).

Thank you very much for your flaming.

2003-07-09 21:29:25

by Jeff Garzik

[permalink] [raw]
Subject: Re: ->direct_IO API change in current 2.4 BK

Trond Myklebust wrote:
>>>>>>" " == Jeff Garzik <[email protected]> writes:
>
>
> > Having the stable API change, conditional on a define, is
> > really nasty and IMO will create maintenance and support
> > headaches down the line. I do not recall Linux VFS _ever_
> > having a hook's definition conditional. We should not start
> > now...
>
> direct_IO() was precisely such a conditional hook definition. It
> appeared in 2.4.15, and anybody who does not check for
> KERNEL_HAS_O_DIRECT is not backward compatible.

You misunderstand. The 2.4.15 direct_IO hook was _not_ conditionally
defined. It appeared in the middle of a stable series, yes. It has a
feature macro, yes. But the definition of the hook in
include/linux/fs.h does not _change_ based on a define. That is what I
mean by a conditional hook definition.

It is far less trouble for everyone to add a new hook, instead of
changing an existing hook, in the middle of a stable series.


> To comment further: There is at least one example I can think of which
> was exactly equivalent to the proposed change, namely the redefinition
> of the filldir_t type in 2.4.9. It was admittedly not documented using
> a define...

No doubt you can find more :) That doesn't make the right thing to do,
though :)


> Note: We could at the same time replace the name direct_IO() with
> direct_IO2() (that has several precedents). There are currently only
> a small number of filesystems that provide O_DIRECT, and converting
> them all is (as has been pointed out before) trivial...

We cannot just-fix-up filesystems which are not in-tree, which is what
the KERNEL_HAS_O_DIRECT2 define would be mainly used for. In-tree
filesystems would just unconditionally use the new, or old, interface as
they chose.


> The problem with read_inode2() was rather that it overloaded the the
> existing iget4() interface...

The higher-level problem was that we didn't want to change the VFS
API... otherwise we could have simply used the new interface, and
converted all in-tree filesystems.

Jeff



2003-07-09 23:32:44

by Alan

[permalink] [raw]
Subject: Re: ->direct_IO API change in current 2.4 BK

On Mer, 2003-07-09 at 20:13, Marcelo Tosatti wrote:
> I applied it because, in my ignorance, I did not noticed it would break
> the stable API.
>
> I applied it because I wanted comments useful from people (Like hch and
> others did).

I'm not sure I see what the fuss is about a slight API change that is
safe since it spews warnings/breaks existing code that isnt fixed. At
least one vendor kernel also has the changed API anyway

2003-07-09 23:29:06

by Alan

[permalink] [raw]
Subject: Re: ->direct_IO API change in current 2.4 BK

On Mer, 2003-07-09 at 19:50, Trond Myklebust wrote:
> >>>>> " " == Marc-Christian Petersen <[email protected]> writes:
>
> > err, -aa has XFS per default, -wolk has XFS per default. So
> > ... ;)
>
> So they have both XFS + NFS O_DIRECT?

-ac has both


2003-07-09 23:30:25

by Alan

[permalink] [raw]
Subject: Re: ->direct_IO API change in current 2.4 BK

On Mer, 2003-07-09 at 20:17, Jeff Garzik wrote:
> Having the stable API change, conditional on a define, is really
> nasty and IMO will create maintenance and support headaches down
> the line. I do not recall Linux VFS _ever_ having a hook's definition
> conditional. We should not start now...

The new quota code in 2.4.22pre already changed the rules slightly, as
do the shmemfs fixes if you are pedantic


2003-07-10 00:12:16

by Jeff Garzik

[permalink] [raw]
Subject: Re: ->direct_IO API change in current 2.4 BK

Alan Cox wrote:
> On Mer, 2003-07-09 at 20:17, Jeff Garzik wrote:
>
>>Having the stable API change, conditional on a define, is really
>>nasty and IMO will create maintenance and support headaches down
>>the line. I do not recall Linux VFS _ever_ having a hook's definition
>>conditional. We should not start now...
>
>
> The new quota code in 2.4.22pre already changed the rules slightly, as
> do the shmemfs fixes if you are pedantic

I am ;-)

2003-07-10 00:12:17

by Jeff Garzik

[permalink] [raw]
Subject: Re: ->direct_IO API change in current 2.4 BK

Alan Cox wrote:
> On Mer, 2003-07-09 at 20:13, Marcelo Tosatti wrote:
>
>>I applied it because, in my ignorance, I did not noticed it would break
>>the stable API.
>>
>>I applied it because I wanted comments useful from people (Like hch and
>>others did).
>
>
> I'm not sure I see what the fuss is about a slight API change that is
> safe since it spews warnings/breaks existing code that isnt fixed. At
> least one vendor kernel also has the changed API anyway


"safe" ignores the pain of people trying to support multiple kernels.
Each API change like the direct_IO one introduces ifdefs. Changing a
function prototype is particularly annoying because you can't create a
backwards-compat wrapper

I disagree with the AC97 codec changes being merged into 2.4, too, for
the same reason. Yes I recognize it is required to support new
hardware. Yes I realize it vastly simplifies supporting some existing
hardware. But I don't think you realize (or don't care?) about the
maintenance pain created by the change. If a vendor wishes their driver
to support 2.4.21 _and_ 2.4.22 (not a lot to ask), they must add a bunch
of ifdef crud in their OSS driver.

Feature and API additions are _far_ less painful than API changes in the
middle of a stable series.

Overall, I think we are looking at a question which needs to be answered
by the community: what constitutes a stable series? when do we stop
changing the API and let it stabilize? ... and I am writing a mail
right now to ask that question (as requested by Marcelo and a couple
others, though I wanted to do it for a while now).

Jeff