2007-01-16 02:03:30

by Nate Diller

[permalink] [raw]
Subject: [PATCH -mm 0/10][RFC] aio: make struct kiocb private

This series is an attempt to generalize the async I/O paths to be
implementation agnostic. It completely eliminates knowledge of
the kiocb structure in the generic code and makes it private within the
current aio code. Things get noticeably cleaner without that layering
violation.

The new interface takes a file_endio_t function pointer, and a private data
pointer, which would normally be aio_complete and a kiocb pointer,
respectively. If the aio submission function gets back EIOCBQUEUED, that is
a guarantee that the endio function will be called, or *already has been
called*. If the file_endio_t pointer provided to aio_[read|write] is NULL,
the FS must block on I/O completion, then return either the number of bytes
read, or an error.

I had to touch more areas that I had originally expected, so there are
changes in a corner of the socket code, and a slight behavior change in the
direct-io completion path with affects XFS and OCFS2. I would appreciate
further review there, so I copied some extra people I hope can help.

This patch is against 2.6.20-rc4-mm1. It has been compile-tested at each
stage. It needs some runtime testing yet, but I prefer to get it out for
commentary and test later.

These patches are for RFC only and have not yet been signed off.

NATE

---

Documentation/filesystems/Locking | 11 +
Documentation/filesystems/vfs.txt | 11 +
arch/s390/hypfs/inode.c | 16 +-
drivers/net/pppoe.c | 8 -
drivers/net/tun.c | 13 +-
drivers/usb/gadget/inode.c | 239 +-------------------------------------
fs/aio.c | 74 ++++++-----
fs/bad_inode.c | 10 -
fs/block_dev.c | 109 +++++++++++------
fs/cifs/cifsfs.c | 10 -
fs/compat.c | 56 --------
fs/direct-io.c | 92 ++++++++------
fs/ecryptfs/file.c | 16 +-
fs/ext2/inode.c | 12 -
fs/ext3/file.c | 9 -
fs/ext3/inode.c | 11 -
fs/ext4/file.c | 9 -
fs/ext4/inode.c | 11 -
fs/fat/inode.c | 12 -
fs/fuse/dev.c | 13 +-
fs/gfs2/ops_address.c | 14 +-
fs/hfs/inode.c | 13 --
fs/hfsplus/inode.c | 13 --
fs/jfs/inode.c | 12 -
fs/nfs/direct.c | 92 +++++++-------
fs/nfs/file.c | 62 +++++----
fs/ntfs/file.c | 71 ++---------
fs/ocfs2/aops.c | 24 +--
fs/ocfs2/aops.h | 8 -
fs/ocfs2/file.c | 44 +++---
fs/ocfs2/inode.h | 2
fs/pipe.c | 12 -
fs/read_write.c | 225 ++++++++++++-----------------------
fs/read_write.h | 8 -
fs/reiserfs/inode.c | 13 --
fs/smbfs/file.c | 28 ++--
fs/udf/file.c | 13 +-
fs/xfs/linux-2.6/xfs_aops.c | 44 +++---
fs/xfs/linux-2.6/xfs_file.c | 58 +++++----
fs/xfs/linux-2.6/xfs_lrw.c | 29 ++--
fs/xfs/linux-2.6/xfs_lrw.h | 10 -
fs/xfs/linux-2.6/xfs_vnode.h | 20 +--
include/linux/aio.h | 11 -
include/linux/fs.h | 114 +++++++++---------
include/linux/net.h | 18 +-
include/linux/nfs_fs.h | 12 -
include/net/bluetooth/bluetooth.h | 2
include/net/inet_common.h | 3
include/net/scm.h | 2
include/net/sock.h | 45 +------
include/net/tcp.h | 6
include/net/udp.h | 3
mm/filemap.c | 109 ++++++++---------
net/appletalk/ddp.c | 5
net/atm/common.c | 6
net/atm/common.h | 7 -
net/ax25/af_ax25.c | 7 -
net/bluetooth/af_bluetooth.c | 4
net/bluetooth/hci_sock.c | 7 -
net/bluetooth/l2cap.c | 2
net/bluetooth/rfcomm/sock.c | 8 -
net/bluetooth/sco.c | 3
net/core/sock.c | 12 -
net/dccp/dccp.h | 8 -
net/dccp/probe.c | 3
net/dccp/proto.c | 7 -
net/decnet/af_decnet.c | 7 -
net/econet/af_econet.c | 7 -
net/ipv4/af_inet.c | 5
net/ipv4/raw.c | 8 -
net/ipv4/tcp.c | 7 -
net/ipv4/tcp_probe.c | 3
net/ipv4/udp.c | 9 -
net/ipv4/udp_impl.h | 2
net/ipv6/raw.c | 6
net/ipv6/udp.c | 10 -
net/ipv6/udp_impl.h | 6
net/ipx/af_ipx.c | 7 -
net/irda/af_irda.c | 29 ++--
net/key/af_key.c | 6
net/llc/af_llc.c | 7 -
net/netlink/af_netlink.c | 24 +--
net/netrom/af_netrom.c | 7 -
net/packet/af_packet.c | 11 -
net/rose/af_rose.c | 7 -
net/sctp/socket.c | 9 -
net/socket.c | 199 +++++++++----------------------
net/tipc/socket.c | 28 +---
net/unix/af_unix.c | 116 +++++++-----------
net/wanrouter/af_wanpipe.c | 7 -
net/x25/af_x25.c | 6
sound/core/pcm_native.c | 15 +-
92 files changed, 1009 insertions(+), 1500 deletions(-)


2007-01-16 02:03:37

by Nate Diller

[permalink] [raw]
Subject: [PATCH -mm 1/10][RFC] aio: scm remove struct siocb

this patch removes struct sock_iocb

Its purpose seems to have dwindled to a mere container for struct
scm_cookie, and all of the users of scm_cookie seem to require
re-initializing it each time anyway. Besides, keeping such data around from
one call to the next seems to me like a layering violation, if not a bug,
considering that the sync IO code can use this call path too.

All scm_cookie users are converted to unconditionally allocate on the stack,
and sock_iocb and all its helpers are removed. This also simplifies the
socket aio submission path (is that even used?)

---

include/net/scm.h | 2
include/net/sock.h | 26 ---------
net/netlink/af_netlink.c | 18 ++----
net/socket.c | 131 +++++++++++------------------------------------
net/unix/af_unix.c | 77 ++++++++++-----------------
5 files changed, 68 insertions(+), 186 deletions(-)

---

diff -urpN -X dontdiff a/include/net/scm.h b/include/net/scm.h
--- a/include/net/scm.h 2006-11-29 13:57:37.000000000 -0800
+++ b/include/net/scm.h 2007-01-10 12:10:19.000000000 -0800
@@ -23,7 +23,6 @@ struct scm_cookie
#ifdef CONFIG_SECURITY_NETWORK
u32 secid; /* Passed security ID */
#endif
- unsigned long seq; /* Connection seqno */
};

extern void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm);
@@ -56,7 +55,6 @@ static __inline__ int scm_send(struct so
scm->creds.gid = p->gid;
scm->creds.pid = p->tgid;
scm->fp = NULL;
- scm->seq = 0;
unix_get_peersec_dgram(sock, scm);
if (msg->msg_controllen <= 0)
return 0;
diff -urpN -X dontdiff a/include/net/sock.h b/include/net/sock.h
--- a/include/net/sock.h 2007-01-10 11:50:54.000000000 -0800
+++ b/include/net/sock.h 2007-01-10 12:15:35.000000000 -0800
@@ -75,10 +75,9 @@
* between user contexts and software interrupt processing, whereas the
* mini-semaphore synchronizes multiple users amongst themselves.
*/
-struct sock_iocb;
typedef struct {
spinlock_t slock;
- struct sock_iocb *owner;
+ void *owner;
wait_queue_head_t wq;
/*
* We express the mutex-alike socket_lock semantics
@@ -656,29 +655,6 @@ static inline void __sk_prot_rehash(stru
#define SOCK_BINDADDR_LOCK 4
#define SOCK_BINDPORT_LOCK 8

-/* sock_iocb: used to kick off async processing of socket ios */
-struct sock_iocb {
- struct list_head list;
-
- int flags;
- int size;
- struct socket *sock;
- struct sock *sk;
- struct scm_cookie *scm;
- struct msghdr *msg, async_msg;
- struct kiocb *kiocb;
-};
-
-static inline struct sock_iocb *kiocb_to_siocb(struct kiocb *iocb)
-{
- return (struct sock_iocb *)iocb->private;
-}
-
-static inline struct kiocb *siocb_to_kiocb(struct sock_iocb *si)
-{
- return si->kiocb;
-}
-
struct socket_alloc {
struct socket socket;
struct inode vfs_inode;
diff -urpN -X dontdiff a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
--- a/net/netlink/af_netlink.c 2007-01-10 11:53:12.000000000 -0800
+++ b/net/netlink/af_netlink.c 2007-01-10 12:10:19.000000000 -0800
@@ -1106,7 +1106,6 @@ static inline void netlink_rcv_wake(stru
static int netlink_sendmsg(struct kiocb *kiocb, struct socket *sock,
struct msghdr *msg, size_t len)
{
- struct sock_iocb *siocb = kiocb_to_siocb(kiocb);
struct sock *sk = sock->sk;
struct netlink_sock *nlk = nlk_sk(sk);
struct sockaddr_nl *addr=msg->msg_name;
@@ -1119,9 +1118,7 @@ static int netlink_sendmsg(struct kiocb
if (msg->msg_flags&MSG_OOB)
return -EOPNOTSUPP;

- if (NULL == siocb->scm)
- siocb->scm = &scm;
- err = scm_send(sock, msg, siocb->scm);
+ err = scm_send(sock, msg, &scm);
if (err < 0)
return err;

@@ -1155,7 +1152,7 @@ static int netlink_sendmsg(struct kiocb
NETLINK_CB(skb).dst_group = dst_group;
NETLINK_CB(skb).loginuid = audit_get_loginuid(current->audit_context);
selinux_get_task_sid(current, &(NETLINK_CB(skb).sid));
- memcpy(NETLINK_CREDS(skb), &siocb->scm->creds, sizeof(struct ucred));
+ memcpy(NETLINK_CREDS(skb), &scm.creds, sizeof(struct ucred));

/* What can I do? Netlink is asynchronous, so that
we will have to save current capabilities to
@@ -1189,7 +1186,6 @@ static int netlink_recvmsg(struct kiocb
struct msghdr *msg, size_t len,
int flags)
{
- struct sock_iocb *siocb = kiocb_to_siocb(kiocb);
struct scm_cookie scm;
struct sock *sk = sock->sk;
struct netlink_sock *nlk = nlk_sk(sk);
@@ -1230,17 +1226,15 @@ static int netlink_recvmsg(struct kiocb
if (nlk->flags & NETLINK_RECV_PKTINFO)
netlink_cmsg_recv_pktinfo(msg, skb);

- if (NULL == siocb->scm) {
- memset(&scm, 0, sizeof(scm));
- siocb->scm = &scm;
- }
- siocb->scm->creds = *NETLINK_CREDS(skb);
+ memset(&scm, 0, sizeof(scm));
+
+ scm.creds = *NETLINK_CREDS(skb);
skb_free_datagram(sk, skb);

if (nlk->cb && atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf / 2)
netlink_dump(sk);

- scm_recv(sock, msg, siocb->scm, flags);
+ scm_recv(sock, msg, &scm, flags);

out:
netlink_rcv_wake(sk);
diff -urpN -X dontdiff a/net/socket.c b/net/socket.c
--- a/net/socket.c 2007-01-10 11:51:10.000000000 -0800
+++ b/net/socket.c 2007-01-10 12:10:19.000000000 -0800
@@ -551,14 +551,8 @@ void sock_release(struct socket *sock)
static inline int __sock_sendmsg(struct kiocb *iocb, struct socket *sock,
struct msghdr *msg, size_t size)
{
- struct sock_iocb *si = kiocb_to_siocb(iocb);
int err;

- si->sock = sock;
- si->scm = NULL;
- si->msg = msg;
- si->size = size;
-
err = security_socket_sendmsg(sock, msg, size);
if (err)
return err;
@@ -569,15 +563,9 @@ static inline int __sock_sendmsg(struct
int sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
{
struct kiocb iocb;
- struct sock_iocb siocb;
- int ret;

init_sync_kiocb(&iocb, NULL);
- iocb.private = &siocb;
- ret = __sock_sendmsg(&iocb, sock, msg, size);
- if (-EIOCBQUEUED == ret)
- ret = wait_on_sync_kiocb(&iocb);
- return ret;
+ return __sock_sendmsg(&iocb, sock, msg, size);
}

int kernel_sendmsg(struct socket *sock, struct msghdr *msg,
@@ -602,13 +590,6 @@ static inline int __sock_recvmsg(struct
struct msghdr *msg, size_t size, int flags)
{
int err;
- struct sock_iocb *si = kiocb_to_siocb(iocb);
-
- si->sock = sock;
- si->scm = NULL;
- si->msg = msg;
- si->size = size;
- si->flags = flags;

err = security_socket_recvmsg(sock, msg, size, flags);
if (err)
@@ -621,15 +602,9 @@ int sock_recvmsg(struct socket *sock, st
size_t size, int flags)
{
struct kiocb iocb;
- struct sock_iocb siocb;
- int ret;

init_sync_kiocb(&iocb, NULL);
- iocb.private = &siocb;
- ret = __sock_recvmsg(&iocb, sock, msg, size, flags);
- if (-EIOCBQUEUED == ret)
- ret = wait_on_sync_kiocb(&iocb);
- return ret;
+ return __sock_recvmsg(&iocb, sock, msg, size, flags);
}

int kernel_recvmsg(struct socket *sock, struct msghdr *msg,
@@ -649,11 +624,6 @@ int kernel_recvmsg(struct socket *sock,
return result;
}

-static void sock_aio_dtor(struct kiocb *iocb)
-{
- kfree(iocb->private);
-}
-
static ssize_t sock_sendpage(struct file *file, struct page *page,
int offset, size_t size, loff_t *ppos, int more)
{
@@ -669,47 +639,13 @@ static ssize_t sock_sendpage(struct file
return sock->ops->sendpage(sock, page, offset, size, flags);
}

-static struct sock_iocb *alloc_sock_iocb(struct kiocb *iocb,
- struct sock_iocb *siocb)
-{
- if (!is_sync_kiocb(iocb)) {
- siocb = kmalloc(sizeof(*siocb), GFP_KERNEL);
- if (!siocb)
- return NULL;
- iocb->ki_dtor = sock_aio_dtor;
- }
-
- siocb->kiocb = iocb;
- iocb->private = siocb;
- return siocb;
-}
-
-static ssize_t do_sock_read(struct msghdr *msg, struct kiocb *iocb,
- struct file *file, const struct iovec *iov,
- unsigned long nr_segs)
-{
- struct socket *sock = file->private_data;
- size_t size = 0;
- int i;
-
- for (i = 0; i < nr_segs; i++)
- size += iov[i].iov_len;
-
- msg->msg_name = NULL;
- msg->msg_namelen = 0;
- msg->msg_control = NULL;
- msg->msg_controllen = 0;
- msg->msg_iov = (struct iovec *)iov;
- msg->msg_iovlen = nr_segs;
- msg->msg_flags = (file->f_flags & O_NONBLOCK) ? MSG_DONTWAIT : 0;
-
- return __sock_recvmsg(iocb, sock, msg, size, msg->msg_flags);
-}
-
static ssize_t sock_aio_read(struct kiocb *iocb, const struct iovec *iov,
unsigned long nr_segs, loff_t pos)
{
- struct sock_iocb siocb, *x;
+ struct msghdr msg;
+ struct socket *sock = iocb->ki_filp->private_data;
+ size_t size = 0;
+ int i;

if (pos != 0)
return -ESPIPE;
@@ -717,41 +653,27 @@ static ssize_t sock_aio_read(struct kioc
if (iocb->ki_left == 0) /* Match SYS5 behaviour */
return 0;

-
- x = alloc_sock_iocb(iocb, &siocb);
- if (!x)
- return -ENOMEM;
- return do_sock_read(&x->async_msg, iocb, iocb->ki_filp, iov, nr_segs);
-}
-
-static ssize_t do_sock_write(struct msghdr *msg, struct kiocb *iocb,
- struct file *file, const struct iovec *iov,
- unsigned long nr_segs)
-{
- struct socket *sock = file->private_data;
- size_t size = 0;
- int i;
-
for (i = 0; i < nr_segs; i++)
size += iov[i].iov_len;

- msg->msg_name = NULL;
- msg->msg_namelen = 0;
- msg->msg_control = NULL;
- msg->msg_controllen = 0;
- msg->msg_iov = (struct iovec *)iov;
- msg->msg_iovlen = nr_segs;
- msg->msg_flags = (file->f_flags & O_NONBLOCK) ? MSG_DONTWAIT : 0;
- if (sock->type == SOCK_SEQPACKET)
- msg->msg_flags |= MSG_EOR;
+ msg.msg_name = NULL;
+ msg.msg_namelen = 0;
+ msg.msg_control = NULL;
+ msg.msg_controllen = 0;
+ msg.msg_iov = (struct iovec *)iov;
+ msg.msg_iovlen = nr_segs;
+ msg.msg_flags = (iocb->ki_filp->f_flags & O_NONBLOCK) ? MSG_DONTWAIT : 0;

- return __sock_sendmsg(iocb, sock, msg, size);
+ return __sock_recvmsg(iocb, sock, &msg, size, msg.msg_flags);
}

static ssize_t sock_aio_write(struct kiocb *iocb, const struct iovec *iov,
unsigned long nr_segs, loff_t pos)
{
- struct sock_iocb siocb, *x;
+ struct msghdr msg;
+ struct socket *sock = iocb->ki_filp->private_data;
+ size_t size = 0;
+ int i;

if (pos != 0)
return -ESPIPE;
@@ -759,11 +681,20 @@ static ssize_t sock_aio_write(struct kio
if (iocb->ki_left == 0) /* Match SYS5 behaviour */
return 0;

- x = alloc_sock_iocb(iocb, &siocb);
- if (!x)
- return -ENOMEM;
+ for (i = 0; i < nr_segs; i++)
+ size += iov[i].iov_len;
+
+ msg.msg_name = NULL;
+ msg.msg_namelen = 0;
+ msg.msg_control = NULL;
+ msg.msg_controllen = 0;
+ msg.msg_iov = (struct iovec *)iov;
+ msg.msg_iovlen = nr_segs;
+ msg.msg_flags = (iocb->ki_filp->f_flags & O_NONBLOCK) ? MSG_DONTWAIT : 0;
+ if (sock->type == SOCK_SEQPACKET)
+ msg.msg_flags |= MSG_EOR;

- return do_sock_write(&x->async_msg, iocb, iocb->ki_filp, iov, nr_segs);
+ return __sock_sendmsg(iocb, sock, &msg, size);
}

/*
diff -urpN -X dontdiff a/net/unix/af_unix.c b/net/unix/af_unix.c
--- a/net/unix/af_unix.c 2007-01-10 11:51:11.000000000 -0800
+++ b/net/unix/af_unix.c 2007-01-10 12:10:19.000000000 -0800
@@ -1267,7 +1267,6 @@ static void unix_attach_fds(struct scm_c
static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock,
struct msghdr *msg, size_t len)
{
- struct sock_iocb *siocb = kiocb_to_siocb(kiocb);
struct sock *sk = sock->sk;
struct unix_sock *u = unix_sk(sk);
struct sockaddr_un *sunaddr=msg->msg_name;
@@ -1277,11 +1276,9 @@ static int unix_dgram_sendmsg(struct kio
unsigned hash;
struct sk_buff *skb;
long timeo;
- struct scm_cookie tmp_scm;
+ struct scm_cookie scm;

- if (NULL == siocb->scm)
- siocb->scm = &tmp_scm;
- err = scm_send(sock, msg, siocb->scm);
+ err = scm_send(sock, msg, &scm);
if (err < 0)
return err;

@@ -1314,10 +1311,10 @@ static int unix_dgram_sendmsg(struct kio
if (skb==NULL)
goto out;

- memcpy(UNIXCREDS(skb), &siocb->scm->creds, sizeof(struct ucred));
- if (siocb->scm->fp)
- unix_attach_fds(siocb->scm, skb);
- unix_get_secdata(siocb->scm, skb);
+ memcpy(UNIXCREDS(skb), &scm.creds, sizeof(struct ucred));
+ if (scm.fp)
+ unix_attach_fds(&scm, skb);
+ unix_get_secdata(&scm, skb);

skb->h.raw = skb->data;
err = memcpy_fromiovec(skb_put(skb,len), msg->msg_iov, len);
@@ -1401,7 +1398,7 @@ restart:
unix_state_runlock(other);
other->sk_data_ready(other, len);
sock_put(other);
- scm_destroy(siocb->scm);
+ scm_destroy(&scm);
return len;

out_unlock:
@@ -1411,7 +1408,7 @@ out_free:
out:
if (other)
sock_put(other);
- scm_destroy(siocb->scm);
+ scm_destroy(&scm);
return err;
}

@@ -1419,18 +1416,15 @@ out:
static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
struct msghdr *msg, size_t len)
{
- struct sock_iocb *siocb = kiocb_to_siocb(kiocb);
struct sock *sk = sock->sk;
struct sock *other = NULL;
struct sockaddr_un *sunaddr=msg->msg_name;
int err,size;
struct sk_buff *skb;
int sent=0;
- struct scm_cookie tmp_scm;
+ struct scm_cookie scm;

- if (NULL == siocb->scm)
- siocb->scm = &tmp_scm;
- err = scm_send(sock, msg, siocb->scm);
+ err = scm_send(sock, msg, &scm);
if (err < 0)
return err;

@@ -1486,9 +1480,9 @@ static int unix_stream_sendmsg(struct ki
*/
size = min_t(int, size, skb_tailroom(skb));

- memcpy(UNIXCREDS(skb), &siocb->scm->creds, sizeof(struct ucred));
- if (siocb->scm->fp)
- unix_attach_fds(siocb->scm, skb);
+ memcpy(UNIXCREDS(skb), &scm.creds, sizeof(struct ucred));
+ if (scm.fp)
+ unix_attach_fds(&scm, skb);

if ((err = memcpy_fromiovec(skb_put(skb,size), msg->msg_iov, size)) != 0) {
kfree_skb(skb);
@@ -1507,9 +1501,7 @@ static int unix_stream_sendmsg(struct ki
sent+=size;
}

- scm_destroy(siocb->scm);
- siocb->scm = NULL;
-
+ scm_destroy(&scm);
return sent;

pipe_err_free:
@@ -1520,8 +1512,7 @@ pipe_err:
send_sig(SIGPIPE,current,0);
err = -EPIPE;
out_err:
- scm_destroy(siocb->scm);
- siocb->scm = NULL;
+ scm_destroy(&scm);
return sent ? : err;
}

@@ -1559,8 +1550,7 @@ static int unix_dgram_recvmsg(struct kio
struct msghdr *msg, size_t size,
int flags)
{
- struct sock_iocb *siocb = kiocb_to_siocb(iocb);
- struct scm_cookie tmp_scm;
+ struct scm_cookie scm;
struct sock *sk = sock->sk;
struct unix_sock *u = unix_sk(sk);
int noblock = flags & MSG_DONTWAIT;
@@ -1593,17 +1583,14 @@ static int unix_dgram_recvmsg(struct kio
if (err)
goto out_free;

- if (!siocb->scm) {
- siocb->scm = &tmp_scm;
- memset(&tmp_scm, 0, sizeof(tmp_scm));
- }
- siocb->scm->creds = *UNIXCREDS(skb);
- unix_set_secdata(siocb->scm, skb);
+ memset(&scm, 0, sizeof(scm));
+ scm.creds = *UNIXCREDS(skb);
+ unix_set_secdata(&scm, skb);

if (!(flags & MSG_PEEK))
{
if (UNIXCB(skb).fp)
- unix_detach_fds(siocb->scm, skb);
+ unix_detach_fds(&scm, skb);
}
else
{
@@ -1620,11 +1607,11 @@ static int unix_dgram_recvmsg(struct kio

*/
if (UNIXCB(skb).fp)
- siocb->scm->fp = scm_fp_dup(UNIXCB(skb).fp);
+ scm.fp = scm_fp_dup(UNIXCB(skb).fp);
}
err = size;

- scm_recv(sock, msg, siocb->scm, flags);
+ scm_recv(sock, msg, &scm, flags);

out_free:
skb_free_datagram(sk,skb);
@@ -1672,8 +1659,7 @@ static int unix_stream_recvmsg(struct ki
struct msghdr *msg, size_t size,
int flags)
{
- struct sock_iocb *siocb = kiocb_to_siocb(iocb);
- struct scm_cookie tmp_scm;
+ struct scm_cookie scm;
struct sock *sk = sock->sk;
struct unix_sock *u = unix_sk(sk);
struct sockaddr_un *sunaddr=msg->msg_name;
@@ -1700,10 +1686,7 @@ static int unix_stream_recvmsg(struct ki
* while sleeps in memcpy_tomsg
*/

- if (!siocb->scm) {
- siocb->scm = &tmp_scm;
- memset(&tmp_scm, 0, sizeof(tmp_scm));
- }
+ memset(&scm, 0, sizeof(scm));

mutex_lock(&u->readlock);

@@ -1743,13 +1726,13 @@ static int unix_stream_recvmsg(struct ki

if (check_creds) {
/* Never glue messages from different writers */
- if (memcmp(UNIXCREDS(skb), &siocb->scm->creds, sizeof(siocb->scm->creds)) != 0) {
+ if (memcmp(UNIXCREDS(skb), &scm.creds, sizeof(scm.creds)) != 0) {
skb_queue_head(&sk->sk_receive_queue, skb);
break;
}
} else {
/* Copy credentials */
- siocb->scm->creds = *UNIXCREDS(skb);
+ scm.creds = *UNIXCREDS(skb);
check_creds = 1;
}

@@ -1776,7 +1759,7 @@ static int unix_stream_recvmsg(struct ki
skb_pull(skb, chunk);

if (UNIXCB(skb).fp)
- unix_detach_fds(siocb->scm, skb);
+ unix_detach_fds(&scm, skb);

/* put the skb back if we didn't use it up.. */
if (skb->len)
@@ -1787,7 +1770,7 @@ static int unix_stream_recvmsg(struct ki

kfree_skb(skb);

- if (siocb->scm->fp)
+ if (scm.fp)
break;
}
else
@@ -1795,7 +1778,7 @@ static int unix_stream_recvmsg(struct ki
/* It is questionable, see note in unix_dgram_recvmsg.
*/
if (UNIXCB(skb).fp)
- siocb->scm->fp = scm_fp_dup(UNIXCB(skb).fp);
+ scm.fp = scm_fp_dup(UNIXCB(skb).fp);

/* put message back and return */
skb_queue_head(&sk->sk_receive_queue, skb);
@@ -1804,7 +1787,7 @@ static int unix_stream_recvmsg(struct ki
} while (size);

mutex_unlock(&u->readlock);
- scm_recv(sock, msg, siocb->scm, flags);
+ scm_recv(sock, msg, &scm, flags);
out:
return copied ? : err;
}

2007-01-16 02:04:18

by Nate Diller

[permalink] [raw]
Subject: [PATCH -mm 4/10][RFC] aio: convert aio_complete to file_endio_t

Define a new function typedef for I/O completion at the file/iovec level --

typedef void (file_endio_t)(void *endio_data, ssize_t count, int err);

and convert aio_complete and all its callers to this new prototype.

---

drivers/usb/gadget/inode.c | 24 +++++++-----------
fs/aio.c | 59 ++++++++++++++++++++++++---------------------
fs/block_dev.c | 8 +-----
fs/direct-io.c | 18 +++++--------
fs/nfs/direct.c | 9 ++----
include/linux/aio.h | 11 +++-----
include/linux/fs.h | 2 +
7 files changed, 61 insertions(+), 70 deletions(-)

---

diff -urpN -X dontdiff a/drivers/usb/gadget/inode.c b/drivers/usb/gadget/inode.c
--- a/drivers/usb/gadget/inode.c 2007-01-12 14:42:29.000000000 -0800
+++ b/drivers/usb/gadget/inode.c 2007-01-12 14:25:34.000000000 -0800
@@ -559,35 +559,32 @@ static int ep_aio_cancel(struct kiocb *i
return value;
}

-static ssize_t ep_aio_read_retry(struct kiocb *iocb)
+static int ep_aio_read_retry(struct kiocb *iocb)
{
struct kiocb_priv *priv = iocb->private;
- ssize_t len, total;
- int i;
+ ssize_t total;
+ int i, err = 0;

/* we "retry" to get the right mm context for this: */

/* copy stuff into user buffers */
total = priv->actual;
- len = 0;
for (i=0; i < priv->nr_segs; i++) {
ssize_t this = min((ssize_t)(priv->iv[i].iov_len), total);

if (copy_to_user(priv->iv[i].iov_base, priv->buf, this)) {
- if (len == 0)
- len = -EFAULT;
+ err = -EFAULT;
break;
}

total -= this;
- len += this;
if (total == 0)
break;
}
kfree(priv->buf);
kfree(priv);
aio_put_req(iocb);
- return len;
+ return err;
}

static void ep_aio_complete(struct usb_ep *ep, struct usb_request *req)
@@ -610,9 +607,7 @@ static void ep_aio_complete(struct usb_e
if (unlikely(kiocbIsCancelled(iocb)))
aio_put_req(iocb);
else
- aio_complete(iocb,
- req->actual ? req->actual : req->status,
- req->status);
+ aio_complete(iocb, req->actual, req->status);
} else {
/* retry() won't report both; so we hide some faults */
if (unlikely(0 != req->status))
@@ -702,16 +697,17 @@ ep_aio_read(struct kiocb *iocb, const st
{
struct ep_data *epdata = iocb->ki_filp->private_data;
char *buf;
+ size_t len = iov_length(iov, nr_segs);

if (unlikely(epdata->desc.bEndpointAddress & USB_DIR_IN))
return -EINVAL;

- buf = kmalloc(iocb->ki_left, GFP_KERNEL);
+ buf = kmalloc(len, GFP_KERNEL);
if (unlikely(!buf))
return -ENOMEM;

iocb->ki_retry = ep_aio_read_retry;
- return ep_aio_rwtail(iocb, buf, iocb->ki_left, epdata, iov, nr_segs);
+ return ep_aio_rwtail(iocb, buf, len, epdata, iov, nr_segs);
}

static ssize_t
@@ -726,7 +722,7 @@ ep_aio_write(struct kiocb *iocb, const s
if (unlikely(!(epdata->desc.bEndpointAddress & USB_DIR_IN)))
return -EINVAL;

- buf = kmalloc(iocb->ki_left, GFP_KERNEL);
+ buf = kmalloc(iov_length(iov, nr_segs), GFP_KERNEL);
if (unlikely(!buf))
return -ENOMEM;

diff -urpN -X dontdiff a/fs/aio.c b/fs/aio.c
--- a/fs/aio.c 2007-01-12 14:42:29.000000000 -0800
+++ b/fs/aio.c 2007-01-12 14:29:20.000000000 -0800
@@ -658,16 +658,16 @@ static inline int __queue_kicked_iocb(st
* simplifies the coding of individual aio operations as
* it avoids various potential races.
*/
-static ssize_t aio_run_iocb(struct kiocb *iocb)
+static void aio_run_iocb(struct kiocb *iocb)
{
struct kioctx *ctx = iocb->ki_ctx;
- ssize_t (*retry)(struct kiocb *);
+ int (*retry)(struct kiocb *);
wait_queue_t *io_wait = current->io_wait;
- ssize_t ret;
+ int err;

if (!(retry = iocb->ki_retry)) {
printk("aio_run_iocb: iocb->ki_retry = NULL\n");
- return 0;
+ return;
}

/*
@@ -702,8 +702,8 @@ static ssize_t aio_run_iocb(struct kiocb

/* Quit retrying if the i/o has been cancelled */
if (kiocbIsCancelled(iocb)) {
- ret = -EINTR;
- aio_complete(iocb, ret, 0);
+ err = -EINTR;
+ aio_complete(iocb, iocb->ki_nbytes - iocb->ki_left, err);
/* must not access the iocb after this */
goto out;
}
@@ -720,17 +720,17 @@ static ssize_t aio_run_iocb(struct kiocb
*/
BUG_ON(!is_sync_wait(current->io_wait));
current->io_wait = &iocb->ki_wait.wait;
- ret = retry(iocb);
+ err = retry(iocb);
current->io_wait = io_wait;

- if (ret != -EIOCBRETRY && ret != -EIOCBQUEUED) {
+ if (err != -EIOCBRETRY && err != -EIOCBQUEUED) {
BUG_ON(!list_empty(&iocb->ki_wait.wait.task_list));
- aio_complete(iocb, ret, 0);
+ aio_complete(iocb, iocb->ki_nbytes - iocb->ki_left, err);
}
out:
spin_lock_irq(&ctx->ctx_lock);

- if (-EIOCBRETRY == ret) {
+ if (-EIOCBRETRY == err) {
/*
* OK, now that we are done with this iteration
* and know that there is more left to go,
@@ -754,7 +754,6 @@ out:
aio_queue_work(ctx);
}
}
- return ret;
}

/*
@@ -918,19 +917,25 @@ EXPORT_SYMBOL(kick_iocb);

/* aio_complete
* Called when the io request on the given iocb is complete.
- * Returns true if this is the last user of the request. The
+ * Frees ioctx if this is the last user of the request. The
* only other user of the request can be the cancellation code.
*/
-int fastcall aio_complete(struct kiocb *iocb, long res, long res2)
+void fastcall aio_complete(void *endio_data, ssize_t count, int err)
{
+ struct kiocb *iocb = endio_data;
struct kioctx *ctx = iocb->ki_ctx;
struct aio_ring_info *info;
struct aio_ring *ring;
struct io_event *event;
unsigned long flags;
unsigned long tail;
+ long result;
int ret;

+ result = (long) err;
+ if (!result)
+ result = (long) count;
+
/*
* Special case handling for sync iocbs:
* - events go directly into the iocb for fast handling
@@ -940,10 +945,10 @@ int fastcall aio_complete(struct kiocb *
*/
if (is_sync_kiocb(iocb)) {
BUG_ON(iocb->ki_users != 1);
- iocb->ki_user_data = res;
+ iocb->ki_user_data = result;
iocb->ki_users = 0;
wake_up_process(iocb->ki_obj.tsk);
- return 1;
+ return;
}

info = &ctx->ring_info;
@@ -975,12 +980,12 @@ int fastcall aio_complete(struct kiocb *

event->obj = (u64)(unsigned long)iocb->ki_obj.user;
event->data = iocb->ki_user_data;
- event->res = res;
- event->res2 = res2;
+ event->res = result;
+ event->res2 = err;

dprintk("aio_complete: %p[%lu]: %p: %p %Lx %lx %lx\n",
ctx, tail, iocb, iocb->ki_obj.user, iocb->ki_user_data,
- res, res2);
+ result, err);

/* after flagging the request as done, we
* must never even look at it again
@@ -1002,7 +1007,7 @@ put_rq:
wake_up(&ctx->wait);

spin_unlock_irqrestore(&ctx->ctx_lock, flags);
- return ret;
+ return;
}

/* aio_read_evt
@@ -1307,7 +1312,7 @@ static void aio_advance_iovec(struct kio
BUG_ON(ret > 0 && iocb->ki_left == 0);
}

-static ssize_t aio_rw_vect_retry(struct kiocb *iocb)
+static int aio_rw_vect_retry(struct kiocb *iocb)
{
struct file *file = iocb->ki_filp;
struct address_space *mapping = file->f_mapping;
@@ -1341,26 +1346,26 @@ static ssize_t aio_rw_vect_retry(struct

/* This means we must have transferred all that we could */
/* No need to retry anymore */
- if ((ret == 0) || (iocb->ki_left == 0))
- ret = iocb->ki_nbytes - iocb->ki_left;
+ if (iocb->ki_left == 0)
+ ret = 0;

- return ret;
+ return (int) ret;
}

-static ssize_t aio_fdsync(struct kiocb *iocb)
+static int aio_fdsync(struct kiocb *iocb)
{
struct file *file = iocb->ki_filp;
- ssize_t ret = -EINVAL;
+ int ret = -EINVAL;

if (file->f_op->aio_fsync)
ret = file->f_op->aio_fsync(iocb, 1);
return ret;
}

-static ssize_t aio_fsync(struct kiocb *iocb)
+static int aio_fsync(struct kiocb *iocb)
{
struct file *file = iocb->ki_filp;
- ssize_t ret = -EINVAL;
+ int ret = -EINVAL;

if (file->f_op->aio_fsync)
ret = file->f_op->aio_fsync(iocb, 0);
diff -urpN -X dontdiff a/fs/block_dev.c b/fs/block_dev.c
--- a/fs/block_dev.c 2007-01-12 11:19:45.000000000 -0800
+++ b/fs/block_dev.c 2007-01-12 14:42:05.000000000 -0800
@@ -147,12 +147,8 @@ static int blk_end_aio(struct bio *bio,
if (error)
iocb->ki_nbytes = -EIO;

- if (atomic_dec_and_test(bio_count)) {
- if ((long)iocb->ki_nbytes < 0)
- aio_complete(iocb, iocb->ki_nbytes, 0);
- else
- aio_complete(iocb, iocb->ki_left, 0);
- }
+ if (atomic_dec_and_test(bio_count))
+ aio_complete(iocb, iocb->ki_left, iocb->ki_nbytes);

return 0;
}
diff -urpN -X dontdiff a/fs/direct-io.c b/fs/direct-io.c
--- a/fs/direct-io.c 2007-01-12 14:42:29.000000000 -0800
+++ b/fs/direct-io.c 2007-01-12 14:25:34.000000000 -0800
@@ -224,8 +224,6 @@ static struct page *dio_get_page(struct
*/
static int dio_complete(struct dio *dio, loff_t offset, int ret)
{
- ssize_t transferred = 0;
-
/*
* AIO submission can race with bio completion to get here while
* expecting to have the last io completed by bio completion.
@@ -236,15 +234,13 @@ static int dio_complete(struct dio *dio,
ret = 0;

if (dio->result) {
- transferred = dio->result;
-
/* Check for short read case */
- if ((dio->rw == READ) && ((offset + transferred) > dio->i_size))
- transferred = dio->i_size - offset;
+ if ((dio->rw == READ) && ((offset + dio->result) > dio->i_size))
+ dio->result = dio->i_size - offset;
}

if (dio->end_io && dio->result)
- dio->end_io(dio->iocb, offset, transferred,
+ dio->end_io(dio->iocb, offset, dio->result,
dio->map_bh.b_private);
if (dio->lock_type == DIO_LOCKING)
/* lockdep: non-owner release */
@@ -254,8 +250,6 @@ static int dio_complete(struct dio *dio,
ret = dio->page_errors;
if (ret == 0)
ret = dio->io_error;
- if (ret == 0)
- ret = transferred;

return ret;
}
@@ -283,8 +277,8 @@ static int dio_bio_end_aio(struct bio *b
spin_unlock_irqrestore(&dio->bio_lock, flags);

if (remaining == 0) {
- int ret = dio_complete(dio, dio->iocb->ki_pos, 0);
- aio_complete(dio->iocb, ret, 0);
+ int err = dio_complete(dio, dio->iocb->ki_pos, 0);
+ aio_complete(dio->iocb, dio->result, err);
kfree(dio);
}

@@ -1110,6 +1104,8 @@ direct_io_worker(int rw, struct kiocb *i
BUG_ON(!dio->is_async && ret2 != 0);
if (ret2 == 0) {
ret = dio_complete(dio, offset, ret);
+ if (ret == 0)
+ ret = dio->result; /* bytes transferred */
kfree(dio);
} else
BUG_ON(ret != -EIOCBQUEUED);
diff -urpN -X dontdiff a/fs/nfs/direct.c b/fs/nfs/direct.c
--- a/fs/nfs/direct.c 2007-01-12 11:18:52.000000000 -0800
+++ b/fs/nfs/direct.c 2007-01-12 14:39:48.000000000 -0800
@@ -200,12 +200,9 @@ out:
*/
static void nfs_direct_complete(struct nfs_direct_req *dreq)
{
- if (dreq->iocb) {
- long res = (long) dreq->error;
- if (!res)
- res = (long) dreq->count;
- aio_complete(dreq->iocb, res, 0);
- }
+ if (dreq->iocb)
+ aio_complete(dreq->iocb, dreq->count, dreq->error);
+
complete_all(&dreq->completion);

kref_put(&dreq->kref, nfs_direct_req_release);
diff -urpN -X dontdiff a/include/linux/aio.h b/include/linux/aio.h
--- a/include/linux/aio.h 2007-01-12 14:42:29.000000000 -0800
+++ b/include/linux/aio.h 2007-01-12 14:25:34.000000000 -0800
@@ -15,10 +15,9 @@
struct kioctx;

/* Notes on cancelling a kiocb:
- * If a kiocb is cancelled, aio_complete may return 0 to indicate
- * that cancel has not yet disposed of the kiocb. All cancel
- * operations *must* call aio_put_req to dispose of the kiocb
- * to guard against races with the completion code.
+ * If a kiocb is cancelled, aio_complete may not yet have disposed of
+ * the kiocb. All cancel operations *must* call aio_put_req to dispose
+ * of the kiocb to guard against races with the completion code.
*/
#define KIOCB_C_CANCELLED 0x01
#define KIOCB_C_COMPLETE 0x02
@@ -98,7 +97,7 @@ struct kiocb {
struct file *ki_filp;
struct kioctx *ki_ctx; /* may be NULL for sync ops */
int (*ki_cancel)(struct kiocb *, struct io_event *);
- ssize_t (*ki_retry)(struct kiocb *);
+ int (*ki_retry)(struct kiocb *);
void (*ki_dtor)(struct kiocb *);

union {
@@ -208,7 +207,7 @@ extern unsigned aio_max_size;
extern ssize_t FASTCALL(wait_on_sync_kiocb(struct kiocb *iocb));
extern int FASTCALL(aio_put_req(struct kiocb *iocb));
extern void FASTCALL(kick_iocb(struct kiocb *iocb));
-extern int FASTCALL(aio_complete(struct kiocb *iocb, long res, long res2));
+extern void FASTCALL(aio_complete(void *endio_data, ssize_t count, int err));
extern void FASTCALL(__put_ioctx(struct kioctx *ctx));
struct mm_struct;
extern void FASTCALL(exit_aio(struct mm_struct *mm));
diff -urpN -X dontdiff a/include/linux/fs.h b/include/linux/fs.h
--- a/include/linux/fs.h 2007-01-12 14:42:29.000000000 -0800
+++ b/include/linux/fs.h 2007-01-12 14:25:34.000000000 -0800
@@ -1109,6 +1109,8 @@ typedef int (*read_actor_t)(read_descrip
#define HAVE_COMPAT_IOCTL 1
#define HAVE_UNLOCKED_IOCTL 1

+typedef void (file_endio_t)(void *endio_data, ssize_t count, int err);
+
/*
* NOTE:
* read, write, poll, fsync, readv, writev, unlocked_ioctl and compat_ioctl

2007-01-16 02:04:28

by Nate Diller

[permalink] [raw]
Subject: [PATCH -mm 5/10][RFC] aio: make blk_directIO use file_endio_t

Convert the internals of blkdev_direct_IO to use a generic endio function,
instead of directly calling aio_complete. This may also fix some bugs/races
in this code, for instance it checks bio->bi_size instead of assuming it's
zero, and it atomically accumulates the bytes_done counter (assuming that
the bio completion handler can't race with itself *might* be valid here, but
the direct-io code makes no such assumption). I'm also pretty sure that
the address_space->directIO functions aren't supposed to mess with the
iocb->ki_pos or ->ki_left.

---

diff -urpN -X dontdiff a/fs/block_dev.c b/fs/block_dev.c
--- a/fs/block_dev.c 2007-01-12 20:26:25.000000000 -0800
+++ b/fs/block_dev.c 2007-01-12 20:23:55.000000000 -0800
@@ -131,10 +131,32 @@ blkdev_get_block(struct inode *inode, se
return 0;
}

-static int blk_end_aio(struct bio *bio, unsigned int bytes_done, int error)
+struct bdev_aio {
+ atomic_t iocount; /* refcount */
+ atomic_t bytes_done; /* byte counter */
+ int err; /* error handling */
+ file_endio_t *endio; /* end I/O notify fn */
+ void *endio_data; /* notify fn private data */
+};
+
+static void blk_io_put(struct bdev_aio *io)
+{
+ if (!atomic_dec_and_test(&io->iocount))
+ return;
+
+ if (!io->endio)
+ return complete((struct completion*)io->endio_data);
+
+ io->endio(io->endio_data, atomic_read(&io->bytes_done), io->err);
+ kfree(io);
+}
+
+static int blk_bio_endio(struct bio *bio, unsigned int bytes_done, int error)
{
- struct kiocb *iocb = bio->bi_private;
- atomic_t *bio_count = &iocb->ki_bio_count;
+ struct bdev_aio *io = bio->bi_private;
+
+ if (bio->bi_size)
+ return 1;

if (bio_data_dir(bio) == READ)
bio_check_pages_dirty(bio);
@@ -143,16 +165,21 @@ static int blk_end_aio(struct bio *bio,
bio_put(bio);
}

- /* iocb->ki_nbytes stores error code from LLDD */
- if (error)
- iocb->ki_nbytes = -EIO;
-
- if (atomic_dec_and_test(bio_count))
- aio_complete(iocb, iocb->ki_left, iocb->ki_nbytes);
+ if (error)
+ io->err = error;
+ atomic_add(bytes_done, &io->bytes_done);

+ blk_io_put(io);
return 0;
}

+static void blk_io_init(struct bdev_aio *io)
+{
+ atomic_set(&io->iocount, 1);
+ atomic_set(&io->bytes_done, 0);
+ io->err = 0;
+}
+
#define VEC_SIZE 16
struct pvec {
unsigned short nr;
@@ -208,24 +235,33 @@ blkdev_direct_IO(int rw, struct kiocb *i

unsigned long addr; /* user iovec address */
size_t count; /* user iovec len */
- size_t nbytes = iocb->ki_nbytes = iocb->ki_left; /* total xfer size */
+ size_t nbytes; /* total xfer size */
loff_t size; /* size of block device */
struct bio *bio;
- atomic_t *bio_count = &iocb->ki_bio_count;
+ struct bdev_aio stack_io, *io;
+ file_endio_t *endio = aio_complete;
+ void *endio_data = iocb;
struct page *page;
struct pvec pvec;

pvec.nr = 0;
pvec.idx = 0;

+ io = &stack_io;
+ if (endio) {
+ io = kmalloc(sizeof(struct bdev_aio), GFP_KERNEL);
+ if (!io)
+ return -ENOMEM;
+ }
+ blk_io_init(io);
+
if (pos & blocksize_mask)
return -EINVAL;

+ nbytes = iov_length(iov, nr_segs);
size = i_size_read(inode);
- if (pos + nbytes > size) {
+ if (pos + nbytes > size)
nbytes = size - pos;
- iocb->ki_left = nbytes;
- }

/*
* check first non-zero iov alignment, the remaining
@@ -237,7 +273,6 @@ blkdev_direct_IO(int rw, struct kiocb *i
if (addr & blocksize_mask || count & blocksize_mask)
return -EINVAL;
} while (!count && ++seg < nr_segs);
- atomic_set(bio_count, 1);

while (nbytes) {
/* roughly estimate number of bio vec needed */
@@ -248,8 +283,8 @@ blkdev_direct_IO(int rw, struct kiocb *i
/* bio_alloc should not fail with GFP_KERNEL flag */
bio = bio_alloc(GFP_KERNEL, nvec);
bio->bi_bdev = I_BDEV(inode);
- bio->bi_end_io = blk_end_aio;
- bio->bi_private = iocb;
+ bio->bi_end_io = blk_bio_endio;
+ bio->bi_private = io;
bio->bi_sector = pos >> blkbits;
same_bio:
cur_off = addr & ~PAGE_MASK;
@@ -289,18 +324,27 @@ same_bio:
/* bio is ready, submit it */
if (rw == READ)
bio_set_pages_dirty(bio);
- atomic_inc(bio_count);
+ atomic_inc(&io->iocount);
submit_bio(rw, bio);
}

completion:
- iocb->ki_left -= nbytes;
- nbytes = iocb->ki_left;
- iocb->ki_pos += nbytes;
+ if (!endio) {
+ struct completion event;
+
+ init_completion(&event);
+ io->endio = NULL;
+ io->endio_data = &event;
+
+ if (!atomic_dec_and_test(&io->iocount))
+ wait_for_completion(&event);
+ return io->err ? io->err : atomic_read(&io->bytes_done);
+ }

- if (atomic_dec_and_test(bio_count))
- aio_complete(iocb, nbytes, 0);
+ io->endio = endio;
+ io->endio_data = endio_data;

+ blk_io_put(io);
return -EIOCBQUEUED;

backout:
@@ -316,7 +360,7 @@ backout:
* if no bio was submmitted, return the error code.
* otherwise, proceed with pending I/O completion.
*/
- if (atomic_read(bio_count) == 1)
+ if (atomic_read(&io->iocount) == 1)
return PTR_ERR(page);
goto completion;
}

2007-01-16 02:05:06

by Nate Diller

[permalink] [raw]
Subject: [PATCH -mm 10/10][RFC] aio: convert file aio to file_endio_t

Convert the file_operation function prototypes to the new file_endio_t
approach. This doesn't change much code in the file systems, just some
straightforward conversions in mm/filemap.c. It also fixes up callers of
the f_op interface, including some code restructuring in fs/read_write.c,
fs/compat.c, and fs/aio.c. Other than that and removing ntfs's redundant
readv_writev thing, this should be pure callsite fixes.

---

Documentation/filesystems/Locking | 6 -
Documentation/filesystems/vfs.txt | 6 -
arch/s390/hypfs/inode.c | 16 +-
drivers/net/tun.c | 13 +-
fs/aio.c | 15 +-
fs/bad_inode.c | 10 +
fs/cifs/cifsfs.c | 10 +
fs/compat.c | 56 ---------
fs/ecryptfs/file.c | 16 +-
fs/ext3/file.c | 9 -
fs/ext4/file.c | 9 -
fs/fuse/dev.c | 13 +-
fs/nfs/file.c | 56 +++++----
fs/ntfs/file.c | 71 ++---------
fs/ocfs2/file.c | 22 ++-
fs/pipe.c | 12 +-
fs/read_write.c | 225 ++++++++++++--------------------------
fs/read_write.h | 8 -
fs/smbfs/file.c | 22 ++-
fs/udf/file.c | 11 +
fs/xfs/linux-2.6/xfs_file.c | 58 +++++----
fs/xfs/linux-2.6/xfs_lrw.c | 25 +---
fs/xfs/linux-2.6/xfs_lrw.h | 10 +
fs/xfs/linux-2.6/xfs_vnode.h | 20 ++-
include/linux/fs.h | 27 ++--
mm/filemap.c | 75 ++++++------
net/socket.c | 32 +++--
sound/core/pcm_native.c | 15 +-
28 files changed, 377 insertions(+), 491 deletions(-)

---

diff -urpN -X dontdiff a/arch/s390/hypfs/inode.c b/arch/s390/hypfs/inode.c
--- a/arch/s390/hypfs/inode.c 2007-01-12 20:25:40.935112866 -0800
+++ b/arch/s390/hypfs/inode.c 2007-01-13 19:40:25.044918589 -0800
@@ -134,12 +134,13 @@ static int hypfs_open(struct inode *inod
return 0;
}

-static ssize_t hypfs_aio_read(struct kiocb *iocb, const struct iovec *iov,
- unsigned long nr_segs, loff_t offset)
+static ssize_t hypfs_aio_read(struct file *filp, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos,
+ file_endio_t *endio, void *endio_data)
{
char *data;
size_t len;
- struct file *filp = iocb->ki_filp;
+ loff_t offset = *ppos;
/* XXX: temporary */
char __user *buf = iov[0].iov_base;
size_t count = iov[0].iov_len;
@@ -161,20 +162,21 @@ static ssize_t hypfs_aio_read(struct kio
count = -EFAULT;
goto out;
}
- iocb->ki_pos += count;
+ *ppos += count;
file_accessed(filp);
out:
return count;
}
-static ssize_t hypfs_aio_write(struct kiocb *iocb, const struct iovec *iov,
- unsigned long nr_segs, loff_t offset)
+static ssize_t hypfs_aio_write(struct file *file, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos,
+ file_endio_t *endio, void *endio_data)
{
int rc;
struct super_block *sb;
struct hypfs_sb_info *fs_info;
size_t count = iov_length(iov, nr_segs);

- sb = iocb->ki_filp->f_path.dentry->d_inode->i_sb;
+ sb = file->f_path.dentry->d_inode->i_sb;
fs_info = sb->s_fs_info;
/*
* Currently we only allow one update per second for two reasons:
diff -urpN -X dontdiff a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
--- a/Documentation/filesystems/Locking 2007-01-12 21:01:00.309877970 -0800
+++ b/Documentation/filesystems/Locking 2007-01-13 19:40:25.086904233 -0800
@@ -366,8 +366,10 @@ prototypes:
loff_t (*llseek) (struct file *, loff_t, int);
ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
- ssize_t (*aio_read) (struct kiocb *, const struct iovec *, unsigned long, loff_t);
- ssize_t (*aio_write) (struct kiocb *, const struct iovec *, unsigned long, loff_t);
+ ssize_t (*aio_read) (struct file *, const struct iovec *,
+ unsigned long, loff_t, file_endio_t *, void *);
+ ssize_t (*aio_write) (struct file *, const struct iovec *,
+ unsigned long, loff_t, file_endio_t *, void *);
int (*readdir) (struct file *, void *, filldir_t);
unsigned int (*poll) (struct file *, struct poll_table_struct *);
int (*ioctl) (struct inode *, struct file *, unsigned int,
diff -urpN -X dontdiff a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
--- a/Documentation/filesystems/vfs.txt 2007-01-12 21:01:00.319874553 -0800
+++ b/Documentation/filesystems/vfs.txt 2007-01-13 19:40:25.100899447 -0800
@@ -701,8 +701,10 @@ struct file_operations {
loff_t (*llseek) (struct file *, loff_t, int);
ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
- ssize_t (*aio_read) (struct kiocb *, const struct iovec *, unsigned long, loff_t);
- ssize_t (*aio_write) (struct kiocb *, const struct iovec *, unsigned long, loff_t);
+ ssize_t (*aio_read) (struct file *, const struct iovec *,
+ unsigned long, loff_t, file_endio_t *, void *);
+ ssize_t (*aio_write) (struct file *, const struct iovec *,
+ unsigned long, loff_t, file_endio_t *, void *);
int (*readdir) (struct file *, void *, filldir_t);
unsigned int (*poll) (struct file *, struct poll_table_struct *);
int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long);
diff -urpN -X dontdiff a/drivers/net/tun.c b/drivers/net/tun.c
--- a/drivers/net/tun.c 2007-01-12 20:25:40.954106374 -0800
+++ b/drivers/net/tun.c 2007-01-13 19:40:25.118893295 -0800
@@ -288,10 +288,11 @@ static inline size_t iov_total(const str
return len;
}

-static ssize_t tun_chr_aio_write(struct kiocb *iocb, const struct iovec *iv,
- unsigned long count, loff_t pos)
+static ssize_t tun_chr_aio_write(struct file *file, const struct iovec *iv,
+ unsigned long count, loff_t *ppos,
+ file_endio_t *endio, void *data)
{
- struct tun_struct *tun = iocb->ki_filp->private_data;
+ struct tun_struct *tun = file->private_data;

if (!tun)
return -EBADFD;
@@ -334,10 +335,10 @@ static __inline__ ssize_t tun_put_user(s
return total;
}

-static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv,
- unsigned long count, loff_t pos)
+static ssize_t tun_chr_aio_read(struct file *file, const struct iovec *iv,
+ unsigned long count, loff_t *ppos,
+ file_endio_t *endio, void *endio_data)
{
- struct file *file = iocb->ki_filp;
struct tun_struct *tun = file->private_data;
DECLARE_WAITQUEUE(wait, current);
struct sk_buff *skb;
diff -urpN -X dontdiff a/fs/aio.c b/fs/aio.c
--- a/fs/aio.c 2007-01-12 20:25:25.333445836 -0800
+++ b/fs/aio.c 2007-01-15 14:01:29.896558897 -0800
@@ -1317,8 +1317,8 @@ static int aio_rw_vect_retry(struct kioc
struct file *file = iocb->ki_filp;
struct address_space *mapping = file->f_mapping;
struct inode *inode = mapping->host;
- ssize_t (*rw_op)(struct kiocb *, const struct iovec *,
- unsigned long, loff_t);
+ ssize_t (*rw_op)(struct file *, const struct iovec *,
+ unsigned long, loff_t *, file_endio_t *, void *);
ssize_t ret = 0;
unsigned short opcode;

@@ -1332,9 +1332,9 @@ static int aio_rw_vect_retry(struct kioc
}

do {
- ret = rw_op(iocb, &iocb->ki_iovec[iocb->ki_cur_seg],
+ ret = rw_op(file, &iocb->ki_iovec[iocb->ki_cur_seg],
iocb->ki_nr_segs - iocb->ki_cur_seg,
- iocb->ki_pos);
+ &iocb->ki_pos, aio_complete, iocb);
if (ret > 0)
aio_advance_iovec(iocb, ret);

@@ -1376,9 +1376,12 @@ static ssize_t aio_setup_vectored_rw(int
{
ssize_t ret;

+ ret = iov_check_alloc(kiocb->ki_nbytes, 1, &kiocb->ki_iovec);
+ if (ret < 0)
+ goto out;
+
ret = rw_copy_check_uvector(type, (struct iovec __user *)kiocb->ki_buf,
- kiocb->ki_nbytes, 1,
- &kiocb->ki_inline_vec, &kiocb->ki_iovec);
+ kiocb->ki_nbytes, kiocb->ki_iovec);
if (ret < 0)
goto out;

diff -urpN -X dontdiff a/fs/bad_inode.c b/fs/bad_inode.c
--- a/fs/bad_inode.c 2007-01-12 20:25:25.345441736 -0800
+++ b/fs/bad_inode.c 2007-01-13 19:40:57.729742591 -0800
@@ -34,14 +34,16 @@ static ssize_t bad_file_write(struct fil
return -EIO;
}

-static ssize_t bad_file_aio_read(struct kiocb *iocb, const struct iovec *iov,
- unsigned long nr_segs, loff_t pos)
+static ssize_t bad_file_aio_read(struct file *file, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos,
+ file_endio_t *endio, void *endio_data)
{
return -EIO;
}

-static ssize_t bad_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
- unsigned long nr_segs, loff_t pos)
+static ssize_t bad_file_aio_write(struct file *file, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos,
+ file_endio_t *endio, void *endio_data)
{
return -EIO;
}
diff -urpN -X dontdiff a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
--- a/fs/cifs/cifsfs.c 2007-01-12 20:25:34.663256715 -0800
+++ b/fs/cifs/cifsfs.c 2007-01-15 14:04:03.132189370 -0800
@@ -495,13 +495,15 @@ cifs_get_sb(struct file_system_type *fs_
return simple_set_mnt(mnt, sb);
}

-static ssize_t cifs_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
- unsigned long nr_segs, loff_t pos)
+static ssize_t cifs_file_aio_write(struct file *file, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos,
+ file_endio_t *endio, void *endio_data)
{
- struct inode *inode = iocb->ki_filp->f_path.dentry->d_inode;
+ struct inode *inode = file->f_path.dentry->d_inode;
ssize_t written;

- written = generic_file_aio_write(iocb, iov, nr_segs, pos);
+ written = generic_file_aio_write(file, iov, nr_segs, ppos,
+ endio, endio_data);
if (!CIFS_I(inode)->clientCanCacheAll)
filemap_fdatawrite(inode->i_mapping);
return written;
diff -urpN -X dontdiff a/fs/compat.c b/fs/compat.c
--- a/fs/compat.c 2007-01-12 20:25:18.270859974 -0800
+++ b/fs/compat.c 2007-01-15 13:50:40.019640964 -0800
@@ -1166,33 +1166,15 @@ static ssize_t compat_do_readv_writev(in
struct iovec *iov=iovstack, *vector;
ssize_t ret;
int seg;
- io_fn_t fn;
- iov_fn_t fnv;

- /*
- * SuS says "The readv() function *may* fail if the iovcnt argument
- * was less than or equal to 0, or greater than {IOV_MAX}. Linux has
- * traditionally returned zero for zero segments, so...
- */
- ret = 0;
- if (nr_segs == 0)
+ ret = iov_check_alloc(nr_segs, UIO_FASTIOV, &iov);
+ if (ret <= 0)
goto out;

/*
* First get the "struct iovec" from user memory and
* verify all the pointers
*/
- ret = -EINVAL;
- if ((nr_segs > UIO_MAXIOV) || (nr_segs <= 0))
- goto out;
- if (!file->f_op)
- goto out;
- if (nr_segs > UIO_FASTIOV) {
- ret = -ENOMEM;
- iov = kmalloc(nr_segs*sizeof(struct iovec), GFP_KERNEL);
- if (!iov)
- goto out;
- }
ret = -EFAULT;
if (!access_ok(VERIFY_READ, uvector, nr_segs*sizeof(*uvector)))
goto out;
@@ -1227,44 +1209,12 @@ static ssize_t compat_do_readv_writev(in
uvector++;
vector++;
}
- if (tot_len == 0) {
- ret = 0;
- goto out;
- }
-
- ret = rw_verify_area(type, file, pos, tot_len);
- if (ret < 0)
- goto out;
-
- ret = security_file_permission(file, type == READ ? MAY_READ:MAY_WRITE);
- if (ret)
- goto out;
-
- fnv = NULL;
- if (type == READ) {
- fn = file->f_op->read;
- fnv = file->f_op->aio_read;
- } else {
- fn = (io_fn_t)file->f_op->write;
- fnv = file->f_op->aio_write;
- }

- if (fnv)
- ret = do_sync_readv_writev(file, iov, nr_segs, tot_len,
- pos, fnv);
- else
- ret = do_loop_readv_writev(file, iov, nr_segs, pos, fn);
+ ret = do_loop_readv_writev(type, file, iov, nr_segs, pos, tot_len);

out:
if (iov != iovstack)
kfree(iov);
- if ((ret + (type == READ)) > 0) {
- struct dentry *dentry = file->f_path.dentry;
- if (type == READ)
- fsnotify_access(dentry);
- else
- fsnotify_modify(dentry);
- }
return ret;
}

diff -urpN -X dontdiff a/fs/ecryptfs/file.c b/fs/ecryptfs/file.c
--- a/fs/ecryptfs/file.c 2007-01-12 20:25:40.984096123 -0800
+++ b/fs/ecryptfs/file.c 2007-01-13 19:17:04.574709230 -0800
@@ -100,27 +100,29 @@ out:
* returns without any errors. This is to be used only for file reads.
* The function to be used for directory reads is ecryptfs_read.
*/
-static ssize_t ecryptfs_read_update_atime(struct kiocb *iocb,
+static ssize_t ecryptfs_read_update_atime(struct file *file,
const struct iovec *iov,
- unsigned long nr_segs, loff_t pos)
+ unsigned long nr_segs, loff_t *ppos,
+ file_endio_t *endio, void *endio_data)
{
int rc;
struct dentry *lower_dentry;
struct vfsmount *lower_vfsmount;
- struct file *file = iocb->ki_filp;

- rc = generic_file_aio_read(iocb, iov, nr_segs, pos);
/*
* Even though this is a async interface, we need to wait
* for IO to finish to update atime
*/
- if (-EIOCBQUEUED == rc)
- rc = wait_on_sync_kiocb(iocb);
+ rc = generic_file_aio_read(file, iov, nr_segs, ppos, NULL, NULL);
if (rc >= 0) {
lower_dentry = ecryptfs_dentry_to_lower(file->f_path.dentry);
lower_vfsmount = ecryptfs_dentry_to_lower_mnt(file->f_path.dentry);
touch_atime(lower_vfsmount, lower_dentry);
- }
+ if (endio)
+ endio(endio_data, rc, 0);
+ } else if (endio)
+ endio(endio_data, 0, rc);
+
return rc;
}

diff -urpN -X dontdiff a/fs/ext3/file.c b/fs/ext3/file.c
--- a/fs/ext3/file.c 2007-01-12 20:25:34.687248514 -0800
+++ b/fs/ext3/file.c 2007-01-13 19:40:25.144884407 -0800
@@ -48,15 +48,16 @@ static int ext3_release_file (struct ino
}

static ssize_t
-ext3_file_write(struct kiocb *iocb, const struct iovec *iov,
- unsigned long nr_segs, loff_t pos)
+ext3_file_write(struct file *file, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos,
+ file_endio_t *endio, void *endio_data)
{
- struct file *file = iocb->ki_filp;
struct inode *inode = file->f_path.dentry->d_inode;
ssize_t ret;
int err;

- ret = generic_file_aio_write(iocb, iov, nr_segs, pos);
+ ret = generic_file_aio_write(file, iov, nr_segs, ppos,
+ endio, endio_data);

/*
* Skip flushing if there was an error, or if nothing was written.
diff -urpN -X dontdiff a/fs/ext4/file.c b/fs/ext4/file.c
--- a/fs/ext4/file.c 2007-01-12 20:25:34.713239630 -0800
+++ b/fs/ext4/file.c 2007-01-13 19:40:25.165877229 -0800
@@ -48,15 +48,16 @@ static int ext4_release_file (struct ino
}

static ssize_t
-ext4_file_write(struct kiocb *iocb, const struct iovec *iov,
- unsigned long nr_segs, loff_t pos)
+ext4_file_write(struct file *file, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos,
+ file_endio_t *endio, void *endio_data)
{
- struct file *file = iocb->ki_filp;
struct inode *inode = file->f_path.dentry->d_inode;
ssize_t ret;
int err;

- ret = generic_file_aio_write(iocb, iov, nr_segs, pos);
+ ret = generic_file_aio_write(file, iov, nr_segs, ppos,
+ endio, endio_data);

/*
* Skip flushing if there was an error, or if nothing was written.
diff -urpN -X dontdiff a/fs/fuse/dev.c b/fs/fuse/dev.c
--- a/fs/fuse/dev.c 2007-01-12 20:25:40.991093731 -0800
+++ b/fs/fuse/dev.c 2007-01-13 19:40:25.185870393 -0800
@@ -680,15 +680,15 @@ static int fuse_read_interrupt(struct fu
* request_end(). Otherwise add it to the processing list, and set
* the 'sent' flag.
*/
-static ssize_t fuse_dev_read(struct kiocb *iocb, const struct iovec *iov,
- unsigned long nr_segs, loff_t pos)
+static ssize_t fuse_dev_read(struct file *file, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos,
+ file_endio_t *endio, void *endio_data)
{
int err;
struct fuse_req *req;
struct fuse_in *in;
struct fuse_copy_state cs;
unsigned reqsize;
- struct file *file = iocb->ki_filp;
struct fuse_conn *fc = fuse_get_conn(file);
if (!fc)
return -EPERM;
@@ -806,15 +806,16 @@ static int copy_out_args(struct fuse_cop
* it from the list and copy the rest of the buffer to the request.
* The request is finished by calling request_end()
*/
-static ssize_t fuse_dev_write(struct kiocb *iocb, const struct iovec *iov,
- unsigned long nr_segs, loff_t pos)
+static ssize_t fuse_dev_write(struct file *file, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos,
+ file_endio_t *endio, void *endio_data)
{
int err;
unsigned nbytes = iov_length(iov, nr_segs);
struct fuse_req *req;
struct fuse_out_header oh;
struct fuse_copy_state cs;
- struct fuse_conn *fc = fuse_get_conn(iocb->ki_filp);
+ struct fuse_conn *fc = fuse_get_conn(file);
if (!fc)
return -EPERM;

diff -urpN -X dontdiff a/fs/nfs/file.c b/fs/nfs/file.c
--- a/fs/nfs/file.c 2007-01-12 20:29:14.876994716 -0800
+++ b/fs/nfs/file.c 2007-01-13 19:40:25.207862873 -0800
@@ -41,10 +41,12 @@ static int nfs_file_release(struct inode
static loff_t nfs_file_llseek(struct file *file, loff_t offset, int origin);
static int nfs_file_mmap(struct file *, struct vm_area_struct *);
static ssize_t nfs_file_sendfile(struct file *, loff_t *, size_t, read_actor_t, void *);
-static ssize_t nfs_file_read(struct kiocb *, const struct iovec *iov,
- unsigned long nr_segs, loff_t pos);
-static ssize_t nfs_file_write(struct kiocb *, const struct iovec *iov,
- unsigned long nr_segs, loff_t pos);
+static ssize_t nfs_file_read(struct file *, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos,
+ file_endio_t *endio, void *endio_data);
+static ssize_t nfs_file_write(struct file *file, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos,
+ file_endio_t *endio, void *endio_data);
static int nfs_file_flush(struct file *, fl_owner_t id);
static int nfs_fsync(struct file *, struct dentry *dentry, int datasync);
static int nfs_check_flags(int flags);
@@ -198,28 +200,30 @@ nfs_file_flush(struct file *file, fl_own
}

static ssize_t
-nfs_file_read(struct kiocb *iocb, const struct iovec *iov,
- unsigned long nr_segs, loff_t pos)
+nfs_file_read(struct file *file, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos,
+ file_endio_t *endio, void *endio_data)
{
- struct dentry * dentry = iocb->ki_filp->f_path.dentry;
+ struct dentry * dentry = file->f_path.dentry;
struct inode * inode = dentry->d_inode;
ssize_t result;
size_t count = iov_length(iov, nr_segs);

#ifdef CONFIG_NFS_DIRECTIO
- if (iocb->ki_filp->f_flags & O_DIRECT)
- return nfs_file_direct_read(iocb->ki_filp, iov, nr_segs,
- &iocb->ki_pos, aio_complete, iocb);
+ if (file->f_flags & O_DIRECT)
+ return nfs_file_direct_read(file, iov, nr_segs,
+ ppos, endio, endio_data);
#endif

dfprintk(VFS, "nfs: read(%s/%s, %lu@%lu)\n",
dentry->d_parent->d_name.name, dentry->d_name.name,
- (unsigned long) count, (unsigned long) pos);
+ (unsigned long) count, (unsigned long) *ppos);

- result = nfs_revalidate_mapping(inode, iocb->ki_filp->f_mapping);
+ result = nfs_revalidate_mapping(inode, file->f_mapping);
nfs_add_stats(inode, NFSIOS_NORMALREADBYTES, count);
if (!result)
- result = generic_file_aio_read(iocb, iov, nr_segs, pos);
+ result = generic_file_aio_read(file, iov, nr_segs, ppos,
+ endio, endio_data);
return result;
}

@@ -341,23 +345,24 @@ const struct address_space_operations nf
.launder_page = nfs_launder_page,
};

-static ssize_t nfs_file_write(struct kiocb *iocb, const struct iovec *iov,
- unsigned long nr_segs, loff_t pos)
+static ssize_t nfs_file_write(struct file *file, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos,
+ file_endio_t *endio, void *endio_data)
{
- struct dentry * dentry = iocb->ki_filp->f_path.dentry;
+ struct dentry * dentry = file->f_path.dentry;
struct inode * inode = dentry->d_inode;
ssize_t result;
size_t count = iov_length(iov, nr_segs);

#ifdef CONFIG_NFS_DIRECTIO
- if (iocb->ki_filp->f_flags & O_DIRECT)
- return nfs_file_direct_write(iocb->ki_filp, iov, nr_segs,
- &iocb->ki_pos, aio_complete, iocb);
+ if (file->f_flags & O_DIRECT)
+ return nfs_file_direct_write(file, iov, nr_segs,
+ ppos, endio, endio_data);
#endif

dfprintk(VFS, "nfs: write(%s/%s(%ld), %lu@%Ld)\n",
dentry->d_parent->d_name.name, dentry->d_name.name,
- inode->i_ino, (unsigned long) count, (long long) pos);
+ inode->i_ino, (unsigned long) count, (long long) *ppos);

result = -EBUSY;
if (IS_SWAPFILE(inode))
@@ -365,8 +370,8 @@ static ssize_t nfs_file_write(struct kio
/*
* O_APPEND implies that we must revalidate the file length.
*/
- if (iocb->ki_filp->f_flags & O_APPEND) {
- result = nfs_revalidate_file_size(inode, iocb->ki_filp);
+ if (file->f_flags & O_APPEND) {
+ result = nfs_revalidate_file_size(inode, file);
if (result)
goto out;
}
@@ -376,10 +381,11 @@ static ssize_t nfs_file_write(struct kio
goto out;

nfs_add_stats(inode, NFSIOS_NORMALWRITTENBYTES, count);
- result = generic_file_aio_write(iocb, iov, nr_segs, pos);
+ result = generic_file_aio_write(file, iov, nr_segs, ppos,
+ endio, endio_data);
/* Return error values for O_SYNC and IS_SYNC() */
- if (result >= 0 && (IS_SYNC(inode) || (iocb->ki_filp->f_flags & O_SYNC))) {
- int err = nfs_fsync(iocb->ki_filp, dentry, 1);
+ if (result >= 0 && (IS_SYNC(inode) || (file->f_flags & O_SYNC))) {
+ int err = nfs_fsync(file, dentry, 1);
if (err < 0)
result = err;
}
diff -urpN -X dontdiff a/fs/ntfs/file.c b/fs/ntfs/file.c
--- a/fs/ntfs/file.c 2007-01-12 20:25:34.804208535 -0800
+++ b/fs/ntfs/file.c 2007-01-13 19:40:25.229855353 -0800
@@ -1808,11 +1808,11 @@ err_out:
*
* Locking: The vfs is holding ->i_mutex on the inode.
*/
-static ssize_t ntfs_file_buffered_write(struct kiocb *iocb,
+static ssize_t ntfs_file_buffered_write(struct file *file,
const struct iovec *iov, unsigned long nr_segs,
- loff_t pos, loff_t *ppos, size_t count)
+ loff_t pos, loff_t *ppos, size_t count,
+ file_endio_t *endio, void *endio_data)
{
- struct file *file = iocb->ki_filp;
struct address_space *mapping = file->f_mapping;
struct inode *vi = mapping->host;
ntfs_inode *ni = NTFS_I(vi);
@@ -2108,7 +2108,7 @@ err_out:
/* For now, when the user asks for O_SYNC, we actually give O_DSYNC. */
if (likely(!status)) {
if (unlikely((file->f_flags & O_SYNC) || IS_SYNC(vi))) {
- if (!mapping->a_ops->writepage || !is_sync_kiocb(iocb))
+ if (!mapping->a_ops->writepage || endio)
status = generic_osync_inode(vi, mapping,
OSYNC_METADATA|OSYNC_DATA);
}
@@ -2123,10 +2123,10 @@ err_out:
/**
* ntfs_file_aio_write_nolock -
*/
-static ssize_t ntfs_file_aio_write_nolock(struct kiocb *iocb,
- const struct iovec *iov, unsigned long nr_segs, loff_t *ppos)
+static ssize_t ntfs_file_aio_write_nolock(struct file *file,
+ const struct iovec *iov, unsigned long nr_segs, loff_t *ppos,
+ file_endio_t *endio, void *endio_data)
{
- struct file *file = iocb->ki_filp;
struct address_space *mapping = file->f_mapping;
struct inode *inode = mapping->host;
loff_t pos;
@@ -2166,8 +2166,8 @@ static ssize_t ntfs_file_aio_write_noloc
if (err)
goto out;
file_update_time(file);
- written = ntfs_file_buffered_write(iocb, iov, nr_segs, pos, ppos,
- count);
+ written = ntfs_file_buffered_write(file, iov, nr_segs, pos, ppos,
+ count, endio, endio_data);
out:
current->backing_dev_info = NULL;
return written ? written : err;
@@ -2176,46 +2176,17 @@ out:
/**
* ntfs_file_aio_write -
*/
-static ssize_t ntfs_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
- unsigned long nr_segs, loff_t pos)
+static ssize_t ntfs_file_aio_write(struct file *file, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos, file_endio_t *endio,
+ void *endio_data)
{
- struct file *file = iocb->ki_filp;
struct address_space *mapping = file->f_mapping;
struct inode *inode = mapping->host;
ssize_t ret;

- BUG_ON(iocb->ki_pos != pos);
-
- mutex_lock(&inode->i_mutex);
- ret = ntfs_file_aio_write_nolock(iocb, iov, nr_segs, &iocb->ki_pos);
- mutex_unlock(&inode->i_mutex);
- if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
- int err = sync_page_range(inode, mapping, pos, ret);
- if (err < 0)
- ret = err;
- }
- return ret;
-}
-
-/**
- * ntfs_file_writev -
- *
- * Basically the same as generic_file_writev() except that it ends up calling
- * ntfs_file_aio_write_nolock() instead of __generic_file_aio_write_nolock().
- */
-static ssize_t ntfs_file_writev(struct file *file, const struct iovec *iov,
- unsigned long nr_segs, loff_t *ppos)
-{
- struct address_space *mapping = file->f_mapping;
- struct inode *inode = mapping->host;
- struct kiocb kiocb;
- ssize_t ret;
-
mutex_lock(&inode->i_mutex);
- init_sync_kiocb(&kiocb, file);
- ret = ntfs_file_aio_write_nolock(&kiocb, iov, nr_segs, ppos);
- if (ret == -EIOCBQUEUED)
- ret = wait_on_sync_kiocb(&kiocb);
+ ret = ntfs_file_aio_write_nolock(file, iov, nr_segs, ppos,
+ endio, endio_data);
mutex_unlock(&inode->i_mutex);
if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
int err = sync_page_range(inode, mapping, *ppos - ret, ret);
@@ -2226,18 +2197,6 @@ static ssize_t ntfs_file_writev(struct f
}

/**
- * ntfs_file_write - simple wrapper for ntfs_file_writev()
- */
-static ssize_t ntfs_file_write(struct file *file, const char __user *buf,
- size_t count, loff_t *ppos)
-{
- struct iovec local_iov = { .iov_base = (void __user *)buf,
- .iov_len = count };
-
- return ntfs_file_writev(file, &local_iov, 1, ppos);
-}
-
-/**
* ntfs_file_fsync - sync a file to disk
* @filp: file to be synced
* @dentry: dentry describing the file to sync
@@ -2299,7 +2258,7 @@ const struct file_operations ntfs_file_o
.read = do_sync_read, /* Read from file. */
.aio_read = generic_file_aio_read, /* Async read from file. */
#ifdef NTFS_RW
- .write = ntfs_file_write, /* Write to file. */
+ .write = do_sync_write, /* Write to file. */
.aio_write = ntfs_file_aio_write, /* Async write to file. */
/*.release = ,*/ /* Last file is closed. See
fs/ext2/file.c::
diff -urpN -X dontdiff a/fs/ocfs2/file.c b/fs/ocfs2/file.c
--- a/fs/ocfs2/file.c 2007-01-12 20:57:42.316535054 -0800
+++ b/fs/ocfs2/file.c 2007-01-15 14:05:56.482450568 -0800
@@ -1141,13 +1141,14 @@ out:
return ret;
}

-static ssize_t ocfs2_file_aio_write(struct kiocb *iocb,
+static ssize_t ocfs2_file_aio_write(struct file *filp,
const struct iovec *iov,
unsigned long nr_segs,
- loff_t pos)
+ loff_t *ppos,
+ file_endio_t *endio,
+ void *endio_data)
{
int ret, rw_level, have_alloc_sem = 0;
- struct file *filp = iocb->ki_filp;
struct inode *inode = filp->f_path.dentry->d_inode;
int appending = filp->f_flags & O_APPEND ? 1 : 0;

@@ -1176,14 +1177,15 @@ static ssize_t ocfs2_file_aio_write(stru
goto out;
}

- ret = ocfs2_prepare_inode_for_write(filp->f_path.dentry, &iocb->ki_pos,
+ ret = ocfs2_prepare_inode_for_write(filp->f_path.dentry, ppos,
iov_length(iov, nr_segs), appending);
if (ret < 0) {
mlog_errno(ret);
goto out;
}

- ret = generic_file_aio_write_nolock(iocb, iov, nr_segs, iocb->ki_pos);
+ ret = generic_file_aio_write_nolock(filp, iov, nr_segs, ppos,
+ endio, endio_data);

/* buffered aio wouldn't have proper lock coverage today */
BUG_ON(ret == -EIOCBQUEUED && !(filp->f_flags & O_DIRECT));
@@ -1286,13 +1288,14 @@ bail:
return ret;
}

-static ssize_t ocfs2_file_aio_read(struct kiocb *iocb,
+static ssize_t ocfs2_file_aio_read(struct file *filp,
const struct iovec *iov,
unsigned long nr_segs,
- loff_t pos)
+ loff_t *ppos,
+ file_endio_t *endio,
+ void *endio_data)
{
int ret = 0, rw_level = -1, have_alloc_sem = 0, lock_level = 0;
- struct file *filp = iocb->ki_filp;
struct inode *inode = filp->f_path.dentry->d_inode;

mlog_entry("(0x%p, %u, '%.*s')\n", filp,
@@ -1338,7 +1341,8 @@ static ssize_t ocfs2_file_aio_read(struc
}
ocfs2_meta_unlock(inode, lock_level);

- ret = generic_file_aio_read(iocb, iov, nr_segs, iocb->ki_pos);
+ ret = generic_file_aio_read(filp, iov, nr_segs, ppos,
+ endio, endio_data);
if (ret == -EINVAL)
mlog(ML_ERROR, "generic_file_aio_read returned -EINVAL\n");

diff -urpN -X dontdiff a/fs/pipe.c b/fs/pipe.c
--- a/fs/pipe.c 2007-01-12 20:25:41.014085872 -0800
+++ b/fs/pipe.c 2007-01-13 19:40:25.273840313 -0800
@@ -218,10 +218,10 @@ static const struct pipe_buf_operations
};

static ssize_t
-pipe_read(struct kiocb *iocb, const struct iovec *_iov,
- unsigned long nr_segs, loff_t pos)
+pipe_read(struct file *filp, const struct iovec *_iov,
+ unsigned long nr_segs, loff_t *ppos,
+ file_endio_t *endio, void *endio_data)
{
- struct file *filp = iocb->ki_filp;
struct inode *inode = filp->f_path.dentry->d_inode;
struct pipe_inode_info *pipe;
int do_wakeup;
@@ -331,10 +331,10 @@ redo:
}

static ssize_t
-pipe_write(struct kiocb *iocb, const struct iovec *_iov,
- unsigned long nr_segs, loff_t ppos)
+pipe_write(struct file *filp, const struct iovec *_iov,
+ unsigned long nr_segs, loff_t *ppos,
+ file_endio_t *endio, void *endio_data)
{
- struct file *filp = iocb->ki_filp;
struct inode *inode = filp->f_path.dentry->d_inode;
struct pipe_inode_info *pipe;
ssize_t ret;
diff -urpN -X dontdiff a/fs/read_write.c b/fs/read_write.c
--- a/fs/read_write.c 2007-01-12 20:25:18.297850748 -0800
+++ b/fs/read_write.c 2007-01-15 14:01:11.288918160 -0800
@@ -217,37 +217,11 @@ Einval:
return -EINVAL;
}

-static void wait_on_retry_sync_kiocb(struct kiocb *iocb)
-{
- set_current_state(TASK_UNINTERRUPTIBLE);
- if (!kiocbIsKicked(iocb))
- schedule();
- else
- kiocbClearKicked(iocb);
- __set_current_state(TASK_RUNNING);
-}
-
ssize_t do_sync_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos)
{
struct iovec iov = { .iov_base = buf, .iov_len = len };
- struct kiocb kiocb;
- ssize_t ret;

- init_sync_kiocb(&kiocb, filp);
- kiocb.ki_pos = *ppos;
- kiocb.ki_left = len;
-
- for (;;) {
- ret = filp->f_op->aio_read(&kiocb, &iov, 1, kiocb.ki_pos);
- if (ret != -EIOCBRETRY)
- break;
- wait_on_retry_sync_kiocb(&kiocb);
- }
-
- if (-EIOCBQUEUED == ret)
- ret = wait_on_sync_kiocb(&kiocb);
- *ppos = kiocb.ki_pos;
- return ret;
+ return filp->f_op->aio_read(filp, &iov, 1, ppos, NULL, NULL);
}

EXPORT_SYMBOL(do_sync_read);
@@ -288,24 +262,8 @@ EXPORT_SYMBOL(vfs_read);
ssize_t do_sync_write(struct file *filp, const char __user *buf, size_t len, loff_t *ppos)
{
struct iovec iov = { .iov_base = (void __user *)buf, .iov_len = len };
- struct kiocb kiocb;
- ssize_t ret;
-
- init_sync_kiocb(&kiocb, filp);
- kiocb.ki_pos = *ppos;
- kiocb.ki_left = len;
-
- for (;;) {
- ret = filp->f_op->aio_write(&kiocb, &iov, 1, kiocb.ki_pos);
- if (ret != -EIOCBRETRY)
- break;
- wait_on_retry_sync_kiocb(&kiocb);
- }

- if (-EIOCBQUEUED == ret)
- ret = wait_on_sync_kiocb(&kiocb);
- *ppos = kiocb.ki_pos;
- return ret;
+ return filp->f_op->aio_write(filp, &iov, 1, ppos, NULL, NULL);
}

EXPORT_SYMBOL(do_sync_write);
@@ -450,37 +408,41 @@ unsigned long iov_shorten(struct iovec *
return seg;
}

-ssize_t do_sync_readv_writev(struct file *filp, const struct iovec *iov,
- unsigned long nr_segs, size_t len, loff_t *ppos, iov_fn_t fn)
+ssize_t do_loop_readv_writev(int type, struct file *filp, struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos, size_t count)
{
- struct kiocb kiocb;
- ssize_t ret;
+ struct iovec *vector = iov;
+ io_fn_t fn = NULL;
+ ssize_t ret = 0;

- init_sync_kiocb(&kiocb, filp);
- kiocb.ki_pos = *ppos;
- kiocb.ki_left = len;
- kiocb.ki_nbytes = len;
-
- for (;;) {
- ret = fn(&kiocb, iov, nr_segs, kiocb.ki_pos);
- if (ret != -EIOCBRETRY)
- break;
- wait_on_retry_sync_kiocb(&kiocb);
- }
+ if (count == 0)
+ goto out;

- if (ret == -EIOCBQUEUED)
- ret = wait_on_sync_kiocb(&kiocb);
- *ppos = kiocb.ki_pos;
- return ret;
-}
+ ret = rw_verify_area(type, filp, ppos, count);
+ if (ret < 0)
+ goto out;

-/* Do it by hand, with file-ops */
-ssize_t do_loop_readv_writev(struct file *filp, struct iovec *iov,
- unsigned long nr_segs, loff_t *ppos, io_fn_t fn)
-{
- struct iovec *vector = iov;
- ssize_t ret = 0;
+ ret = security_file_permission(filp, type == READ ? MAY_READ : MAY_WRITE);
+ if (ret)
+ goto out;
+
+ if (type == READ) {
+ if (filp->f_op->aio_read)
+ ret = filp->f_op->aio_read(filp, iov, nr_segs, ppos,
+ NULL, NULL);
+ fn = filp->f_op->read;
+ } else {
+ if (filp->f_op->aio_write)
+ ret = filp->f_op->aio_write(filp, iov, nr_segs, ppos,
+ NULL, NULL);
+ fn = filp->f_op->write;
+ }

+ if (!fn)
+ goto out;
+ /*
+ * There's no aio_* function, so do each vector by hand
+ */
while (nr_segs > 0) {
void __user *base;
size_t len;
@@ -502,51 +464,54 @@ ssize_t do_loop_readv_writev(struct file
if (nr != len)
break;
}
-
+out:
+ if ((ret + (type == READ)) > 0) {
+ if (type == READ)
+ fsnotify_access(filp->f_path.dentry);
+ else
+ fsnotify_modify(filp->f_path.dentry);
+ }
return ret;
}

/* A write operation does a read from user space and vice versa */
#define vrfy_dir(type) ((type) == READ ? VERIFY_WRITE : VERIFY_READ)

-ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
- unsigned long nr_segs, unsigned long fast_segs,
- struct iovec *fast_pointer,
- struct iovec **ret_pointer)
- {
- unsigned long seg;
- ssize_t ret;
- struct iovec *iov = fast_pointer;
+ssize_t iov_check_alloc(unsigned long nr_segs, unsigned long fast_segs,
+ struct iovec **ret_ptr)
+{
+ /*
+ * SuS says "The readv() function *may* fail if the iovcnt argument
+ * was less than or equal to 0, or greater than {IOV_MAX}. Linux has
+ * traditionally returned zero for zero segments, so...
+ */
+ if (nr_segs == 0)
+ return 0;

- /*
- * SuS says "The readv() function *may* fail if the iovcnt argument
- * was less than or equal to 0, or greater than {IOV_MAX}. Linux has
- * traditionally returned zero for zero segments, so...
- */
- if (nr_segs == 0) {
- ret = 0;
- goto out;
+ if (nr_segs > UIO_MAXIOV)
+ return -EINVAL;
+
+ if (nr_segs > fast_segs) {
+ *ret_ptr = kmalloc(nr_segs*sizeof(struct iovec), GFP_KERNEL);
+ if (*ret_ptr == NULL)
+ return -ENOMEM;
}
+ return 0;
+}
+
+
+ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
+ unsigned long nr_segs, struct iovec *iov)
+{
+ unsigned long seg;
+ ssize_t ret = 0;

/*
* First get the "struct iovec" from user memory and
* verify all the pointers
*/
- if (nr_segs > UIO_MAXIOV) {
- ret = -EINVAL;
- goto out;
- }
- if (nr_segs > fast_segs) {
- iov = kmalloc(nr_segs*sizeof(struct iovec), GFP_KERNEL);
- if (iov == NULL) {
- ret = -ENOMEM;
- goto out;
- }
- }
- if (copy_from_user(iov, uvector, nr_segs*sizeof(*uvector))) {
- ret = -EFAULT;
- goto out;
- }
+ if (copy_from_user(iov, uvector, nr_segs*sizeof(*uvector)))
+ return -EFAULT;

/*
* According to the Single Unix Specification we should return EINVAL
@@ -554,26 +519,20 @@ ssize_t rw_copy_check_uvector(int type,
* total length would overflow the ssize_t return value of the
* system call.
*/
- ret = 0;
for (seg = 0; seg < nr_segs; seg++) {
void __user *buf = iov[seg].iov_base;
ssize_t len = (ssize_t)iov[seg].iov_len;

/* see if we we're about to use an invalid len or if
* it's about to overflow ssize_t */
- if (len < 0 || (ret + len < ret)) {
- ret = -EINVAL;
- goto out;
- }
- if (unlikely(!access_ok(vrfy_dir(type), buf, len))) {
- ret = -EFAULT;
- goto out;
- }
+ if (len < 0 || (ret + len < ret))
+ return -EINVAL;
+
+ if (unlikely(!access_ok(vrfy_dir(type), buf, len)))
+ return -EFAULT;

ret += len;
}
-out:
- *ret_pointer = iov;
return ret;
}

@@ -581,55 +540,23 @@ static ssize_t do_readv_writev(int type,
const struct iovec __user * uvector,
unsigned long nr_segs, loff_t *pos)
{
- size_t tot_len;
struct iovec iovstack[UIO_FASTIOV];
struct iovec *iov = iovstack;
ssize_t ret;
- io_fn_t fn;
- iov_fn_t fnv;

- if (!file->f_op) {
- ret = -EINVAL;
- goto out;
- }
-
- ret = rw_copy_check_uvector(type, uvector, nr_segs,
- ARRAY_SIZE(iovstack), iovstack, &iov);
+ ret = iov_check_alloc(nr_segs, UIO_FASTIOV, &iov);
if (ret <= 0)
goto out;

- tot_len = ret;
- ret = rw_verify_area(type, file, pos, tot_len);
- if (ret < 0)
- goto out;
- ret = security_file_permission(file, type == READ ? MAY_READ : MAY_WRITE);
- if (ret)
+ ret = rw_copy_check_uvector(type, uvector, nr_segs, iov);
+ if (ret <= 0)
goto out;

- fnv = NULL;
- if (type == READ) {
- fn = file->f_op->read;
- fnv = file->f_op->aio_read;
- } else {
- fn = (io_fn_t)file->f_op->write;
- fnv = file->f_op->aio_write;
- }
-
- if (fnv)
- ret = do_sync_readv_writev(file, iov, nr_segs, tot_len,
- pos, fnv);
- else
- ret = do_loop_readv_writev(file, iov, nr_segs, pos, fn);
+ ret = do_loop_readv_writev(type, file, iov, nr_segs, pos, ret);

out:
if (iov != iovstack)
kfree(iov);
- if ((ret + (type == READ)) > 0) {
- if (type == READ)
- fsnotify_access(file->f_path.dentry);
- else
- fsnotify_modify(file->f_path.dentry);
- }
return ret;
}

diff -urpN -X dontdiff a/fs/read_write.h b/fs/read_write.h
--- a/fs/read_write.h 2007-01-12 20:25:18.320842889 -0800
+++ b/fs/read_write.h 2007-01-15 13:55:49.408920229 -0800
@@ -5,10 +5,6 @@


typedef ssize_t (*io_fn_t)(struct file *, char __user *, size_t, loff_t *);
-typedef ssize_t (*iov_fn_t)(struct kiocb *, const struct iovec *,
- unsigned long, loff_t);

-ssize_t do_sync_readv_writev(struct file *filp, const struct iovec *iov,
- unsigned long nr_segs, size_t len, loff_t *ppos, iov_fn_t fn);
-ssize_t do_loop_readv_writev(struct file *filp, struct iovec *iov,
- unsigned long nr_segs, loff_t *ppos, io_fn_t fn);
+ssize_t do_loop_readv_writev(int type, struct file *filp, struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos, size_t count);
diff -urpN -X dontdiff a/fs/smbfs/file.c b/fs/smbfs/file.c
--- a/fs/smbfs/file.c 2007-01-12 20:25:41.022083138 -0800
+++ b/fs/smbfs/file.c 2007-01-13 19:40:25.285836211 -0800
@@ -214,15 +214,15 @@ smb_updatepage(struct file *file, struct
}

static ssize_t
-smb_file_aio_read(struct kiocb *iocb, const struct iovec *iov,
- unsigned long nr_segs, loff_t pos)
+smb_file_aio_read(struct file *file, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos,
+ file_endio_t *endio, void *endio_data)
{
- struct file * file = iocb->ki_filp;
struct dentry * dentry = file->f_path.dentry;
ssize_t status;

VERBOSE("file %s/%s, count=%lu@%lu\n", DENTRY_PATH(dentry),
- (unsigned long) iov_length(iov, nr_segs), (unsigned long) pos);
+ (unsigned long) iov_length(iov, nr_segs), (unsigned long) *ppos);

status = smb_revalidate_inode(dentry);
if (status) {
@@ -235,7 +235,8 @@ smb_file_aio_read(struct kiocb *iocb, co
(long)dentry->d_inode->i_size,
dentry->d_inode->i_flags, dentry->d_inode->i_atime);

- status = generic_file_aio_read(iocb, iov, nr_segs, pos);
+ status = generic_file_aio_read(file, iov, nr_segs, ppos,
+ endio, endio_data);
out:
return status;
}
@@ -319,16 +320,16 @@ const struct address_space_operations sm
* Write to a file (through the page cache).
*/
static ssize_t
-smb_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
- unsigned long nr_segs, loff_t pos)
+smb_file_aio_write(struct file *file, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos,
+ file_endio_t *endio, void *endio_data)
{
- struct file * file = iocb->ki_filp;
struct dentry * dentry = file->f_path.dentry;
ssize_t result;

VERBOSE("file %s/%s, count=%lu@%lu\n",
DENTRY_PATH(dentry),
- (unsigned long) iov_length(iov, nr_segs), (unsigned long) pos);
+ (unsigned long) iov_length(iov, nr_segs), (unsigned long) *ppos);

result = smb_revalidate_inode(dentry);
if (result) {
@@ -342,7 +343,8 @@ smb_file_aio_write(struct kiocb *iocb, c
goto out;

if (iov_length(iov, nr_segs) > 0) {
- result = generic_file_aio_write(iocb, iov, nr_segs, pos);
+ result = generic_file_aio_write(file, iov, nr_segs, ppos,
+ endio, endio_data);
VERBOSE("pos=%ld, size=%ld, mtime=%ld, atime=%ld\n",
(long) file->f_pos, (long) dentry->d_inode->i_size,
dentry->d_inode->i_mtime, dentry->d_inode->i_atime);
diff -urpN -X dontdiff a/fs/udf/file.c b/fs/udf/file.c
--- a/fs/udf/file.c 2007-01-12 20:25:34.897176756 -0800
+++ b/fs/udf/file.c 2007-01-13 19:40:25.307828691 -0800
@@ -102,11 +102,11 @@ const struct address_space_operations ud
.commit_write = udf_adinicb_commit_write,
};

-static ssize_t udf_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
- unsigned long nr_segs, loff_t ppos)
+static ssize_t udf_file_aio_write(struct file *file, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos,
+ file_endio_t *endio, void *endio_data)
{
ssize_t retval;
- struct file *file = iocb->ki_filp;
struct inode *inode = file->f_path.dentry->d_inode;
int err, pos;
size_t count = iov_length(iov, nr_segs);
@@ -116,7 +116,7 @@ static ssize_t udf_file_aio_write(struct
if (file->f_flags & O_APPEND)
pos = inode->i_size;
else
- pos = ppos;
+ pos = *ppos;

if (inode->i_sb->s_blocksize < (udf_file_entry_alloc_offset(inode) +
pos + count))
@@ -137,7 +137,8 @@ static ssize_t udf_file_aio_write(struct
}
}

- retval = generic_file_aio_write(iocb, iov, nr_segs, ppos);
+ retval = generic_file_aio_write(file, iov, nr_segs, ppos,
+ endio, endio_data);

if (retval > 0)
mark_inode_dirty(inode);
diff -urpN -X dontdiff a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
--- a/fs/xfs/linux-2.6/xfs_file.c 2007-01-12 20:25:41.029080746 -0800
+++ b/fs/xfs/linux-2.6/xfs_file.c 2007-01-13 19:40:25.321823906 -0800
@@ -48,79 +48,91 @@ static struct vm_operations_struct xfs_d

STATIC inline ssize_t
__xfs_file_read(
- struct kiocb *iocb,
+ struct file *file,
const struct iovec *iov,
unsigned long nr_segs,
int ioflags,
- loff_t pos)
+ loff_t *ppos,
+ file_endio_t *endio,
+ void *endio_data)
{
- struct file *file = iocb->ki_filp;
bhv_vnode_t *vp = vn_from_inode(file->f_path.dentry->d_inode);

- BUG_ON(iocb->ki_pos != pos);
if (unlikely(file->f_flags & O_DIRECT))
ioflags |= IO_ISDIRECT;
- return bhv_vop_read(vp, iocb, iov, nr_segs, &iocb->ki_pos,
+ return bhv_vop_read(vp, file, iov, nr_segs, ppos, endio, endio_data,
ioflags, NULL);
}

STATIC ssize_t
xfs_file_aio_read(
- struct kiocb *iocb,
+ struct file *file,
const struct iovec *iov,
unsigned long nr_segs,
- loff_t pos)
+ loff_t *ppos,
+ file_endio_t *endio,
+ void *endio_data)
{
- return __xfs_file_read(iocb, iov, nr_segs, IO_ISAIO, pos);
+ return __xfs_file_read(file, iov, nr_segs, IO_ISAIO, ppos,
+ endio, endio_data);
}

STATIC ssize_t
xfs_file_aio_read_invis(
- struct kiocb *iocb,
+ struct file *file,
const struct iovec *iov,
unsigned long nr_segs,
- loff_t pos)
+ loff_t *ppos,
+ file_endio_t *endio,
+ void *endio_data)
{
- return __xfs_file_read(iocb, iov, nr_segs, IO_ISAIO|IO_INVIS, pos);
+ return __xfs_file_read(file, iov, nr_segs, IO_ISAIO|IO_INVIS, ppos,
+ endio, endio_data);
}

STATIC inline ssize_t
__xfs_file_write(
- struct kiocb *iocb,
+ struct file *file,
const struct iovec *iov,
unsigned long nr_segs,
int ioflags,
- loff_t pos)
+ loff_t *ppos,
+ file_endio_t *endio,
+ void *endio_data)
{
- struct file *file = iocb->ki_filp;
struct inode *inode = file->f_mapping->host;
bhv_vnode_t *vp = vn_from_inode(inode);

- BUG_ON(iocb->ki_pos != pos);
if (unlikely(file->f_flags & O_DIRECT))
ioflags |= IO_ISDIRECT;
- return bhv_vop_write(vp, iocb, iov, nr_segs, &iocb->ki_pos,
- ioflags, NULL);
+ return bhv_vop_write(vp, file, iov, nr_segs, ppos,
+ endio, endio_data, ioflags, NULL);
}

STATIC ssize_t
xfs_file_aio_write(
- struct kiocb *iocb,
+ struct file *file,
const struct iovec *iov,
unsigned long nr_segs,
- loff_t pos)
+ loff_t *ppos,
+ file_endio_t *endio,
+ void *endio_data)
{
- return __xfs_file_write(iocb, iov, nr_segs, IO_ISAIO, pos);
+ return __xfs_file_write(file, iov, nr_segs, IO_ISAIO, ppos,
+ endio, endio_data);
}

STATIC ssize_t
xfs_file_aio_write_invis(
- struct kiocb *iocb,
+ struct file *file,
const struct iovec *iov,
unsigned long nr_segs,
- loff_t pos)
+ loff_t *ppos,
+ file_endio_t *endio,
+ void *endio_data)
{
- return __xfs_file_write(iocb, iov, nr_segs, IO_ISAIO|IO_INVIS, pos);
+ return __xfs_file_write(file, iov, nr_segs, IO_ISAIO|IO_INVIS, ppos,
+ endio, endio_data);
}

STATIC ssize_t
diff -urpN -X dontdiff a/fs/xfs/linux-2.6/xfs_lrw.c b/fs/xfs/linux-2.6/xfs_lrw.c
--- a/fs/xfs/linux-2.6/xfs_lrw.c 2007-01-12 21:01:00.605776824 -0800
+++ b/fs/xfs/linux-2.6/xfs_lrw.c 2007-01-13 19:40:25.336818779 -0800
@@ -190,14 +190,15 @@ unlock:
ssize_t /* bytes read, or (-) error */
xfs_read(
bhv_desc_t *bdp,
- struct kiocb *iocb,
+ struct file *file,
const struct iovec *iovp,
unsigned int segs,
loff_t *offset,
+ file_endio_t *endio,
+ void *endio_data,
int ioflags,
cred_t *credp)
{
- struct file *file = iocb->ki_filp;
struct inode *inode = file->f_mapping->host;
size_t size = 0;
ssize_t ret;
@@ -280,10 +281,8 @@ xfs_read(
xfs_rw_enter_trace(XFS_READ_ENTER, &ip->i_iocore,
(void *)iovp, segs, *offset, ioflags);

- iocb->ki_pos = *offset;
- ret = generic_file_aio_read(iocb, iovp, segs, *offset);
- if (ret == -EIOCBQUEUED && !(ioflags & IO_ISAIO))
- ret = wait_on_sync_kiocb(iocb);
+ ret = generic_file_aio_read(file, iovp, segs, *offset,
+ endio, endio_data);
if (ret > 0)
XFS_STATS_ADD(xs_read_bytes, ret);

@@ -627,14 +626,15 @@ out_lock:
ssize_t /* bytes written, or (-) error */
xfs_write(
bhv_desc_t *bdp,
- struct kiocb *iocb,
+ struct file *file,
const struct iovec *iovp,
unsigned int nsegs,
loff_t *offset,
+ file_endio_t *endio,
+ void *endio_data,
int ioflags,
cred_t *credp)
{
- struct file *file = iocb->ki_filp;
struct address_space *mapping = file->f_mapping;
struct inode *inode = mapping->host;
unsigned long segs = nsegs;
@@ -837,7 +837,7 @@ retry:
xfs_rw_enter_trace(XFS_DIOWR_ENTER, io, (void *)iovp, segs,
*offset, ioflags);
ret = generic_file_direct_write(file, iovp, &segs, pos,
- offset, count, ocount, aio_complete, iocb);
+ offset, count, ocount, endio, endio_data);

/*
* direct-io write to a hole: fall through to buffered I/O
@@ -857,15 +857,12 @@ retry:
} else {
xfs_rw_enter_trace(XFS_WRITE_ENTER, io, (void *)iovp, segs,
*offset, ioflags);
- ret = generic_file_buffered_write(iocb, iovp, segs,
- pos, offset, count, ret);
+ ret = generic_file_buffered_write(file, iovp, segs,
+ pos, offset, endio, endio_data, count, ret);
}

current->backing_dev_info = NULL;

- if (ret == -EIOCBQUEUED && !(ioflags & IO_ISAIO))
- ret = wait_on_sync_kiocb(iocb);
-
if ((ret == -ENOSPC) &&
DM_EVENT_ENABLED(vp->v_vfsp, xip, DM_EVENT_NOSPACE) &&
!(ioflags & IO_INVIS)) {
diff -urpN -X dontdiff a/fs/xfs/linux-2.6/xfs_lrw.h b/fs/xfs/linux-2.6/xfs_lrw.h
--- a/fs/xfs/linux-2.6/xfs_lrw.h 2007-01-12 20:25:41.042076304 -0800
+++ b/fs/xfs/linux-2.6/xfs_lrw.h 2007-01-13 19:40:25.346815361 -0800
@@ -84,12 +84,14 @@ extern int xfs_dev_is_read_only(struct x

extern int xfs_zero_eof(struct bhv_vnode *, struct xfs_iocore *, xfs_off_t,
xfs_fsize_t, xfs_fsize_t);
-extern ssize_t xfs_read(struct bhv_desc *, struct kiocb *,
+extern ssize_t xfs_read(struct bhv_desc *, struct file *,
const struct iovec *, unsigned int,
- loff_t *, int, struct cred *);
-extern ssize_t xfs_write(struct bhv_desc *, struct kiocb *,
+ loff_t *, file_endio_t *, void *, int,
+ struct cred *);
+extern ssize_t xfs_write(struct bhv_desc *, struct file *,
const struct iovec *, unsigned int,
- loff_t *, int, struct cred *);
+ loff_t *, file_endio_t *, void *, int,
+ struct cred *);
extern ssize_t xfs_sendfile(struct bhv_desc *, struct file *,
loff_t *, int, size_t, read_actor_t,
void *, struct cred *);
diff -urpN -X dontdiff a/fs/xfs/linux-2.6/xfs_vnode.h b/fs/xfs/linux-2.6/xfs_vnode.h
--- a/fs/xfs/linux-2.6/xfs_vnode.h 2007-01-12 20:25:41.051073229 -0800
+++ b/fs/xfs/linux-2.6/xfs_vnode.h 2007-01-15 14:11:03.544506727 -0800
@@ -133,12 +133,14 @@ typedef enum { L_FALSE, L_TRUE } lastclo

typedef int (*vop_open_t)(bhv_desc_t *, struct cred *);
typedef int (*vop_close_t)(bhv_desc_t *, int, lastclose_t, struct cred *);
-typedef ssize_t (*vop_read_t)(bhv_desc_t *, struct kiocb *,
+typedef ssize_t (*vop_read_t)(bhv_desc_t *, struct file *,
const struct iovec *, unsigned int,
- loff_t *, int, struct cred *);
-typedef ssize_t (*vop_write_t)(bhv_desc_t *, struct kiocb *,
+ loff_t *, file_endio_t *, void *, int,
+ struct cred *);
+typedef ssize_t (*vop_write_t)(bhv_desc_t *, struct file *,
const struct iovec *, unsigned int,
- loff_t *, int, struct cred *);
+ loff_t *, file_endio_t *, void *, int,
+ struct cred *);
typedef ssize_t (*vop_sendfile_t)(bhv_desc_t *, struct file *,
loff_t *, int, size_t, read_actor_t,
void *, struct cred *);
@@ -250,10 +252,12 @@ typedef struct bhv_vnodeops {
#define VOP(op, vp) (*((bhv_vnodeops_t *)VNHEAD(vp)->bd_ops)->op)
#define bhv_vop_open(vp, cr) VOP(vop_open, vp)(VNHEAD(vp),cr)
#define bhv_vop_close(vp, f,last,cr) VOP(vop_close, vp)(VNHEAD(vp),f,last,cr)
-#define bhv_vop_read(vp,file,iov,segs,offset,ioflags,cr) \
- VOP(vop_read, vp)(VNHEAD(vp),file,iov,segs,offset,ioflags,cr)
-#define bhv_vop_write(vp,file,iov,segs,offset,ioflags,cr) \
- VOP(vop_write, vp)(VNHEAD(vp),file,iov,segs,offset,ioflags,cr)
+#define bhv_vop_read(vp,file,iov,segs,offset,endio,endio_data,ioflags,cr) \
+ VOP(vop_read, vp)(VNHEAD(vp),file,iov,segs,offset, \
+ endio,endio_data,ioflags,cr)
+#define bhv_vop_write(vp,file,iov,segs,offset,endio,endio_data,ioflags,cr) \
+ VOP(vop_write, vp)(VNHEAD(vp),file,iov,segs,offset, \
+ endio,endio_data,ioflags,cr)
#define bhv_vop_sendfile(vp,f,off,ioflags,cnt,act,targ,cr) \
VOP(vop_sendfile, vp)(VNHEAD(vp),f,off,ioflags,cnt,act,targ,cr)
#define bhv_vop_splice_read(vp,f,o,pipe,cnt,fl,iofl,cr) \
diff -urpN -X dontdiff a/include/linux/fs.h b/include/linux/fs.h
--- a/include/linux/fs.h 2007-01-12 21:01:00.619772040 -0800
+++ b/include/linux/fs.h 2007-01-15 13:56:21.746869412 -0800
@@ -1122,8 +1122,10 @@ struct file_operations {
loff_t (*llseek) (struct file *, loff_t, int);
ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
- ssize_t (*aio_read) (struct kiocb *, const struct iovec *, unsigned long, loff_t);
- ssize_t (*aio_write) (struct kiocb *, const struct iovec *, unsigned long, loff_t);
+ ssize_t (*aio_read) (struct file *, const struct iovec *,
+ unsigned long, loff_t *, file_endio_t *, void *);
+ ssize_t (*aio_write) (struct file *, const struct iovec *,
+ unsigned long, loff_t *, file_endio_t *, void *);
int (*readdir) (struct file *, void *, filldir_t);
unsigned int (*poll) (struct file *, struct poll_table_struct *);
int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long);
@@ -1174,10 +1176,10 @@ struct inode_operations {

struct seq_file;

+ssize_t iov_check_alloc(unsigned long nr_segs, unsigned long fast_segs,
+ struct iovec **ret_ptr);
ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
- unsigned long nr_segs, unsigned long fast_segs,
- struct iovec *fast_pointer,
- struct iovec **ret_pointer);
+ unsigned long nr_segs, struct iovec *iov);

extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
extern ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *);
@@ -1769,15 +1771,18 @@ extern int generic_file_readonly_mmap(st
extern int file_read_actor(read_descriptor_t * desc, struct page *page, unsigned long offset, unsigned long size);
extern int file_send_actor(read_descriptor_t * desc, struct page *page, unsigned long offset, unsigned long size);
int generic_write_checks(struct file *file, loff_t *pos, size_t *count, int isblk);
-extern ssize_t generic_file_aio_read(struct kiocb *, const struct iovec *, unsigned long, loff_t);
-extern ssize_t generic_file_aio_write(struct kiocb *, const struct iovec *, unsigned long, loff_t);
-extern ssize_t generic_file_aio_write_nolock(struct kiocb *, const struct iovec *,
- unsigned long, loff_t);
+extern ssize_t generic_file_aio_read(struct file *, const struct iovec *,
+ unsigned long, loff_t *, file_endio_t *, void *);
+extern ssize_t generic_file_aio_write(struct file *, const struct iovec *,
+ unsigned long, loff_t *, file_endio_t *, void *);
+extern ssize_t generic_file_aio_write_nolock(struct file *, const struct iovec *,
+ unsigned long, loff_t *, file_endio_t *, void *);
extern ssize_t generic_file_direct_write(struct file *, const struct iovec *,
unsigned long *, loff_t, loff_t *,
size_t, size_t, file_endio_t, void *);
-extern ssize_t generic_file_buffered_write(struct kiocb *, const struct iovec *,
- unsigned long, loff_t, loff_t *, size_t, ssize_t);
+extern ssize_t generic_file_buffered_write(struct file *, const struct iovec *,
+ unsigned long, loff_t, loff_t *, size_t, ssize_t,
+ file_endio_t *, void *);
extern ssize_t do_sync_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos);
extern ssize_t do_sync_write(struct file *filp, const char __user *buf, size_t len, loff_t *ppos);
extern ssize_t generic_file_sendfile(struct file *, loff_t *, size_t, read_actor_t, void *);
diff -urpN -X dontdiff a/mm/filemap.c b/mm/filemap.c
--- a/mm/filemap.c 2007-01-15 13:30:26.467363895 -0800
+++ b/mm/filemap.c 2007-01-15 13:40:44.752036575 -0800
@@ -1173,23 +1173,24 @@ success:

/**
* generic_file_aio_read - generic filesystem read routine
- * @iocb: kernel I/O control block
+ * @filp: file to read
* @iov: io vector request
* @nr_segs: number of segments in the iovec
- * @pos: current file position
+ * @ppos: current file position
+ * @endio: async end I/O function
+ * @endio_data: endio private data
*
* This is the "read()" routine for all filesystems
* that can use the page cache directly.
*/
ssize_t
-generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov,
- unsigned long nr_segs, loff_t pos)
+generic_file_aio_read(struct file *filp, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos, file_endio_t *endio,
+ void *endio_data)
{
- struct file *filp = iocb->ki_filp;
ssize_t retval;
unsigned long seg;
size_t count;
- loff_t *ppos = &iocb->ki_pos;

count = 0;
for (seg = 0; seg < nr_segs; seg++) {
@@ -1223,11 +1224,11 @@ generic_file_aio_read(struct kiocb *iocb
if (!count)
goto out; /* skip atime */
size = i_size_read(inode);
- if (pos < size) {
- retval = generic_file_direct_IO(READ, filp, iov, pos,
- nr_segs, aio_complete, iocb);
+ if (*ppos < size) {
+ retval = generic_file_direct_IO(READ, filp, iov, *ppos,
+ nr_segs, endio, endio_data);
if (retval > 0)
- *ppos = pos + retval;
+ *ppos += retval;
}
if (likely(retval != 0)) {
file_accessed(filp);
@@ -2118,11 +2119,11 @@ generic_file_direct_write(struct file *f
EXPORT_SYMBOL(generic_file_direct_write);

ssize_t
-generic_file_buffered_write(struct kiocb *iocb, const struct iovec *iov,
+generic_file_buffered_write(struct file *file, const struct iovec *iov,
unsigned long nr_segs, loff_t pos, loff_t *ppos,
- size_t count, ssize_t written)
+ size_t count, ssize_t written, file_endio_t *endio,
+ void *endio_data)
{
- struct file *file = iocb->ki_filp;
struct address_space * mapping = file->f_mapping;
const struct address_space_operations *a_ops = mapping->a_ops;
struct inode *inode = mapping->host;
@@ -2281,10 +2282,10 @@ zero_length_segment:
EXPORT_SYMBOL(generic_file_buffered_write);

static ssize_t
-__generic_file_aio_write_nolock(struct kiocb *iocb, const struct iovec *iov,
- unsigned long nr_segs, loff_t *ppos)
+__generic_file_aio_write_nolock(struct file *file, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos,
+ file_endio_t *endio, void *endio_data)
{
- struct file *file = iocb->ki_filp;
struct address_space * mapping = file->f_mapping;
size_t ocount; /* original count */
size_t count; /* after file limit checks */
@@ -2346,7 +2347,7 @@ __generic_file_aio_write_nolock(struct k

written = generic_file_direct_write(file, iov, &nr_segs, pos,
ppos, count, ocount,
- aio_complete, iocb);
+ endio, endio_data);
if (written < 0 || written == count)
goto out;
/*
@@ -2355,9 +2356,9 @@ __generic_file_aio_write_nolock(struct k
*/
pos += written;
count -= written;
- written_buffered = generic_file_buffered_write(iocb, iov,
+ written_buffered = generic_file_buffered_write(file, iov,
nr_segs, pos, ppos, count,
- written);
+ written, endio, endio_data);
/*
* If generic_file_buffered_write() retuned a synchronous error
* then we want to return the number of bytes which were
@@ -2392,62 +2393,58 @@ __generic_file_aio_write_nolock(struct k
*/
}
} else {
- written = generic_file_buffered_write(iocb, iov, nr_segs,
- pos, ppos, count, written);
+ written = generic_file_buffered_write(file, iov, nr_segs,
+ pos, ppos, count, written, endio, endio_data);
}
out:
current->backing_dev_info = NULL;
return written ? written : err;
}

-ssize_t generic_file_aio_write_nolock(struct kiocb *iocb,
- const struct iovec *iov, unsigned long nr_segs, loff_t pos)
+ssize_t generic_file_aio_write_nolock(struct file *file,
+ const struct iovec *iov, unsigned long nr_segs, loff_t *ppos,
+ file_endio_t *endio, void *endio_data)
{
- struct file *file = iocb->ki_filp;
struct address_space *mapping = file->f_mapping;
struct inode *inode = mapping->host;
ssize_t ret;

- BUG_ON(iocb->ki_pos != pos);
-
- ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs,
- &iocb->ki_pos);
+ ret = __generic_file_aio_write_nolock(file, iov, nr_segs, ppos,
+ endio, endio_data);

if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
ssize_t err;

- err = sync_page_range_nolock(inode, mapping, pos, ret);
+ err = sync_page_range_nolock(inode, mapping, *ppos - ret, ret);
if (err < 0) {
+ *ppos -= ret;
ret = err;
- iocb->ki_pos = pos;
}
}
return ret;
}
EXPORT_SYMBOL(generic_file_aio_write_nolock);

-ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
- unsigned long nr_segs, loff_t pos)
+ssize_t generic_file_aio_write(struct file *file, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos, file_endio_t *endio,
+ void *endio_data)
{
- struct file *file = iocb->ki_filp;
struct address_space *mapping = file->f_mapping;
struct inode *inode = mapping->host;
ssize_t ret;

- BUG_ON(iocb->ki_pos != pos);
-
mutex_lock(&inode->i_mutex);
- ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs,
- &iocb->ki_pos);
+ ret = __generic_file_aio_write_nolock(file, iov, nr_segs, ppos,
+ endio, endio_data);
mutex_unlock(&inode->i_mutex);

if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
ssize_t err;

- err = sync_page_range(inode, mapping, pos, ret);
+ err = sync_page_range(inode, mapping, *ppos - ret, ret);
if (err < 0) {
+ *ppos -= ret;
ret = err;
- iocb->ki_pos = pos;
}
}
return ret;
diff -urpN -X dontdiff a/net/socket.c b/net/socket.c
--- a/net/socket.c 2007-01-12 20:25:41.069067078 -0800
+++ b/net/socket.c 2007-01-15 13:24:42.254021081 -0800
@@ -94,10 +94,12 @@
#include <linux/netfilter.h>

static int sock_no_open(struct inode *irrelevant, struct file *dontcare);
-static ssize_t sock_aio_read(struct kiocb *iocb, const struct iovec *iov,
- unsigned long nr_segs, loff_t pos);
-static ssize_t sock_aio_write(struct kiocb *iocb, const struct iovec *iov,
- unsigned long nr_segs, loff_t pos);
+static ssize_t sock_aio_read(struct file *file, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos,
+ file_endio_t *endio, void *endio_data);
+static ssize_t sock_aio_write(struct file *file, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos,
+ file_endio_t *endio, void *endio_data);
static int sock_mmap(struct file *file, struct vm_area_struct *vma);

static int sock_close(struct inode *inode, struct file *file);
@@ -621,15 +623,16 @@ static ssize_t sock_sendpage(struct file
return sock->ops->sendpage(sock, page, offset, size, flags);
}

-static ssize_t sock_aio_read(struct kiocb *iocb, const struct iovec *iov,
- unsigned long nr_segs, loff_t pos)
+static ssize_t sock_aio_read(struct file *file, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos,
+ file_endio_t *endio, void *endio_data)
{
struct msghdr msg;
- struct socket *sock = iocb->ki_filp->private_data;
+ struct socket *sock = file->private_data;
size_t size = 0;
int i;

- if (pos != 0)
+ if (ppos != 0)
return -ESPIPE;

if (iov_length(iov, nr_segs) == 0) /* Match SYS5 behaviour */
@@ -644,20 +647,21 @@ static ssize_t sock_aio_read(struct kioc
msg.msg_controllen = 0;
msg.msg_iov = (struct iovec *)iov;
msg.msg_iovlen = nr_segs;
- msg.msg_flags = (iocb->ki_filp->f_flags & O_NONBLOCK) ? MSG_DONTWAIT : 0;
+ msg.msg_flags = (file->f_flags & O_NONBLOCK) ? MSG_DONTWAIT : 0;

return sock_recvmsg(sock, &msg, size, msg.msg_flags);
}

-static ssize_t sock_aio_write(struct kiocb *iocb, const struct iovec *iov,
- unsigned long nr_segs, loff_t pos)
+static ssize_t sock_aio_write(struct file *file, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos,
+ file_endio_t _endio, void *endio_data)
{
struct msghdr msg;
- struct socket *sock = iocb->ki_filp->private_data;
+ struct socket *sock = file->private_data;
size_t size = 0;
int i;

- if (pos != 0)
+ if (ppos != 0)
return -ESPIPE;

if (iov_length(iov, nr_segs) == 0) /* Match SYS5 behaviour */
@@ -672,7 +676,7 @@ static ssize_t sock_aio_write(struct kio
msg.msg_controllen = 0;
msg.msg_iov = (struct iovec *)iov;
msg.msg_iovlen = nr_segs;
- msg.msg_flags = (iocb->ki_filp->f_flags & O_NONBLOCK) ? MSG_DONTWAIT : 0;
+ msg.msg_flags = (file->f_flags & O_NONBLOCK) ? MSG_DONTWAIT : 0;
if (sock->type == SOCK_SEQPACKET)
msg.msg_flags |= MSG_EOR;

diff -urpN -X dontdiff a/sound/core/pcm_native.c b/sound/core/pcm_native.c
--- a/sound/core/pcm_native.c 2007-01-12 20:25:41.079063661 -0800
+++ b/sound/core/pcm_native.c 2007-01-13 19:40:25.467774001 -0800
@@ -2854,11 +2854,12 @@ static ssize_t snd_pcm_write(struct file
return result;
}

-static ssize_t snd_pcm_aio_read(struct kiocb *iocb, const struct iovec *iov,
- unsigned long nr_segs, loff_t pos)
+static ssize_t snd_pcm_aio_read(struct file *file, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos,
+ file_endio_t *endio, void *endio_data)

{
- struct snd_pcm_file *pcm_file;
+ struct snd_pcm_file *pcm_file = file->private_data;
struct snd_pcm_substream *substream;
struct snd_pcm_runtime *runtime;
snd_pcm_sframes_t result;
@@ -2866,7 +2867,6 @@ static ssize_t snd_pcm_aio_read(struct k
void __user **bufs;
snd_pcm_uframes_t frames;

- pcm_file = iocb->ki_filp->private_data;
substream = pcm_file->substream;
snd_assert(substream != NULL, return -ENXIO);
runtime = substream->runtime;
@@ -2889,8 +2889,9 @@ static ssize_t snd_pcm_aio_read(struct k
return result;
}

-static ssize_t snd_pcm_aio_write(struct kiocb *iocb, const struct iovec *iov,
- unsigned long nr_segs, loff_t pos)
+static ssize_t snd_pcm_aio_write(struct file *file, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos,
+ file_endio_t *endio, void *endio_data)
{
struct snd_pcm_file *pcm_file;
struct snd_pcm_substream *substream;
@@ -2900,7 +2901,7 @@ static ssize_t snd_pcm_aio_write(struct
void __user **bufs;
snd_pcm_uframes_t frames;

- pcm_file = iocb->ki_filp->private_data;
+ pcm_file = file->private_data;
substream = pcm_file->substream;
snd_assert(substream != NULL, result = -ENXIO; goto end);
runtime = substream->runtime;

2007-01-16 02:05:07

by Nate Diller

[permalink] [raw]
Subject: [PATCH -mm 6/10][RFC] aio: make nfs_directIO use file_endio_t

This converts the iternals of nfs's directIO support to use a generic endio
function, instead of directly calling aio_complete. It's pretty easy
because it already has a pretty abstracted completion path.

---

diff -urpN -X dontdiff a/fs/nfs/direct.c b/fs/nfs/direct.c
--- a/fs/nfs/direct.c 2007-01-12 14:53:48.000000000 -0800
+++ b/fs/nfs/direct.c 2007-01-12 15:02:30.000000000 -0800
@@ -68,7 +68,6 @@ struct nfs_direct_req {

/* I/O parameters */
struct nfs_open_context *ctx; /* file open context info */
- struct kiocb * iocb; /* controlling i/o request */
struct inode * inode; /* target file of i/o */

/* completion state */
@@ -77,6 +76,8 @@ struct nfs_direct_req {
ssize_t count, /* bytes actually processed */
error; /* any reported error */
struct completion completion; /* wait for i/o completion */
+ file_endio_t *endio; /* async completion function */
+ void *endio_data; /* private completion data */

/* commit state */
struct list_head rewrite_list; /* saved nfs_write_data structs */
@@ -151,7 +152,7 @@ static inline struct nfs_direct_req *nfs
kref_get(&dreq->kref);
init_completion(&dreq->completion);
INIT_LIST_HEAD(&dreq->rewrite_list);
- dreq->iocb = NULL;
+ dreq->endio = NULL;
dreq->ctx = NULL;
spin_lock_init(&dreq->lock);
atomic_set(&dreq->io_count, 0);
@@ -179,7 +180,7 @@ static ssize_t nfs_direct_wait(struct nf
ssize_t result = -EIOCBQUEUED;

/* Async requests don't wait here */
- if (dreq->iocb)
+ if (!dreq->endio)
goto out;

result = wait_for_completion_interruptible(&dreq->completion);
@@ -194,14 +195,10 @@ out:
return (ssize_t) result;
}

-/*
- * Synchronous I/O uses a stack-allocated iocb. Thus we can't trust
- * the iocb is still valid here if this is a synchronous request.
- */
static void nfs_direct_complete(struct nfs_direct_req *dreq)
{
- if (dreq->iocb)
- aio_complete(dreq->iocb, dreq->count, dreq->error);
+ if (dreq->endio)
+ dreq->endio(dreq->endio_data, dreq->count, dreq->error);

complete_all(&dreq->completion);

@@ -332,11 +329,13 @@ static ssize_t nfs_direct_read_schedule(
return result < 0 ? (ssize_t) result : -EFAULT;
}

-static ssize_t nfs_direct_read(struct kiocb *iocb, unsigned long user_addr, size_t count, loff_t pos)
+static ssize_t nfs_direct_read(struct file *file, unsigned long user_addr,
+ size_t count, loff_t pos,
+ file_endio_t *endio, void *endio_data)
{
ssize_t result = 0;
sigset_t oldset;
- struct inode *inode = iocb->ki_filp->f_mapping->host;
+ struct inode *inode = file->f_mapping->host;
struct rpc_clnt *clnt = NFS_CLIENT(inode);
struct nfs_direct_req *dreq;

@@ -345,9 +344,9 @@ static ssize_t nfs_direct_read(struct ki
return -ENOMEM;

dreq->inode = inode;
- dreq->ctx = get_nfs_open_context((struct nfs_open_context *)iocb->ki_filp->private_data);
- if (!is_sync_kiocb(iocb))
- dreq->iocb = iocb;
+ dreq->ctx = get_nfs_open_context((struct nfs_open_context *)file->private_data);
+ dreq->endio = endio;
+ dreq->endio_data = endio_data;

nfs_add_stats(inode, NFSIOS_DIRECTREADBYTES, count);
rpc_clnt_sigmask(clnt, &oldset);
@@ -663,11 +662,13 @@ static ssize_t nfs_direct_write_schedule
return result < 0 ? (ssize_t) result : -EFAULT;
}

-static ssize_t nfs_direct_write(struct kiocb *iocb, unsigned long user_addr, size_t count, loff_t pos)
+static ssize_t nfs_direct_write(struct file *file, unsigned long user_addr,
+ size_t count, loff_t pos,
+ file_endio_t *endio, void *endio_data)
{
ssize_t result = 0;
sigset_t oldset;
- struct inode *inode = iocb->ki_filp->f_mapping->host;
+ struct inode *inode = file->f_mapping->host;
struct rpc_clnt *clnt = NFS_CLIENT(inode);
struct nfs_direct_req *dreq;
size_t wsize = NFS_SERVER(inode)->wsize;
@@ -682,9 +683,9 @@ static ssize_t nfs_direct_write(struct k
sync = FLUSH_STABLE;

dreq->inode = inode;
- dreq->ctx = get_nfs_open_context((struct nfs_open_context *)iocb->ki_filp->private_data);
- if (!is_sync_kiocb(iocb))
- dreq->iocb = iocb;
+ dreq->ctx = get_nfs_open_context((struct nfs_open_context *)file->private_data);
+ dreq->endio = endio;
+ dreq->endio_data = endio_data;

nfs_add_stats(inode, NFSIOS_DIRECTWRITTENBYTES, count);

@@ -701,10 +702,12 @@ static ssize_t nfs_direct_write(struct k

/**
* nfs_file_direct_read - file direct read operation for NFS files
- * @iocb: target I/O control block
+ * @file: target file
* @iov: vector of user buffers into which to read data
* @nr_segs: size of iov vector
* @pos: byte offset in file where reading starts
+ * @endio: async I/O completion function
+ * @endio_data: private completion data
*
* We use this function for direct reads instead of calling
* generic_file_aio_read() in order to avoid gfar's check to see if
@@ -720,11 +723,11 @@ static ssize_t nfs_direct_write(struct k
* client must read the updated atime from the server back into its
* cache.
*/
-ssize_t nfs_file_direct_read(struct kiocb *iocb, const struct iovec *iov,
- unsigned long nr_segs, loff_t pos)
+ssize_t nfs_file_direct_read(struct file *file, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *pos,
+ file_endio_t *endio, void *endio_data)
{
ssize_t retval = -EINVAL;
- struct file *file = iocb->ki_filp;
struct address_space *mapping = file->f_mapping;
/* XXX: temporary */
const char __user *buf = iov[0].iov_base;
@@ -733,7 +736,7 @@ ssize_t nfs_file_direct_read(struct kioc
dprintk("nfs: direct read(%s/%s, %lu@%Ld)\n",
file->f_path.dentry->d_parent->d_name.name,
file->f_path.dentry->d_name.name,
- (unsigned long) count, (long long) pos);
+ (unsigned long) count, (long long) *pos);

if (nr_segs != 1)
return -EINVAL;
@@ -751,9 +754,10 @@ ssize_t nfs_file_direct_read(struct kioc
if (retval)
goto out;

- retval = nfs_direct_read(iocb, (unsigned long) buf, count, pos);
+ retval = nfs_direct_read(file, (unsigned long) buf, count, *pos,
+ endio, endio_data);
if (retval > 0)
- iocb->ki_pos = pos + retval;
+ *pos += retval;

out:
return retval;
@@ -761,10 +765,12 @@ out:

/**
* nfs_file_direct_write - file direct write operation for NFS files
- * @iocb: target I/O control block
+ * @file: target file
* @iov: vector of user buffers from which to write data
* @nr_segs: size of iov vector
* @pos: byte offset in file where writing starts
+ * @endio: async I/O completion function
+ * @endio_data: private completion data
*
* We use this function for direct writes instead of calling
* generic_file_aio_write() in order to avoid taking the inode
@@ -784,11 +790,11 @@ out:
* Note that O_APPEND is not supported for NFS direct writes, as there
* is no atomic O_APPEND write facility in the NFS protocol.
*/
-ssize_t nfs_file_direct_write(struct kiocb *iocb, const struct iovec *iov,
- unsigned long nr_segs, loff_t pos)
+ssize_t nfs_file_direct_write(struct file *file, const struct iovec *iov,
+ unsigned long nr_segs, loff_t *pos,
+ file_endio_t *endio, void *endio_data)
{
ssize_t retval;
- struct file *file = iocb->ki_filp;
struct address_space *mapping = file->f_mapping;
/* XXX: temporary */
const char __user *buf = iov[0].iov_base;
@@ -797,12 +803,12 @@ ssize_t nfs_file_direct_write(struct kio
dfprintk(VFS, "nfs: direct write(%s/%s, %lu@%Ld)\n",
file->f_path.dentry->d_parent->d_name.name,
file->f_path.dentry->d_name.name,
- (unsigned long) count, (long long) pos);
+ (unsigned long) count, (long long) *pos);

if (nr_segs != 1)
return -EINVAL;

- retval = generic_write_checks(file, &pos, &count, 0);
+ retval = generic_write_checks(file, pos, &count, 0);
if (retval)
goto out;

@@ -821,10 +827,11 @@ ssize_t nfs_file_direct_write(struct kio
if (retval)
goto out;

- retval = nfs_direct_write(iocb, (unsigned long) buf, count, pos);
+ retval = nfs_direct_write(file, (unsigned long) buf, count, *pos,
+ endio, endio_data);

if (retval > 0)
- iocb->ki_pos = pos + retval;
+ *pos += retval;

out:
return retval;
diff -urpN -X dontdiff a/fs/nfs/file.c b/fs/nfs/file.c
--- a/fs/nfs/file.c 2007-01-12 11:19:45.000000000 -0800
+++ b/fs/nfs/file.c 2007-01-12 15:01:17.000000000 -0800
@@ -208,7 +208,8 @@ nfs_file_read(struct kiocb *iocb, const

#ifdef CONFIG_NFS_DIRECTIO
if (iocb->ki_filp->f_flags & O_DIRECT)
- return nfs_file_direct_read(iocb, iov, nr_segs, pos);
+ return nfs_file_direct_read(iocb->ki_filp, iov, nr_segs,
+ &iocb->ki_pos, aio_complete, iocb);
#endif

dfprintk(VFS, "nfs: read(%s/%s, %lu@%lu)\n",
@@ -350,7 +351,8 @@ static ssize_t nfs_file_write(struct kio

#ifdef CONFIG_NFS_DIRECTIO
if (iocb->ki_filp->f_flags & O_DIRECT)
- return nfs_file_direct_write(iocb, iov, nr_segs, pos);
+ return nfs_file_direct_write(iocb->ki_filp, iov, nr_segs,
+ &iocb->ki_pos, aio_complete, iocb);
#endif

dfprintk(VFS, "nfs: write(%s/%s(%ld), %lu@%Ld)\n",
diff -urpN -X dontdiff a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
--- a/include/linux/nfs_fs.h 2007-01-12 11:19:48.000000000 -0800
+++ b/include/linux/nfs_fs.h 2007-01-12 15:01:17.000000000 -0800
@@ -369,12 +369,12 @@ extern int nfs3_removexattr (struct dent
*/
extern ssize_t nfs_direct_IO(int, struct kiocb *, const struct iovec *, loff_t,
unsigned long);
-extern ssize_t nfs_file_direct_read(struct kiocb *iocb,
+extern ssize_t nfs_file_direct_read(struct file *file,
const struct iovec *iov, unsigned long nr_segs,
- loff_t pos);
-extern ssize_t nfs_file_direct_write(struct kiocb *iocb,
+ loff_t *pos, file_endio_t *endio, void *endio_data);
+extern ssize_t nfs_file_direct_write(struct file *file,
const struct iovec *iov, unsigned long nr_segs,
- loff_t pos);
+ loff_t *pos, file_endio_t *endio, void *endio_data);

/*
* linux/fs/nfs/dir.c

2007-01-16 02:05:51

by Nate Diller

[permalink] [raw]
Subject: [PATCH -mm 3/10][RFC] aio: use iov_length instead of ki_left

Convert code using iocb->ki_left to use the more generic iov_length() call.

---

diff -urpN -X dontdiff a/fs/ocfs2/file.c b/fs/ocfs2/file.c
--- a/fs/ocfs2/file.c 2007-01-10 11:50:26.000000000 -0800
+++ b/fs/ocfs2/file.c 2007-01-10 12:42:09.000000000 -0800
@@ -1157,7 +1157,7 @@ static ssize_t ocfs2_file_aio_write(stru
filp->f_path.dentry->d_name.name);

/* happy write of zero bytes */
- if (iocb->ki_left == 0)
+ if (iov_length(iov, nr_segs) == 0)
return 0;

mutex_lock(&inode->i_mutex);
@@ -1177,7 +1177,7 @@ static ssize_t ocfs2_file_aio_write(stru
}

ret = ocfs2_prepare_inode_for_write(filp->f_path.dentry, &iocb->ki_pos,
- iocb->ki_left, appending);
+ iov_length(iov, nr_segs), appending);
if (ret < 0) {
mlog_errno(ret);
goto out;
diff -urpN -X dontdiff a/fs/smbfs/file.c b/fs/smbfs/file.c
--- a/fs/smbfs/file.c 2007-01-10 11:50:28.000000000 -0800
+++ b/fs/smbfs/file.c 2007-01-10 12:42:09.000000000 -0800
@@ -222,7 +222,7 @@ smb_file_aio_read(struct kiocb *iocb, co
ssize_t status;

VERBOSE("file %s/%s, count=%lu@%lu\n", DENTRY_PATH(dentry),
- (unsigned long) iocb->ki_left, (unsigned long) pos);
+ (unsigned long) iov_length(iov, nr_segs), (unsigned long) pos);

status = smb_revalidate_inode(dentry);
if (status) {
@@ -328,7 +328,7 @@ smb_file_aio_write(struct kiocb *iocb, c

VERBOSE("file %s/%s, count=%lu@%lu\n",
DENTRY_PATH(dentry),
- (unsigned long) iocb->ki_left, (unsigned long) pos);
+ (unsigned long) iov_length(iov, nr_segs), (unsigned long) pos);

result = smb_revalidate_inode(dentry);
if (result) {
@@ -341,7 +341,7 @@ smb_file_aio_write(struct kiocb *iocb, c
if (result)
goto out;

- if (iocb->ki_left > 0) {
+ if (iov_length(iov, nr_segs) > 0) {
result = generic_file_aio_write(iocb, iov, nr_segs, pos);
VERBOSE("pos=%ld, size=%ld, mtime=%ld, atime=%ld\n",
(long) file->f_pos, (long) dentry->d_inode->i_size,
diff -urpN -X dontdiff a/fs/udf/file.c b/fs/udf/file.c
--- a/fs/udf/file.c 2007-01-10 11:53:02.000000000 -0800
+++ b/fs/udf/file.c 2007-01-10 12:42:09.000000000 -0800
@@ -109,7 +109,7 @@ static ssize_t udf_file_aio_write(struct
struct file *file = iocb->ki_filp;
struct inode *inode = file->f_path.dentry->d_inode;
int err, pos;
- size_t count = iocb->ki_left;
+ size_t count = iov_length(iov, nr_segs);

if (UDF_I_ALLOCTYPE(inode) == ICBTAG_FLAG_AD_IN_ICB)
{
diff -urpN -X dontdiff a/net/socket.c b/net/socket.c
--- a/net/socket.c 2007-01-10 12:40:54.000000000 -0800
+++ b/net/socket.c 2007-01-10 12:42:09.000000000 -0800
@@ -632,7 +632,7 @@ static ssize_t sock_aio_read(struct kioc
if (pos != 0)
return -ESPIPE;

- if (iocb->ki_left == 0) /* Match SYS5 behaviour */
+ if (iov_length(iov, nr_segs) == 0) /* Match SYS5 behaviour */
return 0;

for (i = 0; i < nr_segs; i++)
@@ -660,7 +660,7 @@ static ssize_t sock_aio_write(struct kio
if (pos != 0)
return -ESPIPE;

- if (iocb->ki_left == 0) /* Match SYS5 behaviour */
+ if (iov_length(iov, nr_segs) == 0) /* Match SYS5 behaviour */
return 0;

for (i = 0; i < nr_segs; i++)

2007-01-16 02:06:31

by Nate Diller

[permalink] [raw]
Subject: [PATCH -mm 9/10][RFC] aio: usb gadget remove aio file ops

This removes the aio implementation from the usb gadget file system. Aside
from making very creative (!) use of the aio retry path, it can't be of any
use performance-wise because it always kmalloc()s a bounce buffer for the
*whole* I/O size. Perhaps the only reason to keep it around is the ability
to cancel I/O requests, which only applies when using the user space async
I/O interface. I highly doubt that is enough incentive to justify the extra
complexity here or in user-space, so I think it's a safe bet to remove this.
If that feature still desired, it would be possible to implement a sync
interface that does an interruptible sleep.

I can be convinced otherwise, but the alternatives are difficult. See for
example the "fuse, get_user_pages, flush_anon_page, aliasing caches and all
that again" LKML thread recently for why it's waaay easier to kmalloc a
bounce buffer here, and (ab)use the retry interface.

---

diff -urpN -X dontdiff a/drivers/usb/gadget/inode.c b/drivers/usb/gadget/inode.c
--- a/drivers/usb/gadget/inode.c 2007-01-10 13:23:46.000000000 -0800
+++ b/drivers/usb/gadget/inode.c 2007-01-10 16:56:09.000000000 -0800
@@ -527,218 +527,6 @@ static int ep_ioctl (struct inode *inode

/*----------------------------------------------------------------------*/

-/* ASYNCHRONOUS ENDPOINT I/O OPERATIONS (bulk/intr/iso) */
-
-struct kiocb_priv {
- struct usb_request *req;
- struct ep_data *epdata;
- void *buf;
- const struct iovec *iv;
- unsigned long nr_segs;
- unsigned actual;
-};
-
-static int ep_aio_cancel(struct kiocb *iocb, struct io_event *e)
-{
- struct kiocb_priv *priv = iocb->private;
- struct ep_data *epdata;
- int value;
-
- local_irq_disable();
- epdata = priv->epdata;
- // spin_lock(&epdata->dev->lock);
- kiocbSetCancelled(iocb);
- if (likely(epdata && epdata->ep && priv->req))
- value = usb_ep_dequeue (epdata->ep, priv->req);
- else
- value = -EINVAL;
- // spin_unlock(&epdata->dev->lock);
- local_irq_enable();
-
- aio_put_req(iocb);
- return value;
-}
-
-static int ep_aio_read_retry(struct kiocb *iocb)
-{
- struct kiocb_priv *priv = iocb->private;
- ssize_t total;
- int i, err = 0;
-
- /* we "retry" to get the right mm context for this: */
-
- /* copy stuff into user buffers */
- total = priv->actual;
- for (i=0; i < priv->nr_segs; i++) {
- ssize_t this = min((ssize_t)(priv->iv[i].iov_len), total);
-
- if (copy_to_user(priv->iv[i].iov_base, priv->buf, this)) {
- err = -EFAULT;
- break;
- }
-
- total -= this;
- if (total == 0)
- break;
- }
- kfree(priv->buf);
- kfree(priv);
- aio_put_req(iocb);
- return err;
-}
-
-static void ep_aio_complete(struct usb_ep *ep, struct usb_request *req)
-{
- struct kiocb *iocb = req->context;
- struct kiocb_priv *priv = iocb->private;
- struct ep_data *epdata = priv->epdata;
-
- /* lock against disconnect (and ideally, cancel) */
- spin_lock(&epdata->dev->lock);
- priv->req = NULL;
- priv->epdata = NULL;
- if (priv->iv == NULL
- || unlikely(req->actual == 0)
- || unlikely(kiocbIsCancelled(iocb))) {
- kfree(req->buf);
- kfree(priv);
- iocb->private = NULL;
- /* aio_complete() reports bytes-transferred _and_ faults */
- if (unlikely(kiocbIsCancelled(iocb)))
- aio_put_req(iocb);
- else
- aio_complete(iocb, req->actual, req->status);
- } else {
- /* retry() won't report both; so we hide some faults */
- if (unlikely(0 != req->status))
- DBG(epdata->dev, "%s fault %d len %d\n",
- ep->name, req->status, req->actual);
-
- priv->buf = req->buf;
- priv->actual = req->actual;
- kick_iocb(iocb);
- }
- spin_unlock(&epdata->dev->lock);
-
- usb_ep_free_request(ep, req);
- put_ep(epdata);
-}
-
-static ssize_t
-ep_aio_rwtail(
- struct kiocb *iocb,
- char *buf,
- size_t len,
- struct ep_data *epdata,
- const struct iovec *iv,
- unsigned long nr_segs
-)
-{
- struct kiocb_priv *priv;
- struct usb_request *req;
- ssize_t value;
-
- priv = kmalloc(sizeof *priv, GFP_KERNEL);
- if (!priv) {
- value = -ENOMEM;
-fail:
- kfree(buf);
- return value;
- }
- iocb->private = priv;
- priv->iv = iv;
- priv->nr_segs = nr_segs;
-
- value = get_ready_ep(iocb->ki_filp->f_flags, epdata);
- if (unlikely(value < 0)) {
- kfree(priv);
- goto fail;
- }
-
- iocb->ki_cancel = ep_aio_cancel;
- get_ep(epdata);
- priv->epdata = epdata;
- priv->actual = 0;
-
- /* each kiocb is coupled to one usb_request, but we can't
- * allocate or submit those if the host disconnected.
- */
- spin_lock_irq(&epdata->dev->lock);
- if (likely(epdata->ep)) {
- req = usb_ep_alloc_request(epdata->ep, GFP_ATOMIC);
- if (likely(req)) {
- priv->req = req;
- req->buf = buf;
- req->length = len;
- req->complete = ep_aio_complete;
- req->context = iocb;
- value = usb_ep_queue(epdata->ep, req, GFP_ATOMIC);
- if (unlikely(0 != value))
- usb_ep_free_request(epdata->ep, req);
- } else
- value = -EAGAIN;
- } else
- value = -ENODEV;
- spin_unlock_irq(&epdata->dev->lock);
-
- up(&epdata->lock);
-
- if (unlikely(value)) {
- kfree(priv);
- put_ep(epdata);
- } else
- value = (iv ? -EIOCBRETRY : -EIOCBQUEUED);
- return value;
-}
-
-static ssize_t
-ep_aio_read(struct kiocb *iocb, const struct iovec *iov,
- unsigned long nr_segs, loff_t o)
-{
- struct ep_data *epdata = iocb->ki_filp->private_data;
- char *buf;
- size_t len = iov_length(iov, nr_segs);
-
- if (unlikely(epdata->desc.bEndpointAddress & USB_DIR_IN))
- return -EINVAL;
-
- buf = kmalloc(len, GFP_KERNEL);
- if (unlikely(!buf))
- return -ENOMEM;
-
- iocb->ki_retry = ep_aio_read_retry;
- return ep_aio_rwtail(iocb, buf, len, epdata, iov, nr_segs);
-}
-
-static ssize_t
-ep_aio_write(struct kiocb *iocb, const struct iovec *iov,
- unsigned long nr_segs, loff_t o)
-{
- struct ep_data *epdata = iocb->ki_filp->private_data;
- char *buf;
- size_t len = 0;
- int i = 0;
-
- if (unlikely(!(epdata->desc.bEndpointAddress & USB_DIR_IN)))
- return -EINVAL;
-
- buf = kmalloc(iov_length(iov, nr_segs), GFP_KERNEL);
- if (unlikely(!buf))
- return -ENOMEM;
-
- for (i=0; i < nr_segs; i++) {
- if (unlikely(copy_from_user(&buf[len], iov[i].iov_base,
- iov[i].iov_len) != 0)) {
- kfree(buf);
- return -EFAULT;
- }
- len += iov[i].iov_len;
- }
- return ep_aio_rwtail(iocb, buf, len, epdata, NULL, 0);
-}
-
-/*----------------------------------------------------------------------*/
-
/* used after endpoint configuration */
static const struct file_operations ep_io_operations = {
.owner = THIS_MODULE,
@@ -748,9 +536,6 @@ static const struct file_operations ep_i
.write = ep_write,
.ioctl = ep_ioctl,
.release = ep_release,
-
- .aio_read = ep_aio_read,
- .aio_write = ep_aio_write,
};

/* ENDPOINT INITIALIZATION

2007-01-16 02:06:50

by Nate Diller

[permalink] [raw]
Subject: [PATCH -mm 7/10][RFC] aio: make __blockdev_direct_IO use file_endio_t

This converts the internals of __blockdev_direct_IO in fs/direct-io.c to use
a generic endio function, instead of directly calling aio_complete. It also
changes the semantics of dio_iodone to be more friendly to its only users,
xfs and ocfs2. This allows the caller to know how to release locks and tear
down data structures on error.

It also converts the _own_locking and _no_locking variants of
blockdev_direct_IO to use a generic endio function.

---

fs/direct-io.c | 74 ++++++++++++++++++++++++++------------------
fs/gfs2/ops_address.c | 6 +--
fs/ocfs2/aops.c | 15 ++------
fs/ocfs2/aops.h | 8 ----
fs/ocfs2/file.c | 18 ++++------
fs/ocfs2/inode.h | 2 -
fs/xfs/linux-2.6/xfs_aops.c | 33 +++++++------------
include/linux/fs.h | 57 ++++++++++++++++++---------------
8 files changed, 104 insertions(+), 109 deletions(-)

---

diff -urpN -X dontdiff a/fs/direct-io.c b/fs/direct-io.c
--- a/fs/direct-io.c 2007-01-12 14:53:48.000000000 -0800
+++ b/fs/direct-io.c 2007-01-12 15:06:44.000000000 -0800
@@ -67,7 +67,7 @@ struct dio {
struct bio *bio; /* bio under assembly */
struct inode *inode;
int rw;
- loff_t i_size; /* i_size when submitted */
+ unsigned max_to_read; /* (i_size when submitted) - offset */
int lock_type; /* doesn't change */
unsigned blkbits; /* doesn't change */
unsigned blkfactor; /* When we're using an alignment which
@@ -89,6 +89,7 @@ struct dio {
int reap_counter; /* rate limit reaping */
get_block_t *get_block; /* block mapping function */
dio_iodone_t *end_io; /* IO completion function */
+ void *destructor_data; /* private data for completion fn */
sector_t final_block_in_bio; /* current final block in bio + 1 */
sector_t next_block_for_io; /* next block to be put under IO,
in dio_blocks units */
@@ -127,7 +128,8 @@ struct dio {
struct task_struct *waiter; /* waiting task (NULL if none) */

/* AIO related stuff */
- struct kiocb *iocb; /* kiocb */
+ file_endio_t *file_endio; /* aio completion function */
+ void *endio_data; /* private data for aio completion */
int is_async; /* is IO async ? */
int io_error; /* IO error in completion path */
ssize_t result; /* IO result */
@@ -222,7 +224,7 @@ static struct page *dio_get_page(struct
* filesystems can use it to hold additional state between get_block calls and
* dio_complete.
*/
-static int dio_complete(struct dio *dio, loff_t offset, int ret)
+static int dio_complete(struct dio *dio, int ret)
{
/*
* AIO submission can race with bio completion to get here while
@@ -232,25 +234,21 @@ static int dio_complete(struct dio *dio,
*/
if (ret == -EIOCBQUEUED)
ret = 0;
+ if (ret == 0)
+ ret = dio->page_errors;
+ if (ret == 0)
+ ret = dio->io_error;

if (dio->result) {
/* Check for short read case */
- if ((dio->rw == READ) && ((offset + dio->result) > dio->i_size))
- dio->result = dio->i_size - offset;
+ if ((dio->rw == READ) && (dio->result > dio->max_to_read))
+ dio->result = dio->max_to_read;
}

- if (dio->end_io && dio->result)
- dio->end_io(dio->iocb, offset, dio->result,
- dio->map_bh.b_private);
if (dio->lock_type == DIO_LOCKING)
/* lockdep: non-owner release */
up_read_non_owner(&dio->inode->i_alloc_sem);

- if (ret == 0)
- ret = dio->page_errors;
- if (ret == 0)
- ret = dio->io_error;
-
return ret;
}

@@ -277,8 +275,11 @@ static int dio_bio_end_aio(struct bio *b
spin_unlock_irqrestore(&dio->bio_lock, flags);

if (remaining == 0) {
- int err = dio_complete(dio, dio->iocb->ki_pos, 0);
- aio_complete(dio->iocb, dio->result, err);
+ int err = dio_complete(dio, 0);
+ if (dio->end_io)
+ dio->end_io(dio->destructor_data, dio->result,
+ dio->map_bh.b_private);
+ dio->file_endio(dio->endio_data, dio->result, err);
kfree(dio);
}

@@ -944,10 +945,11 @@ out:
* Releases both i_mutex and i_alloc_sem
*/
static ssize_t
-direct_io_worker(int rw, struct kiocb *iocb, struct inode *inode,
+direct_io_worker(int rw, struct file *file, struct inode *inode,
const struct iovec *iov, loff_t offset, unsigned long nr_segs,
unsigned blkbits, get_block_t get_block, dio_iodone_t end_io,
- struct dio *dio)
+ void *destructor_data, struct dio *dio, file_endio_t *file_endio,
+ void *endio_data)
{
unsigned long user_addr;
unsigned long flags;
@@ -971,6 +973,7 @@ direct_io_worker(int rw, struct kiocb *i
dio->reap_counter = 0;
dio->get_block = get_block;
dio->end_io = end_io;
+ dio->destructor_data = destructor_data;
dio->map_bh.b_private = NULL;
dio->final_block_in_bio = -1;
dio->next_block_for_io = -1;
@@ -978,8 +981,9 @@ direct_io_worker(int rw, struct kiocb *i
dio->page_errors = 0;
dio->io_error = 0;
dio->result = 0;
- dio->iocb = iocb;
- dio->i_size = i_size_read(inode);
+ dio->file_endio = file_endio;
+ dio->endio_data = endio_data;
+ dio->max_to_read = i_size_read(inode) - offset;

spin_lock_init(&dio->bio_lock);
dio->refcount = 1;
@@ -1103,9 +1107,18 @@ direct_io_worker(int rw, struct kiocb *i
spin_unlock_irqrestore(&dio->bio_lock, flags);
BUG_ON(!dio->is_async && ret2 != 0);
if (ret2 == 0) {
- ret = dio_complete(dio, offset, ret);
- if (ret == 0)
+ ret = dio_complete(dio, ret);
+ if (ret == 0) {
+ /*
+ * we guarantee to call end_io unless we return a
+ * real error, ie. not -EIOCBQUEUED, which can never
+ * happen here, so call it unconditionally
+ */
+ if (dio->end_io)
+ dio->end_io(dio->destructor_data, dio->result,
+ dio->map_bh.b_private);
ret = dio->result; /* bytes transferred */
+ }
kfree(dio);
} else
BUG_ON(ret != -EIOCBQUEUED);
@@ -1135,14 +1148,16 @@ direct_io_worker(int rw, struct kiocb *i
* Additional i_alloc_sem locking requirements described inline below.
*/
ssize_t
-__blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
- struct block_device *bdev, const struct iovec *iov, loff_t offset,
- unsigned long nr_segs, get_block_t get_block, dio_iodone_t end_io,
- int dio_lock_type)
+__blockdev_direct_IO(int rw, struct file *file, const struct iovec *iov,
+ loff_t offset, unsigned long nr_segs, get_block_t get_block,
+ dio_iodone_t end_io, void *destructor_data, file_endio_t *file_endio,
+ void *endio_data, int dio_lock_type)
{
int seg;
size_t size;
unsigned long addr;
+ struct inode *inode = file->f_mapping->host;
+ struct block_device *bdev = inode->i_sb->s_bdev;
unsigned blkbits = inode->i_blkbits;
unsigned bdev_blkbits = 0;
unsigned blocksize_mask = (1 << blkbits) - 1;
@@ -1202,7 +1217,7 @@ __blockdev_direct_IO(int rw, struct kioc
if (rw == READ && end > offset) {
struct address_space *mapping;

- mapping = iocb->ki_filp->f_mapping;
+ mapping = file->f_mapping;
if (dio_lock_type != DIO_OWN_LOCKING) {
mutex_lock(&inode->i_mutex);
release_i_mutex = 1;
@@ -1232,11 +1247,12 @@ __blockdev_direct_IO(int rw, struct kioc
* even for AIO, we need to wait for i/o to complete before
* returning in this case.
*/
- dio->is_async = !is_sync_kiocb(iocb) && !((rw & WRITE) &&
+ dio->is_async = file_endio && !((rw & WRITE) &&
(end > i_size_read(inode)));

- retval = direct_io_worker(rw, iocb, inode, iov, offset,
- nr_segs, blkbits, get_block, end_io, dio);
+ retval = direct_io_worker(rw, file, inode, iov, offset,
+ nr_segs, blkbits, get_block, end_io,
+ destructor_data, dio, file_endio, endio_data);

if (rw == READ && dio_lock_type == DIO_LOCKING)
release_i_mutex = 0;
diff -urpN -X dontdiff a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c
--- a/fs/gfs2/ops_address.c 2007-01-12 11:19:45.000000000 -0800
+++ b/fs/gfs2/ops_address.c 2007-01-12 15:06:44.000000000 -0800
@@ -628,9 +628,9 @@ static ssize_t gfs2_direct_IO(int rw, st
if (rv != 1)
goto out; /* dio not valid, fall back to buffered i/o */

- rv = blockdev_direct_IO_no_locking(rw, iocb, inode, inode->i_sb->s_bdev,
- iov, offset, nr_segs,
- gfs2_get_block_direct, NULL);
+ rv = blockdev_direct_IO_no_locking(rw, file, iov, offset, nr_segs,
+ gfs2_get_block_direct, NULL, NULL,
+ aio_complete, iocb);
out:
gfs2_glock_dq_m(1, &gh);
gfs2_holder_uninit(&gh);
diff -urpN -X dontdiff a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
--- a/fs/ocfs2/aops.c 2007-01-12 11:19:45.000000000 -0800
+++ b/fs/ocfs2/aops.c 2007-01-12 15:06:44.000000000 -0800
@@ -600,16 +600,12 @@ bail:
* i_alloc_sem, we use the rw_lock DLM lock to protect io on one node from
* truncation on another.
*/
-static void ocfs2_dio_end_io(struct kiocb *iocb,
- loff_t offset,
+static void ocfs2_dio_end_io(void *destructor_data,
ssize_t bytes,
void *private)
{
- struct inode *inode = iocb->ki_filp->f_path.dentry->d_inode;
+ struct inode *inode = destructor_data;

- /* this io's submitter should not have unlocked this before we could */
- BUG_ON(!ocfs2_iocb_is_rw_locked(iocb));
- ocfs2_iocb_clear_rw_locked(iocb);
up_read(&inode->i_alloc_sem);
ocfs2_rw_unlock(inode, 0);
}
@@ -644,11 +640,10 @@ static ssize_t ocfs2_direct_IO(int rw,
}
ocfs2_data_unlock(inode, 0);

- ret = blockdev_direct_IO_no_locking(rw, iocb, inode,
- inode->i_sb->s_bdev, iov, offset,
- nr_segs,
+ ret = blockdev_direct_IO_no_locking(rw, file, iov, offset, nr_segs,
ocfs2_direct_IO_get_blocks,
- ocfs2_dio_end_io);
+ ocfs2_dio_end_io, inode,
+ aio_complete, iocb);
out:
mlog_exit(ret);
return ret;
diff -urpN -X dontdiff a/fs/ocfs2/aops.h b/fs/ocfs2/aops.h
--- a/fs/ocfs2/aops.h 2007-01-12 11:18:52.000000000 -0800
+++ b/fs/ocfs2/aops.h 2007-01-12 15:06:44.000000000 -0800
@@ -30,12 +30,4 @@ handle_t *ocfs2_start_walk_page_trans(st
unsigned from,
unsigned to);

-/* all ocfs2_dio_end_io()'s fault */
-#define ocfs2_iocb_is_rw_locked(iocb) \
- test_bit(0, (unsigned long *)&iocb->private)
-#define ocfs2_iocb_set_rw_locked(iocb) \
- set_bit(0, (unsigned long *)&iocb->private)
-#define ocfs2_iocb_clear_rw_locked(iocb) \
- clear_bit(0, (unsigned long *)&iocb->private)
-
#endif /* OCFS2_FILE_H */
diff -urpN -X dontdiff a/fs/ocfs2/file.c b/fs/ocfs2/file.c
--- a/fs/ocfs2/file.c 2007-01-12 14:23:40.000000000 -0800
+++ b/fs/ocfs2/file.c 2007-01-12 15:06:44.000000000 -0800
@@ -1183,9 +1183,6 @@ static ssize_t ocfs2_file_aio_write(stru
goto out;
}

- /* communicate with ocfs2_dio_end_io */
- ocfs2_iocb_set_rw_locked(iocb);
-
ret = generic_file_aio_write_nolock(iocb, iov, nr_segs, iocb->ki_pos);

/* buffered aio wouldn't have proper lock coverage today */
@@ -1196,12 +1193,13 @@ static ssize_t ocfs2_file_aio_write(stru
* function pointer which is called when o_direct io completes so that
* it can unlock our rw lock. (it's the clustered equivalent of
* i_alloc_sem; protects truncate from racing with pending ios).
- * Unfortunately there are error cases which call end_io and others
- * that don't. so we don't have to unlock the rw_lock if either an
- * async dio is going to do it in the future or an end_io after an
- * error has already done it.
+ *
+ * The direct_IO code guarantees that it will call end_io unless it
+ * encountered a real error, ie. not -EIOCBQUEUED, so we don't have
+ * to unlock the rw_lock if either an async dio is going to do it in
+ * the future or an end_io has already done it.
*/
- if (ret == -EIOCBQUEUED || !ocfs2_iocb_is_rw_locked(iocb)) {
+ if (ret >= 0 || ret == -EIOCBQUEUED) {
rw_level = -1;
have_alloc_sem = 0;
}
@@ -1322,8 +1320,6 @@ static ssize_t ocfs2_file_aio_read(struc
goto bail;
}
rw_level = 0;
- /* communicate with ocfs2_dio_end_io */
- ocfs2_iocb_set_rw_locked(iocb);
}

/*
@@ -1350,7 +1346,7 @@ static ssize_t ocfs2_file_aio_read(struc
BUG_ON(ret == -EIOCBQUEUED && !(filp->f_flags & O_DIRECT));

/* see ocfs2_file_aio_write */
- if (ret == -EIOCBQUEUED || !ocfs2_iocb_is_rw_locked(iocb)) {
+ if (ret >= 0 || ret == -EIOCBQUEUED) {
rw_level = -1;
have_alloc_sem = 0;
}
diff -urpN -X dontdiff a/fs/ocfs2/inode.h b/fs/ocfs2/inode.h
--- a/fs/ocfs2/inode.h 2007-01-12 11:18:52.000000000 -0800
+++ b/fs/ocfs2/inode.h 2007-01-12 15:06:44.000000000 -0800
@@ -139,8 +139,6 @@ void ocfs2_refresh_inode(struct inode *i
int ocfs2_mark_inode_dirty(handle_t *handle,
struct inode *inode,
struct buffer_head *bh);
-int ocfs2_aio_read(struct file *file, struct kiocb *req, struct iocb *iocb);
-int ocfs2_aio_write(struct file *file, struct kiocb *req, struct iocb *iocb);

void ocfs2_set_inode_flags(struct inode *inode);

diff -urpN -X dontdiff a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c
--- a/fs/xfs/linux-2.6/xfs_aops.c 2007-01-12 11:19:47.000000000 -0800
+++ b/fs/xfs/linux-2.6/xfs_aops.c 2007-01-12 15:06:44.000000000 -0800
@@ -1332,12 +1332,11 @@ xfs_get_blocks_direct(

STATIC void
xfs_end_io_direct(
- struct kiocb *iocb,
- loff_t offset,
+ void *destructor_data,
ssize_t size,
void *private)
{
- xfs_ioend_t *ioend = iocb->private;
+ xfs_ioend_t *ioend = destructor_data;

/*
* Non-NULL private data means we need to issue a transaction to
@@ -1352,19 +1351,11 @@ xfs_end_io_direct(
* go away.
*/
if (private && size > 0) {
- ioend->io_offset = offset;
ioend->io_size = size;
xfs_finish_ioend(ioend);
} else {
xfs_destroy_ioend(ioend);
}
-
- /*
- * blockdev_direct_IO can return an error even after the I/O
- * completion handler was called. Thus we need to protect
- * against double-freeing.
- */
- iocb->private = NULL;
}

STATIC ssize_t
@@ -1379,6 +1370,7 @@ xfs_vm_direct_IO(
struct inode *inode = file->f_mapping->host;
bhv_vnode_t *vp = vn_from_inode(inode);
xfs_iomap_t iomap;
+ xfs_ioend_t *ioend;
int maps = 1;
int error;
ssize_t ret;
@@ -1387,24 +1379,25 @@ xfs_vm_direct_IO(
if (error)
return -error;

- iocb->private = xfs_alloc_ioend(inode, IOMAP_UNWRITTEN);
+ ioend = xfs_alloc_ioend(inode, IOMAP_UNWRITTEN);
+ ioend->io_offset = offset;

if (rw == WRITE) {
- ret = blockdev_direct_IO_own_locking(rw, iocb, inode,
- iomap.iomap_target->bt_bdev,
+ ret = blockdev_direct_IO_own_locking(rw, file,
iov, offset, nr_segs,
xfs_get_blocks_direct,
- xfs_end_io_direct);
+ xfs_end_io_direct, ioend,
+ aio_complete, iocb);
} else {
- ret = blockdev_direct_IO_no_locking(rw, iocb, inode,
- iomap.iomap_target->bt_bdev,
+ ret = blockdev_direct_IO_no_locking(rw, file,
iov, offset, nr_segs,
xfs_get_blocks_direct,
- xfs_end_io_direct);
+ xfs_end_io_direct, ioend,
+ aio_complete, iocb);
}

- if (unlikely(ret != -EIOCBQUEUED && iocb->private))
- xfs_destroy_ioend(iocb->private);
+ if (unlikely(ret < 0 && ret != -EIOCBQUEUED))
+ xfs_destroy_ioend(ioend);
return ret;
}

diff -urpN -X dontdiff a/include/linux/fs.h b/include/linux/fs.h
--- a/include/linux/fs.h 2007-01-12 14:53:48.000000000 -0800
+++ b/include/linux/fs.h 2007-01-12 15:06:44.000000000 -0800
@@ -306,8 +306,8 @@ extern void __init files_init(unsigned l
struct buffer_head;
typedef int (get_block_t)(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create);
-typedef void (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
- ssize_t bytes, void *private);
+typedef void (dio_iodone_t)(void *destructor_data, ssize_t bytes,
+ void *private);

/*
* Attribute flags. These should be or-ed together to figure out what
@@ -1833,10 +1833,10 @@ static inline void do_generic_file_read(
}

#ifdef CONFIG_BLOCK
-ssize_t __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
- struct block_device *bdev, const struct iovec *iov, loff_t offset,
- unsigned long nr_segs, get_block_t get_block, dio_iodone_t end_io,
- int lock_type);
+ssize_t __blockdev_direct_IO(int rw, struct file *file,
+ const struct iovec *iov, loff_t offset, unsigned long nr_segs,
+ get_block_t get_block, dio_iodone_t end_io, void *destructor_data,
+ file_endio_t *endio, void *endio_data, int lock_type);

enum {
DIO_LOCKING = 1, /* need locking between buffered and direct access */
@@ -1849,26 +1849,31 @@ static inline ssize_t blockdev_direct_IO
loff_t offset, unsigned long nr_segs, get_block_t get_block,
dio_iodone_t end_io)
{
- return __blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset,
- nr_segs, get_block, end_io, DIO_LOCKING);
-}
-
-static inline ssize_t blockdev_direct_IO_no_locking(int rw, struct kiocb *iocb,
- struct inode *inode, struct block_device *bdev, const struct iovec *iov,
- loff_t offset, unsigned long nr_segs, get_block_t get_block,
- dio_iodone_t end_io)
-{
- return __blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset,
- nr_segs, get_block, end_io, DIO_NO_LOCKING);
-}
-
-static inline ssize_t blockdev_direct_IO_own_locking(int rw, struct kiocb *iocb,
- struct inode *inode, struct block_device *bdev, const struct iovec *iov,
- loff_t offset, unsigned long nr_segs, get_block_t get_block,
- dio_iodone_t end_io)
-{
- return __blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset,
- nr_segs, get_block, end_io, DIO_OWN_LOCKING);
+ struct file *file = iocb->ki_filp;
+ file_endio_t *file_endio = &aio_complete;
+ void *endio_data = iocb;
+ return __blockdev_direct_IO(rw, file, iov, offset, nr_segs, get_block,
+ end_io, NULL, file_endio, endio_data, DIO_LOCKING);
+}
+
+static inline ssize_t blockdev_direct_IO_no_locking(int rw, struct file *file,
+ const struct iovec *iov, loff_t offset, unsigned long nr_segs,
+ get_block_t get_block, dio_iodone_t end_io, void *destructor_data,
+ file_endio_t file_endio, void *endio_data)
+{
+ return __blockdev_direct_IO(rw, file, iov, offset, nr_segs, get_block,
+ end_io, destructor_data, file_endio, endio_data,
+ DIO_NO_LOCKING);
+}
+
+static inline ssize_t blockdev_direct_IO_own_locking(int rw, struct file *file,
+ const struct iovec *iov, loff_t offset, unsigned long nr_segs,
+ get_block_t get_block, dio_iodone_t end_io, void *destructor_data,
+ file_endio_t file_endio, void *endio_data)
+{
+ return __blockdev_direct_IO(rw, file, iov, offset, nr_segs, get_block,
+ end_io, destructor_data, file_endio, endio_data,
+ DIO_OWN_LOCKING);
}
#endif

2007-01-16 02:16:30

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH -mm 3/10][RFC] aio: use iov_length instead of ki_left

On Mon, Jan 15, 2007 at 05:54:50PM -0800, Nate Diller wrote:
> Convert code using iocb->ki_left to use the more generic iov_length() call.

No way. We need to reduce the numer of iovec traversals, not adding
more of them.

2007-01-16 02:37:51

by Nate Diller

[permalink] [raw]
Subject: [PATCH -mm 8/10][RFC] aio: make direct_IO aops use file_endio_t

This converts the _locking variant of blockdev_direct_IO to use a generic
endio function, and updates all the FS callsites.

---

Documentation/filesystems/Locking | 5 +++--
Documentation/filesystems/vfs.txt | 5 +++--
fs/block_dev.c | 9 ++++-----
fs/ext2/inode.c | 12 +++++-------
fs/ext3/inode.c | 11 +++++------
fs/ext4/inode.c | 11 +++++------
fs/fat/inode.c | 12 ++++++------
fs/gfs2/ops_address.c | 8 ++++----
fs/hfs/inode.c | 13 ++++++-------
fs/hfsplus/inode.c | 13 ++++++-------
fs/jfs/inode.c | 12 +++++-------
fs/nfs/direct.c | 8 +++++---
fs/ocfs2/aops.c | 9 +++++----
fs/reiserfs/inode.c | 13 +++++--------
fs/xfs/linux-2.6/xfs_aops.c | 11 ++++++-----
fs/xfs/linux-2.6/xfs_lrw.c | 4 ++--
include/linux/fs.h | 28 +++++++++++++---------------
include/linux/nfs_fs.h | 4 ++--
mm/filemap.c | 34 ++++++++++++++++++----------------
19 files changed, 108 insertions(+), 114 deletions(-)

---

diff -urpN -X dontdiff a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
--- a/Documentation/filesystems/Locking 2007-01-12 20:26:06.000000000 -0800
+++ b/Documentation/filesystems/Locking 2007-01-12 20:42:37.000000000 -0800
@@ -169,8 +169,9 @@ prototypes:
sector_t (*bmap)(struct address_space *, sector_t);
int (*invalidatepage) (struct page *, unsigned long);
int (*releasepage) (struct page *, int);
- int (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
- loff_t offset, unsigned long nr_segs);
+ int (*direct_IO)(int, struct file *, const struct iovec *iov,
+ loff_t offset, unsigned long nr_segs,
+ file_endio_t *endio, void *endio_data);
int (*launder_page) (struct page *);

locking rules:
diff -urpN -X dontdiff a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
--- a/Documentation/filesystems/vfs.txt 2007-01-12 20:26:06.000000000 -0800
+++ b/Documentation/filesystems/vfs.txt 2007-01-12 20:42:37.000000000 -0800
@@ -537,8 +537,9 @@ struct address_space_operations {
sector_t (*bmap)(struct address_space *, sector_t);
int (*invalidatepage) (struct page *, unsigned long);
int (*releasepage) (struct page *, int);
- ssize_t (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
- loff_t offset, unsigned long nr_segs);
+ ssize_t (*direct_IO)(int, struct file *, const struct iovec *iov,
+ loff_t offset, unsigned long nr_segs,
+ file_endio_t *endio, void *endio_data);
struct page* (*get_xip_page)(struct address_space *, sector_t,
int);
/* migrate the contents of a page to the specified target */
diff -urpN -X dontdiff a/fs/block_dev.c b/fs/block_dev.c
--- a/fs/block_dev.c 2007-01-12 20:29:02.000000000 -0800
+++ b/fs/block_dev.c 2007-01-12 20:42:37.000000000 -0800
@@ -222,10 +222,11 @@ static void blk_unget_page(struct page *
}

static ssize_t
-blkdev_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
- loff_t pos, unsigned long nr_segs)
+blkdev_direct_IO(int rw, struct file *file, const struct iovec *iov,
+ loff_t pos, unsigned long nr_segs, file_endio_t *endio,
+ void *endio_data)
{
- struct inode *inode = iocb->ki_filp->f_mapping->host;
+ struct inode *inode = file->f_mapping->host;
unsigned blkbits = blksize_bits(bdev_hardsect_size(I_BDEV(inode)));
unsigned blocksize_mask = (1 << blkbits) - 1;
unsigned long seg = 0; /* iov segment iterator */
@@ -239,8 +240,6 @@ blkdev_direct_IO(int rw, struct kiocb *i
loff_t size; /* size of block device */
struct bio *bio;
struct bdev_aio stack_io, *io;
- file_endio_t *endio = aio_complete;
- void *endio_data = iocb;
struct page *page;
struct pvec pvec;

diff -urpN -X dontdiff a/fs/ext2/inode.c b/fs/ext2/inode.c
--- a/fs/ext2/inode.c 2007-01-12 20:26:06.000000000 -0800
+++ b/fs/ext2/inode.c 2007-01-12 20:42:37.000000000 -0800
@@ -752,14 +752,12 @@ static sector_t ext2_bmap(struct address
}

static ssize_t
-ext2_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
- loff_t offset, unsigned long nr_segs)
+ext2_direct_IO(int rw, struct file *file, const struct iovec *iov,
+ loff_t offset, unsigned long nr_segs, file_endio_t *endio,
+ void *endio_data)
{
- struct file *file = iocb->ki_filp;
- struct inode *inode = file->f_mapping->host;
-
- return blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov,
- offset, nr_segs, ext2_get_block, NULL);
+ return blockdev_direct_IO(rw, file, iov, offset, nr_segs,
+ ext2_get_block, endio, endio_data);
}

static int
diff -urpN -X dontdiff a/fs/ext3/inode.c b/fs/ext3/inode.c
--- a/fs/ext3/inode.c 2007-01-12 20:26:06.000000000 -0800
+++ b/fs/ext3/inode.c 2007-01-12 20:42:37.000000000 -0800
@@ -1681,11 +1681,11 @@ static int ext3_releasepage(struct page
* If the O_DIRECT write is intantiating holes inside i_size and the machine
* crashes then stale disk data _may_ be exposed inside the file.
*/
-static ssize_t ext3_direct_IO(int rw, struct kiocb *iocb,
+static ssize_t ext3_direct_IO(int rw, struct file *file,
const struct iovec *iov, loff_t offset,
- unsigned long nr_segs)
+ unsigned long nr_segs, file_endio_t *endio,
+ void *endio_data)
{
- struct file *file = iocb->ki_filp;
struct inode *inode = file->f_mapping->host;
struct ext3_inode_info *ei = EXT3_I(inode);
handle_t *handle = NULL;
@@ -1710,9 +1710,8 @@ static ssize_t ext3_direct_IO(int rw, st
}
}

- ret = blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov,
- offset, nr_segs,
- ext3_get_block, NULL);
+ ret = blockdev_direct_IO(rw, file, iov, offset, nr_segs,
+ ext3_get_block, endio, endio_data);

/*
* Reacquire the handle: ext3_get_block() can restart the transaction
diff -urpN -X dontdiff a/fs/ext4/inode.c b/fs/ext4/inode.c
--- a/fs/ext4/inode.c 2007-01-12 20:26:06.000000000 -0800
+++ b/fs/ext4/inode.c 2007-01-12 20:42:37.000000000 -0800
@@ -1680,11 +1680,11 @@ static int ext4_releasepage(struct page
* If the O_DIRECT write is intantiating holes inside i_size and the machine
* crashes then stale disk data _may_ be exposed inside the file.
*/
-static ssize_t ext4_direct_IO(int rw, struct kiocb *iocb,
+static ssize_t ext4_direct_IO(int rw, struct file *file,
const struct iovec *iov, loff_t offset,
- unsigned long nr_segs)
+ unsigned long nr_segs, file_endio_t *endio,
+ void *endio_data)
{
- struct file *file = iocb->ki_filp;
struct inode *inode = file->f_mapping->host;
struct ext4_inode_info *ei = EXT4_I(inode);
handle_t *handle = NULL;
@@ -1709,9 +1709,8 @@ static ssize_t ext4_direct_IO(int rw, st
}
}

- ret = blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov,
- offset, nr_segs,
- ext4_get_block, NULL);
+ ret = blockdev_direct_IO(rw, file, iov, offset, nr_segs,
+ ext4_get_block, endio, endio_data);

/*
* Reacquire the handle: ext4_get_block() can restart the transaction
diff -urpN -X dontdiff a/fs/fat/inode.c b/fs/fat/inode.c
--- a/fs/fat/inode.c 2007-01-12 20:26:06.000000000 -0800
+++ b/fs/fat/inode.c 2007-01-12 20:42:37.000000000 -0800
@@ -159,11 +159,11 @@ static int fat_commit_write(struct file
return err;
}

-static ssize_t fat_direct_IO(int rw, struct kiocb *iocb,
- const struct iovec *iov,
- loff_t offset, unsigned long nr_segs)
+static ssize_t fat_direct_IO(int rw, struct file *file,
+ const struct iovec *iov, loff_t offset,
+ unsigned long nr_segs, file_endio_t *endio,
+ void *endio_data)
{
- struct file *file = iocb->ki_filp;
struct inode *inode = file->f_mapping->host;

if (rw == WRITE) {
@@ -183,8 +183,8 @@ static ssize_t fat_direct_IO(int rw, str
* FAT need to use the DIO_LOCKING for avoiding the race
* condition of fat_get_block() and ->truncate().
*/
- return blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov,
- offset, nr_segs, fat_get_block, NULL);
+ return blockdev_direct_IO(rw, file, iov, offset, nr_segs,
+ fat_get_block, endio, endio_data);
}

static sector_t _fat_bmap(struct address_space *mapping, sector_t block)
diff -urpN -X dontdiff a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c
--- a/fs/gfs2/ops_address.c 2007-01-12 20:57:42.000000000 -0800
+++ b/fs/gfs2/ops_address.c 2007-01-12 20:42:37.000000000 -0800
@@ -602,11 +602,11 @@ static int gfs2_ok_for_dio(struct gfs2_i



-static ssize_t gfs2_direct_IO(int rw, struct kiocb *iocb,
+static ssize_t gfs2_direct_IO(int rw, struct file *file,
const struct iovec *iov, loff_t offset,
- unsigned long nr_segs)
+ unsigned long nr_segs, file_endio_t *endio,
+ void *endio_data)
{
- struct file *file = iocb->ki_filp;
struct inode *inode = file->f_mapping->host;
struct gfs2_inode *ip = GFS2_I(inode);
struct gfs2_holder gh;
@@ -630,7 +630,7 @@ static ssize_t gfs2_direct_IO(int rw, st

rv = blockdev_direct_IO_no_locking(rw, file, iov, offset, nr_segs,
gfs2_get_block_direct, NULL, NULL,
- aio_complete, iocb);
+ endio, endio_data);
out:
gfs2_glock_dq_m(1, &gh);
gfs2_holder_uninit(&gh);
diff -urpN -X dontdiff a/fs/hfs/inode.c b/fs/hfs/inode.c
--- a/fs/hfs/inode.c 2007-01-12 20:26:06.000000000 -0800
+++ b/fs/hfs/inode.c 2007-01-12 20:42:37.000000000 -0800
@@ -98,14 +98,13 @@ static int hfs_releasepage(struct page *
return res ? try_to_free_buffers(page) : 0;
}

-static ssize_t hfs_direct_IO(int rw, struct kiocb *iocb,
- const struct iovec *iov, loff_t offset, unsigned long nr_segs)
+static ssize_t hfs_direct_IO(int rw, struct file *file,
+ const struct iovec *iov, loff_t offset,
+ unsigned long nr_segs, file_endio_t *endio,
+ void *endio_data)
{
- struct file *file = iocb->ki_filp;
- struct inode *inode = file->f_path.dentry->d_inode->i_mapping->host;
-
- return blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov,
- offset, nr_segs, hfs_get_block, NULL);
+ return blockdev_direct_IO(rw, file, iov, offset, nr_segs,
+ hfs_get_block, endio, endio_data);
}

static int hfs_writepages(struct address_space *mapping,
diff -urpN -X dontdiff a/fs/hfsplus/inode.c b/fs/hfsplus/inode.c
--- a/fs/hfsplus/inode.c 2007-01-12 20:26:06.000000000 -0800
+++ b/fs/hfsplus/inode.c 2007-01-12 20:42:37.000000000 -0800
@@ -93,14 +93,13 @@ static int hfsplus_releasepage(struct pa
return res ? try_to_free_buffers(page) : 0;
}

-static ssize_t hfsplus_direct_IO(int rw, struct kiocb *iocb,
- const struct iovec *iov, loff_t offset, unsigned long nr_segs)
+static ssize_t hfsplus_direct_IO(int rw, struct file *file,
+ const struct iovec *iov,
+ loff_t offset, unsigned long nr_segs,
+ file_endio_t *endio, void *endio_data)
{
- struct file *file = iocb->ki_filp;
- struct inode *inode = file->f_path.dentry->d_inode->i_mapping->host;
-
- return blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov,
- offset, nr_segs, hfsplus_get_block, NULL);
+ return blockdev_direct_IO(rw, file, iov, offset, nr_segs,
+ hfsplus_get_block, endio, endio_data);
}

static int hfsplus_writepages(struct address_space *mapping,
diff -urpN -X dontdiff a/fs/jfs/inode.c b/fs/jfs/inode.c
--- a/fs/jfs/inode.c 2007-01-12 20:26:06.000000000 -0800
+++ b/fs/jfs/inode.c 2007-01-12 20:42:37.000000000 -0800
@@ -287,14 +287,12 @@ static sector_t jfs_bmap(struct address_
return generic_block_bmap(mapping, block, jfs_get_block);
}

-static ssize_t jfs_direct_IO(int rw, struct kiocb *iocb,
- const struct iovec *iov, loff_t offset, unsigned long nr_segs)
+static ssize_t jfs_direct_IO(int rw, struct file *file,
+ const struct iovec *iov, loff_t offset, unsigned long nr_segs,
+ file_endio_t *endio, void *endio_data)
{
- struct file *file = iocb->ki_filp;
- struct inode *inode = file->f_mapping->host;
-
- return blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov,
- offset, nr_segs, jfs_get_block, NULL);
+ return blockdev_direct_IO(rw, file, iov, offset, nr_segs,
+ jfs_get_block, endio, endio_data);
}

const struct address_space_operations jfs_aops = {
diff -urpN -X dontdiff a/fs/nfs/direct.c b/fs/nfs/direct.c
--- a/fs/nfs/direct.c 2007-01-12 20:29:14.000000000 -0800
+++ b/fs/nfs/direct.c 2007-01-12 20:55:00.000000000 -0800
@@ -104,7 +104,7 @@ static inline int put_dreq(struct nfs_di
/**
* nfs_direct_IO - NFS address space operation for direct I/O
* @rw: direction (read or write)
- * @iocb: target I/O control block
+ * @file: target file
* @iov: array of vectors that define I/O buffer
* @pos: offset in file to begin the operation
* @nr_segs: size of iovec array
@@ -114,10 +114,12 @@ static inline int put_dreq(struct nfs_di
* read and write requests before the VFS gets them, so this method
* should never be called.
*/
-ssize_t nfs_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, loff_t pos, unsigned long nr_segs)
+ssize_t nfs_direct_IO(int rw, struct file *file, const struct iovec *iov,
+ loff_t pos, unsigned long nr_segs,
+ file_endio_t *endio, void *endio_data)
{
dprintk("NFS: nfs_direct_IO (%s) off/no(%Ld/%lu) EINVAL\n",
- iocb->ki_filp->f_path.dentry->d_name.name,
+ file->f_path.dentry->d_name.name,
(long long) pos, nr_segs);

return -EINVAL;
diff -urpN -X dontdiff a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
--- a/fs/ocfs2/aops.c 2007-01-12 20:57:42.000000000 -0800
+++ b/fs/ocfs2/aops.c 2007-01-12 20:42:37.000000000 -0800
@@ -611,12 +611,13 @@ static void ocfs2_dio_end_io(void *destr
}

static ssize_t ocfs2_direct_IO(int rw,
- struct kiocb *iocb,
+ struct file *file,
const struct iovec *iov,
loff_t offset,
- unsigned long nr_segs)
+ unsigned long nr_segs,
+ file_endio_t *endio,
+ void *endio_data)
{
- struct file *file = iocb->ki_filp;
struct inode *inode = file->f_path.dentry->d_inode->i_mapping->host;
int ret;

@@ -643,7 +644,7 @@ static ssize_t ocfs2_direct_IO(int rw,
ret = blockdev_direct_IO_no_locking(rw, file, iov, offset, nr_segs,
ocfs2_direct_IO_get_blocks,
ocfs2_dio_end_io, inode,
- aio_complete, iocb);
+ endio, endio_data);
out:
mlog_exit(ret);
return ret;
diff -urpN -X dontdiff a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
--- a/fs/reiserfs/inode.c 2007-01-12 20:26:06.000000000 -0800
+++ b/fs/reiserfs/inode.c 2007-01-12 20:42:37.000000000 -0800
@@ -2888,16 +2888,13 @@ static int reiserfs_releasepage(struct p

/* We thank Mingming Cao for helping us understand in great detail what
to do in this section of the code. */
-static ssize_t reiserfs_direct_IO(int rw, struct kiocb *iocb,
+static ssize_t reiserfs_direct_IO(int rw, struct file *file,
const struct iovec *iov, loff_t offset,
- unsigned long nr_segs)
+ unsigned long nr_segs, file_endio_t *endio,
+ void *endio_data)
{
- struct file *file = iocb->ki_filp;
- struct inode *inode = file->f_mapping->host;
-
- return blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov,
- offset, nr_segs,
- reiserfs_get_blocks_direct_io, NULL);
+ return blockdev_direct_IO(rw, file, iov, offset, nr_segs,
+ reiserfs_get_blocks_direct_io, endio, endio_data);
}

int reiserfs_setattr(struct dentry *dentry, struct iattr *attr)
diff -urpN -X dontdiff a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c
--- a/fs/xfs/linux-2.6/xfs_aops.c 2007-01-12 20:57:42.000000000 -0800
+++ b/fs/xfs/linux-2.6/xfs_aops.c 2007-01-12 20:42:38.000000000 -0800
@@ -1361,12 +1361,13 @@ xfs_end_io_direct(
STATIC ssize_t
xfs_vm_direct_IO(
int rw,
- struct kiocb *iocb,
+ struct file *file,
const struct iovec *iov,
loff_t offset,
- unsigned long nr_segs)
+ unsigned long nr_segs,
+ file_endio_t *endio,
+ void *endio_data)
{
- struct file *file = iocb->ki_filp;
struct inode *inode = file->f_mapping->host;
bhv_vnode_t *vp = vn_from_inode(inode);
xfs_iomap_t iomap;
@@ -1387,13 +1388,13 @@ xfs_vm_direct_IO(
iov, offset, nr_segs,
xfs_get_blocks_direct,
xfs_end_io_direct, ioend,
- aio_complete, iocb);
+ endio, endio_data);
} else {
ret = blockdev_direct_IO_no_locking(rw, file,
iov, offset, nr_segs,
xfs_get_blocks_direct,
xfs_end_io_direct, ioend,
- aio_complete, iocb);
+ endio, endio_data);
}

if (unlikely(ret < 0 && ret != -EIOCBQUEUED))
diff -urpN -X dontdiff a/fs/xfs/linux-2.6/xfs_lrw.c b/fs/xfs/linux-2.6/xfs_lrw.c
--- a/fs/xfs/linux-2.6/xfs_lrw.c 2007-01-12 20:26:07.000000000 -0800
+++ b/fs/xfs/linux-2.6/xfs_lrw.c 2007-01-12 20:42:38.000000000 -0800
@@ -836,8 +836,8 @@ retry:

xfs_rw_enter_trace(XFS_DIOWR_ENTER, io, (void *)iovp, segs,
*offset, ioflags);
- ret = generic_file_direct_write(iocb, iovp,
- &segs, pos, offset, count, ocount);
+ ret = generic_file_direct_write(file, iovp, &segs, pos,
+ offset, count, ocount, aio_complete, iocb);

/*
* direct-io write to a hole: fall through to buffered I/O
diff -urpN -X dontdiff a/include/linux/fs.h b/include/linux/fs.h
--- a/include/linux/fs.h 2007-01-12 20:57:42.000000000 -0800
+++ b/include/linux/fs.h 2007-01-12 20:42:38.000000000 -0800
@@ -309,6 +309,8 @@ typedef int (get_block_t)(struct inode *
typedef void (dio_iodone_t)(void *destructor_data, ssize_t bytes,
void *private);

+typedef void (file_endio_t)(void *endio_data, ssize_t count, int err);
+
/*
* Attribute flags. These should be or-ed together to figure out what
* has been changed!
@@ -421,8 +423,9 @@ struct address_space_operations {
sector_t (*bmap)(struct address_space *, sector_t);
void (*invalidatepage) (struct page *, unsigned long);
int (*releasepage) (struct page *, gfp_t);
- ssize_t (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
- loff_t offset, unsigned long nr_segs);
+ ssize_t (*direct_IO)(int rw, struct file *file, const struct iovec *iov,
+ loff_t offset, unsigned long nr_segs,
+ file_endio_t *endio, void *endio_data);
struct page* (*get_xip_page)(struct address_space *, sector_t,
int);
/* migrate the contents of a page to the specified target */
@@ -1109,8 +1112,6 @@ typedef int (*read_actor_t)(read_descrip
#define HAVE_COMPAT_IOCTL 1
#define HAVE_UNLOCKED_IOCTL 1

-typedef void (file_endio_t)(void *endio_data, ssize_t count, int err);
-
/*
* NOTE:
* read, write, poll, fsync, readv, writev, unlocked_ioctl and compat_ioctl
@@ -1772,8 +1773,9 @@ extern ssize_t generic_file_aio_read(str
extern ssize_t generic_file_aio_write(struct kiocb *, const struct iovec *, unsigned long, loff_t);
extern ssize_t generic_file_aio_write_nolock(struct kiocb *, const struct iovec *,
unsigned long, loff_t);
-extern ssize_t generic_file_direct_write(struct kiocb *, const struct iovec *,
- unsigned long *, loff_t, loff_t *, size_t, size_t);
+extern ssize_t generic_file_direct_write(struct file *, const struct iovec *,
+ unsigned long *, loff_t, loff_t *,
+ size_t, size_t, file_endio_t, void *);
extern ssize_t generic_file_buffered_write(struct kiocb *, const struct iovec *,
unsigned long, loff_t, loff_t *, size_t, ssize_t);
extern ssize_t do_sync_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos);
@@ -1844,16 +1846,12 @@ enum {
DIO_OWN_LOCKING, /* filesystem locks buffered and direct internally */
};

-static inline ssize_t blockdev_direct_IO(int rw, struct kiocb *iocb,
- struct inode *inode, struct block_device *bdev, const struct iovec *iov,
- loff_t offset, unsigned long nr_segs, get_block_t get_block,
- dio_iodone_t end_io)
-{
- struct file *file = iocb->ki_filp;
- file_endio_t *file_endio = &aio_complete;
- void *endio_data = iocb;
+static inline ssize_t blockdev_direct_IO(int rw, struct file *file,
+ const struct iovec *iov, loff_t offset, unsigned long nr_segs,
+ get_block_t get_block, file_endio_t file_endio, void *endio_data)
+{
return __blockdev_direct_IO(rw, file, iov, offset, nr_segs, get_block,
- end_io, NULL, file_endio, endio_data, DIO_LOCKING);
+ NULL, NULL, file_endio, endio_data, DIO_LOCKING);
}

static inline ssize_t blockdev_direct_IO_no_locking(int rw, struct file *file,
diff -urpN -X dontdiff a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
--- a/include/linux/nfs_fs.h 2007-01-12 20:29:14.000000000 -0800
+++ b/include/linux/nfs_fs.h 2007-01-12 20:42:38.000000000 -0800
@@ -367,8 +367,8 @@ extern int nfs3_removexattr (struct dent
/*
* linux/fs/nfs/direct.c
*/
-extern ssize_t nfs_direct_IO(int, struct kiocb *, const struct iovec *, loff_t,
- unsigned long);
+extern ssize_t nfs_direct_IO(int, struct file *, const struct iovec *, loff_t,
+ unsigned long, file_endio_t, void *);
extern ssize_t nfs_file_direct_read(struct file *file,
const struct iovec *iov, unsigned long nr_segs,
loff_t *pos, file_endio_t *endio, void *endio_data);
diff -urpN -X dontdiff a/mm/filemap.c b/mm/filemap.c
--- a/mm/filemap.c 2007-01-12 20:26:07.000000000 -0800
+++ b/mm/filemap.c 2007-01-12 20:42:38.000000000 -0800
@@ -41,8 +41,9 @@
#include <asm/mman.h>

static ssize_t
-generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
- loff_t offset, unsigned long nr_segs);
+generic_file_direct_IO(int rw, struct file *file, const struct iovec *iov,
+ loff_t offset, unsigned long nr_segs,
+ file_endio_t *endio, void *endio_data);

/*
* Shared mappings implemented 30.11.1994. It's not fully working yet,
@@ -1223,8 +1224,8 @@ generic_file_aio_read(struct kiocb *iocb
goto out; /* skip atime */
size = i_size_read(inode);
if (pos < size) {
- retval = generic_file_direct_IO(READ, iocb,
- iov, pos, nr_segs);
+ retval = generic_file_direct_IO(READ, filp, iov, pos,
+ nr_segs, aio_complete, iocb);
if (retval > 0)
*ppos = pos + retval;
}
@@ -2078,11 +2079,10 @@ inline int generic_write_checks(struct f
EXPORT_SYMBOL(generic_write_checks);

ssize_t
-generic_file_direct_write(struct kiocb *iocb, const struct iovec *iov,
- unsigned long *nr_segs, loff_t pos, loff_t *ppos,
- size_t count, size_t ocount)
+generic_file_direct_write(struct file *file, const struct iovec *iov,
+ unsigned long *nr_segs, loff_t pos, loff_t *ppos, size_t count,
+ size_t ocount, file_endio_t *endio, void *endio_data)
{
- struct file *file = iocb->ki_filp;
struct address_space *mapping = file->f_mapping;
struct inode *inode = mapping->host;
ssize_t written;
@@ -2090,7 +2090,8 @@ generic_file_direct_write(struct kiocb *
if (count != ocount)
*nr_segs = iov_shorten((struct iovec *)iov, *nr_segs, count);

- written = generic_file_direct_IO(WRITE, iocb, iov, pos, *nr_segs);
+ written = generic_file_direct_IO(WRITE, file, iov, pos,
+ *nr_segs, endio, endio_data);
if (written > 0) {
loff_t end = pos + written;
if (end > i_size_read(inode) && !S_ISBLK(inode->i_mode)) {
@@ -2343,8 +2344,9 @@ __generic_file_aio_write_nolock(struct k
loff_t endbyte;
ssize_t written_buffered;

- written = generic_file_direct_write(iocb, iov, &nr_segs, pos,
- ppos, count, ocount);
+ written = generic_file_direct_write(file, iov, &nr_segs, pos,
+ ppos, count, ocount,
+ aio_complete, iocb);
if (written < 0 || written == count)
goto out;
/*
@@ -2457,10 +2459,10 @@ EXPORT_SYMBOL(generic_file_aio_write);
* went wrong during pagecache shootdown.
*/
static ssize_t
-generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
- loff_t offset, unsigned long nr_segs)
+generic_file_direct_IO(int rw, struct file *file, const struct iovec *iov,
+ loff_t offset, unsigned long nr_segs,
+ file_endio_t *endio, void *endio_data)
{
- struct file *file = iocb->ki_filp;
struct address_space *mapping = file->f_mapping;
ssize_t retval;
size_t write_len = 0;
@@ -2478,8 +2480,8 @@ generic_file_direct_IO(int rw, struct ki

retval = filemap_write_and_wait(mapping);
if (retval == 0) {
- retval = mapping->a_ops->direct_IO(rw, iocb, iov,
- offset, nr_segs);
+ retval = mapping->a_ops->direct_IO(rw, file, iov, offset,
+ nr_segs, endio, endio_data);
if (rw == WRITE && mapping->nrpages) {
pgoff_t end = (offset + write_len - 1)
>> PAGE_CACHE_SHIFT;

2007-01-16 02:38:01

by Nate Diller

[permalink] [raw]
Subject: [PATCH -mm 2/10][RFC] aio: net use struct socket for io

Remove unused arg from socket operations

The sendmsg and recvmsg socket operations take a kiocb pointer, but none of
the functions actually use it. There's really no need even theoretically,
it's really quite ugly having it there at all. Also, removing it will pave
the way for a more generic completion path in the file_operations.

---

drivers/net/pppoe.c | 8 +++----
include/linux/net.h | 18 +++++++----------
include/net/bluetooth/bluetooth.h | 2 -
include/net/inet_common.h | 3 --
include/net/sock.h | 19 ++++++++----------
include/net/tcp.h | 6 ++---
include/net/udp.h | 3 --
net/appletalk/ddp.c | 5 +---
net/atm/common.c | 6 +----
net/atm/common.h | 7 ++----
net/ax25/af_ax25.c | 7 ++----
net/bluetooth/af_bluetooth.c | 4 +--
net/bluetooth/hci_sock.c | 7 ++----
net/bluetooth/l2cap.c | 2 -
net/bluetooth/rfcomm/sock.c | 8 +++----
net/bluetooth/sco.c | 3 --
net/core/sock.c | 12 ++++-------
net/dccp/dccp.h | 8 +++----
net/dccp/probe.c | 3 --
net/dccp/proto.c | 7 ++----
net/decnet/af_decnet.c | 7 ++----
net/econet/af_econet.c | 7 ++----
net/ipv4/af_inet.c | 5 +---
net/ipv4/raw.c | 8 ++-----
net/ipv4/tcp.c | 7 ++----
net/ipv4/tcp_probe.c | 3 --
net/ipv4/udp.c | 9 +++-----
net/ipv4/udp_impl.h | 2 -
net/ipv6/raw.c | 6 +----
net/ipv6/udp.c | 10 +++------
net/ipv6/udp_impl.h | 6 +----
net/ipx/af_ipx.c | 7 ++----
net/irda/af_irda.c | 29 +++++++++++++---------------
net/key/af_key.c | 6 +----
net/llc/af_llc.c | 7 ++----
net/netlink/af_netlink.c | 6 +----
net/netrom/af_netrom.c | 7 ++----
net/packet/af_packet.c | 11 ++++------
net/rose/af_rose.c | 7 ++----
net/sctp/socket.c | 9 +++-----
net/socket.c | 32 ++++++-------------------------
net/tipc/socket.c | 28 +++++++++------------------
net/unix/af_unix.c | 39 +++++++++++++++-----------------------
net/wanrouter/af_wanpipe.c | 7 ++----
net/x25/af_x25.c | 6 +----
45 files changed, 166 insertions(+), 243 deletions(-)

---

diff -urpN -X dontdiff a/drivers/net/pppoe.c b/drivers/net/pppoe.c
--- a/drivers/net/pppoe.c 2007-01-12 11:18:47.244855016 -0800
+++ b/drivers/net/pppoe.c 2007-01-12 11:29:21.179177108 -0800
@@ -746,8 +746,8 @@ static int pppoe_ioctl(struct socket *so
}


-static int pppoe_sendmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *m, size_t total_len)
+static int pppoe_sendmsg(struct socket *sock, struct msghdr *m,
+ size_t total_len)
{
struct sk_buff *skb = NULL;
struct sock *sk = sock->sk;
@@ -912,8 +912,8 @@ static struct ppp_channel_ops pppoe_chan
.start_xmit = pppoe_xmit,
};

-static int pppoe_recvmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *m, size_t total_len, int flags)
+static int pppoe_recvmsg(struct socket *sock, struct msghdr *m,
+ size_t total_len, int flags)
{
struct sock *sk = sock->sk;
struct sk_buff *skb = NULL;
diff -urpN -X dontdiff a/include/linux/net.h b/include/linux/net.h
--- a/include/linux/net.h 2007-01-12 11:18:56.683629587 -0800
+++ b/include/linux/net.h 2007-01-12 11:29:21.185175058 -0800
@@ -118,7 +118,6 @@ struct socket {

struct vm_area_struct;
struct page;
-struct kiocb;
struct sockaddr;
struct msghdr;
struct module;
@@ -156,11 +155,10 @@ struct proto_ops {
int optname, char __user *optval, int optlen);
int (*compat_getsockopt)(struct socket *sock, int level,
int optname, char __user *optval, int __user *optlen);
- int (*sendmsg) (struct kiocb *iocb, struct socket *sock,
- struct msghdr *m, size_t total_len);
- int (*recvmsg) (struct kiocb *iocb, struct socket *sock,
- struct msghdr *m, size_t total_len,
- int flags);
+ int (*sendmsg) (struct socket *sock, struct msghdr *m,
+ size_t total_len);
+ int (*recvmsg) (struct socket *sock, struct msghdr *m,
+ size_t total_len, int flags);
int (*mmap) (struct file *file, struct socket *sock,
struct vm_area_struct * vma);
ssize_t (*sendpage) (struct socket *sock, struct page *page,
@@ -276,10 +274,10 @@ SOCKCALL_WRAP(name, setsockopt, (struct
char __user *optval, int optlen), (sock, level, optname, optval, optlen)) \
SOCKCALL_WRAP(name, getsockopt, (struct socket *sock, int level, int optname, \
char __user *optval, int __user *optlen), (sock, level, optname, optval, optlen)) \
-SOCKCALL_WRAP(name, sendmsg, (struct kiocb *iocb, struct socket *sock, struct msghdr *m, size_t len), \
- (iocb, sock, m, len)) \
-SOCKCALL_WRAP(name, recvmsg, (struct kiocb *iocb, struct socket *sock, struct msghdr *m, size_t len, int flags), \
- (iocb, sock, m, len, flags)) \
+SOCKCALL_WRAP(name, sendmsg, (struct socket *sock, struct msghdr *m, size_t len), \
+ (sock, m, len)) \
+SOCKCALL_WRAP(name, recvmsg, (struct socket *sock, struct msghdr *m, size_t len, int flags), \
+ (sock, m, len, flags)) \
SOCKCALL_WRAP(name, mmap, (struct file *file, struct socket *sock, struct vm_area_struct *vma), \
(file, sock, vma)) \
\
diff -urpN -X dontdiff a/include/net/bluetooth/bluetooth.h b/include/net/bluetooth/bluetooth.h
--- a/include/net/bluetooth/bluetooth.h 2006-11-29 13:57:37.000000000 -0800
+++ b/include/net/bluetooth/bluetooth.h 2007-01-12 11:29:21.191173008 -0800
@@ -119,7 +119,7 @@ int bt_sock_register(int proto, struct
int bt_sock_unregister(int proto);
void bt_sock_link(struct bt_sock_list *l, struct sock *s);
void bt_sock_unlink(struct bt_sock_list *l, struct sock *s);
-int bt_sock_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg, size_t len, int flags);
+int bt_sock_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, int flags);
uint bt_sock_poll(struct file * file, struct socket *sock, poll_table *wait);
int bt_sock_wait_state(struct sock *sk, int state, unsigned long timeo);

diff -urpN -X dontdiff a/include/net/inet_common.h b/include/net/inet_common.h
--- a/include/net/inet_common.h 2006-11-29 13:57:37.000000000 -0800
+++ b/include/net/inet_common.h 2007-01-12 11:29:21.196171300 -0800
@@ -25,8 +25,7 @@ extern int inet_dgram_connect(struct s
int addr_len, int flags);
extern int inet_accept(struct socket *sock,
struct socket *newsock, int flags);
-extern int inet_sendmsg(struct kiocb *iocb,
- struct socket *sock,
+extern int inet_sendmsg(struct socket *sock,
struct msghdr *msg,
size_t size);
extern int inet_shutdown(struct socket *sock, int how);
diff -urpN -X dontdiff a/include/net/sock.h b/include/net/sock.h
--- a/include/net/sock.h 2007-01-12 11:28:23.510888256 -0800
+++ b/include/net/sock.h 2007-01-12 11:29:21.201169591 -0800
@@ -535,10 +535,9 @@ struct proto {
int level,
int optname, char __user *optval,
int __user *option);
- int (*sendmsg)(struct kiocb *iocb, struct sock *sk,
- struct msghdr *msg, size_t len);
- int (*recvmsg)(struct kiocb *iocb, struct sock *sk,
- struct msghdr *msg,
+ int (*sendmsg)(struct sock *sk, struct msghdr *msg,
+ size_t len);
+ int (*recvmsg)(struct sock *sk, struct msghdr *msg,
size_t len, int noblock, int flags,
int *addr_len);
int (*sendpage)(struct sock *sk, struct page *page,
@@ -813,10 +812,10 @@ extern int sock_no_getsockopt(struct s
char __user *, int __user *);
extern int sock_no_setsockopt(struct socket *, int, int,
char __user *, int);
-extern int sock_no_sendmsg(struct kiocb *, struct socket *,
- struct msghdr *, size_t);
-extern int sock_no_recvmsg(struct kiocb *, struct socket *,
- struct msghdr *, size_t, int);
+extern int sock_no_sendmsg(struct socket *, struct msghdr *,
+ size_t);
+extern int sock_no_recvmsg(struct socket *, struct msghdr *,
+ size_t, int);
extern int sock_no_mmap(struct file *file,
struct socket *sock,
struct vm_area_struct *vma);
@@ -831,8 +830,8 @@ extern ssize_t sock_no_sendpage(struct
*/
extern int sock_common_getsockopt(struct socket *sock, int level, int optname,
char __user *optval, int __user *optlen);
-extern int sock_common_recvmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t size, int flags);
+extern int sock_common_recvmsg(struct socket *sock, struct msghdr *msg,
+ size_t size, int flags);
extern int sock_common_setsockopt(struct socket *sock, int level, int optname,
char __user *optval, int optlen);
extern int compat_sock_common_getsockopt(struct socket *sock, int level,
diff -urpN -X dontdiff a/include/net/tcp.h b/include/net/tcp.h
--- a/include/net/tcp.h 2007-01-12 11:18:57.743267486 -0800
+++ b/include/net/tcp.h 2007-01-12 11:29:21.206167883 -0800
@@ -273,8 +273,8 @@ extern int tcp_v4_remember_stamp(struc

extern int tcp_v4_tw_remember_stamp(struct inet_timewait_sock *tw);

-extern int tcp_sendmsg(struct kiocb *iocb, struct sock *sk,
- struct msghdr *msg, size_t size);
+extern int tcp_sendmsg(struct sock *sk, struct msghdr *msg,
+ size_t size);
extern ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset, size_t size, int flags);

extern int tcp_ioctl(struct sock *sk,
@@ -364,7 +364,7 @@ extern int compat_tcp_setsockopt(struc
int level, int optname,
char __user *optval, int optlen);
extern void tcp_set_keepalive(struct sock *sk, int val);
-extern int tcp_recvmsg(struct kiocb *iocb, struct sock *sk,
+extern int tcp_recvmsg(struct sock *sk,
struct msghdr *msg,
size_t len, int nonblock,
int flags, int *addr_len);
diff -urpN -X dontdiff a/include/net/udp.h b/include/net/udp.h
--- a/include/net/udp.h 2007-01-12 11:18:57.766259629 -0800
+++ b/include/net/udp.h 2007-01-12 11:29:21.210166516 -0800
@@ -127,8 +127,7 @@ extern int udp_get_port(struct sock *sk,
int (*saddr_cmp)(const struct sock *, const struct sock *));
extern void udp_err(struct sk_buff *, u32);

-extern int udp_sendmsg(struct kiocb *iocb, struct sock *sk,
- struct msghdr *msg, size_t len);
+extern int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len);

extern int udp_rcv(struct sk_buff *skb);
extern int udp_ioctl(struct sock *sk, int cmd, unsigned long arg);
diff -urpN -X dontdiff a/net/appletalk/ddp.c b/net/appletalk/ddp.c
--- a/net/appletalk/ddp.c 2007-01-12 11:18:58.625965849 -0800
+++ b/net/appletalk/ddp.c 2007-01-12 11:29:21.215164808 -0800
@@ -1527,8 +1527,7 @@ freeit:
return 0;
}

-static int atalk_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
- size_t len)
+static int atalk_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
{
struct sock *sk = sock->sk;
struct atalk_sock *at = at_sk(sk);
@@ -1688,7 +1687,7 @@ static int atalk_sendmsg(struct kiocb *i
return len;
}

-static int atalk_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
+static int atalk_recvmsg(struct socket *sock, struct msghdr *msg,
size_t size, int flags)
{
struct sock *sk = sock->sk;
diff -urpN -X dontdiff a/net/atm/common.c b/net/atm/common.c
--- a/net/atm/common.c 2007-01-12 11:19:49.454595039 -0800
+++ b/net/atm/common.c 2007-01-12 11:29:21.220163099 -0800
@@ -472,8 +472,7 @@ int vcc_connect(struct socket *sock, int
}


-int vcc_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
- size_t size, int flags)
+int vcc_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, int flags)
{
struct sock *sk = sock->sk;
struct atm_vcc *vcc;
@@ -511,8 +510,7 @@ int vcc_recvmsg(struct kiocb *iocb, stru
}


-int vcc_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m,
- size_t total_len)
+int vcc_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
{
struct sock *sk = sock->sk;
DEFINE_WAIT(wait);
diff -urpN -X dontdiff a/net/atm/common.h b/net/atm/common.h
--- a/net/atm/common.h 2006-11-29 13:57:37.000000000 -0800
+++ b/net/atm/common.h 2007-01-12 11:29:21.224161733 -0800
@@ -13,10 +13,9 @@
int vcc_create(struct socket *sock, int protocol, int family);
int vcc_release(struct socket *sock);
int vcc_connect(struct socket *sock, int itf, short vpi, int vci);
-int vcc_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
- size_t size, int flags);
-int vcc_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m,
- size_t total_len);
+int vcc_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
+ int flags);
+int vcc_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len);
unsigned int vcc_poll(struct file *file, struct socket *sock, poll_table *wait);
int vcc_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg);
int vcc_setsockopt(struct socket *sock, int level, int optname,
diff -urpN -X dontdiff a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
--- a/net/ax25/af_ax25.c 2007-01-12 11:18:58.769916658 -0800
+++ b/net/ax25/af_ax25.c 2007-01-12 11:29:21.229160024 -0800
@@ -1417,8 +1417,7 @@ out:
return err;
}

-static int ax25_sendmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t len)
+static int ax25_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
{
struct sockaddr_ax25 *usax = (struct sockaddr_ax25 *)msg->msg_name;
struct sock *sk = sock->sk;
@@ -1604,8 +1603,8 @@ out:
return err;
}

-static int ax25_recvmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t size, int flags)
+static int ax25_recvmsg(struct socket *sock, struct msghdr *msg,
+ size_t size, int flags)
{
struct sock *sk = sock->sk;
struct sk_buff *skb;
diff -urpN -X dontdiff a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
--- a/net/bluetooth/af_bluetooth.c 2006-11-29 13:57:37.000000000 -0800
+++ b/net/bluetooth/af_bluetooth.c 2007-01-12 11:29:21.234158316 -0800
@@ -193,8 +193,8 @@ struct sock *bt_accept_dequeue(struct so
}
EXPORT_SYMBOL(bt_accept_dequeue);

-int bt_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t len, int flags)
+int bt_sock_recvmsg(struct socket *sock, struct msghdr *msg,
+ size_t len, int flags)
{
int noblock = flags & MSG_DONTWAIT;
struct sock *sk = sock->sk;
diff -urpN -X dontdiff a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
--- a/net/bluetooth/hci_sock.c 2007-01-12 11:19:49.548562920 -0800
+++ b/net/bluetooth/hci_sock.c 2007-01-12 11:29:21.238156949 -0800
@@ -348,8 +348,8 @@ static inline void hci_sock_cmsg(struct
}
}

-static int hci_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t len, int flags)
+static int hci_sock_recvmsg(struct socket *sock, struct msghdr *msg,
+ size_t len, int flags)
{
int noblock = flags & MSG_DONTWAIT;
struct sock *sk = sock->sk;
@@ -385,8 +385,7 @@ static int hci_sock_recvmsg(struct kiocb
return err ? : copied;
}

-static int hci_sock_sendmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t len)
+static int hci_sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
{
struct sock *sk = sock->sk;
struct hci_dev *hdev;
diff -urpN -X dontdiff a/net/bluetooth/l2cap.c b/net/bluetooth/l2cap.c
--- a/net/bluetooth/l2cap.c 2007-01-12 11:18:58.824897870 -0800
+++ b/net/bluetooth/l2cap.c 2007-01-12 11:29:21.243155241 -0800
@@ -906,7 +906,7 @@ fail:
return err;
}

-static int l2cap_sock_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg, size_t len)
+static int l2cap_sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
{
struct sock *sk = sock->sk;
int err = 0;
diff -urpN -X dontdiff a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
--- a/net/bluetooth/rfcomm/sock.c 2007-01-12 11:19:49.562558136 -0800
+++ b/net/bluetooth/rfcomm/sock.c 2007-01-12 11:29:21.250152849 -0800
@@ -551,8 +551,8 @@ static int rfcomm_sock_getname(struct so
return 0;
}

-static int rfcomm_sock_sendmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t len)
+static int rfcomm_sock_sendmsg(struct socket *sock, struct msghdr *msg,
+ size_t len)
{
struct sock *sk = sock->sk;
struct rfcomm_dlc *d = rfcomm_pi(sk)->dlc;
@@ -631,8 +631,8 @@ static long rfcomm_sock_data_wait(struct
return timeo;
}

-static int rfcomm_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t size, int flags)
+static int rfcomm_sock_recvmsg(struct socket *sock, struct msghdr *msg,
+ size_t size, int flags)
{
struct sock *sk = sock->sk;
int err = 0;
diff -urpN -X dontdiff a/net/bluetooth/sco.c b/net/bluetooth/sco.c
--- a/net/bluetooth/sco.c 2006-11-29 13:57:37.000000000 -0800
+++ b/net/bluetooth/sco.c 2007-01-12 11:29:21.268146698 -0800
@@ -627,8 +627,7 @@ static int sco_sock_getname(struct socke
return 0;
}

-static int sco_sock_sendmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t len)
+static int sco_sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
{
struct sock *sk = sock->sk;
int err = 0;
diff -urpN -X dontdiff a/net/core/sock.c b/net/core/sock.c
--- a/net/core/sock.c 2007-01-12 11:18:59.001837406 -0800
+++ b/net/core/sock.c 2007-01-12 11:29:21.274144648 -0800
@@ -1352,14 +1352,12 @@ int sock_no_getsockopt(struct socket *so
return -EOPNOTSUPP;
}

-int sock_no_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m,
- size_t len)
+int sock_no_sendmsg(struct socket *sock, struct msghdr *m, size_t len)
{
return -EOPNOTSUPP;
}

-int sock_no_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m,
- size_t len, int flags)
+int sock_no_recvmsg(struct socket *sock, struct msghdr *m, size_t len, int flags)
{
return -EOPNOTSUPP;
}
@@ -1605,14 +1603,14 @@ int compat_sock_common_getsockopt(struct
EXPORT_SYMBOL(compat_sock_common_getsockopt);
#endif

-int sock_common_recvmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t size, int flags)
+int sock_common_recvmsg(struct socket *sock, struct msghdr *msg,
+ size_t size, int flags)
{
struct sock *sk = sock->sk;
int addr_len = 0;
int err;

- err = sk->sk_prot->recvmsg(iocb, sk, msg, size, flags & MSG_DONTWAIT,
+ err = sk->sk_prot->recvmsg(sk, msg, size, flags & MSG_DONTWAIT,
flags & ~MSG_DONTWAIT, &addr_len);
if (err >= 0)
msg->msg_namelen = addr_len;
diff -urpN -X dontdiff a/net/dccp/dccp.h b/net/dccp/dccp.h
--- a/net/dccp/dccp.h 2007-01-12 11:18:59.097804612 -0800
+++ b/net/dccp/dccp.h 2007-01-12 11:29:21.278143282 -0800
@@ -257,10 +257,10 @@ extern int compat_dccp_setsockopt(str
char __user *optval, int optlen);
#endif
extern int dccp_ioctl(struct sock *sk, int cmd, unsigned long arg);
-extern int dccp_sendmsg(struct kiocb *iocb, struct sock *sk,
- struct msghdr *msg, size_t size);
-extern int dccp_recvmsg(struct kiocb *iocb, struct sock *sk,
- struct msghdr *msg, size_t len, int nonblock,
+extern int dccp_sendmsg(struct sock *sk, struct msghdr *msg,
+ size_t size);
+extern int dccp_recvmsg(struct sock *sk, struct msghdr *msg,
+ size_t len, int nonblock,
int flags, int *addr_len);
extern void dccp_shutdown(struct sock *sk, int how);
extern int inet_dccp_listen(struct socket *sock, int backlog);
diff -urpN -X dontdiff a/net/dccp/probe.c b/net/dccp/probe.c
--- a/net/dccp/probe.c 2007-01-12 11:18:59.137790948 -0800
+++ b/net/dccp/probe.c 2007-01-12 11:29:21.282141915 -0800
@@ -75,8 +75,7 @@ static void printl(const char *fmt, ...)
wake_up(&dccpw.wait);
}

-static int jdccp_sendmsg(struct kiocb *iocb, struct sock *sk,
- struct msghdr *msg, size_t size)
+static int jdccp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
{
const struct dccp_minisock *dmsk = dccp_msk(sk);
const struct inet_sock *inet = inet_sk(sk);
diff -urpN -X dontdiff a/net/dccp/proto.c b/net/dccp/proto.c
--- a/net/dccp/proto.c 2007-01-12 11:18:59.142789240 -0800
+++ b/net/dccp/proto.c 2007-01-12 11:29:21.287140206 -0800
@@ -634,8 +634,7 @@ int compat_dccp_getsockopt(struct sock *
EXPORT_SYMBOL_GPL(compat_dccp_getsockopt);
#endif

-int dccp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
- size_t len)
+int dccp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
{
const struct dccp_sock *dp = dccp_sk(sk);
const int flags = msg->msg_flags;
@@ -690,8 +689,8 @@ out_discard:

EXPORT_SYMBOL_GPL(dccp_sendmsg);

-int dccp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
- size_t len, int nonblock, int flags, int *addr_len)
+int dccp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
+ int flags, int *addr_len)
{
const struct dccp_hdr *dh;
long timeo;
diff -urpN -X dontdiff a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c
--- a/net/decnet/af_decnet.c 2006-11-29 13:57:37.000000000 -0800
+++ b/net/decnet/af_decnet.c 2007-01-12 11:29:21.293138156 -0800
@@ -1666,8 +1666,8 @@ static int dn_data_ready(struct sock *sk
}


-static int dn_recvmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t size, int flags)
+static int dn_recvmsg(struct socket *sock, struct msghdr *msg,
+ size_t size, int flags)
{
struct sock *sk = sock->sk;
struct dn_scp *scp = DN_SK(sk);
@@ -1903,8 +1903,7 @@ static inline struct sk_buff *dn_alloc_s
return skb;
}

-static int dn_sendmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t size)
+static int dn_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
{
struct sock *sk = sock->sk;
struct dn_scp *scp = DN_SK(sk);
diff -urpN -X dontdiff a/net/econet/af_econet.c b/net/econet/af_econet.c
--- a/net/econet/af_econet.c 2007-01-12 11:19:49.657525676 -0800
+++ b/net/econet/af_econet.c 2007-01-12 11:29:21.298136448 -0800
@@ -114,8 +114,8 @@ static void econet_insert_socket(struct
* If necessary we block.
*/

-static int econet_recvmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t len, int flags)
+static int econet_recvmsg(struct socket *sock, struct msghdr *msg,
+ size_t len, int flags)
{
struct sock *sk = sock->sk;
struct sk_buff *skb;
@@ -260,8 +260,7 @@ static void ec_tx_done(struct sk_buff *s
* and hence whether to use real Econet or the UDP emulation.
*/

-static int econet_sendmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t len)
+static int econet_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
{
struct sock *sk = sock->sk;
struct sockaddr_ec *saddr=(struct sockaddr_ec *)msg->msg_name;
diff -urpN -X dontdiff a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
--- a/net/ipv4/af_inet.c 2007-01-12 11:19:49.671520892 -0800
+++ b/net/ipv4/af_inet.c 2007-01-12 11:29:21.303134739 -0800
@@ -655,8 +655,7 @@ int inet_getname(struct socket *sock, st
return 0;
}

-int inet_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
- size_t size)
+int inet_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
{
struct sock *sk = sock->sk;

@@ -664,7 +663,7 @@ int inet_sendmsg(struct kiocb *iocb, str
if (!inet_sk(sk)->num && inet_autobind(sk))
return -EAGAIN;

- return sk->sk_prot->sendmsg(iocb, sk, msg, size);
+ return sk->sk_prot->sendmsg(sk, msg, size);
}


diff -urpN -X dontdiff a/net/ipv4/raw.c b/net/ipv4/raw.c
--- a/net/ipv4/raw.c 2007-01-12 11:18:59.767575737 -0800
+++ b/net/ipv4/raw.c 2007-01-12 11:29:21.307133373 -0800
@@ -48,7 +48,6 @@
#include <linux/stddef.h>
#include <linux/slab.h>
#include <linux/errno.h>
-#include <linux/aio.h>
#include <linux/kernel.h>
#include <linux/spinlock.h>
#include <linux/sockios.h>
@@ -376,8 +375,7 @@ static int raw_probe_proto_opt(struct fl
return 0;
}

-static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
- size_t len)
+static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
{
struct inet_sock *inet = inet_sk(sk);
struct ipcm_cookie ipc;
@@ -574,8 +572,8 @@ out: return ret;
* we return it, otherwise we block.
*/

-static int raw_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
- size_t len, int noblock, int flags, int *addr_len)
+static int raw_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
+ int noblock, int flags, int *addr_len)
{
struct inet_sock *inet = inet_sk(sk);
size_t copied = 0;
diff -urpN -X dontdiff a/net/ipv4/tcp.c b/net/ipv4/tcp.c
--- a/net/ipv4/tcp.c 2007-01-12 11:19:49.806474764 -0800
+++ b/net/ipv4/tcp.c 2007-01-12 11:29:21.313131323 -0800
@@ -658,8 +658,7 @@ static inline int select_size(struct soc
return tmp;
}

-int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
- size_t size)
+int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
{
struct iovec *iov;
struct tcp_sock *tp = tcp_sk(sk);
@@ -1097,8 +1096,8 @@ int tcp_read_sock(struct sock *sk, read_
* Probably, code can be easily improved even more.
*/

-int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
- size_t len, int nonblock, int flags, int *addr_len)
+int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
+ int nonblock, int flags, int *addr_len)
{
struct tcp_sock *tp = tcp_sk(sk);
int copied = 0;
diff -urpN -X dontdiff a/net/ipv4/tcp_probe.c b/net/ipv4/tcp_probe.c
--- a/net/ipv4/tcp_probe.c 2006-11-29 13:57:37.000000000 -0800
+++ b/net/ipv4/tcp_probe.c 2007-01-12 11:29:21.319129273 -0800
@@ -78,8 +78,7 @@ static void printl(const char *fmt, ...)
wake_up(&tcpw.wait);
}

-static int jtcp_sendmsg(struct kiocb *iocb, struct sock *sk,
- struct msghdr *msg, size_t size)
+static int jtcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
{
const struct tcp_sock *tp = tcp_sk(sk);
const struct inet_sock *inet = inet_sk(sk);
diff -urpN -X dontdiff a/net/ipv4/udp.c b/net/ipv4/udp.c
--- a/net/ipv4/udp.c 2007-01-12 11:18:59.882536452 -0800
+++ b/net/ipv4/udp.c 2007-01-12 11:29:21.324127564 -0800
@@ -504,8 +504,7 @@ out:
return err;
}

-int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
- size_t len)
+int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
{
struct inet_sock *inet = inet_sk(sk);
struct udp_sock *up = udp_sk(sk);
@@ -723,7 +722,7 @@ int udp_sendpage(struct sock *sk, struct
* sendpage interface can't pass.
* This will succeed only when the socket is connected.
*/
- ret = udp_sendmsg(NULL, sk, &msg, 0);
+ ret = udp_sendmsg(sk, &msg, 0);
if (ret < 0)
return ret;
}
@@ -803,8 +802,8 @@ int udp_ioctl(struct sock *sk, int cmd,
* return it, otherwise we block.
*/

-int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
- size_t len, int noblock, int flags, int *addr_len)
+int udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
+ int noblock, int flags, int *addr_len)
{
struct inet_sock *inet = inet_sk(sk);
struct sockaddr_in *sin = (struct sockaddr_in *)msg->msg_name;
diff -urpN -X dontdiff a/net/ipv4/udp_impl.h b/net/ipv4/udp_impl.h
--- a/net/ipv4/udp_impl.h 2007-01-12 11:18:59.885535427 -0800
+++ b/net/ipv4/udp_impl.h 2007-01-12 11:29:21.328126197 -0800
@@ -25,7 +25,7 @@ extern int compat_udp_setsockopt(struct
extern int compat_udp_getsockopt(struct sock *sk, int level, int optname,
char __user *optval, int __user *optlen);
#endif
-extern int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
+extern int udp_recvmsg(struct sock *sk, struct msghdr *msg,
size_t len, int noblock, int flags, int *addr_len);
extern int udp_sendpage(struct sock *sk, struct page *page, int offset,
size_t size, int flags);
diff -urpN -X dontdiff a/net/ipv6/raw.c b/net/ipv6/raw.c
--- a/net/ipv6/raw.c 2007-01-12 11:19:49.875451187 -0800
+++ b/net/ipv6/raw.c 2007-01-12 11:29:21.333124489 -0800
@@ -391,8 +391,7 @@ int rawv6_rcv(struct sock *sk, struct sk
* we return it, otherwise we block.
*/

-static int rawv6_recvmsg(struct kiocb *iocb, struct sock *sk,
- struct msghdr *msg, size_t len,
+static int rawv6_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
int noblock, int flags, int *addr_len)
{
struct ipv6_pinfo *np = inet6_sk(sk);
@@ -667,8 +666,7 @@ static int rawv6_probe_proto_opt(struct
return 0;
}

-static int rawv6_sendmsg(struct kiocb *iocb, struct sock *sk,
- struct msghdr *msg, size_t len)
+static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
{
struct ipv6_txoptions opt_space;
struct sockaddr_in6 * sin6 = (struct sockaddr_in6 *) msg->msg_name;
diff -urpN -X dontdiff a/net/ipv6/udp.c b/net/ipv6/udp.c
--- a/net/ipv6/udp.c 2007-01-12 11:19:49.884448112 -0800
+++ b/net/ipv6/udp.c 2007-01-12 11:29:21.341121755 -0800
@@ -113,8 +113,7 @@ static struct sock *__udp6_lib_lookup(st
* return it, otherwise we block.
*/

-int udpv6_recvmsg(struct kiocb *iocb, struct sock *sk,
- struct msghdr *msg, size_t len,
+int udpv6_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
int noblock, int flags, int *addr_len)
{
struct ipv6_pinfo *np = inet6_sk(sk);
@@ -545,8 +544,7 @@ out:
return err;
}

-int udpv6_sendmsg(struct kiocb *iocb, struct sock *sk,
- struct msghdr *msg, size_t len)
+int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
{
struct ipv6_txoptions opt_space;
struct udp_sock *up = udp_sk(sk);
@@ -607,12 +605,12 @@ int udpv6_sendmsg(struct kiocb *iocb, st
do_udp_sendmsg:
if (__ipv6_only_sock(sk))
return -ENETUNREACH;
- return udp_sendmsg(iocb, sk, msg, len);
+ return udp_sendmsg(sk, msg, len);
}
}

if (up->pending == AF_INET)
- return udp_sendmsg(iocb, sk, msg, len);
+ return udp_sendmsg(sk, msg, len);

/* Rough check on arithmetic overflow,
better check is made in ip6_build_xmit
diff -urpN -X dontdiff a/net/ipv6/udp_impl.h b/net/ipv6/udp_impl.h
--- a/net/ipv6/udp_impl.h 2007-01-12 11:19:00.080468814 -0800
+++ b/net/ipv6/udp_impl.h 2007-01-12 11:29:21.346120047 -0800
@@ -20,10 +20,8 @@ extern int compat_udpv6_setsockopt(struc
extern int compat_udpv6_getsockopt(struct sock *sk, int level, int optname,
char __user *optval, int __user *optlen);
#endif
-extern int udpv6_sendmsg(struct kiocb *iocb, struct sock *sk,
- struct msghdr *msg, size_t len);
-extern int udpv6_recvmsg(struct kiocb *iocb, struct sock *sk,
- struct msghdr *msg, size_t len,
+extern int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len);
+extern int udpv6_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
int noblock, int flags, int *addr_len);
extern int udpv6_queue_rcv_skb(struct sock * sk, struct sk_buff *skb);
extern int udpv6_destroy_sock(struct sock *sk);
diff -urpN -X dontdiff a/net/ipx/af_ipx.c b/net/ipx/af_ipx.c
--- a/net/ipx/af_ipx.c 2006-11-29 13:57:37.000000000 -0800
+++ b/net/ipx/af_ipx.c 2007-01-12 11:29:21.351118339 -0800
@@ -1692,8 +1692,7 @@ out:
return rc;
}

-static int ipx_sendmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t len)
+static int ipx_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
{
struct sock *sk = sock->sk;
struct ipx_sock *ipxs = ipx_sk(sk);
@@ -1757,8 +1756,8 @@ out:
}


-static int ipx_recvmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t size, int flags)
+static int ipx_recvmsg(struct socket *sock, struct msghdr *msg,
+ size_t size, int flags)
{
struct sock *sk = sock->sk;
struct ipx_sock *ipxs = ipx_sk(sk);
diff -urpN -X dontdiff a/net/irda/af_irda.c b/net/irda/af_irda.c
--- a/net/irda/af_irda.c 2006-11-29 13:57:37.000000000 -0800
+++ b/net/irda/af_irda.c 2007-01-12 11:29:21.357116288 -0800
@@ -1263,14 +1263,13 @@ static int irda_release(struct socket *s
}

/*
- * Function irda_sendmsg (iocb, sock, msg, len)
+ * Function irda_sendmsg (sock, msg, len)
*
* Send message down to TinyTP. This function is used for both STREAM and
* SEQPACK services. This is possible since it forces the client to
* fragment the message if necessary
*/
-static int irda_sendmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t len)
+static int irda_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
{
struct sock *sk = sock->sk;
struct irda_sock *self;
@@ -1340,13 +1339,13 @@ static int irda_sendmsg(struct kiocb *io
}

/*
- * Function irda_recvmsg_dgram (iocb, sock, msg, size, flags)
+ * Function irda_recvmsg_dgram (sock, msg, size, flags)
*
* Try to receive message and copy it to user. The frame is discarded
* after being read, regardless of how much the user actually read
*/
-static int irda_recvmsg_dgram(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t size, int flags)
+static int irda_recvmsg_dgram(struct socket *sock, struct msghdr *msg,
+ size_t size, int flags)
{
struct sock *sk = sock->sk;
struct irda_sock *self = irda_sk(sk);
@@ -1395,10 +1394,10 @@ static int irda_recvmsg_dgram(struct kio
}

/*
- * Function irda_recvmsg_stream (iocb, sock, msg, size, flags)
+ * Function irda_recvmsg_stream (sock, msg, size, flags)
*/
-static int irda_recvmsg_stream(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t size, int flags)
+static int irda_recvmsg_stream(struct socket *sock, struct msghdr *msg,
+ size_t size, int flags)
{
struct sock *sk = sock->sk;
struct irda_sock *self = irda_sk(sk);
@@ -1519,14 +1518,14 @@ static int irda_recvmsg_stream(struct ki
}

/*
- * Function irda_sendmsg_dgram (iocb, sock, msg, len)
+ * Function irda_sendmsg_dgram (sock, msg, len)
*
* Send message down to TinyTP for the unreliable sequenced
* packet service...
*
*/
-static int irda_sendmsg_dgram(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t len)
+static int irda_sendmsg_dgram(struct socket *sock, struct msghdr *msg,
+ size_t len)
{
struct sock *sk = sock->sk;
struct irda_sock *self;
@@ -1589,14 +1588,14 @@ static int irda_sendmsg_dgram(struct kio
}

/*
- * Function irda_sendmsg_ultra (iocb, sock, msg, len)
+ * Function irda_sendmsg_ultra (sock, msg, len)
*
* Send message down to IrLMP for the unreliable Ultra
* packet service...
*/
#ifdef CONFIG_IRDA_ULTRA
-static int irda_sendmsg_ultra(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t len)
+static int irda_sendmsg_ultra(struct socket *sock, struct msghdr *msg,
+ size_t len)
{
struct sock *sk = sock->sk;
struct irda_sock *self;
diff -urpN -X dontdiff a/net/key/af_key.c b/net/key/af_key.c
--- a/net/key/af_key.c 2007-01-12 11:19:00.142447635 -0800
+++ b/net/key/af_key.c 2007-01-12 11:29:21.363114238 -0800
@@ -3118,8 +3118,7 @@ static int pfkey_send_new_mapping(struct
return pfkey_broadcast(skb, GFP_ATOMIC, BROADCAST_REGISTERED, NULL);
}

-static int pfkey_sendmsg(struct kiocb *kiocb,
- struct socket *sock, struct msghdr *msg, size_t len)
+static int pfkey_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
{
struct sock *sk = sock->sk;
struct sk_buff *skb = NULL;
@@ -3160,8 +3159,7 @@ out:
return err ? : len;
}

-static int pfkey_recvmsg(struct kiocb *kiocb,
- struct socket *sock, struct msghdr *msg, size_t len,
+static int pfkey_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
int flags)
{
struct sock *sk = sock->sk;
diff -urpN -X dontdiff a/net/llc/af_llc.c b/net/llc/af_llc.c
--- a/net/llc/af_llc.c 2007-01-12 11:19:00.148445585 -0800
+++ b/net/llc/af_llc.c 2007-01-12 11:29:21.368112530 -0800
@@ -656,8 +656,8 @@ out:
* Copy received data to the socket user.
* Returns non-negative upon success, negative otherwise.
*/
-static int llc_ui_recvmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t len, int flags)
+static int llc_ui_recvmsg(struct socket *sock, struct msghdr *msg,
+ size_t len, int flags)
{
struct sockaddr_llc *uaddr = (struct sockaddr_llc *)msg->msg_name;
const int nonblock = flags & MSG_DONTWAIT;
@@ -818,8 +818,7 @@ copy_uaddr:
* Transmit data provided by the socket user.
* Returns non-negative upon success, negative otherwise.
*/
-static int llc_ui_sendmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t len)
+static int llc_ui_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
{
struct sock *sk = sock->sk;
struct llc_sock *llc = llc_sk(sk);
diff -urpN -X dontdiff a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
--- a/net/netlink/af_netlink.c 2007-01-12 11:28:23.515886548 -0800
+++ b/net/netlink/af_netlink.c 2007-01-12 11:29:21.373110822 -0800
@@ -1103,8 +1103,7 @@ static inline void netlink_rcv_wake(stru
wake_up_interruptible(&nlk->wait);
}

-static int netlink_sendmsg(struct kiocb *kiocb, struct socket *sock,
- struct msghdr *msg, size_t len)
+static int netlink_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
{
struct sock *sk = sock->sk;
struct netlink_sock *nlk = nlk_sk(sk);
@@ -1182,8 +1181,7 @@ out:
return err;
}

-static int netlink_recvmsg(struct kiocb *kiocb, struct socket *sock,
- struct msghdr *msg, size_t len,
+static int netlink_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
int flags)
{
struct scm_cookie scm;
diff -urpN -X dontdiff a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
--- a/net/netrom/af_netrom.c 2007-01-12 11:19:00.396360867 -0800
+++ b/net/netrom/af_netrom.c 2007-01-12 11:29:21.378109113 -0800
@@ -1009,8 +1009,7 @@ int nr_rx_frame(struct sk_buff *skb, str
return 1;
}

-static int nr_sendmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t len)
+static int nr_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
{
struct sock *sk = sock->sk;
struct nr_sock *nr = nr_sk(sk);
@@ -1123,8 +1122,8 @@ out:
return err;
}

-static int nr_recvmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t size, int flags)
+static int nr_recvmsg(struct socket *sock, struct msghdr *msg,
+ size_t size, int flags)
{
struct sock *sk = sock->sk;
struct sockaddr_ax25 *sax = (struct sockaddr_ax25 *)msg->msg_name;
diff -urpN -X dontdiff a/net/packet/af_packet.c b/net/packet/af_packet.c
--- a/net/packet/af_packet.c 2007-01-12 11:19:49.994410526 -0800
+++ b/net/packet/af_packet.c 2007-01-12 11:29:21.383107405 -0800
@@ -324,8 +324,8 @@ oom:
* protocol layers and you must therefore supply it with a complete frame
*/

-static int packet_sendmsg_spkt(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t len)
+static int packet_sendmsg_spkt(struct socket *sock, struct msghdr *msg,
+ size_t len)
{
struct sock *sk = sock->sk;
struct sockaddr_pkt *saddr=(struct sockaddr_pkt *)msg->msg_name;
@@ -697,8 +697,7 @@ ring_is_full:
#endif


-static int packet_sendmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t len)
+static int packet_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
{
struct sock *sk = sock->sk;
struct sockaddr_ll *saddr=(struct sockaddr_ll *)msg->msg_name;
@@ -1048,8 +1047,8 @@ out:
* If necessary we block.
*/

-static int packet_recvmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t len, int flags)
+static int packet_recvmsg(struct socket *sock, struct msghdr *msg,
+ size_t len, int flags)
{
struct sock *sk = sock->sk;
struct sk_buff *skb;
diff -urpN -X dontdiff a/net/rose/af_rose.c b/net/rose/af_rose.c
--- a/net/rose/af_rose.c 2007-01-12 11:19:00.415354377 -0800
+++ b/net/rose/af_rose.c 2007-01-12 11:29:21.389105355 -0800
@@ -1009,8 +1009,7 @@ int rose_rx_call_request(struct sk_buff
return 1;
}

-static int rose_sendmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t len)
+static int rose_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
{
struct sock *sk = sock->sk;
struct rose_sock *rose = rose_sk(sk);
@@ -1179,8 +1178,8 @@ static int rose_sendmsg(struct kiocb *io
}


-static int rose_recvmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t size, int flags)
+static int rose_recvmsg(struct socket *sock, struct msghdr *msg,
+ size_t size, int flags)
{
struct sock *sk = sock->sk;
struct rose_sock *rose = rose_sk(sk);
diff -urpN -X dontdiff a/net/sctp/socket.c b/net/sctp/socket.c
--- a/net/sctp/socket.c 2007-01-12 11:19:00.627281957 -0800
+++ b/net/sctp/socket.c 2007-01-12 11:29:21.397102621 -0800
@@ -1348,8 +1348,8 @@ static int sctp_error(struct sock *sk, i

SCTP_STATIC int sctp_msghdr_parse(const struct msghdr *, sctp_cmsgs_t *);

-SCTP_STATIC int sctp_sendmsg(struct kiocb *iocb, struct sock *sk,
- struct msghdr *msg, size_t msg_len)
+SCTP_STATIC int sctp_sendmsg(struct sock *sk, struct msghdr *msg,
+ size_t msg_len)
{
struct sctp_sock *sp;
struct sctp_endpoint *ep;
@@ -1803,9 +1803,8 @@ static int sctp_skb_pull(struct sk_buff
*/
static struct sk_buff *sctp_skb_recv_datagram(struct sock *, int, int, int *);

-SCTP_STATIC int sctp_recvmsg(struct kiocb *iocb, struct sock *sk,
- struct msghdr *msg, size_t len, int noblock,
- int flags, int *addr_len)
+SCTP_STATIC int sctp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
+ int noblock, int flags, int *addr_len)
{
struct sctp_ulpevent *event = NULL;
struct sctp_sock *sp = sctp_sk(sk);
diff -urpN -X dontdiff a/net/socket.c b/net/socket.c
--- a/net/socket.c 2007-01-12 11:28:23.521884498 -0800
+++ b/net/socket.c 2007-01-12 11:29:21.442087245 -0800
@@ -548,8 +548,7 @@ void sock_release(struct socket *sock)
sock->file = NULL;
}

-static inline int __sock_sendmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t size)
+int sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
{
int err;

@@ -557,15 +556,7 @@ static inline int __sock_sendmsg(struct
if (err)
return err;

- return sock->ops->sendmsg(iocb, sock, msg, size);
-}
-
-int sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
-{
- struct kiocb iocb;
-
- init_sync_kiocb(&iocb, NULL);
- return __sock_sendmsg(&iocb, sock, msg, size);
+ return sock->ops->sendmsg(sock, msg, size);
}

int kernel_sendmsg(struct socket *sock, struct msghdr *msg,
@@ -586,8 +577,8 @@ int kernel_sendmsg(struct socket *sock,
return result;
}

-static inline int __sock_recvmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t size, int flags)
+int sock_recvmsg(struct socket *sock, struct msghdr *msg,
+ size_t size, int flags)
{
int err;

@@ -595,16 +586,7 @@ static inline int __sock_recvmsg(struct
if (err)
return err;

- return sock->ops->recvmsg(iocb, sock, msg, size, flags);
-}
-
-int sock_recvmsg(struct socket *sock, struct msghdr *msg,
- size_t size, int flags)
-{
- struct kiocb iocb;
-
- init_sync_kiocb(&iocb, NULL);
- return __sock_recvmsg(&iocb, sock, msg, size, flags);
+ return sock->ops->recvmsg(sock, msg, size, flags);
}

int kernel_recvmsg(struct socket *sock, struct msghdr *msg,
@@ -664,7 +646,7 @@ static ssize_t sock_aio_read(struct kioc
msg.msg_iovlen = nr_segs;
msg.msg_flags = (iocb->ki_filp->f_flags & O_NONBLOCK) ? MSG_DONTWAIT : 0;

- return __sock_recvmsg(iocb, sock, &msg, size, msg.msg_flags);
+ return sock_recvmsg(sock, &msg, size, msg.msg_flags);
}

static ssize_t sock_aio_write(struct kiocb *iocb, const struct iovec *iov,
@@ -694,7 +676,7 @@ static ssize_t sock_aio_write(struct kio
if (sock->type == SOCK_SEQPACKET)
msg.msg_flags |= MSG_EOR;

- return __sock_sendmsg(iocb, sock, &msg, size);
+ return sock_sendmsg(sock, &msg, size);
}

/*
diff -urpN -X dontdiff a/net/tipc/socket.c b/net/tipc/socket.c
--- a/net/tipc/socket.c 2006-11-29 13:57:37.000000000 -0800
+++ b/net/tipc/socket.c 2007-01-12 11:29:21.447085537 -0800
@@ -431,7 +431,6 @@ static int dest_name_check(struct sockad

/**
* send_msg - send message in connectionless manner
- * @iocb: (unused)
* @sock: socket structure
* @m: message to send
* @total_len: length of message
@@ -444,8 +443,7 @@ static int dest_name_check(struct sockad
* Returns the number of bytes sent on success, or errno otherwise
*/

-static int send_msg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *m, size_t total_len)
+static int send_msg(struct socket *sock, struct msghdr *m, size_t total_len)
{
struct tipc_sock *tsock = tipc_sk(sock->sk);
struct sockaddr_tipc *dest = (struct sockaddr_tipc *)m->msg_name;
@@ -537,7 +535,6 @@ exit:

/**
* send_packet - send a connection-oriented message
- * @iocb: (unused)
* @sock: socket structure
* @m: message to send
* @total_len: length of message
@@ -547,8 +544,7 @@ exit:
* Returns the number of bytes sent on success, or errno otherwise
*/

-static int send_packet(struct kiocb *iocb, struct socket *sock,
- struct msghdr *m, size_t total_len)
+static int send_packet(struct socket *sock, struct msghdr *m, size_t total_len)
{
struct tipc_sock *tsock = tipc_sk(sock->sk);
struct sockaddr_tipc *dest = (struct sockaddr_tipc *)m->msg_name;
@@ -557,7 +553,7 @@ static int send_packet(struct kiocb *ioc
/* Handle implied connection establishment */

if (unlikely(dest))
- return send_msg(iocb, sock, m, total_len);
+ return send_msg(sock, m, total_len);

if (down_interruptible(&tsock->sem)) {
return -ERESTARTSYS;
@@ -592,7 +588,6 @@ exit:

/**
* send_stream - send stream-oriented data
- * @iocb: (unused)
* @sock: socket structure
* @m: data to send
* @total_len: total length of data to be sent
@@ -604,8 +599,7 @@ exit:
*/


-static int send_stream(struct kiocb *iocb, struct socket *sock,
- struct msghdr *m, size_t total_len)
+static int send_stream(struct socket *sock, struct msghdr *m, size_t total_len)
{
struct msghdr my_msg;
struct iovec my_iov;
@@ -618,7 +612,7 @@ static int send_stream(struct kiocb *ioc
int res;

if (likely(total_len <= TIPC_MAX_USER_MSG_SIZE))
- return send_packet(iocb, sock, m, total_len);
+ return send_packet(sock, m, total_len);

/* Can only send large data streams if already connected */

@@ -657,7 +651,7 @@ static int send_stream(struct kiocb *ioc
? curr_left : TIPC_MAX_USER_MSG_SIZE;
my_iov.iov_base = curr_start;
my_iov.iov_len = bytes_to_send;
- if ((res = send_packet(iocb, sock, &my_msg, 0)) < 0) {
+ if ((res = send_packet(sock, &my_msg, 0)) < 0) {
return bytes_sent ? bytes_sent : res;
}
curr_left -= bytes_to_send;
@@ -792,7 +786,6 @@ static int anc_data_recv(struct msghdr *

/**
* recv_msg - receive packet-oriented message
- * @iocb: (unused)
* @m: descriptor for message info
* @buf_len: total size of user buffer area
* @flags: receive flags
@@ -803,8 +796,8 @@ static int anc_data_recv(struct msghdr *
* Returns size of returned message data, errno otherwise
*/

-static int recv_msg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *m, size_t buf_len, int flags)
+static int recv_msg(struct socket *sock, struct msghdr *m,
+ size_t buf_len, int flags)
{
struct tipc_sock *tsock = tipc_sk(sock->sk);
struct sk_buff *buf;
@@ -924,7 +917,6 @@ exit:

/**
* recv_stream - receive stream-oriented data
- * @iocb: (unused)
* @m: descriptor for message info
* @buf_len: total size of user buffer area
* @flags: receive flags
@@ -935,8 +927,8 @@ exit:
* Returns size of returned message data, errno otherwise
*/

-static int recv_stream(struct kiocb *iocb, struct socket *sock,
- struct msghdr *m, size_t buf_len, int flags)
+static int recv_stream(struct socket *sock, struct msghdr *m,
+ size_t buf_len, int flags)
{
struct tipc_sock *tsock = tipc_sk(sock->sk);
struct sk_buff *buf;
diff -urpN -X dontdiff a/net/unix/af_unix.c b/net/unix/af_unix.c
--- a/net/unix/af_unix.c 2007-01-12 11:28:23.526882789 -0800
+++ b/net/unix/af_unix.c 2007-01-12 11:29:21.453083487 -0800
@@ -478,18 +478,13 @@ static int unix_getname(struct socket *,
static unsigned int unix_poll(struct file *, struct socket *, poll_table *);
static int unix_ioctl(struct socket *, unsigned int, unsigned long);
static int unix_shutdown(struct socket *, int);
-static int unix_stream_sendmsg(struct kiocb *, struct socket *,
- struct msghdr *, size_t);
-static int unix_stream_recvmsg(struct kiocb *, struct socket *,
- struct msghdr *, size_t, int);
-static int unix_dgram_sendmsg(struct kiocb *, struct socket *,
- struct msghdr *, size_t);
-static int unix_dgram_recvmsg(struct kiocb *, struct socket *,
- struct msghdr *, size_t, int);
+static int unix_stream_sendmsg(struct socket *, struct msghdr *, size_t);
+static int unix_stream_recvmsg(struct socket *, struct msghdr *, size_t, int);
+static int unix_dgram_sendmsg(struct socket *, struct msghdr *, size_t);
+static int unix_dgram_recvmsg(struct socket *, struct msghdr *, size_t, int);
static int unix_dgram_connect(struct socket *, struct sockaddr *,
int, int);
-static int unix_seqpacket_sendmsg(struct kiocb *, struct socket *,
- struct msghdr *, size_t);
+static int unix_seqpacket_sendmsg(struct socket *, struct msghdr *, size_t);

static const struct proto_ops unix_stream_ops = {
.family = PF_UNIX,
@@ -1264,8 +1259,8 @@ static void unix_attach_fds(struct scm_c
* Send AF_UNIX data.
*/

-static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock,
- struct msghdr *msg, size_t len)
+static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
+ size_t len)
{
struct sock *sk = sock->sk;
struct unix_sock *u = unix_sk(sk);
@@ -1413,8 +1408,8 @@ out:
}


-static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
- struct msghdr *msg, size_t len)
+static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg,
+ size_t len)
{
struct sock *sk = sock->sk;
struct sock *other = NULL;
@@ -1516,8 +1511,8 @@ out_err:
return sent ? : err;
}

-static int unix_seqpacket_sendmsg(struct kiocb *kiocb, struct socket *sock,
- struct msghdr *msg, size_t len)
+static int unix_seqpacket_sendmsg(struct socket *sock, struct msghdr *msg,
+ size_t len)
{
int err;
struct sock *sk = sock->sk;
@@ -1532,7 +1527,7 @@ static int unix_seqpacket_sendmsg(struct
if (msg->msg_namelen)
msg->msg_namelen = 0;

- return unix_dgram_sendmsg(kiocb, sock, msg, len);
+ return unix_dgram_sendmsg(sock, msg, len);
}

static void unix_copy_addr(struct msghdr *msg, struct sock *sk)
@@ -1546,9 +1541,8 @@ static void unix_copy_addr(struct msghdr
}
}

-static int unix_dgram_recvmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t size,
- int flags)
+static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
+ size_t size, int flags)
{
struct scm_cookie scm;
struct sock *sk = sock->sk;
@@ -1655,9 +1649,8 @@ static long unix_stream_data_wait(struct



-static int unix_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t size,
- int flags)
+static int unix_stream_recvmsg(struct socket *sock, struct msghdr *msg,
+ size_t size, int flags)
{
struct scm_cookie scm;
struct sock *sk = sock->sk;
diff -urpN -X dontdiff a/net/wanrouter/af_wanpipe.c b/net/wanrouter/af_wanpipe.c
--- a/net/wanrouter/af_wanpipe.c 2007-01-12 11:19:00.807220468 -0800
+++ b/net/wanrouter/af_wanpipe.c 2007-01-12 11:29:21.459081437 -0800
@@ -542,8 +542,7 @@ static struct sock *wanpipe_alloc_socket
* a packet is queued into sk->sk_write_queue.
*===========================================================*/

-static int wanpipe_sendmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, int len)
+static int wanpipe_sendmsg(struct socket *sock, struct msghdr *msg, int len)
{
wanpipe_opt *wp;
struct sock *sk = sock->sk;
@@ -1546,8 +1545,8 @@ static int wanpipe_create(struct socket
* to the user. If necessary we block.
*===========================================================*/

-static int wanpipe_recvmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, int len, int flags)
+static int wanpipe_recvmsg(struct socket *sock, struct msghdr *msg,
+ int len, int flags)
{
struct sock *sk = sock->sk;
struct sk_buff *skb;
diff -urpN -X dontdiff a/net/x25/af_x25.c b/net/x25/af_x25.c
--- a/net/x25/af_x25.c 2007-01-12 11:19:00.816217393 -0800
+++ b/net/x25/af_x25.c 2007-01-12 11:29:21.464079728 -0800
@@ -958,8 +958,7 @@ out_clear_request:
goto out;
}

-static int x25_sendmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t len)
+static int x25_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
{
struct sock *sk = sock->sk;
struct x25_sock *x25 = x25_sk(sk);
@@ -1134,8 +1133,7 @@ out_kfree_skb:
}


-static int x25_recvmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *msg, size_t size,
+static int x25_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
int flags)
{
struct sock *sk = sock->sk;

2007-01-16 03:24:13

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH -mm 0/10][RFC] aio: make struct kiocb private

On Mon, Jan 15, 2007 at 05:54:50PM -0800, Nate Diller wrote:
> This series is an attempt to generalize the async I/O paths to be
> implementation agnostic. It completely eliminates knowledge of
> the kiocb structure in the generic code and makes it private within the
> current aio code. Things get noticeably cleaner without that layering
> violation.
>
> The new interface takes a file_endio_t function pointer, and a private data
> pointer, which would normally be aio_complete and a kiocb pointer,
> respectively. If the aio submission function gets back EIOCBQUEUED, that is
> a guarantee that the endio function will be called, or *already has been
> called*. If the file_endio_t pointer provided to aio_[read|write] is NULL,
> the FS must block on I/O completion, then return either the number of bytes
> read, or an error.

I don't really like this patchet at all. At some point it's a lot nicer
to have a lot of paramaters that are related and passed down a long
callchain into a structure, and I think the aio code is over that threshold.
The completion function cleanups look okay to me, but I'd rather add
that completion function to struct kiocb instead of removing kiocb use.

I have this slight feeling you want to use this completions for something
else than the current aio code, if that's the case it would help
if you could explain briefly in what direction your heading.

2007-01-16 04:25:21

by Nate Diller

[permalink] [raw]
Subject: Re: [PATCH -mm 0/10][RFC] aio: make struct kiocb private

On 1/15/07, Christoph Hellwig <[email protected]> wrote:
> On Mon, Jan 15, 2007 at 05:54:50PM -0800, Nate Diller wrote:
> > This series is an attempt to generalize the async I/O paths to be
> > implementation agnostic. It completely eliminates knowledge of
> > the kiocb structure in the generic code and makes it private within the
> > current aio code. Things get noticeably cleaner without that layering
> > violation.
> >
> > The new interface takes a file_endio_t function pointer, and a private data
> > pointer, which would normally be aio_complete and a kiocb pointer,
> > respectively. If the aio submission function gets back EIOCBQUEUED, that is
> > a guarantee that the endio function will be called, or *already has been
> > called*. If the file_endio_t pointer provided to aio_[read|write] is NULL,
> > the FS must block on I/O completion, then return either the number of bytes
> > read, or an error.
>
> I don't really like this patchet at all. At some point it's a lot nicer
> to have a lot of paramaters that are related and passed down a long
> callchain into a structure, and I think the aio code is over that threshold.
> The completion function cleanups look okay to me, but I'd rather add
> that completion function to struct kiocb instead of removing kiocb use.
>
> I have this slight feeling you want to use this completions for something
> else than the current aio code, if that's the case it would help
> if you could explain briefly in what direction your heading.

Actually I agree with you more than you might think. I had intended
this to mesh with your struct iodesc idea, where iodesc would contain
the iovec pointer, nr_segs, iov_length, and whatever else needs to be
there, potentially even the endio function and its private data, tying
those to the iovec instead of a separate structure that needs to be
kept in sync. There's a distinct layering that should exist between
things that should accompany the iovec transparently, and private data
that should be attached opaquely by layers above.

The biggest thing I have in mind for this patch, actually, is to fix
up the *sync* paths. I don't think we should be waiting on sync I/O
at the *top* of the call stack, like with wait_on_sync_kiocb(), I'd
say the best place to wait is at the *bottom*, down in the I/O
scheduler. This would make it a lot easier to clean up the completion
paths, because in the sync case, you'd be right back in process
context again as you traverse upward through the RAID, encryption,
loopback, directIO, FS log commit, etc. It doesn't by itself
eliminate the need for all the threads and workqueues and such that
those layers each own, but it is a step in the right direction.

Now if you want to talk about long-term vaporware style ideas, yeah, I
do have my own thoughts on how aio should work. And from Agami's
perspective, this patch also makes it easier for us to do certain
debugging traces that we wish to hack together, in order to profile
performance on our platform. But I'd be hesitant to make those
arguments, cause they are largely irrelevant (we can obviously carry
the patch for debugging without buy-in from the community). This is
the right thing to do from a design perspective. Hopefully it enables
a new architecture that can reduce context switches in I/O completion,
and reduce overhead. That's the real motive ;)

NATE

2007-01-16 05:37:35

by Nate Diller

[permalink] [raw]
Subject: Re: [PATCH -mm 3/10][RFC] aio: use iov_length instead of ki_left

On 1/15/07, Christoph Hellwig <[email protected]> wrote:
> On Mon, Jan 15, 2007 at 05:54:50PM -0800, Nate Diller wrote:
> > Convert code using iocb->ki_left to use the more generic iov_length() call.
>
> No way. We need to reduce the numer of iovec traversals, not adding
> more of them.

ok, I can work on a version of this that uses struct iodesc. Maybe
something like this?

struct iodesc {
struct iovec *iov;
unsigned long nr_segs;
size_t nbytes;
};

I suppose it's worth doing the iodesc thing along with this patchset
anyway, since it'll avoid an extra round of interface churn.

NATE

2007-01-16 05:46:15

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH -mm 2/10][RFC] aio: net use struct socket for io

On Mon, 15 Jan 2007 17:54:50 -0800
Nate Diller <[email protected]> wrote:

> Remove unused arg from socket operations
>
> The sendmsg and recvmsg socket operations take a kiocb pointer, but none of
> the functions actually use it. There's really no need even theoretically,
> it's really quite ugly having it there at all. Also, removing it will pave
> the way for a more generic completion path in the file_operations.
>
> ---

Would getting rid of these make later implementation of AIO networking
harder?

2007-01-16 07:24:52

by David Brownell

[permalink] [raw]
Subject: Re: [PATCH -mm 4/10][RFC] aio: convert aio_complete to file_endio_t

On Monday 15 January 2007 5:54 pm, Nate Diller wrote:
> --- a/drivers/usb/gadget/inode.c 2007-01-12 14:42:29.000000000 -0800
> +++ b/drivers/usb/gadget/inode.c 2007-01-12 14:25:34.000000000 -0800
> @@ -559,35 +559,32 @@ static int ep_aio_cancel(struct kiocb *i
> return value;
> }
>
> -static ssize_t ep_aio_read_retry(struct kiocb *iocb)
> +static int ep_aio_read_retry(struct kiocb *iocb)
> {
> struct kiocb_priv *priv = iocb->private;
> - ssize_t len, total;
> - int i;
> + ssize_t total;
> + int i, err = 0;
>
> /* we "retry" to get the right mm context for this: */
>
> /* copy stuff into user buffers */
> total = priv->actual;
> - len = 0;
> for (i=0; i < priv->nr_segs; i++) {
> ssize_t this = min((ssize_t)(priv->iv[i].iov_len), total);
>
> if (copy_to_user(priv->iv[i].iov_base, priv->buf, this)) {
> - if (len == 0)
> - len = -EFAULT;
> + err = -EFAULT;

Discarding the capability to report partial success, e.g. that the first N
bytes were properly transferred? I don't see any virtue in that change.
Quite the opposite in fact.

I think you're also expecting that if N bytes were requested, that's always
how many will be received. That's not true for packetized I/O such as USB
isochronous transfers ... where it's quite legit (and in some cases routine)
for the other end to send packets that are shorter than the maximum allowed.
Sending a zero length packet is not the same as sending no packet at all,
for another example.

2007-01-16 07:24:57

by David Brownell

[permalink] [raw]
Subject: Re: [PATCH -mm 9/10][RFC] aio: usb gadget remove aio file ops

On Monday 15 January 2007 5:54 pm, Nate Diller wrote:
> This removes the aio implementation from the usb gadget file system.

NAK. I see a deep mis-understanding here.


> Aside
> from making very creative (!) use of the aio retry path, it can't be of any
> use performance-wise

Other than the basic win of letting one userspace thread keep an I/O
stream active while at the same time processing the data it reads or
writes?? That's the "async" part of AIO.

There's a not-so-little thing called "I/O overlap" ... which is the only
way to prevent wasting bandwidth between (non-cacheable) I/O requests,
and thus is the only way to let userspace code achieve anything close
to the maximum I/O bandwidth the hardware can achieve.

We want to see the host side "usbfs" evolve to support AIO like this
too, for the same reasons. (Currently it has fairly ugly AIO code
that looks unlike any other AIO code in Linux. Recent updates to
support a file-per-endpoint device model are a necessary precursor
to switching over to standard AIO syscalls.)


> because it always kmalloc()s a bounce buffer for the
> *whole* I/O size.

By and large that's a negligible factor compared to being able to
achieve I/O overlap. ISTR the reason for not doing fancy DMA magic
was that the cost of this style AIO was under 1 KByte object code
on ARM, which was easy to justify ... while DMA magic to do that
sort of stuff would be much fatter, as well as more error prone.

(And that's why the "creative" use of the retry path. As I've
observed before, "retry" is a misnomer in the general sense of
an async I/O framework. It's more of a semi-completion callback;
I/O can't in general be "retried" on error or fault, and even in
the current usage it's not really a "retry".)


Now that high speed peripheral hardware is becoming more common on
embedded Linuxes -- TI has DaVinci, OMAP 2430, TUSB6010 (as found
in the new Nokia 800 tablets); Atmel AVR32 AP7000; at least a couple
parts that should be able to use the same musb_hdrc driver as those
TI parts; and a few other chips I've heard of -- there may be some
virtue in eliminating the memcpy, since those CPUs don't have many
MIPS to waste. (Iff the memcpy turns out to be a real issue...)


> Perhaps the only reason to keep it around is the ability
> to cancel I/O requests, which only applies when using the user space async
> I/O interface.

It's good to have almost the complete kernel API functionality
exposed to userspace, and having I/O cancelation is an inevitable
consequence of a complete AIO framework ... but that particular
issue was not a driving concern.


The reason for AIO is to have a *STANDARD* userspace interface
for *ASYNC I/O* which otherwise can't exist. You know, the kind
of I/O interface that can't be implemented with read() and write()
syscalls, which for non-buffered I/O necessarily preclude all I/O
overlap. AIO itself is a direct match to most I/O frameworks'
primitives. (AIOCB being directly analagous to peripheral side
"struct usb_request" and host side "struct urb".)


You know, I've always thought that one reason the AIO discussions
seemed strange is that they weren't really focussed on I/O (the
lowlevel after-the-caches stuff) so much as filesystems (several
layers up in the stack, with intervening caching frameworks).

The first several implementations of AIO that I saw were restricted
to "real" I/O and not applicable to disk backed files. So while I
was glad the Linux approach didn't make that mistake, it's seemed
that it might be wanting to make a converse mistake: neglecting I/O
that isn't aimed at data stored on disks.


> I highly doubt that is enough incentive to justify the extra
> complexity here or in user-space, so I think it's a safe bet to remove this.
> If that feature still desired, it would be possible to implement a sync
> interface that does an interruptible sleep.

What's needed is an async, non-sleeeping, interface ... with I/O
overlap. That's antithetical to using read()/write() calls, so
your proposed approach couldn't possibly work.

- Dave


2007-01-16 08:22:43

by David Brownell

[permalink] [raw]
Subject: Re: [PATCH -mm 0/10][RFC] aio: make struct kiocb private

On Monday 15 January 2007 8:25 pm, Nate Diller wrote:

> I don't think we should be waiting on sync I/O
> at the *top* of the call stack, like with wait_on_sync_kiocb(), I'd
> say the best place to wait is at the *bottom*, down in the I/O
> scheduler.

Erm ... *what* I/O scheduler? These I/O requests may go directly
to the end of the hardware I/O queue, which already has an I/O model
where each request can correspond directly to a KIOCB. And which
does not include any synchronous primitives.

No such scheduler has previously been, or _should_ be, required.

2007-01-16 09:13:50

by Nate Diller

[permalink] [raw]
Subject: Re: [PATCH -mm 9/10][RFC] aio: usb gadget remove aio file ops

On 1/15/07, David Brownell <[email protected]> wrote:
> On Monday 15 January 2007 5:54 pm, Nate Diller wrote:
> > This removes the aio implementation from the usb gadget file system.
>
> NAK. I see a deep mis-understanding here.
>
>
> > Aside
> > from making very creative (!) use of the aio retry path, it can't be of any
> > use performance-wise
>
> Other than the basic win of letting one userspace thread keep an I/O
> stream active while at the same time processing the data it reads or
> writes?? That's the "async" part of AIO.
>
> There's a not-so-little thing called "I/O overlap" ... which is the only
> way to prevent wasting bandwidth between (non-cacheable) I/O requests,
> and thus is the only way to let userspace code achieve anything close
> to the maximum I/O bandwidth the hardware can achieve.
>
> We want to see the host side "usbfs" evolve to support AIO like this
> too, for the same reasons. (Currently it has fairly ugly AIO code
> that looks unlike any other AIO code in Linux. Recent updates to
> support a file-per-endpoint device model are a necessary precursor
> to switching over to standard AIO syscalls.)
>
>
> > because it always kmalloc()s a bounce buffer for the
> > *whole* I/O size.
>
> By and large that's a negligible factor compared to being able to
> achieve I/O overlap. ISTR the reason for not doing fancy DMA magic
> was that the cost of this style AIO was under 1 KByte object code
> on ARM, which was easy to justify ... while DMA magic to do that
> sort of stuff would be much fatter, as well as more error prone.
>
> (And that's why the "creative" use of the retry path. As I've
> observed before, "retry" is a misnomer in the general sense of
> an async I/O framework. It's more of a semi-completion callback;
> I/O can't in general be "retried" on error or fault, and even in
> the current usage it's not really a "retry".)
>
>
> Now that high speed peripheral hardware is becoming more common on
> embedded Linuxes -- TI has DaVinci, OMAP 2430, TUSB6010 (as found
> in the new Nokia 800 tablets); Atmel AVR32 AP7000; at least a couple
> parts that should be able to use the same musb_hdrc driver as those
> TI parts; and a few other chips I've heard of -- there may be some
> virtue in eliminating the memcpy, since those CPUs don't have many
> MIPS to waste. (Iff the memcpy turns out to be a real issue...)
>
>
> > Perhaps the only reason to keep it around is the ability
> > to cancel I/O requests, which only applies when using the user space async
> > I/O interface.
>
> It's good to have almost the complete kernel API functionality
> exposed to userspace, and having I/O cancelation is an inevitable
> consequence of a complete AIO framework ... but that particular
> issue was not a driving concern.
>
>
> The reason for AIO is to have a *STANDARD* userspace interface
> for *ASYNC I/O* which otherwise can't exist. You know, the kind
> of I/O interface that can't be implemented with read() and write()
> syscalls, which for non-buffered I/O necessarily preclude all I/O
> overlap. AIO itself is a direct match to most I/O frameworks'
> primitives. (AIOCB being directly analagous to peripheral side
> "struct usb_request" and host side "struct urb".)
>
>
> You know, I've always thought that one reason the AIO discussions
> seemed strange is that they weren't really focussed on I/O (the
> lowlevel after-the-caches stuff) so much as filesystems (several
> layers up in the stack, with intervening caching frameworks).
>
> The first several implementations of AIO that I saw were restricted
> to "real" I/O and not applicable to disk backed files. So while I
> was glad the Linux approach didn't make that mistake, it's seemed
> that it might be wanting to make a converse mistake: neglecting I/O
> that isn't aimed at data stored on disks.
>
>
> > I highly doubt that is enough incentive to justify the extra
> > complexity here or in user-space, so I think it's a safe bet to remove this.
> > If that feature still desired, it would be possible to implement a sync
> > interface that does an interruptible sleep.
>
> What's needed is an async, non-sleeeping, interface ... with I/O
> overlap. That's antithetical to using read()/write() calls, so
> your proposed approach couldn't possibly work.

haha, wow ok you convinced me :)

I got a bit impatient when I was working on this, it took some time
just to figure out the intention of the code, and I'm trying to hold
to a bit of a schedule here. Without any clear (to me) reason, I
didn't want to spend a lot of effort fixing this up.

There's really no big difference between the usb drivers here and the
disk I/O scheduler queue, AFAICT, so it seems like the solution I want
is to do a kmap() on the user buffer and then do the I/O straight out
of that. That will eliminate the need for the bounce buffer. I'll
post a new version along with the iodesc changes later this week.

NATE

2007-01-16 09:21:28

by Nate Diller

[permalink] [raw]
Subject: Re: [PATCH -mm 4/10][RFC] aio: convert aio_complete to file_endio_t

On 1/15/07, David Brownell <[email protected]> wrote:
> On Monday 15 January 2007 5:54 pm, Nate Diller wrote:
> > --- a/drivers/usb/gadget/inode.c 2007-01-12 14:42:29.000000000 -0800
> > +++ b/drivers/usb/gadget/inode.c 2007-01-12 14:25:34.000000000 -0800
> > @@ -559,35 +559,32 @@ static int ep_aio_cancel(struct kiocb *i
> > return value;
> > }
> >
> > -static ssize_t ep_aio_read_retry(struct kiocb *iocb)
> > +static int ep_aio_read_retry(struct kiocb *iocb)
> > {
> > struct kiocb_priv *priv = iocb->private;
> > - ssize_t len, total;
> > - int i;
> > + ssize_t total;
> > + int i, err = 0;
> >
> > /* we "retry" to get the right mm context for this: */
> >
> > /* copy stuff into user buffers */
> > total = priv->actual;
> > - len = 0;
> > for (i=0; i < priv->nr_segs; i++) {
> > ssize_t this = min((ssize_t)(priv->iv[i].iov_len), total);
> >
> > if (copy_to_user(priv->iv[i].iov_base, priv->buf, this)) {
> > - if (len == 0)
> > - len = -EFAULT;
> > + err = -EFAULT;
>
> Discarding the capability to report partial success, e.g. that the first N
> bytes were properly transferred? I don't see any virtue in that change.
> Quite the opposite in fact.
>
> I think you're also expecting that if N bytes were requested, that's always
> how many will be received. That's not true for packetized I/O such as USB
> isochronous transfers ... where it's quite legit (and in some cases routine)
> for the other end to send packets that are shorter than the maximum allowed.
> Sending a zero length packet is not the same as sending no packet at all,
> for another example.

I will convert this (usb) code to use the standard completion path,
which you will notice *gained* the ability to properly report both an
error and a partial success as part of this patch. In fact, fixing
this up was my intention when I wrote this patch, and the later patch
was a compromise intended to get this whole bundle out for review in a
timely manner :)

NATE

2007-01-16 10:30:39

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [PATCH -mm 2/10][RFC] aio: net use struct socket for io

On Mon, Jan 15, 2007 at 09:44:27PM -0800, Stephen Hemminger ([email protected]) wrote:
> > The sendmsg and recvmsg socket operations take a kiocb pointer, but none of
> > the functions actually use it. There's really no need even theoretically,
> > it's really quite ugly having it there at all. Also, removing it will pave
> > the way for a more generic completion path in the file_operations.
> >
> > ---
>
> Would getting rid of these make later implementation of AIO networking
> harder?

Depending on what AIO it will be.
Mainstream AIO does stand on kiocb, but if socket operations will be
extended to have additional async_read/write (like it as done in kevent
AIO) there is no need to have this pointer in sync operations (until
people want to have sync aio just as async with waiting for completion).

So, real question is, what next - how network AIO will be implemented?

--
Evgeniy Polyakov

2007-01-16 18:36:51

by David Brownell

[permalink] [raw]
Subject: Re: [PATCH -mm 9/10][RFC] aio: usb gadget remove aio file ops

On Tuesday 16 January 2007 1:13 am, Nate Diller wrote:
> On 1/15/07, David Brownell <[email protected]> wrote:
> > What's needed is an async, non-sleeeping, interface ... with I/O
> > overlap. That's antithetical to using read()/write() calls, so
> > your proposed approach couldn't possibly work.
>
> haha, wow ok you convinced me :)

Good. :)


> I got a bit impatient when I was working on this, it took some time
> just to figure out the intention of the code, and I'm trying to hold
> to a bit of a schedule here. Without any clear (to me) reason, I
> didn't want to spend a lot of effort fixing this up.

Thing is, it's not OK to break other subsystems like that.


> There's really no big difference between the usb drivers here and the
> disk I/O scheduler queue, AFAICT,

The disk scheduler queue is more complex, as I understand things,
since it can combine operations. For USB, "combining" would break
essential semantics relied on by both sides of the transaction.

Maybe the best way to view this is to accept that with USB, all
scheduler operations (e.g. for bandwidth reservation management)
are out of scope of the AIO request model. AIO requests are no
more (or less) than "append this to the endpoint's I/O queue",
with the (host side) I/O scheduling handled separately.


> so it seems like the solution I want
> is to do a kmap() on the user buffer and then do the I/O straight out
> of that. That will eliminate the need for the bounce buffer. I'll
> post a new version along with the iodesc changes later this week.

Sounds more complex, but it would be nice to have that code become
zero-copy instead of single-copy. That'd let some platforms work
with high bandwidth ISO from userspace, which previously would not
have had enough CPU bandwidth. ("High bandwidth" means sustained
8-24 MByte/sec data streaming. Processing pixels at that rate may
require a companion DSP...) Testing will be different issue though.

- Dave

2007-01-16 23:36:31

by Ingo Oeser

[permalink] [raw]
Subject: Re: [PATCH -mm 3/10][RFC] aio: use iov_length instead of ki_left

On Tuesday, 16. January 2007 06:37, Nate Diller wrote:
> On 1/15/07, Christoph Hellwig <[email protected]> wrote:
> > On Mon, Jan 15, 2007 at 05:54:50PM -0800, Nate Diller wrote:
> > > Convert code using iocb->ki_left to use the more generic iov_length() call.
> >
> > No way. We need to reduce the numer of iovec traversals, not adding
> > more of them.
>
> ok, I can work on a version of this that uses struct iodesc. Maybe
> something like this?
>
> struct iodesc {
> struct iovec *iov;
> unsigned long nr_segs;
> size_t nbytes;
> };
>
> I suppose it's worth doing the iodesc thing along with this patchset
> anyway, since it'll avoid an extra round of interface churn.

What about this instead

struct iodesc {
struct iovec *iov;
unsigned long nr_segs;
unsigned long seg_limit;
size_t nr_bytes;
};

That will enable resizeable iodescs with partial completion state and
will enable successive filling of an iodesc with iovs.

This will be needed anyway. I built an complete short userspace
module for that already. I can post and GPLv2 it somewhere, if people
are interested.

Regards

Ingo Oeser

2007-01-17 21:52:56

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: [PATCH -mm 0/10][RFC] aio: make struct kiocb private

On Mon, Jan 15, 2007 at 08:25:15PM -0800, Nate Diller wrote:
> the right thing to do from a design perspective. Hopefully it enables
> a new architecture that can reduce context switches in I/O completion,
> and reduce overhead. That's the real motive ;)

And it's a broken motive. Context switches per se are not bad, as they
make it possible to properly schedule code in a busy system (which is
*very* important when realtime concerns come into play). Have a look
at how things were done in the 2.4 aio code to see how completion would
get done with a non-retry method, typically in interrupt context. I had
code that did direct I/O rather differently by sharing code with the
read/write code paths at some point, the catch being that it was pretty
invasive, which meant that it never got merged with the changes to handle
writeback pressure and other work that happened during 2.5.

That said, you can't make kiocb private without completely removing the
ability of the rest of the kernel to complete an aio sanely from irq context.
You need some form of i/o descriptor, and a kiocb is just that. Adding more
layering is just going to make things messier and slower for no real gain.

-ben
--
"Time is of no importance, Mr. President, only life is important."
Don't Email: <[email protected]>.

2007-01-17 23:32:47

by Nate Diller

[permalink] [raw]
Subject: Re: [PATCH -mm 0/10][RFC] aio: make struct kiocb private

On Wed, 17 Jan 2007, Benjamin LaHaise wrote:

> On Mon, Jan 15, 2007 at 08:25:15PM -0800, Nate Diller wrote:
>> the right thing to do from a design perspective. Hopefully it enables
>> a new architecture that can reduce context switches in I/O completion,
>> and reduce overhead. That's the real motive ;)
>
> And it's a broken motive. Context switches per se are not bad, as they
> make it possible to properly schedule code in a busy system (which is
> *very* important when realtime concerns come into play). Have a look
> at how things were done in the 2.4 aio code to see how completion would
> get done with a non-retry method, typically in interrupt context. I had
> code that did direct I/O rather differently by sharing code with the
> read/write code paths at some point, the catch being that it was pretty
> invasive, which meant that it never got merged with the changes to handle
> writeback pressure and other work that happened during 2.5.

I'm having some trouble understanding your concern. From my perspective,
any unnecessary context switch represents not only performance loss, but
extra complexity in the code. In this case, I'm not suggesting that the
aio.c code causes problems, quite the opposite. The code I'd like to change
is FS and md levels, where context switches happen because of timers,
workqueues, and worker threads. For sync I/O, these layers could be doing
their completion work in process context, but because waiting on sync I/O is
done in layers above, they must resort to other means, even for the common
case. The dm-crypt module is the most straightforward example.

I took a look at some 2.4.18 aio patches in kernel.org/.../bcrl/aio/, and if
I understand what you did, you were basically operating at the aops level
rather than f_ops. I actually like that idea, it's nicer than having the
direct-io code do its work seperately from the aio code. Part of where I'm
going with this patch is a better integration between the block layer
(make_request), page layer (aops), and FS layer (f_ops), particularly in the
completion paths. The direct-io code is an improvement over the common code
on that point, do_readahead() and friends all wait on individual pages to
become uptodate. I'd like to bring some improvements from the directIO
architecture into use in the common case, which I hope will help
performance.

I know that might seem somewhat unrelated, but I don't think it is. This
change goes hand in hand with using completion handlers in the aops. That
will link together the completion callback in the bio with the aio callback,
so that the whole stack can finish its work in one context.

> That said, you can't make kiocb private without completely removing the
> ability of the rest of the kernel to complete an aio sanely from irq context.
> You need some form of i/o descriptor, and a kiocb is just that. Adding more
> layering is just going to make things messier and slower for no real gain.

This patchset does not change how or when I/O completion happens,
aio_complete() will still get called from direct-io.c, nfs-direct.c, et al.
The iocb structure is still passed to aio_complete, just like before. The
only difference is that the lower level code doesn't know that it's got an
iocb, all it sees is an opaque cookie. It's more like enforcing a layer
that's already in place, and I think things got simpler rather than messier.
Whether things are slower or not remains to be seen, but I expect no
measurable changes either way with this patch.

I'm releasing a new version of the patch soon, it will use a new iodesc
structure to keep track of iovec state, which simplifies things further. It
also will have a new version of the usb gadget code, and some general
cleanups. I hope you'll take a look at it.

NATE

2007-01-18 04:23:39

by Suparna Bhattacharya

[permalink] [raw]
Subject: Vectored AIO breakage for sockets and pipes ?


The call to aio_advance_iovec() in aio_rw_vect_retry() becomes problematic
when it comes to pipe and socket operations which internally modify/advance
the iovec themselves. As a result AIO writes to sockets fail to return
the correct result.

I'm not sure what the best way to fix this is. One option is to always make
a copy of the iovec and pass that down. Any other thoughts ?

Regards
Suparna

--
Suparna Bhattacharya ([email protected])
Linux Technology Center
IBM Software Lab, India

2007-01-18 21:40:47

by Zach Brown

[permalink] [raw]
Subject: Re: Vectored AIO breakage for sockets and pipes ?

> I'm not sure what the best way to fix this is. One option is to
> always make
> a copy of the iovec and pass that down. Any other thoughts ?

Can we use this as another motivation to introduce an iovec container
struct instead of passing a raw iov/seg? The transition could turn
hand-rolled functions like pipe_iov_copy_to_user() into functions
that this iovec struct API provides.

I don't know if this would specifically help aio_rw_vect_retry() to
know if it should advance the iovec on behalf of its callee who
returned positive result codes.

Maybe it could use the API to discover a case where ret < size &&
cur_pos(iov_struct) == initial_pos(iov_struct) via some iovec pos
query before rw_op is called?

Or maybe the introduction of the API could normalize where the
responsibility of advancing the iovec lies. That might be a bit much.

Just talkin' here.

- z