2014-11-04 08:31:50

by Herbert Xu

[permalink] [raw]
Subject: [PATCH 1/4] inet: Add skb_copy_datagram_iter

This patch adds skb_copy_datagram_iter, which is identical to
skb_copy_datagram_iovec except that it operates on iov_iter
instead of iovec.

Eventually all users of skb_copy_datagram_iovec should switch
over to iov_iter and then we can remove skb_copy_datagram_iovec.

Signed-off-by: Herbert Xu <[email protected]>
---

include/linux/skbuff.h | 3 +
net/core/datagram.c | 82 +++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 85 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 6c8b6f6..5ff7054 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -148,6 +148,7 @@
struct net_device;
struct scatterlist;
struct pipe_inode_info;
+struct iov_iter;

#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
struct nf_conntrack {
@@ -2641,6 +2642,8 @@ int zerocopy_sg_from_iovec(struct sk_buff *skb, const struct iovec *frm,
int skb_copy_datagram_const_iovec(const struct sk_buff *from, int offset,
const struct iovec *to, int to_offset,
int size);
+int skb_copy_datagram_iter(const struct sk_buff *from, int offset,
+ struct iov_iter *to, int size);
void skb_free_datagram(struct sock *sk, struct sk_buff *skb);
void skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb);
int skb_kill_datagram(struct sock *sk, struct sk_buff *skb, unsigned int flags);
diff --git a/net/core/datagram.c b/net/core/datagram.c
index fdbc9a8..45a9d4d 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -49,6 +49,7 @@
#include <linux/spinlock.h>
#include <linux/slab.h>
#include <linux/pagemap.h>
+#include <linux/uio.h>

#include <net/protocol.h>
#include <linux/skbuff.h>
@@ -482,6 +483,87 @@ fault:
EXPORT_SYMBOL(skb_copy_datagram_const_iovec);

/**
+ * skb_copy_datagram_iter - Copy a datagram to an iovec iterator.
+ * @skb: buffer to copy
+ * @offset: offset in the buffer to start copying from
+ * @to: iovec iterator to copy to
+ * @len: amount of data to copy from buffer to iovec
+ */
+int skb_copy_datagram_iter(const struct sk_buff *skb, int offset,
+ struct iov_iter *to, int len)
+{
+ int start = skb_headlen(skb);
+ int i, copy = start - offset;
+ struct sk_buff *frag_iter;
+
+ trace_skb_copy_datagram_iovec(skb, len);
+
+ /* Copy header. */
+ if (copy > 0) {
+ if (copy > len)
+ copy = len;
+ if (copy_to_iter(skb->data + offset, copy, to))
+ goto fault;
+ if ((len -= copy) == 0)
+ return 0;
+ offset += copy;
+ }
+
+ /* Copy paged appendix. Hmm... why does this look so complicated? */
+ for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+ int end;
+ const skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
+
+ WARN_ON(start > offset + len);
+
+ end = start + skb_frag_size(frag);
+ if ((copy = end - offset) > 0) {
+ int err;
+ u8 *vaddr;
+ struct page *page = skb_frag_page(frag);
+
+ if (copy > len)
+ copy = len;
+ vaddr = kmap(page);
+ err = copy_to_iter(vaddr + frag->page_offset +
+ offset - start, copy, to);
+ kunmap(page);
+ if (err)
+ goto fault;
+ if (!(len -= copy))
+ return 0;
+ offset += copy;
+ }
+ start = end;
+ }
+
+ skb_walk_frags(skb, frag_iter) {
+ int end;
+
+ WARN_ON(start > offset + len);
+
+ end = start + frag_iter->len;
+ if ((copy = end - offset) > 0) {
+ if (copy > len)
+ copy = len;
+ if (skb_copy_datagram_iter(frag_iter, offset - start,
+ to, copy))
+ goto fault;
+ if ((len -= copy) == 0)
+ return 0;
+ offset += copy;
+ }
+ start = end;
+ }
+ if (!len)
+ return 0;
+
+fault:
+ return -EFAULT;
+}
+EXPORT_SYMBOL(skb_copy_datagram_iter);
+
+/**
* skb_copy_datagram_from_iovec - Copy a datagram from an iovec.
* @skb: buffer to copy
* @offset: offset in the buffer to start copying to


2014-11-04 14:32:09

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

On Tue, Nov 04, 2014 at 04:31:34PM +0800, Herbert Xu wrote:
> This patch adds skb_copy_datagram_iter, which is identical to
> skb_copy_datagram_iovec except that it operates on iov_iter
> instead of iovec.
>
> Eventually all users of skb_copy_datagram_iovec should switch
> over to iov_iter and then we can remove skb_copy_datagram_iovec.

Too noisy, IMO. How about skb_copy_datagram_msg() first? The fewer
places have to even think of iovec or iov_iter, the better...

2014-11-04 14:35:42

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

On Tue, Nov 04, 2014 at 02:32:00PM +0000, Al Viro wrote:
> On Tue, Nov 04, 2014 at 04:31:34PM +0800, Herbert Xu wrote:
> > This patch adds skb_copy_datagram_iter, which is identical to
> > skb_copy_datagram_iovec except that it operates on iov_iter
> > instead of iovec.
> >
> > Eventually all users of skb_copy_datagram_iovec should switch
> > over to iov_iter and then we can remove skb_copy_datagram_iovec.
>
> Too noisy, IMO. How about skb_copy_datagram_msg() first? The fewer
> places have to even think of iovec or iov_iter, the better...

PS: "too noisy" is about turning every callsite of skb_copy_datagram_iovec
into that of skb_copy_datagram_iter; the helper itself is just fine.

2014-11-04 14:43:11

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

On Tue, Nov 04, 2014 at 02:32:00PM +0000, Al Viro wrote:
>
> Too noisy, IMO. How about skb_copy_datagram_msg() first? The fewer
> places have to even think of iovec or iov_iter, the better...

We have places like tcp ucopy and tun that do not have msghdr.
So doing skb_copy_datagram_msg means that we'd have to create
a fake msghdr wrapper around them. The point is not everything
comes in via sendmsg/recvmsg.

What is your motivation for hiding iov/iov_iter? Do you plan to
change their API at some future point?

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2014-11-04 14:44:27

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

On Tue, Nov 04, 2014 at 02:35:36PM +0000, Al Viro wrote:
> On Tue, Nov 04, 2014 at 02:32:00PM +0000, Al Viro wrote:
>
> > Too noisy, IMO. How about skb_copy_datagram_msg() first? The fewer
> > places have to even think of iovec or iov_iter, the better...
>
> PS: "too noisy" is about turning every callsite of skb_copy_datagram_iovec
> into that of skb_copy_datagram_iter; the helper itself is just fine.

Hmm if that is your concern then I don't see how skb_copy_datagram_msg
changes things as you'd still have to convert every existing caller
of skb_copy_datagram_iovec. Colour me confused.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2014-11-04 14:52:28

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

On Tue, Nov 04, 2014 at 10:44:16PM +0800, Herbert Xu wrote:
> On Tue, Nov 04, 2014 at 02:35:36PM +0000, Al Viro wrote:
> > On Tue, Nov 04, 2014 at 02:32:00PM +0000, Al Viro wrote:
> >
> > > Too noisy, IMO. How about skb_copy_datagram_msg() first? The fewer
> > > places have to even think of iovec or iov_iter, the better...
> >
> > PS: "too noisy" is about turning every callsite of skb_copy_datagram_iovec
> > into that of skb_copy_datagram_iter; the helper itself is just fine.
>
> Hmm if that is your concern then I don't see how skb_copy_datagram_msg
> changes things as you'd still have to convert every existing caller
> of skb_copy_datagram_iovec. Colour me confused.

Fewer places having to even think of iovec/iov_iter...

2014-11-04 14:55:47

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

On Tue, Nov 04, 2014 at 02:52:22PM +0000, Al Viro wrote:
>
> > Hmm if that is your concern then I don't see how skb_copy_datagram_msg
> > changes things as you'd still have to convert every existing caller
> > of skb_copy_datagram_iovec. Colour me confused.
>
> Fewer places having to even think of iovec/iov_iter...

Well it's the difference between

skb_copy_datagram_iter(..., &kmsghdr->iov_iter, ...)

and

skb_copy_datagram_msg(..., kmsghdr, ...)

Heck we could even make skb_copy_datagram_msg an inline wrapper
around skb_copy_datagram_iter if you like.

Anyway, the point is that not everything comes with a kmsghdr.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2014-11-04 15:14:03

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

On Tue, Nov 04, 2014 at 10:42:58PM +0800, Herbert Xu wrote:
> On Tue, Nov 04, 2014 at 02:32:00PM +0000, Al Viro wrote:
> >
> > Too noisy, IMO. How about skb_copy_datagram_msg() first? The fewer
> > places have to even think of iovec or iov_iter, the better...
>
> We have places like tcp ucopy and tun that do not have msghdr.
> So doing skb_copy_datagram_msg means that we'd have to create
> a fake msghdr wrapper around them. The point is not everything
> comes in via sendmsg/recvmsg.

I'm certainly not suggesting it as a primitive.

> What is your motivation for hiding iov/iov_iter? Do you plan to
> change their API at some future point?

Think of it that way: every sendmsg/recvmsg path leading to memcpy_fromiovec
and its friends (including the open-coded ones) would need to be changed
at some point. Assuming we do not end up passing struct iov_iter * as
an extra argument through a fairly large part of net/* (and that would
be prohibitively hard and messy, not to mention the effects on the stack
footprint, etc.), the most obvious strategy is to have that thing passed
where msg_iov/msg_iovlen are - in struct msghdr. *IF* we go that way,
it makes a whole lot of sense to start with a bunch of cleanups that
will make sense on their own (most of callers of skb_copy_datagram_iovec
do look like skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied); might
as well give it an inlined helper) and will reduce the amount of places
where ->msg_iov is used. With such cleanups standing on their own and
being splittable from the rest of the queue. And leaving us with fewer
places in code that deal with ->msg_iov and need to be dealt with.

Please, look through my yesterday posting upthread. Outline of the
proposed strategy is there...

FWIW, this is from the beginning of April queue - rebased to current,
but very likely incomplete. Variant taking iov_iter would come later
and yes, it would replace the ..._iovec() one as primitive. With much
fewer places to worry about.

commit 8241142acab3451239029085286b717ca30aac33
Author: Al Viro <[email protected]>
Date: Sun Apr 6 18:41:28 2014 -0400

new helper: skb_copy_datagram_msg()

Absolute majority of skb_copy_datagram_iovec() callers (49 out of 56)
are passing it msg->msg_iov as iovec. Provide a trivial wrapper that
takes msg as argument instead of iovec.

Signed-off-by: Al Viro <[email protected]>

diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
index 1be8228..dcbd858 100644
--- a/drivers/isdn/mISDN/socket.c
+++ b/drivers/isdn/mISDN/socket.c
@@ -163,7 +163,7 @@ mISDN_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
memcpy(skb_push(skb, MISDN_HEADER_LEN), mISDN_HEAD_P(skb),
MISDN_HEADER_LEN);

- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);

mISDN_sock_cmsg(sk, msg, skb);

diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 6c9c16d..443cbbf 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -981,7 +981,7 @@ static int pppoe_recvmsg(struct kiocb *iocb, struct socket *sock,

if (skb) {
total_len = min_t(size_t, total_len, skb->len);
- error = skb_copy_datagram_iovec(skb, 0, m->msg_iov, total_len);
+ error = skb_copy_datagram_msg(skb, 0, m, total_len);
if (error == 0) {
consume_skb(skb);
return total_len;
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 6c8b6f6..379ab46 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2662,6 +2662,12 @@ unsigned int skb_gso_transport_seglen(const struct sk_buff *skb);
struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features);
struct sk_buff *skb_vlan_untag(struct sk_buff *skb);

+static inline int skb_copy_datagram_msg(const struct sk_buff *from, int offset,
+ struct msghdr *msg, int size)
+{
+ return skb_copy_datagram_iovec(from, offset, msg->msg_iov, size);
+}
+
struct skb_checksum_ops {
__wsum (*update)(const void *mem, int len, __wsum wsum);
__wsum (*combine)(__wsum csum, __wsum csum2, int offset, int len);
diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c
index c00897f..425942d 100644
--- a/net/appletalk/ddp.c
+++ b/net/appletalk/ddp.c
@@ -1758,7 +1758,7 @@ static int atalk_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr
copied = size;
msg->msg_flags |= MSG_TRUNC;
}
- err = skb_copy_datagram_iovec(skb, offset, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, offset, msg, copied);

if (!err && msg->msg_name) {
DECLARE_SOCKADDR(struct sockaddr_at *, sat, msg->msg_name);
diff --git a/net/atm/common.c b/net/atm/common.c
index 6a76515..9cd1cca 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -554,7 +554,7 @@ int vcc_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
msg->msg_flags |= MSG_TRUNC;
}

- error = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ error = skb_copy_datagram_msg(skb, 0, msg, copied);
if (error)
return error;
sock_recv_ts_and_drops(msg, sk, skb);
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index c35c3f4..f4f835e19 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1634,7 +1634,7 @@ static int ax25_recvmsg(struct kiocb *iocb, struct socket *sock,
msg->msg_flags |= MSG_TRUNC;
}

- skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ skb_copy_datagram_msg(skb, 0, msg, copied);

if (msg->msg_name) {
ax25_digi digi;
diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
index 339c74a..0a7cc56 100644
--- a/net/bluetooth/af_bluetooth.c
+++ b/net/bluetooth/af_bluetooth.c
@@ -237,7 +237,7 @@ int bt_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
}

skb_reset_transport_header(skb);
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);
if (err == 0) {
sock_recv_ts_and_drops(msg, sk, skb);

@@ -328,7 +328,7 @@ int bt_sock_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
}

chunk = min_t(unsigned int, skb->len, size);
- if (skb_copy_datagram_iovec(skb, 0, msg->msg_iov, chunk)) {
+ if (skb_copy_datagram_msg(skb, 0, msg, chunk)) {
skb_queue_head(&sk->sk_receive_queue, skb);
if (!copied)
copied = -EFAULT;
diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index 115f149..29e1ec7 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -878,7 +878,7 @@ static int hci_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
}

skb_reset_transport_header(skb);
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);

switch (hci_pi(sk)->channel) {
case HCI_CHANNEL_RAW:
diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index 43f750e..fbcd156 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -293,7 +293,7 @@ static int caif_seqpkt_recvmsg(struct kiocb *iocb, struct socket *sock,
copylen = len;
}

- ret = skb_copy_datagram_iovec(skb, 0, m->msg_iov, copylen);
+ ret = skb_copy_datagram_msg(skb, 0, m, copylen);
if (ret)
goto out_free;

diff --git a/net/core/sock.c b/net/core/sock.c
index 15e0c67..ac56dd0 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2457,7 +2457,7 @@ int sock_recv_errqueue(struct sock *sk, struct msghdr *msg, int len,
msg->msg_flags |= MSG_TRUNC;
copied = len;
}
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);
if (err)
goto out_free_skb;

diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 5ab6627..8e6ae94 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -896,7 +896,7 @@ verify_sock_status:
else if (len < skb->len)
msg->msg_flags |= MSG_TRUNC;

- if (skb_copy_datagram_iovec(skb, 0, msg->msg_iov, len)) {
+ if (skb_copy_datagram_msg(skb, 0, msg, len)) {
/* Exception. Bailout! */
len = -EFAULT;
break;
diff --git a/net/ieee802154/dgram.c b/net/ieee802154/dgram.c
index ef2ad8a..fc9193e 100644
--- a/net/ieee802154/dgram.c
+++ b/net/ieee802154/dgram.c
@@ -324,7 +324,7 @@ static int dgram_recvmsg(struct kiocb *iocb, struct sock *sk,
}

/* FIXME: skip headers if necessary ?! */
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);
if (err)
goto done;

diff --git a/net/ieee802154/raw.c b/net/ieee802154/raw.c
index 9d1f648..73a4d53 100644
--- a/net/ieee802154/raw.c
+++ b/net/ieee802154/raw.c
@@ -195,7 +195,7 @@ static int raw_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
copied = len;
}

- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);
if (err)
goto done;

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index c373a9a..21894df 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -424,7 +424,7 @@ int ip_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
msg->msg_flags |= MSG_TRUNC;
copied = len;
}
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);
if (err)
goto out_free_skb;

diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 57f7c98..736236c 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -875,7 +875,7 @@ int ping_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
}

/* Don't bother checking the checksum */
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);
if (err)
goto done;

diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 739db31..ee8fa4b 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -718,7 +718,7 @@ static int raw_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
copied = len;
}

- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);
if (err)
goto done;

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 39ec0c3..c239f47 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1377,7 +1377,7 @@ static int tcp_peek_sndq(struct sock *sk, struct msghdr *msg, int len)
/* XXX -- need to support SO_PEEK_OFF */

skb_queue_walk(&sk->sk_write_queue, skb) {
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, skb->len);
+ err = skb_copy_datagram_msg(skb, 0, msg, skb->len);
if (err)
break;

@@ -1833,8 +1833,7 @@ do_prequeue:
}

if (!(flags & MSG_TRUNC)) {
- err = skb_copy_datagram_iovec(skb, offset,
- msg->msg_iov, used);
+ err = skb_copy_datagram_msg(skb, offset, msg, used);
if (err) {
/* Exception. Bailout! */
if (!copied)
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index cd0db54..d7266f7 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1281,8 +1281,8 @@ try_again:
}

if (skb_csum_unnecessary(skb))
- err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr),
- msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, sizeof(struct udphdr),
+ msg, copied);
else {
err = skb_copy_and_csum_datagram_iovec(skb,
sizeof(struct udphdr),
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 2cdc383..5c6996e 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -351,7 +351,7 @@ int ipv6_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
msg->msg_flags |= MSG_TRUNC;
copied = len;
}
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);
if (err)
goto out_free_skb;

@@ -445,7 +445,7 @@ int ipv6_recv_rxpmtu(struct sock *sk, struct msghdr *msg, int len,
msg->msg_flags |= MSG_TRUNC;
copied = len;
}
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);
if (err)
goto out_free_skb;

diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 896af88..f642598 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -486,11 +486,11 @@ static int rawv6_recvmsg(struct kiocb *iocb, struct sock *sk,
}

if (skb_csum_unnecessary(skb)) {
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);
} else if (msg->msg_flags&MSG_TRUNC) {
if (__skb_checksum_complete(skb))
goto csum_copy_err;
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);
} else {
err = skb_copy_and_csum_datagram_iovec(skb, 0, msg->msg_iov);
if (err == -EINVAL)
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index f6ba535..5f68cd72 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -424,8 +424,8 @@ try_again:
}

if (skb_csum_unnecessary(skb))
- err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr),
- msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, sizeof(struct udphdr),
+ msg, copied);
else {
err = skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov);
if (err == -EINVAL)
diff --git a/net/ipx/af_ipx.c b/net/ipx/af_ipx.c
index 91729b8..8b7ca1c 100644
--- a/net/ipx/af_ipx.c
+++ b/net/ipx/af_ipx.c
@@ -1805,7 +1805,7 @@ static int ipx_recvmsg(struct kiocb *iocb, struct socket *sock,
msg->msg_flags |= MSG_TRUNC;
}

- rc = skb_copy_datagram_iovec(skb, sizeof(struct ipxhdr), msg->msg_iov,
+ rc = skb_copy_datagram_msg(skb, sizeof(struct ipxhdr), msg,
copied);
if (rc)
goto out_free;
diff --git a/net/irda/af_irda.c b/net/irda/af_irda.c
index 3f3a6cb..3f1a37b 100644
--- a/net/irda/af_irda.c
+++ b/net/irda/af_irda.c
@@ -1394,7 +1394,7 @@ static int irda_recvmsg_dgram(struct kiocb *iocb, struct socket *sock,
copied = size;
msg->msg_flags |= MSG_TRUNC;
}
- skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ skb_copy_datagram_msg(skb, 0, msg, copied);

skb_free_datagram(sk, skb);

diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
index a089b6b..057b564 100644
--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -1355,7 +1355,7 @@ static int iucv_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
sk->sk_shutdown = sk->sk_shutdown | RCV_SHUTDOWN;

cskb = skb;
- if (skb_copy_datagram_iovec(cskb, offset, msg->msg_iov, copied)) {
+ if (skb_copy_datagram_msg(cskb, offset, msg, copied)) {
if (!(flags & MSG_PEEK))
skb_queue_head(&sk->sk_receive_queue, skb);
return -EFAULT;
diff --git a/net/key/af_key.c b/net/key/af_key.c
index 1847ec4..e588309 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -3654,7 +3654,7 @@ static int pfkey_recvmsg(struct kiocb *kiocb,
}

skb_reset_transport_header(skb);
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);
if (err)
goto out_free;

diff --git a/net/l2tp/l2tp_ip.c b/net/l2tp/l2tp_ip.c
index 369a982..a6cc1fe 100644
--- a/net/l2tp/l2tp_ip.c
+++ b/net/l2tp/l2tp_ip.c
@@ -528,7 +528,7 @@ static int l2tp_ip_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *m
copied = len;
}

- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);
if (err)
goto done;

diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index 0edb263..2177b96 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -672,7 +672,7 @@ static int l2tp_ip6_recvmsg(struct kiocb *iocb, struct sock *sk,
copied = len;
}

- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);
if (err)
goto done;

diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index b704a93..c559bcd 100644
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -208,7 +208,7 @@ static int pppol2tp_recvmsg(struct kiocb *iocb, struct socket *sock,
else if (len < skb->len)
msg->msg_flags |= MSG_TRUNC;

- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, len);
+ err = skb_copy_datagram_msg(skb, 0, msg, len);
if (likely(err == 0))
err = len;

diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index bb9cbc1..8fa230b 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -819,8 +819,8 @@ static int llc_ui_recvmsg(struct kiocb *iocb, struct socket *sock,
used = len;

if (!(flags & MSG_TRUNC)) {
- int rc = skb_copy_datagram_iovec(skb, offset,
- msg->msg_iov, used);
+ int rc = skb_copy_datagram_msg(skb, offset,
+ msg, used);
if (rc) {
/* Exception. Bailout! */
if (!copied)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index f1de72d..580b794 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2401,7 +2401,7 @@ static int netlink_recvmsg(struct kiocb *kiocb, struct socket *sock,
}

skb_reset_transport_header(data_skb);
- err = skb_copy_datagram_iovec(data_skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(data_skb, 0, msg, copied);

if (msg->msg_name) {
DECLARE_SOCKADDR(struct sockaddr_nl *, addr, msg->msg_name);
diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index 1b06a1f..7e13f6a 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -1167,7 +1167,7 @@ static int nr_recvmsg(struct kiocb *iocb, struct socket *sock,
msg->msg_flags |= MSG_TRUNC;
}

- er = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ er = skb_copy_datagram_msg(skb, 0, msg, copied);
if (er < 0) {
skb_free_datagram(sk, skb);
release_sock(sk);
diff --git a/net/nfc/llcp_sock.c b/net/nfc/llcp_sock.c
index 51f077a..83bc785 100644
--- a/net/nfc/llcp_sock.c
+++ b/net/nfc/llcp_sock.c
@@ -832,7 +832,7 @@ static int llcp_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
copied = min_t(unsigned int, rlen, len);

cskb = skb;
- if (skb_copy_datagram_iovec(cskb, 0, msg->msg_iov, copied)) {
+ if (skb_copy_datagram_msg(cskb, 0, msg, copied)) {
if (!(flags & MSG_PEEK))
skb_queue_head(&sk->sk_receive_queue, skb);
return -EFAULT;
diff --git a/net/nfc/rawsock.c b/net/nfc/rawsock.c
index 11c3544..9d7d2b7 100644
--- a/net/nfc/rawsock.c
+++ b/net/nfc/rawsock.c
@@ -269,7 +269,7 @@ static int rawsock_recvmsg(struct kiocb *iocb, struct socket *sock,
copied = len;
}

- rc = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ rc = skb_copy_datagram_msg(skb, 0, msg, copied);

skb_free_datagram(sk, skb);

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 87d20f4..4cd13d8 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2953,7 +2953,7 @@ static int packet_recvmsg(struct kiocb *iocb, struct socket *sock,
msg->msg_flags |= MSG_TRUNC;
}

- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);
if (err)
goto out_free;

diff --git a/net/phonet/datagram.c b/net/phonet/datagram.c
index 290352c..0918bc2 100644
--- a/net/phonet/datagram.c
+++ b/net/phonet/datagram.c
@@ -150,7 +150,7 @@ static int pn_recvmsg(struct kiocb *iocb, struct sock *sk,
copylen = len;
}

- rval = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copylen);
+ rval = skb_copy_datagram_msg(skb, 0, msg, copylen);
if (rval) {
rval = -EFAULT;
goto out;
diff --git a/net/phonet/pep.c b/net/phonet/pep.c
index 70a547e..44b2123 100644
--- a/net/phonet/pep.c
+++ b/net/phonet/pep.c
@@ -1296,7 +1296,7 @@ copy:
else
len = skb->len;

- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, len);
+ err = skb_copy_datagram_msg(skb, 0, msg, len);
if (!err)
err = (flags & MSG_TRUNC) ? skb->len : len;

diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index a85c1a0..9b600c2 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -1249,7 +1249,7 @@ static int rose_recvmsg(struct kiocb *iocb, struct socket *sock,
msg->msg_flags |= MSG_TRUNC;
}

- skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ skb_copy_datagram_msg(skb, 0, msg, copied);

if (msg->msg_name) {
struct sockaddr_rose *srose;
diff --git a/net/rxrpc/ar-recvmsg.c b/net/rxrpc/ar-recvmsg.c
index e9aaa65..4575485 100644
--- a/net/rxrpc/ar-recvmsg.c
+++ b/net/rxrpc/ar-recvmsg.c
@@ -180,7 +180,7 @@ int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock,
if (copy > len - copied)
copy = len - copied;

- ret = skb_copy_datagram_iovec(skb, offset, msg->msg_iov, copy);
+ ret = skb_copy_datagram_msg(skb, offset, msg, copy);

if (ret < 0)
goto copy_error;
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 634a2ab..2120292 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -2095,7 +2095,7 @@ static int sctp_recvmsg(struct kiocb *iocb, struct sock *sk,
if (copied > len)
copied = len;

- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);

event = sctp_skb2event(skb);

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 51bddc2..f726eaa 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1372,8 +1372,8 @@ restart:
sz = buf_len;
m->msg_flags |= MSG_TRUNC;
}
- res = skb_copy_datagram_iovec(buf, msg_hdr_sz(msg),
- m->msg_iov, sz);
+ res = skb_copy_datagram_msg(buf, msg_hdr_sz(msg),
+ m, sz);
if (res)
goto exit;
res = sz;
@@ -1473,8 +1473,8 @@ restart:
needed = (buf_len - sz_copied);
sz_to_copy = (sz <= needed) ? sz : needed;

- res = skb_copy_datagram_iovec(buf, msg_hdr_sz(msg) + offset,
- m->msg_iov, sz_to_copy);
+ res = skb_copy_datagram_msg(buf, msg_hdr_sz(msg) + offset,
+ m, sz_to_copy);
if (res)
goto exit;

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index e968843..350771a 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1825,7 +1825,7 @@ static int unix_dgram_recvmsg(struct kiocb *iocb, struct socket *sock,
else if (size < skb->len - skip)
msg->msg_flags |= MSG_TRUNC;

- err = skb_copy_datagram_iovec(skb, skip, msg->msg_iov, size);
+ err = skb_copy_datagram_msg(skb, skip, msg, size);
if (err)
goto out_free;

@@ -2030,8 +2030,8 @@ again:
}

chunk = min_t(unsigned int, unix_skb_len(skb) - skip, size);
- if (skb_copy_datagram_iovec(skb, UNIXCB(skb).consumed + skip,
- msg->msg_iov, chunk)) {
+ if (skb_copy_datagram_msg(skb, UNIXCB(skb).consumed + skip,
+ msg, chunk)) {
if (copied == 0)
copied = -EFAULT;
break;
diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index 9bb63ff..a57ddef 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -1773,8 +1773,7 @@ static int vmci_transport_dgram_dequeue(struct kiocb *kiocb,
}

/* Place the datagram payload in the user's iovec. */
- err = skb_copy_datagram_iovec(skb, sizeof(*dg), msg->msg_iov,
- payload_len);
+ err = skb_copy_datagram_msg(skb, sizeof(*dg), msg, payload_len);
if (err)
goto out;

diff --git a/net/x25/af_x25.c b/net/x25/af_x25.c
index 5ad4418..59e785b 100644
--- a/net/x25/af_x25.c
+++ b/net/x25/af_x25.c
@@ -1335,7 +1335,7 @@ static int x25_recvmsg(struct kiocb *iocb, struct socket *sock,
/* Currently, each datagram always contains a complete record */
msg->msg_flags |= MSG_EOR;

- rc = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ rc = skb_copy_datagram_msg(skb, 0, msg, copied);
if (rc)
goto out_free_dgram;

2014-11-05 02:23:26

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

On Tue, Nov 04, 2014 at 03:13:55PM +0000, Al Viro wrote:
>
> Think of it that way: every sendmsg/recvmsg path leading to memcpy_fromiovec
> and its friends (including the open-coded ones) would need to be changed
> at some point. Assuming we do not end up passing struct iov_iter * as
> an extra argument through a fairly large part of net/* (and that would
> be prohibitively hard and messy, not to mention the effects on the stack
> footprint, etc.), the most obvious strategy is to have that thing passed
> where msg_iov/msg_iovlen are - in struct msghdr. *IF* we go that way,
> it makes a whole lot of sense to start with a bunch of cleanups that
> will make sense on their own (most of callers of skb_copy_datagram_iovec
> do look like skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied); might
> as well give it an inlined helper) and will reduce the amount of places
> where ->msg_iov is used. With such cleanups standing on their own and
> being splittable from the rest of the queue. And leaving us with fewer
> places in code that deal with ->msg_iov and need to be dealt with.

I think your solution is great. However, I don't see how my four
patches impede in anyway the work that you're doing. I presume
your first patch will make skb_copy_datagram_msg just a wrapper
around skb_copy_datagram_iovec.

Since I'm not removing skb_copy_datagram_iovec (it can't be removed
until all users are gone) you can still do that and when you're
ready to switch over to iov_iter you can just move the wrapper over
to skb_copy_datagram_iter. No?

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2014-11-05 03:27:34

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

From: Herbert Xu <[email protected]>
Date: Wed, 5 Nov 2014 10:22:51 +0800

> On Tue, Nov 04, 2014 at 03:13:55PM +0000, Al Viro wrote:
>>
>> Think of it that way: every sendmsg/recvmsg path leading to memcpy_fromiovec
>> and its friends (including the open-coded ones) would need to be changed
>> at some point. Assuming we do not end up passing struct iov_iter * as
>> an extra argument through a fairly large part of net/* (and that would
>> be prohibitively hard and messy, not to mention the effects on the stack
>> footprint, etc.), the most obvious strategy is to have that thing passed
>> where msg_iov/msg_iovlen are - in struct msghdr. *IF* we go that way,
>> it makes a whole lot of sense to start with a bunch of cleanups that
>> will make sense on their own (most of callers of skb_copy_datagram_iovec
>> do look like skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied); might
>> as well give it an inlined helper) and will reduce the amount of places
>> where ->msg_iov is used. With such cleanups standing on their own and
>> being splittable from the rest of the queue. And leaving us with fewer
>> places in code that deal with ->msg_iov and need to be dealt with.
>
> I think your solution is great. However, I don't see how my four
> patches impede in anyway the work that you're doing. I presume
> your first patch will make skb_copy_datagram_msg just a wrapper
> around skb_copy_datagram_iovec.
>
> Since I'm not removing skb_copy_datagram_iovec (it can't be removed
> until all users are gone) you can still do that and when you're
> ready to switch over to iov_iter you can just move the wrapper over
> to skb_copy_datagram_iter. No?

Agreed, I think both efforts can proceed in parallel.

Al, is this the helper you are talking about?

diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
index 1be8228..a08057d 100644
--- a/drivers/isdn/mISDN/socket.c
+++ b/drivers/isdn/mISDN/socket.c
@@ -163,7 +163,7 @@ mISDN_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
memcpy(skb_push(skb, MISDN_HEADER_LEN), mISDN_HEAD_P(skb),
MISDN_HEADER_LEN);

- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msghdr(skb, msg, copied);

mISDN_sock_cmsg(sk, msg, skb);

diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 6c9c16d..25234d9 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -981,7 +981,7 @@ static int pppoe_recvmsg(struct kiocb *iocb, struct socket *sock,

if (skb) {
total_len = min_t(size_t, total_len, skb->len);
- error = skb_copy_datagram_iovec(skb, 0, m->msg_iov, total_len);
+ error = skb_copy_datagram_msghdr(skb, m, total_len);
if (error == 0) {
consume_skb(skb);
return total_len;
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 5ad9675..19fe8cc 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -17,6 +17,7 @@
#include <linux/kernel.h>
#include <linux/kmemcheck.h>
#include <linux/compiler.h>
+#include <linux/socket.h>
#include <linux/time.h>
#include <linux/bug.h>
#include <linux/cache.h>
@@ -2637,6 +2638,11 @@ unsigned int datagram_poll(struct file *file, struct socket *sock,
struct poll_table_struct *wait);
int skb_copy_datagram_iovec(const struct sk_buff *from, int offset,
struct iovec *to, int size);
+static inline int skb_copy_datagram_msghdr(const struct sk_buff *from,
+ struct msghdr *msg, int size)
+{
+ return skb_copy_datagram_iovec(from, 0, msg->msg_iov, size);
+}
int skb_copy_and_csum_datagram_iovec(struct sk_buff *skb, int hlen,
struct iovec *iov);
int skb_copy_datagram_from_iovec(struct sk_buff *skb, int offset,
diff --git a/net/atm/common.c b/net/atm/common.c
index 6a76515..7e42bbe 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -554,7 +554,7 @@ int vcc_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
msg->msg_flags |= MSG_TRUNC;
}

- error = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ error = skb_copy_datagram_msghdr(skb, msg, copied);
if (error)
return error;
sock_recv_ts_and_drops(msg, sk, skb);
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index c35c3f4..a91075c 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1634,7 +1634,7 @@ static int ax25_recvmsg(struct kiocb *iocb, struct socket *sock,
msg->msg_flags |= MSG_TRUNC;
}

- skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ skb_copy_datagram_msghdr(skb, msg, copied);

if (msg->msg_name) {
ax25_digi digi;
diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
index 339c74a..a68dd75 100644
--- a/net/bluetooth/af_bluetooth.c
+++ b/net/bluetooth/af_bluetooth.c
@@ -237,7 +237,7 @@ int bt_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
}

skb_reset_transport_header(skb);
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msghdr(skb, msg, copied);
if (err == 0) {
sock_recv_ts_and_drops(msg, sk, skb);

@@ -328,7 +328,7 @@ int bt_sock_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
}

chunk = min_t(unsigned int, skb->len, size);
- if (skb_copy_datagram_iovec(skb, 0, msg->msg_iov, chunk)) {
+ if (skb_copy_datagram_msghdr(skb, msg, chunk)) {
skb_queue_head(&sk->sk_receive_queue, skb);
if (!copied)
copied = -EFAULT;
diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index 115f149..45d4fba 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -878,7 +878,7 @@ static int hci_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
}

skb_reset_transport_header(skb);
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msghdr(skb, msg, copied);

switch (hci_pi(sk)->channel) {
case HCI_CHANNEL_RAW:
diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index 43f750e..67e63b6 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -293,7 +293,7 @@ static int caif_seqpkt_recvmsg(struct kiocb *iocb, struct socket *sock,
copylen = len;
}

- ret = skb_copy_datagram_iovec(skb, 0, m->msg_iov, copylen);
+ ret = skb_copy_datagram_msghdr(skb, m, copylen);
if (ret)
goto out_free;

diff --git a/net/core/sock.c b/net/core/sock.c
index 15e0c67..220c791 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2457,7 +2457,7 @@ int sock_recv_errqueue(struct sock *sk, struct msghdr *msg, int len,
msg->msg_flags |= MSG_TRUNC;
copied = len;
}
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msghdr(skb, msg, copied);
if (err)
goto out_free_skb;

diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 5ab6627..7ccf58f 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -896,7 +896,7 @@ verify_sock_status:
else if (len < skb->len)
msg->msg_flags |= MSG_TRUNC;

- if (skb_copy_datagram_iovec(skb, 0, msg->msg_iov, len)) {
+ if (skb_copy_datagram_msghdr(skb, msg, len)) {
/* Exception. Bailout! */
len = -EFAULT;
break;
diff --git a/net/ieee802154/dgram.c b/net/ieee802154/dgram.c
index ef2ad8a..7017055 100644
--- a/net/ieee802154/dgram.c
+++ b/net/ieee802154/dgram.c
@@ -324,7 +324,7 @@ static int dgram_recvmsg(struct kiocb *iocb, struct sock *sk,
}

/* FIXME: skip headers if necessary ?! */
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msghdr(skb, msg, copied);
if (err)
goto done;

diff --git a/net/ieee802154/raw.c b/net/ieee802154/raw.c
index 9d1f648..5dd893a 100644
--- a/net/ieee802154/raw.c
+++ b/net/ieee802154/raw.c
@@ -195,7 +195,7 @@ static int raw_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
copied = len;
}

- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msghdr(skb, msg, copied);
if (err)
goto done;

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index c373a9a..d643882 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -424,7 +424,7 @@ int ip_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
msg->msg_flags |= MSG_TRUNC;
copied = len;
}
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msghdr(skb, msg, copied);
if (err)
goto out_free_skb;

diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 57f7c98..a6e0197 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -875,7 +875,7 @@ int ping_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
}

/* Don't bother checking the checksum */
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msghdr(skb, msg, copied);
if (err)
goto done;

diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 739db31..2f4bb30 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -718,7 +718,7 @@ static int raw_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
copied = len;
}

- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msghdr(skb, msg, copied);
if (err)
goto done;

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 39ec0c3..3638679 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1377,7 +1377,7 @@ static int tcp_peek_sndq(struct sock *sk, struct msghdr *msg, int len)
/* XXX -- need to support SO_PEEK_OFF */

skb_queue_walk(&sk->sk_write_queue, skb) {
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, skb->len);
+ err = skb_copy_datagram_msghdr(skb, msg, skb->len);
if (err)
break;

diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 2cdc383..4bd84fd 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -351,7 +351,7 @@ int ipv6_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
msg->msg_flags |= MSG_TRUNC;
copied = len;
}
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msghdr(skb, msg, copied);
if (err)
goto out_free_skb;

@@ -445,7 +445,7 @@ int ipv6_recv_rxpmtu(struct sock *sk, struct msghdr *msg, int len,
msg->msg_flags |= MSG_TRUNC;
copied = len;
}
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msghdr(skb, msg, copied);
if (err)
goto out_free_skb;

diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 075a0fb..5d37aa1 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -486,11 +486,11 @@ static int rawv6_recvmsg(struct kiocb *iocb, struct sock *sk,
}

if (skb_csum_unnecessary(skb)) {
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msghdr(skb, msg, copied);
} else if (msg->msg_flags&MSG_TRUNC) {
if (__skb_checksum_complete(skb))
goto csum_copy_err;
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msghdr(skb, msg, copied);
} else {
err = skb_copy_and_csum_datagram_iovec(skb, 0, msg->msg_iov);
if (err == -EINVAL)
diff --git a/net/irda/af_irda.c b/net/irda/af_irda.c
index 92fafd4..54db6dc 100644
--- a/net/irda/af_irda.c
+++ b/net/irda/af_irda.c
@@ -1396,7 +1396,7 @@ static int irda_recvmsg_dgram(struct kiocb *iocb, struct socket *sock,
copied = size;
msg->msg_flags |= MSG_TRUNC;
}
- skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ skb_copy_datagram_msghdr(skb, msg, copied);

skb_free_datagram(sk, skb);

diff --git a/net/key/af_key.c b/net/key/af_key.c
index 1847ec4..f09a848 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -3654,7 +3654,7 @@ static int pfkey_recvmsg(struct kiocb *kiocb,
}

skb_reset_transport_header(skb);
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msghdr(skb, msg, copied);
if (err)
goto out_free;

diff --git a/net/l2tp/l2tp_ip.c b/net/l2tp/l2tp_ip.c
index 369a982..2913198 100644
--- a/net/l2tp/l2tp_ip.c
+++ b/net/l2tp/l2tp_ip.c
@@ -528,7 +528,7 @@ static int l2tp_ip_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *m
copied = len;
}

- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msghdr(skb, msg, copied);
if (err)
goto done;

diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index 0edb263..8613881 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -672,7 +672,7 @@ static int l2tp_ip6_recvmsg(struct kiocb *iocb, struct sock *sk,
copied = len;
}

- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msghdr(skb, msg, copied);
if (err)
goto done;

diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index b704a93..5f3c0f5 100644
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -208,7 +208,7 @@ static int pppol2tp_recvmsg(struct kiocb *iocb, struct socket *sock,
else if (len < skb->len)
msg->msg_flags |= MSG_TRUNC;

- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, len);
+ err = skb_copy_datagram_msghdr(skb, msg, len);
if (likely(err == 0))
err = len;

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index f1de72d..123cffd 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2401,7 +2401,7 @@ static int netlink_recvmsg(struct kiocb *kiocb, struct socket *sock,
}

skb_reset_transport_header(data_skb);
- err = skb_copy_datagram_iovec(data_skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msghdr(data_skb, msg, copied);

if (msg->msg_name) {
DECLARE_SOCKADDR(struct sockaddr_nl *, addr, msg->msg_name);
diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index 1b06a1f..6bc3556 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -1167,7 +1167,7 @@ static int nr_recvmsg(struct kiocb *iocb, struct socket *sock,
msg->msg_flags |= MSG_TRUNC;
}

- er = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ er = skb_copy_datagram_msghdr(skb, msg, copied);
if (er < 0) {
skb_free_datagram(sk, skb);
release_sock(sk);
diff --git a/net/nfc/llcp_sock.c b/net/nfc/llcp_sock.c
index 51f077a..e962c07 100644
--- a/net/nfc/llcp_sock.c
+++ b/net/nfc/llcp_sock.c
@@ -832,7 +832,7 @@ static int llcp_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
copied = min_t(unsigned int, rlen, len);

cskb = skb;
- if (skb_copy_datagram_iovec(cskb, 0, msg->msg_iov, copied)) {
+ if (skb_copy_datagram_msghdr(cskb, msg, copied)) {
if (!(flags & MSG_PEEK))
skb_queue_head(&sk->sk_receive_queue, skb);
return -EFAULT;
diff --git a/net/nfc/rawsock.c b/net/nfc/rawsock.c
index 11c3544..4467b2c 100644
--- a/net/nfc/rawsock.c
+++ b/net/nfc/rawsock.c
@@ -269,7 +269,7 @@ static int rawsock_recvmsg(struct kiocb *iocb, struct socket *sock,
copied = len;
}

- rc = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ rc = skb_copy_datagram_msghdr(skb, msg, copied);

skb_free_datagram(sk, skb);

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 87d20f4..390b776 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2953,7 +2953,7 @@ static int packet_recvmsg(struct kiocb *iocb, struct socket *sock,
msg->msg_flags |= MSG_TRUNC;
}

- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msghdr(skb, msg, copied);
if (err)
goto out_free;

diff --git a/net/phonet/datagram.c b/net/phonet/datagram.c
index 290352c..b4835bc 100644
--- a/net/phonet/datagram.c
+++ b/net/phonet/datagram.c
@@ -150,7 +150,7 @@ static int pn_recvmsg(struct kiocb *iocb, struct sock *sk,
copylen = len;
}

- rval = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copylen);
+ rval = skb_copy_datagram_msghdr(skb, msg, copylen);
if (rval) {
rval = -EFAULT;
goto out;
diff --git a/net/phonet/pep.c b/net/phonet/pep.c
index 70a547e..f544658 100644
--- a/net/phonet/pep.c
+++ b/net/phonet/pep.c
@@ -1296,7 +1296,7 @@ copy:
else
len = skb->len;

- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, len);
+ err = skb_copy_datagram_msghdr(skb, msg, len);
if (!err)
err = (flags & MSG_TRUNC) ? skb->len : len;

diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index a85c1a0..b660504 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -1249,7 +1249,7 @@ static int rose_recvmsg(struct kiocb *iocb, struct socket *sock,
msg->msg_flags |= MSG_TRUNC;
}

- skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ skb_copy_datagram_msghdr(skb, msg, copied);

if (msg->msg_name) {
struct sockaddr_rose *srose;
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 634a2ab..0fca34c 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -2095,7 +2095,7 @@ static int sctp_recvmsg(struct kiocb *iocb, struct sock *sk,
if (copied > len)
copied = len;

- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msghdr(skb, msg, copied);

event = sctp_skb2event(skb);

diff --git a/net/x25/af_x25.c b/net/x25/af_x25.c
index 5ad4418..fad0702 100644
--- a/net/x25/af_x25.c
+++ b/net/x25/af_x25.c
@@ -1335,7 +1335,7 @@ static int x25_recvmsg(struct kiocb *iocb, struct socket *sock,
/* Currently, each datagram always contains a complete record */
msg->msg_flags |= MSG_EOR;

- rc = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ rc = skb_copy_datagram_msghdr(skb, msg, copied);
if (rc)
goto out_free_dgram;

2014-11-05 03:55:43

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

On Tue, Nov 04, 2014 at 10:27:27PM -0500, David Miller wrote:

> Al, is this the helper you are talking about?

Mostly, except that I kept it 4-argument (and used skb_copy_datagram_msg()
for name). Matter of taste - the ones you've missed because of that are

net/appletalk/ddp.c:1761: err = skb_copy_datagram_iovec(skb, offset, msg->msg_iov, copied);
net/ipv4/tcp.c:1836: err = skb_copy_datagram_iovec(skb, offset,
net/ipv4/udp.c:1284: err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr),
net/ipv6/udp.c:427: err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr),
net/ipx/af_ipx.c:1808: rc = skb_copy_datagram_iovec(skb, sizeof(struct ipxhdr), msg->msg_iov,
net/iucv/af_iucv.c:1358: if (skb_copy_datagram_iovec(cskb, offset, msg->msg_iov, copied)) {
net/llc/af_llc.c:822: int rc = skb_copy_datagram_iovec(skb, offset,
net/rxrpc/ar-recvmsg.c:183: ret = skb_copy_datagram_iovec(skb, offset, msg->msg_iov, copy);
net/tipc/socket.c:1375: res = skb_copy_datagram_iovec(buf, msg_hdr_sz(msg),
net/tipc/socket.c:1476: res = skb_copy_datagram_iovec(buf, msg_hdr_sz(msg) + offset,
net/unix/af_unix.c:1828: err = skb_copy_datagram_iovec(skb, skip, msg->msg_iov, size);
net/unix/af_unix.c:2033: if (skb_copy_datagram_iovec(skb, UNIXCB(skb).consumed + skip,
net/vmw_vsock/vmci_transport.c:1776: err = skb_copy_datagram_iovec(skb, sizeof(*dg), msg->msg_iov,

and back then I decided that 13 more converted instances might be worth keeping
it in 4-argument form...

What do you think of the trick with user_msghdr, BTW?

2014-11-05 04:12:42

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

On Wed, Nov 05, 2014 at 03:55:36AM +0000, Al Viro wrote:
> What do you think of the trick with user_msghdr, BTW?

PS: where do you prefer the branches to be based off?
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net#master, mainline,
something else? I can certainly do that as patches over email, the
question is what's best used as base... FWIW, the analysis I've posted
was in 3.18-rc3 and it looks like it ought to be valid in net#master
as well.

2014-11-05 20:24:15

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter


Herbert, please provide a cover letter for this series, and the most recent
version of patch #2 gets various rejects when I try to apply it to net-next.

Thanks.

2014-11-05 20:50:59

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

From: Al Viro <[email protected]>
Date: Wed, 5 Nov 2014 03:55:36 +0000

> On Tue, Nov 04, 2014 at 10:27:27PM -0500, David Miller wrote:
>
>> Al, is this the helper you are talking about?
>
> Mostly, except that I kept it 4-argument (and used skb_copy_datagram_msg()
> for name). Matter of taste - the ones you've missed because of that are
...
> and back then I decided that 13 more converted instances might be worth keeping
> it in 4-argument form...

Ok, fixed up patch below:

> What do you think of the trick with user_msghdr, BTW?

I think we can get away with it if, as you say, we don't export a 'msghdr'
from any uapi headers.

And indeed, double checking, it's purely a linux/socket.h thing.

If this patch is OK, mind if I toss it into net-next Al?

diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
index 1be8228..dcbd858 100644
--- a/drivers/isdn/mISDN/socket.c
+++ b/drivers/isdn/mISDN/socket.c
@@ -163,7 +163,7 @@ mISDN_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
memcpy(skb_push(skb, MISDN_HEADER_LEN), mISDN_HEAD_P(skb),
MISDN_HEADER_LEN);

- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);

mISDN_sock_cmsg(sk, msg, skb);

diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 6c9c16d..443cbbf 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -981,7 +981,7 @@ static int pppoe_recvmsg(struct kiocb *iocb, struct socket *sock,

if (skb) {
total_len = min_t(size_t, total_len, skb->len);
- error = skb_copy_datagram_iovec(skb, 0, m->msg_iov, total_len);
+ error = skb_copy_datagram_msg(skb, 0, m, total_len);
if (error == 0) {
consume_skb(skb);
return total_len;
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 5ad9675..31cdb7e 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -21,6 +21,7 @@
#include <linux/bug.h>
#include <linux/cache.h>
#include <linux/rbtree.h>
+#include <linux/socket.h>

#include <linux/atomic.h>
#include <asm/types.h>
@@ -2637,6 +2638,11 @@ unsigned int datagram_poll(struct file *file, struct socket *sock,
struct poll_table_struct *wait);
int skb_copy_datagram_iovec(const struct sk_buff *from, int offset,
struct iovec *to, int size);
+static inline int skb_copy_datagram_msg(const struct sk_buff *from, int offset,
+ struct msghdr *msg, int size)
+{
+ return skb_copy_datagram_iovec(from, offset, msg->msg_iov, size);
+}
int skb_copy_and_csum_datagram_iovec(struct sk_buff *skb, int hlen,
struct iovec *iov);
int skb_copy_datagram_from_iovec(struct sk_buff *skb, int offset,
diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c
index c00897f..425942d 100644
--- a/net/appletalk/ddp.c
+++ b/net/appletalk/ddp.c
@@ -1758,7 +1758,7 @@ static int atalk_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr
copied = size;
msg->msg_flags |= MSG_TRUNC;
}
- err = skb_copy_datagram_iovec(skb, offset, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, offset, msg, copied);

if (!err && msg->msg_name) {
DECLARE_SOCKADDR(struct sockaddr_at *, sat, msg->msg_name);
diff --git a/net/atm/common.c b/net/atm/common.c
index 6a76515..9cd1cca 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -554,7 +554,7 @@ int vcc_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
msg->msg_flags |= MSG_TRUNC;
}

- error = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ error = skb_copy_datagram_msg(skb, 0, msg, copied);
if (error)
return error;
sock_recv_ts_and_drops(msg, sk, skb);
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index c35c3f4..f4f835e 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1634,7 +1634,7 @@ static int ax25_recvmsg(struct kiocb *iocb, struct socket *sock,
msg->msg_flags |= MSG_TRUNC;
}

- skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ skb_copy_datagram_msg(skb, 0, msg, copied);

if (msg->msg_name) {
ax25_digi digi;
diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
index 339c74a..0a7cc56 100644
--- a/net/bluetooth/af_bluetooth.c
+++ b/net/bluetooth/af_bluetooth.c
@@ -237,7 +237,7 @@ int bt_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
}

skb_reset_transport_header(skb);
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);
if (err == 0) {
sock_recv_ts_and_drops(msg, sk, skb);

@@ -328,7 +328,7 @@ int bt_sock_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
}

chunk = min_t(unsigned int, skb->len, size);
- if (skb_copy_datagram_iovec(skb, 0, msg->msg_iov, chunk)) {
+ if (skb_copy_datagram_msg(skb, 0, msg, chunk)) {
skb_queue_head(&sk->sk_receive_queue, skb);
if (!copied)
copied = -EFAULT;
diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index 115f149..29e1ec7 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -878,7 +878,7 @@ static int hci_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
}

skb_reset_transport_header(skb);
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);

switch (hci_pi(sk)->channel) {
case HCI_CHANNEL_RAW:
diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index 43f750e..fbcd156 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -293,7 +293,7 @@ static int caif_seqpkt_recvmsg(struct kiocb *iocb, struct socket *sock,
copylen = len;
}

- ret = skb_copy_datagram_iovec(skb, 0, m->msg_iov, copylen);
+ ret = skb_copy_datagram_msg(skb, 0, m, copylen);
if (ret)
goto out_free;

diff --git a/net/core/sock.c b/net/core/sock.c
index 15e0c67..ac56dd0 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2457,7 +2457,7 @@ int sock_recv_errqueue(struct sock *sk, struct msghdr *msg, int len,
msg->msg_flags |= MSG_TRUNC;
copied = len;
}
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);
if (err)
goto out_free_skb;

diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 5ab6627..8e6ae94 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -896,7 +896,7 @@ verify_sock_status:
else if (len < skb->len)
msg->msg_flags |= MSG_TRUNC;

- if (skb_copy_datagram_iovec(skb, 0, msg->msg_iov, len)) {
+ if (skb_copy_datagram_msg(skb, 0, msg, len)) {
/* Exception. Bailout! */
len = -EFAULT;
break;
diff --git a/net/ieee802154/dgram.c b/net/ieee802154/dgram.c
index ef2ad8a..fc9193e 100644
--- a/net/ieee802154/dgram.c
+++ b/net/ieee802154/dgram.c
@@ -324,7 +324,7 @@ static int dgram_recvmsg(struct kiocb *iocb, struct sock *sk,
}

/* FIXME: skip headers if necessary ?! */
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);
if (err)
goto done;

diff --git a/net/ieee802154/raw.c b/net/ieee802154/raw.c
index 9d1f648..73a4d53 100644
--- a/net/ieee802154/raw.c
+++ b/net/ieee802154/raw.c
@@ -195,7 +195,7 @@ static int raw_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
copied = len;
}

- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);
if (err)
goto done;

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index c373a9a..21894df 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -424,7 +424,7 @@ int ip_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
msg->msg_flags |= MSG_TRUNC;
copied = len;
}
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);
if (err)
goto out_free_skb;

diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 57f7c98..736236c 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -875,7 +875,7 @@ int ping_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
}

/* Don't bother checking the checksum */
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);
if (err)
goto done;

diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 739db31..ee8fa4b 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -718,7 +718,7 @@ static int raw_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
copied = len;
}

- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);
if (err)
goto done;

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 39ec0c3..c239f47 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1377,7 +1377,7 @@ static int tcp_peek_sndq(struct sock *sk, struct msghdr *msg, int len)
/* XXX -- need to support SO_PEEK_OFF */

skb_queue_walk(&sk->sk_write_queue, skb) {
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, skb->len);
+ err = skb_copy_datagram_msg(skb, 0, msg, skb->len);
if (err)
break;

@@ -1833,8 +1833,7 @@ do_prequeue:
}

if (!(flags & MSG_TRUNC)) {
- err = skb_copy_datagram_iovec(skb, offset,
- msg->msg_iov, used);
+ err = skb_copy_datagram_msg(skb, offset, msg, used);
if (err) {
/* Exception. Bailout! */
if (!copied)
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 3f001db..df19027 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1281,8 +1281,8 @@ try_again:
}

if (skb_csum_unnecessary(skb))
- err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr),
- msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, sizeof(struct udphdr),
+ msg, copied);
else {
err = skb_copy_and_csum_datagram_iovec(skb,
sizeof(struct udphdr),
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 2cdc383..5c6996e 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -351,7 +351,7 @@ int ipv6_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
msg->msg_flags |= MSG_TRUNC;
copied = len;
}
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);
if (err)
goto out_free_skb;

@@ -445,7 +445,7 @@ int ipv6_recv_rxpmtu(struct sock *sk, struct msghdr *msg, int len,
msg->msg_flags |= MSG_TRUNC;
copied = len;
}
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);
if (err)
goto out_free_skb;

diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 075a0fb..0cbcf98 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -486,11 +486,11 @@ static int rawv6_recvmsg(struct kiocb *iocb, struct sock *sk,
}

if (skb_csum_unnecessary(skb)) {
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);
} else if (msg->msg_flags&MSG_TRUNC) {
if (__skb_checksum_complete(skb))
goto csum_copy_err;
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);
} else {
err = skb_copy_and_csum_datagram_iovec(skb, 0, msg->msg_iov);
if (err == -EINVAL)
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index f6ba535..9b68092 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -424,8 +424,8 @@ try_again:
}

if (skb_csum_unnecessary(skb))
- err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr),
- msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, sizeof(struct udphdr),
+ msg, copied);
else {
err = skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov);
if (err == -EINVAL)
diff --git a/net/ipx/af_ipx.c b/net/ipx/af_ipx.c
index 313ef46..a0c7536 100644
--- a/net/ipx/af_ipx.c
+++ b/net/ipx/af_ipx.c
@@ -1805,8 +1805,7 @@ static int ipx_recvmsg(struct kiocb *iocb, struct socket *sock,
msg->msg_flags |= MSG_TRUNC;
}

- rc = skb_copy_datagram_iovec(skb, sizeof(struct ipxhdr), msg->msg_iov,
- copied);
+ rc = skb_copy_datagram_msg(skb, sizeof(struct ipxhdr), msg, copied);
if (rc)
goto out_free;
if (skb->tstamp.tv64)
diff --git a/net/irda/af_irda.c b/net/irda/af_irda.c
index 92fafd4..980bc26 100644
--- a/net/irda/af_irda.c
+++ b/net/irda/af_irda.c
@@ -1396,7 +1396,7 @@ static int irda_recvmsg_dgram(struct kiocb *iocb, struct socket *sock,
copied = size;
msg->msg_flags |= MSG_TRUNC;
}
- skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ skb_copy_datagram_msg(skb, 0, msg, copied);

skb_free_datagram(sk, skb);

diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
index a089b6b..057b564 100644
--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -1355,7 +1355,7 @@ static int iucv_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
sk->sk_shutdown = sk->sk_shutdown | RCV_SHUTDOWN;

cskb = skb;
- if (skb_copy_datagram_iovec(cskb, offset, msg->msg_iov, copied)) {
+ if (skb_copy_datagram_msg(cskb, offset, msg, copied)) {
if (!(flags & MSG_PEEK))
skb_queue_head(&sk->sk_receive_queue, skb);
return -EFAULT;
diff --git a/net/key/af_key.c b/net/key/af_key.c
index 1847ec4..e588309 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -3654,7 +3654,7 @@ static int pfkey_recvmsg(struct kiocb *kiocb,
}

skb_reset_transport_header(skb);
- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);
if (err)
goto out_free;

diff --git a/net/l2tp/l2tp_ip.c b/net/l2tp/l2tp_ip.c
index 369a982..a6cc1fe 100644
--- a/net/l2tp/l2tp_ip.c
+++ b/net/l2tp/l2tp_ip.c
@@ -528,7 +528,7 @@ static int l2tp_ip_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *m
copied = len;
}

- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);
if (err)
goto done;

diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index 0edb263..2177b96 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -672,7 +672,7 @@ static int l2tp_ip6_recvmsg(struct kiocb *iocb, struct sock *sk,
copied = len;
}

- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);
if (err)
goto done;

diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index b704a93..c559bcd 100644
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -208,7 +208,7 @@ static int pppol2tp_recvmsg(struct kiocb *iocb, struct socket *sock,
else if (len < skb->len)
msg->msg_flags |= MSG_TRUNC;

- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, len);
+ err = skb_copy_datagram_msg(skb, 0, msg, len);
if (likely(err == 0))
err = len;

diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index bb9cbc1..af66266 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -819,8 +819,7 @@ static int llc_ui_recvmsg(struct kiocb *iocb, struct socket *sock,
used = len;

if (!(flags & MSG_TRUNC)) {
- int rc = skb_copy_datagram_iovec(skb, offset,
- msg->msg_iov, used);
+ int rc = skb_copy_datagram_msg(skb, offset, msg, used);
if (rc) {
/* Exception. Bailout! */
if (!copied)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index f1de72d..580b794 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2401,7 +2401,7 @@ static int netlink_recvmsg(struct kiocb *kiocb, struct socket *sock,
}

skb_reset_transport_header(data_skb);
- err = skb_copy_datagram_iovec(data_skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(data_skb, 0, msg, copied);

if (msg->msg_name) {
DECLARE_SOCKADDR(struct sockaddr_nl *, addr, msg->msg_name);
diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index 1b06a1f..7e13f6a 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -1167,7 +1167,7 @@ static int nr_recvmsg(struct kiocb *iocb, struct socket *sock,
msg->msg_flags |= MSG_TRUNC;
}

- er = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ er = skb_copy_datagram_msg(skb, 0, msg, copied);
if (er < 0) {
skb_free_datagram(sk, skb);
release_sock(sk);
diff --git a/net/nfc/llcp_sock.c b/net/nfc/llcp_sock.c
index 51f077a..83bc785 100644
--- a/net/nfc/llcp_sock.c
+++ b/net/nfc/llcp_sock.c
@@ -832,7 +832,7 @@ static int llcp_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
copied = min_t(unsigned int, rlen, len);

cskb = skb;
- if (skb_copy_datagram_iovec(cskb, 0, msg->msg_iov, copied)) {
+ if (skb_copy_datagram_msg(cskb, 0, msg, copied)) {
if (!(flags & MSG_PEEK))
skb_queue_head(&sk->sk_receive_queue, skb);
return -EFAULT;
diff --git a/net/nfc/rawsock.c b/net/nfc/rawsock.c
index 11c3544..9d7d2b7 100644
--- a/net/nfc/rawsock.c
+++ b/net/nfc/rawsock.c
@@ -269,7 +269,7 @@ static int rawsock_recvmsg(struct kiocb *iocb, struct socket *sock,
copied = len;
}

- rc = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ rc = skb_copy_datagram_msg(skb, 0, msg, copied);

skb_free_datagram(sk, skb);

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 87d20f4..4cd13d8 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2953,7 +2953,7 @@ static int packet_recvmsg(struct kiocb *iocb, struct socket *sock,
msg->msg_flags |= MSG_TRUNC;
}

- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);
if (err)
goto out_free;

diff --git a/net/phonet/datagram.c b/net/phonet/datagram.c
index 290352c..0918bc2 100644
--- a/net/phonet/datagram.c
+++ b/net/phonet/datagram.c
@@ -150,7 +150,7 @@ static int pn_recvmsg(struct kiocb *iocb, struct sock *sk,
copylen = len;
}

- rval = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copylen);
+ rval = skb_copy_datagram_msg(skb, 0, msg, copylen);
if (rval) {
rval = -EFAULT;
goto out;
diff --git a/net/phonet/pep.c b/net/phonet/pep.c
index 70a547e..44b2123 100644
--- a/net/phonet/pep.c
+++ b/net/phonet/pep.c
@@ -1296,7 +1296,7 @@ copy:
else
len = skb->len;

- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, len);
+ err = skb_copy_datagram_msg(skb, 0, msg, len);
if (!err)
err = (flags & MSG_TRUNC) ? skb->len : len;

diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index a85c1a0..9b600c2 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -1249,7 +1249,7 @@ static int rose_recvmsg(struct kiocb *iocb, struct socket *sock,
msg->msg_flags |= MSG_TRUNC;
}

- skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ skb_copy_datagram_msg(skb, 0, msg, copied);

if (msg->msg_name) {
struct sockaddr_rose *srose;
diff --git a/net/rxrpc/ar-recvmsg.c b/net/rxrpc/ar-recvmsg.c
index e9aaa65..4575485 100644
--- a/net/rxrpc/ar-recvmsg.c
+++ b/net/rxrpc/ar-recvmsg.c
@@ -180,7 +180,7 @@ int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock,
if (copy > len - copied)
copy = len - copied;

- ret = skb_copy_datagram_iovec(skb, offset, msg->msg_iov, copy);
+ ret = skb_copy_datagram_msg(skb, offset, msg, copy);

if (ret < 0)
goto copy_error;
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 634a2ab..2120292 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -2095,7 +2095,7 @@ static int sctp_recvmsg(struct kiocb *iocb, struct sock *sk,
if (copied > len)
copied = len;

- err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ err = skb_copy_datagram_msg(skb, 0, msg, copied);

event = sctp_skb2event(skb);

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index ad8a1a1..591bbfa 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1372,8 +1372,7 @@ restart:
sz = buf_len;
m->msg_flags |= MSG_TRUNC;
}
- res = skb_copy_datagram_iovec(buf, msg_hdr_sz(msg),
- m->msg_iov, sz);
+ res = skb_copy_datagram_msg(buf, msg_hdr_sz(msg), m, sz);
if (res)
goto exit;
res = sz;
@@ -1473,8 +1472,8 @@ restart:
needed = (buf_len - sz_copied);
sz_to_copy = (sz <= needed) ? sz : needed;

- res = skb_copy_datagram_iovec(buf, msg_hdr_sz(msg) + offset,
- m->msg_iov, sz_to_copy);
+ res = skb_copy_datagram_msg(buf, msg_hdr_sz(msg) + offset,
+ m, sz_to_copy);
if (res)
goto exit;

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index e968843..5eee625 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1825,7 +1825,7 @@ static int unix_dgram_recvmsg(struct kiocb *iocb, struct socket *sock,
else if (size < skb->len - skip)
msg->msg_flags |= MSG_TRUNC;

- err = skb_copy_datagram_iovec(skb, skip, msg->msg_iov, size);
+ err = skb_copy_datagram_msg(skb, skip, msg, size);
if (err)
goto out_free;

@@ -2030,8 +2030,8 @@ again:
}

chunk = min_t(unsigned int, unix_skb_len(skb) - skip, size);
- if (skb_copy_datagram_iovec(skb, UNIXCB(skb).consumed + skip,
- msg->msg_iov, chunk)) {
+ if (skb_copy_datagram_msg(skb, UNIXCB(skb).consumed + skip,
+ msg, chunk)) {
if (copied == 0)
copied = -EFAULT;
break;
diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index 9bb63ff..a57ddef 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -1773,8 +1773,7 @@ static int vmci_transport_dgram_dequeue(struct kiocb *kiocb,
}

/* Place the datagram payload in the user's iovec. */
- err = skb_copy_datagram_iovec(skb, sizeof(*dg), msg->msg_iov,
- payload_len);
+ err = skb_copy_datagram_msg(skb, sizeof(*dg), msg, payload_len);
if (err)
goto out;

diff --git a/net/x25/af_x25.c b/net/x25/af_x25.c
index 5ad4418..59e785b 100644
--- a/net/x25/af_x25.c
+++ b/net/x25/af_x25.c
@@ -1335,7 +1335,7 @@ static int x25_recvmsg(struct kiocb *iocb, struct socket *sock,
/* Currently, each datagram always contains a complete record */
msg->msg_flags |= MSG_EOR;

- rc = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+ rc = skb_copy_datagram_msg(skb, 0, msg, copied);
if (rc)
goto out_free_dgram;

2014-11-05 20:51:43

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

From: Al Viro <[email protected]>
Date: Wed, 5 Nov 2014 04:12:32 +0000

> On Wed, Nov 05, 2014 at 03:55:36AM +0000, Al Viro wrote:
>> What do you think of the trick with user_msghdr, BTW?
>
> PS: where do you prefer the branches to be based off?
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net#master, mainline,
> something else? I can certainly do that as patches over email, the
> question is what's best used as base... FWIW, the analysis I've posted
> was in 3.18-rc3 and it looks like it ought to be valid in net#master
> as well.

Let's work against net-next, ie:

git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next#master

I can integrate, your, mine, and Herbert's changes all into the same
place.

Thanks.

2014-11-05 21:07:52

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

On Wed, Nov 05, 2014 at 03:50:54PM -0500, David Miller wrote:
> I think we can get away with it if, as you say, we don't export a 'msghdr'
> from any uapi headers.

OK. I'm about halfway through the review of struct msghdr instances in the
current tree right now, will post user_msghdr patch once I'm done. Already
found a dumb bug in o2net_send_tcp_msg() while doing that - broken by me
back in 3.15 ;-/ Will send a fix to Linus in an hour or so...

> And indeed, double checking, it's purely a linux/socket.h thing.
>
> If this patch is OK, mind if I toss it into net-next Al?

Sure, no problem - AFAICS, the only real difference from rebase of April one
I've quoted upthread is that you add include of socket.h into skbuff.h; the
rest of the differences is pure whitespace noise.

Ping me when you put it there, OK? I'll rebase the rest of old stuff on
top of it (similar helpers, mostly).

2014-11-05 21:57:26

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

From: Al Viro <[email protected]>
Date: Wed, 5 Nov 2014 21:07:45 +0000

> Ping me when you put it there, OK? I'll rebase the rest of old stuff on
> top of it (similar helpers, mostly).

I just pushed it into net-next, thanks Al.

2014-11-06 03:25:41

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

On Wed, Nov 05, 2014 at 04:57:19PM -0500, David Miller wrote:
> From: Al Viro <[email protected]>
> Date: Wed, 5 Nov 2014 21:07:45 +0000
>
> > Ping me when you put it there, OK? I'll rebase the rest of old stuff on
> > top of it (similar helpers, mostly).
>
> I just pushed it into net-next, thanks Al.

OK, I've taken the beginning of the old queue on top of net-next; it's
in git://git.kernel.org//pub/scm/linux/kernel/git/viro/vfs.git iov_iter-net.

>From the quick look at the remaining ->msg_iov users:

* I'll need to add several iov_iter primitives - counterparts of
checksum.h stuff (copy_and_csum_{from,to}_iter(), maybe some more). Not
a big deal, I'll do that tomorrow. That will give us a clean iov_iter-based
counterpart of skb_copy_and_csum_datagram_iovec().

* a new helper: zerocopy_sg_from_iter(). I have it, actually,
but I'd rather not step on Herbert's toes - it's too close to the areas
his series will touch, so that's probably for when his series goes in.
It will be needed for complete macvtap conversion...

* why doesn't verify_iovec() use rw_copy_check_uvector()? The only
real differences I see is that (a) you do allocation in callers (same as
rw_copy_check_uvector() would've done), (b) you return EMSGSIZE in case of
too long vector, while rw_copy_check_uvector() returns EINVAL in that case
and (c) you don't do access_ok(). The last one is described as optimization,
but for iov_iter primitives it's a serious PITA - for iovec-backed instances
they are using __copy_from_user()/__copy_to_user(), etc.
It certainly would be nice to have the same code doing all copying
of iovecs from userland - readv/writev/aio/sendmsg/recvmsg/etc. Am I
missing something subtle semantical difference in there? EMSGSIZE vs EINVAL
is trivial (we can lift that check into the callers, if nothing else), but
I could miss something more interesting...

* various getfrag will need to grow iov_iter-based counterparts,
but ip_append_output() needs no changes, AFAICS.

* crypto stuff will be easy to convert - iov_iter_get_pages()
would suffice for a primitive

* there's some really weird stuff in there. Just what is this
static int raw_probe_proto_opt(struct flowi4 *fl4, struct msghdr *msg)
{
struct iovec *iov;
u8 __user *type = NULL;
u8 __user *code = NULL;
int probed = 0;
unsigned int i;

if (!msg->msg_iov)
return 0;

for (i = 0; i < msg->msg_iovlen; i++) {
iov = &msg->msg_iov[i];
if (!iov)
continue;
trying to do? "If non-NULL pointer + i somehow happened to be NULL, skip it
and try to use the same pointer + i + 1"? Huh? Had been that way since
the function first went in back in 2004 ("[IPV4] XFRM: probe icmp type/code
when sending packets via raw socket.", according to historical tree)...

* rds, bluetooth and vsock are doing something odd; need to RTFS some
more.

* not sure I understand what TIPC is doing - does it prohibit too
short first segment of ->msg_iov? net/tipc/socket.c:dest_name_check() looks
odd _and_ potentially racy - we read the same data twice and hope our checks
still apply. I asked TIPC folks about that race back in April, but it
looks like that fell through the cracks...

Overall, so far it looks more or less feasible - other than the missing csum
primitives, current mm/iov_iter.c should suffice. I have _not_ seriously
looked into sendpage yet; that might very well require some more.

2014-11-06 05:50:42

by Herbert Xu

[permalink] [raw]
Subject: ipv4: Use standard iovec primitive in raw_probe_proto_opt

On Thu, Nov 06, 2014 at 03:25:34AM +0000, Al Viro wrote:
>
> * there's some really weird stuff in there. Just what is this
> static int raw_probe_proto_opt(struct flowi4 *fl4, struct msghdr *msg)
> {

It looks like newbie coding that's all. There's nothing tricky
here as far as I can tell. We're just trying to fetch the ICMP
header to seed the IPsec lookup.

So how about this rewrite? I'm assuming that you're not going
to get rid of memcpy_fromiovecend/memcpy_toiovecend, if you
are, let me know and I'll redo this with iterators.

ipv4: Use standard iovec primitive in raw_probe_proto_opt

The function raw_probe_proto_opt tries to extract the first two
bytes from the user input in order to seed the IPsec lookup for
ICMP packets. In doing so it's processing iovec by hand and
overcomplicating things.

This patch replaces the manual iovec processing with a call to
memcpy_fromiovecend.

Signed-off-by: Herbert Xu <[email protected]>

diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 739db31..04f67e1 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -422,48 +422,20 @@ error:

static int raw_probe_proto_opt(struct flowi4 *fl4, struct msghdr *msg)
{
- struct iovec *iov;
- u8 __user *type = NULL;
- u8 __user *code = NULL;
- int probed = 0;
- unsigned int i;
+ struct icmphdr icmph;
+ int err;

- if (!msg->msg_iov)
+ if (fl4->flowi4_proto != IPPROTO_ICMP)
return 0;

- for (i = 0; i < msg->msg_iovlen; i++) {
- iov = &msg->msg_iov[i];
- if (!iov)
- continue;
-
- switch (fl4->flowi4_proto) {
- case IPPROTO_ICMP:
- /* check if one-byte field is readable or not. */
- if (iov->iov_base && iov->iov_len < 1)
- break;
-
- if (!type) {
- type = iov->iov_base;
- /* check if code field is readable or not. */
- if (iov->iov_len > 1)
- code = type + 1;
- } else if (!code)
- code = iov->iov_base;
-
- if (type && code) {
- if (get_user(fl4->fl4_icmp_type, type) ||
- get_user(fl4->fl4_icmp_code, code))
- return -EFAULT;
- probed = 1;
- }
- break;
- default:
- probed = 1;
- break;
- }
- if (probed)
- break;
- }
+ /* We only need the first two bytes. */
+ err = memcpy_fromiovecend((void *)&icmph, msg->msg_iov, 0, 2);
+ if (err)
+ return err;
+
+ fl4->fl4_icmp_type = icmph.type;
+ fl4->fl4_icmp_code = icmph.code;
+
return 0;
}


Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2014-11-06 06:43:24

by Al Viro

[permalink] [raw]
Subject: Re: ipv4: Use standard iovec primitive in raw_probe_proto_opt

On Thu, Nov 06, 2014 at 01:50:23PM +0800, Herbert Xu wrote:
> + /* We only need the first two bytes. */
> + err = memcpy_fromiovecend((void *)&icmph, msg->msg_iov, 0, 2);
> + if (err)
> + return err;
> +
> + fl4->fl4_icmp_type = icmph.type;
> + fl4->fl4_icmp_code = icmph.code;

That's more readable, but that exposes another problem in there - we read
the same piece of userland data twice, with no promise whatsoever that we'll
get the same value both times...

2014-11-06 06:46:39

by Herbert Xu

[permalink] [raw]
Subject: Re: ipv4: Use standard iovec primitive in raw_probe_proto_opt

On Thu, Nov 06, 2014 at 06:43:18AM +0000, Al Viro wrote:
> On Thu, Nov 06, 2014 at 01:50:23PM +0800, Herbert Xu wrote:
> > + /* We only need the first two bytes. */
> > + err = memcpy_fromiovecend((void *)&icmph, msg->msg_iov, 0, 2);
> > + if (err)
> > + return err;
> > +
> > + fl4->fl4_icmp_type = icmph.type;
> > + fl4->fl4_icmp_code = icmph.code;
>
> That's more readable, but that exposes another problem in there - we read
> the same piece of userland data twice, with no promise whatsoever that we'll
> get the same value both times...

Sure, but you have to be root anyway to write to raw sockets.

Patches are welcome :)

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2014-11-06 07:11:14

by Al Viro

[permalink] [raw]
Subject: Re: ipv4: Use standard iovec primitive in raw_probe_proto_opt

On Thu, Nov 06, 2014 at 02:46:29PM +0800, Herbert Xu wrote:
> On Thu, Nov 06, 2014 at 06:43:18AM +0000, Al Viro wrote:
> > On Thu, Nov 06, 2014 at 01:50:23PM +0800, Herbert Xu wrote:
> > > + /* We only need the first two bytes. */
> > > + err = memcpy_fromiovecend((void *)&icmph, msg->msg_iov, 0, 2);
> > > + if (err)
> > > + return err;
> > > +
> > > + fl4->fl4_icmp_type = icmph.type;
> > > + fl4->fl4_icmp_code = icmph.code;
> >
> > That's more readable, but that exposes another problem in there - we read
> > the same piece of userland data twice, with no promise whatsoever that we'll
> > get the same value both times...
>
> Sure, but you have to be root anyway to write to raw sockets.

Point, but that might very well be a pattern to watch for - there's at least
one more instance in TIPC (also not exploitable, according to TIPC folks)
and such bugs are easily repeated...

BTW, I've picked the tun and macvtap related bits from another part of old
queue; see vfs.git#untested-macvtap - it's on top of #iov_iter-net and it's
really completely untested. Back then I was mostly interested in killing
as many ->aio_write() instances as I could, so it's only the "send" side of
things.

2014-11-06 08:23:52

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

On Wed, Nov 05, 2014 at 03:24:10PM -0500, David Miller wrote:
>
> Herbert, please provide a cover letter for this series, and the most recent
> version of patch #2 gets various rejects when I try to apply it to net-next.

Sure, I'll regenerate them. However, while doing so I noticed that
a number of my patches on tun/macvtap that you have previously set
as accepted are missing from net-next. Could this be why you got
the rejects?

For example, this patch wasn't in net-next when I just did a pull.

https://patchwork.ozlabs.org/patch/405966/

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2014-11-06 08:27:14

by Herbert Xu

[permalink] [raw]
Subject: [PATCH 0/4] Replace skb_copy_datagram_const_iovec with iterator version

Hi Dave:

This patch series adds the helper skb_copy_datagram_iter, which
is meant to replace both skb_copy_datagram_iovec and its evil
twin skb_copy_datagram_const_iovec.

It then converts tun and macvtap over to the new helper and finally
removes skb_copy_datagram_const_iovec which is only used by tun
and macvtap.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2014-11-06 10:06:11

by Jon Maloy

[permalink] [raw]
Subject: RE: [PATCH 1/4] inet: Add skb_copy_datagram_iter



> -----Original Message-----
> From: [email protected] [mailto:netdev-
> [email protected]] On Behalf Of Al Viro
> Sent: November-06-14 4:26 AM
> To: David Miller
> Cc: [email protected]; [email protected]; linux-
> [email protected]; [email protected]
> Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter
>
> On Wed, Nov 05, 2014 at 04:57:19PM -0500, David Miller wrote:
> > From: Al Viro <[email protected]>
> > Date: Wed, 5 Nov 2014 21:07:45 +0000
> >
> > > Ping me when you put it there, OK? I'll rebase the rest of old
> > > stuff on top of it (similar helpers, mostly).
> >
> > I just pushed it into net-next, thanks Al.
>
> OK, I've taken the beginning of the old queue on top of net-next; it's in
> git://git.kernel.org//pub/scm/linux/kernel/git/viro/vfs.git iov_iter-net.
>
> From the quick look at the remaining ->msg_iov users:
>
> * I'll need to add several iov_iter primitives - counterparts of
> checksum.h stuff (copy_and_csum_{from,to}_iter(), maybe some more).
> Not a big deal, I'll do that tomorrow. That will give us a clean iov_iter-based
> counterpart of skb_copy_and_csum_datagram_iovec().
>
> * a new helper: zerocopy_sg_from_iter(). I have it, actually, but I'd
> rather not step on Herbert's toes - it's too close to the areas his series will
> touch, so that's probably for when his series goes in.
> It will be needed for complete macvtap conversion...
>
> * why doesn't verify_iovec() use rw_copy_check_uvector()? The
> only real differences I see is that (a) you do allocation in callers (same as
> rw_copy_check_uvector() would've done), (b) you return EMSGSIZE in case
> of too long vector, while rw_copy_check_uvector() returns EINVAL in that
> case and (c) you don't do access_ok(). The last one is described as
> optimization, but for iov_iter primitives it's a serious PITA - for iovec-backed
> instances they are using __copy_from_user()/__copy_to_user(), etc.
> It certainly would be nice to have the same code doing all copying of
> iovecs from userland - readv/writev/aio/sendmsg/recvmsg/etc. Am I
> missing something subtle semantical difference in there? EMSGSIZE vs
> EINVAL is trivial (we can lift that check into the callers, if nothing else), but I
> could miss something more interesting...
>
> * various getfrag will need to grow iov_iter-based counterparts, but
> ip_append_output() needs no changes, AFAICS.
>
> * crypto stuff will be easy to convert - iov_iter_get_pages() would
> suffice for a primitive
>
> * there's some really weird stuff in there. Just what is this static int
> raw_probe_proto_opt(struct flowi4 *fl4, struct msghdr *msg) {
> struct iovec *iov;
> u8 __user *type = NULL;
> u8 __user *code = NULL;
> int probed = 0;
> unsigned int i;
>
> if (!msg->msg_iov)
> return 0;
>
> for (i = 0; i < msg->msg_iovlen; i++) {
> iov = &msg->msg_iov[i];
> if (!iov)
> continue;
> trying to do? "If non-NULL pointer + i somehow happened to be NULL, skip it
> and try to use the same pointer + i + 1"? Huh? Had been that way since the
> function first went in back in 2004 ("[IPV4] XFRM: probe icmp type/code
> when sending packets via raw socket.", according to historical tree)...
>
> * rds, bluetooth and vsock are doing something odd; need to RTFS
> some more.
>
> * not sure I understand what TIPC is doing - does it prohibit too short
> first segment of ->msg_iov?

Yes, that is the purpose. It was needed in early versions of TIPC, which was
using TIPC itself, instead of netlink, as carrier of configuration commands.
This option is long gone, and we can safely remove those checks. I'll
post a patch.

///jon

net/tipc/socket.c:dest_name_check() looks
> odd _and_ potentially racy - we read the same data twice and hope our
> checks still apply. I asked TIPC folks about that race back in April, but it looks
> like that fell through the cracks...
>
> Overall, so far it looks more or less feasible - other than the missing csum
> primitives, current mm/iov_iter.c should suffice. I have _not_ seriously
> looked into sendpage yet; that might very well require some more.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in the body
> of a message to [email protected] More majordomo info at
> http://vger.kernel.org/majordomo-info.html

2014-11-06 10:10:38

by Jon Maloy

[permalink] [raw]
Subject: RE: ipv4: Use standard iovec primitive in raw_probe_proto_opt



> -----Original Message-----
> From: [email protected] [mailto:netdev-
> [email protected]] On Behalf Of Al Viro
> Sent: November-06-14 8:11 AM
> To: Herbert Xu
> Cc: David Miller; [email protected]; [email protected];
> [email protected]; Masahide Nakamura; Hideaki YOSHIFUJI
> Subject: Re: ipv4: Use standard iovec primitive in raw_probe_proto_opt
>
> On Thu, Nov 06, 2014 at 02:46:29PM +0800, Herbert Xu wrote:
> > On Thu, Nov 06, 2014 at 06:43:18AM +0000, Al Viro wrote:
> > > On Thu, Nov 06, 2014 at 01:50:23PM +0800, Herbert Xu wrote:
> > > > + /* We only need the first two bytes. */
> > > > + err = memcpy_fromiovecend((void *)&icmph, msg->msg_iov, 0, 2);
> > > > + if (err)
> > > > + return err;
> > > > +
> > > > + fl4->fl4_icmp_type = icmph.type;
> > > > + fl4->fl4_icmp_code = icmph.code;
> > >
> > > That's more readable, but that exposes another problem in there - we
> > > read the same piece of userland data twice, with no promise
> > > whatsoever that we'll get the same value both times...
> >
> > Sure, but you have to be root anyway to write to raw sockets.
>
> Point, but that might very well be a pattern to watch for - there's at least one
> more instance in TIPC (also not exploitable, according to TIPC folks) and such

I don't recall this, and I can't see where it would be either. Can you please
point to where it is?

///jon

> bugs are easily repeated...
>
> BTW, I've picked the tun and macvtap related bits from another part of old
> queue; see vfs.git#untested-macvtap - it's on top of #iov_iter-net and it's
> really completely untested. Back then I was mostly interested in killing as
> many ->aio_write() instances as I could, so it's only the "send" side of things.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in the body
> of a message to [email protected] More majordomo info at
> http://vger.kernel.org/majordomo-info.html

2014-11-06 17:25:06

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

From: Herbert Xu <[email protected]>
Date: Thu, 6 Nov 2014 16:23:38 +0800

> On Wed, Nov 05, 2014 at 03:24:10PM -0500, David Miller wrote:
>>
>> Herbert, please provide a cover letter for this series, and the most recent
>> version of patch #2 gets various rejects when I try to apply it to net-next.
>
> Sure, I'll regenerate them. However, while doing so I noticed that
> a number of my patches on tun/macvtap that you have previously set
> as accepted are missing from net-next. Could this be why you got
> the rejects?

Those were bug fixes so went into plain 'net', they will show up next
time I do a merge and I will deal with the conflicts, if any.

2014-11-06 21:28:16

by David Miller

[permalink] [raw]
Subject: Re: ipv4: Use standard iovec primitive in raw_probe_proto_opt

From: Herbert Xu <[email protected]>
Date: Thu, 6 Nov 2014 14:46:29 +0800

> On Thu, Nov 06, 2014 at 06:43:18AM +0000, Al Viro wrote:
>> On Thu, Nov 06, 2014 at 01:50:23PM +0800, Herbert Xu wrote:
>> > + /* We only need the first two bytes. */
>> > + err = memcpy_fromiovecend((void *)&icmph, msg->msg_iov, 0, 2);
>> > + if (err)
>> > + return err;
>> > +
>> > + fl4->fl4_icmp_type = icmph.type;
>> > + fl4->fl4_icmp_code = icmph.code;
>>
>> That's more readable, but that exposes another problem in there - we read
>> the same piece of userland data twice, with no promise whatsoever that we'll
>> get the same value both times...
>
> Sure, but you have to be root anyway to write to raw sockets.
>
> Patches are welcome :)

I'd agree with this root-only argument maybe 15 years ago, but with
containers and stuff like that we want to prevent root X from messing
up the machine for root Y.

This is a recurring topic, and I'd strongly like to avoid adding new
ways that these kinds of problems can happen.

For example, I'm still on the hook to address the AF_NETLINK mmap TX
code, which has a similarly abusable issue.

2014-11-06 22:16:18

by Al Viro

[permalink] [raw]
Subject: Re: ipv4: Use standard iovec primitive in raw_probe_proto_opt

On Thu, Nov 06, 2014 at 09:55:31AM +0000, Jon Maloy wrote:
> > Point, but that might very well be a pattern to watch for - there's at least one
> > more instance in TIPC (also not exploitable, according to TIPC folks) and such
>
> I don't recall this, and I can't see where it would be either. Can you please
> point to where it is?

The same dest_name_check() thing. This
if (copy_from_user(&hdr, m->msg_iov[0].iov_base, sizeof(hdr)))
return -EFAULT;
if ((ntohs(hdr.tcm_type) & 0xC000) && (!capable(CAP_NET_ADMIN)))
return -EACCES;
is easily bypassed. Suppose you want to send a packet with these two
bits in ->tcm_type not being 00, and you don't have CAP_NET_ADMIN.
Not a problem - spawn two threads sharing memory, have one trying to
call sendmsg() while another keeps flipping these two bits. Sooner
of later you'll get the timing right and have these bits observed as 00
in dest_name_check() and 11 when it comes to memcpy_fromiovecend() actually
copying the whole thing. And considering that the interval between those
two is much longer than the loop in the second thread would take on
each iteration, I'd expect the odds around 25% per attempted sendmsg().

IOW, this test is either pointless and can be removed completely, or there's
an exploitable race. As far as I understand from your replies both back then
and in another branch of this thread, it's the former and the proper fix is
to remove at least that part of dest_name_check(). So this case is also
not something exploitable, but it certainly matches the same pattern.

My point was simply that this pattern is worth watching for - recurrent bug
classes like that have a good chance to spawn an instance that will be
exploitable.

2014-11-07 01:59:53

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

On Thu, Nov 06, 2014 at 12:25:00PM -0500, David Miller wrote:
> From: Herbert Xu <[email protected]>
> Date: Thu, 6 Nov 2014 16:23:38 +0800
>
> > On Wed, Nov 05, 2014 at 03:24:10PM -0500, David Miller wrote:
> >>
> >> Herbert, please provide a cover letter for this series, and the most recent
> >> version of patch #2 gets various rejects when I try to apply it to net-next.
> >
> > Sure, I'll regenerate them. However, while doing so I noticed that
> > a number of my patches on tun/macvtap that you have previously set
> > as accepted are missing from net-next. Could this be why you got
> > the rejects?
>
> Those were bug fixes so went into plain 'net', they will show up next
> time I do a merge and I will deal with the conflicts, if any.

I see. In that case it might be best to wait until those fixes hit
net-next first before applying these patches as otherwise Stephen will
get hit with some nasty merge conflicts.

I'll repost them for RFC with the problems that Al pointed out fixed
in the mean time.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2014-11-07 02:00:35

by Herbert Xu

[permalink] [raw]
Subject: Re: ipv4: Use standard iovec primitive in raw_probe_proto_opt

On Thu, Nov 06, 2014 at 04:28:08PM -0500, David Miller wrote:
> From: Herbert Xu <[email protected]>
> Date: Thu, 6 Nov 2014 14:46:29 +0800
>
> > On Thu, Nov 06, 2014 at 06:43:18AM +0000, Al Viro wrote:
> >> On Thu, Nov 06, 2014 at 01:50:23PM +0800, Herbert Xu wrote:
> >> > + /* We only need the first two bytes. */
> >> > + err = memcpy_fromiovecend((void *)&icmph, msg->msg_iov, 0, 2);
> >> > + if (err)
> >> > + return err;
> >> > +
> >> > + fl4->fl4_icmp_type = icmph.type;
> >> > + fl4->fl4_icmp_code = icmph.code;
> >>
> >> That's more readable, but that exposes another problem in there - we read
> >> the same piece of userland data twice, with no promise whatsoever that we'll
> >> get the same value both times...
> >
> > Sure, but you have to be root anyway to write to raw sockets.
> >
> > Patches are welcome :)
>
> I'd agree with this root-only argument maybe 15 years ago, but with
> containers and stuff like that we want to prevent root X from messing
> up the machine for root Y.
>
> This is a recurring topic, and I'd strongly like to avoid adding new
> ways that these kinds of problems can happen.
>
> For example, I'm still on the hook to address the AF_NETLINK mmap TX
> code, which has a similarly abusable issue.

Fair enough. Even though the bug existed prior to my patch I'll
see if we could get rid of it.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2014-11-07 03:13:19

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

From: Herbert Xu <[email protected]>
Date: Fri, 7 Nov 2014 09:59:44 +0800

> In that case it might be best to wait until those fixes hit net-next
> first before applying these patches as otherwise Stephen will get
> hit with some nasty merge conflicts.

I just merged net into net-next, so this barrier should no longer
be present.

Thanks.

2014-11-07 13:21:52

by Herbert Xu

[permalink] [raw]
Subject: [PATCH 0/4] Replace skb_copy_datagram_const_iovec with iterator version

Hi Dave:

This patch series adds the helper skb_copy_datagram_iter, which
is meant to replace both skb_copy_datagram_iovec and its evil
twin skb_copy_datagram_const_iovec.

It then converts tun and macvtap over to the new helper and finally
removes skb_copy_datagram_const_iovec which is only used by tun
and macvtap.

The copy_to_iter return value issue pointed out by Al has now been
fixed.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2014-11-07 13:26:13

by Herbert Xu

[permalink] [raw]
Subject: [PATCH 0/2] ipv4: Simplify raw_probe_proto_opt and avoid reading user iov twice

Hi Dave:

This series rewrites the function raw_probe_proto_opt in a more
readable fasion, and then fixes the long-standing bug where we
read the probed bytes twice which means that what we're using to
probe may in fact be invalid.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2014-11-07 21:49:09

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

From: Al Viro <[email protected]>
Date: Thu, 6 Nov 2014 03:25:34 +0000

> OK, I've taken the beginning of the old queue on top of net-next; it's
> in git://git.kernel.org//pub/scm/linux/kernel/git/viro/vfs.git iov_iter-net.

What I see in there looks good. I wonder if we can somehow make msghdr
pointer args const... but this is not so important now.

Some minor coding style nits, comments:

/* Like
* this.
*/

and for multi-line function calls in the networking, align the second
and subsequent lines at the first column after the openning parenthesis
of the first line.

Thanks.

2014-11-07 21:52:23

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

From: Al Viro <[email protected]>
Date: Thu, 6 Nov 2014 03:25:34 +0000

> * a new helper: zerocopy_sg_from_iter(). I have it, actually,
> but I'd rather not step on Herbert's toes - it's too close to the areas
> his series will touch, so that's probably for when his series goes in.
> It will be needed for complete macvtap conversion...

Just a heads up, his series is applied to net-next.

> * why doesn't verify_iovec() use rw_copy_check_uvector()? The only
> real differences I see is that (a) you do allocation in callers (same as
> rw_copy_check_uvector() would've done), (b) you return EMSGSIZE in case of
> too long vector, while rw_copy_check_uvector() returns EINVAL in that case
> and (c) you don't do access_ok(). The last one is described as optimization,
> but for iov_iter primitives it's a serious PITA - for iovec-backed instances
> they are using __copy_from_user()/__copy_to_user(), etc.

The answer is that nobody knew abuot it and looked, that's why.

> It certainly would be nice to have the same code doing all copying
> of iovecs from userland - readv/writev/aio/sendmsg/recvmsg/etc. Am I
> missing something subtle semantical difference in there? EMSGSIZE vs EINVAL
> is trivial (we can lift that check into the callers, if nothing else), but
> I could miss something more interesting...

We also need compat counterparts.

> * various getfrag will need to grow iov_iter-based counterparts,
> but ip_append_output() needs no changes, AFAICS.

Right.

> * there's some really weird stuff in there. Just what is this
> static int raw_probe_proto_opt(struct flowi4 *fl4, struct msghdr *msg)
> {
> struct iovec *iov;
> u8 __user *type = NULL;
> u8 __user *code = NULL;
> int probed = 0;
> unsigned int i;
>
> if (!msg->msg_iov)
> return 0;
>
> for (i = 0; i < msg->msg_iovlen; i++) {
> iov = &msg->msg_iov[i];
> if (!iov)
> continue;
> trying to do? "If non-NULL pointer + i somehow happened to be NULL, skip it
> and try to use the same pointer + i + 1"? Huh? Had been that way since
> the function first went in back in 2004 ("[IPV4] XFRM: probe icmp type/code
> when sending packets via raw socket.", according to historical tree)...

This is probably just bogus, because this address-of will never evaluate to
NULL.

> * rds, bluetooth and vsock are doing something odd; need to RTFS some
> more.

It is not surprising.... :-/

2014-11-07 22:11:21

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

On Fri, Nov 07, 2014 at 04:48:59PM -0500, David Miller wrote:
> From: Al Viro <[email protected]>
> Date: Thu, 6 Nov 2014 03:25:34 +0000
>
> > OK, I've taken the beginning of the old queue on top of net-next; it's
> > in git://git.kernel.org//pub/scm/linux/kernel/git/viro/vfs.git iov_iter-net.
>
> What I see in there looks good. I wonder if we can somehow make msghdr
> pointer args const... but this is not so important now.
>
> Some minor coding style nits, comments:
>
> /* Like
> * this.
> */
>
> and for multi-line function calls in the networking, align the second
> and subsequent lines at the first column after the openning parenthesis
> of the first line.

OK... I played with csum side of things a bit, and I've noticed something
really nasty - iov_iter primitives all assume that iovec has been validated,
_including_ access_ok() on all ranges. That allows us to use __copy_...()
in primitives, and on the read/write/readv/writev/aio side of things we have
that validation done when we copy iovec from userland (or set a single-element
iovec over the userland-supplied range, as in read(2)/write(2)).

net/* primitives, OTOH, do access_ok() themselves - syscalls do _not_ check
access_ok() on iovec from untrusted source and rely on the low-level stuff
to do such checks.

Result: regular IO syscalls on sockets (i.e. not recvmsg/sendmsg, usual
read/write) do these checks (at least) twice and use of copy_from_iter()
in ->recvmsg() opens quite a nasty hole - one can call recvmsg(2) with
kernel address in ->msg_iov[0].base and have such an instance of ->recvmsg()
stomp on the kernel memory. At the very least, with Herbert's patches
we need to validate that somewhere on the way to tun and macvtap recvmsg
instances. We can do that right there, and as a stopgap measure it might
be a good idea. However, it's not a sane long-term solution.

We could, of course, add those access_ok() in mm/iov_iter.c and drop them
from fs/read_write.c and fs/aio.c, but I really don't see the point - why
not do that along with the checks we do in verify_iovec() anyway? And drop
them from memcpy_fromiovec() et.al.

I'm looking through the tree right now; so far it looks like we can just
move those suckers to the point where we validate iovec and lose them
from low-level iovec and csum copying completely. I still haven't finished
tracing all possible paths for address to arrive at the points where we
currently check that stuff, but so far it looks very doable.

Comments?

2014-11-07 22:31:58

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

On Fri, Nov 07, 2014 at 10:11:14PM +0000, Al Viro wrote:

> I'm looking through the tree right now; so far it looks like we can just
> move those suckers to the point where we validate iovec and lose them
> from low-level iovec and csum copying completely. I still haven't finished
> tracing all possible paths for address to arrive at the points where we
> currently check that stuff, but so far it looks very doable.

BTW, csum side of that is also chock-full of duplicate access_ok() -
e.g. generic csum_and_copy_from_user() checks before calling
csum_partial_copy_from_user(). And generic instance of that is using
__copy_from_user(), all right, but a _lot_ of non-default instances
repeat that access_ok().

2014-11-07 22:36:03

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

On Fri, Nov 07, 2014 at 10:31:53PM +0000, Al Viro wrote:
> On Fri, Nov 07, 2014 at 10:11:14PM +0000, Al Viro wrote:
>
> > I'm looking through the tree right now; so far it looks like we can just
> > move those suckers to the point where we validate iovec and lose them
> > from low-level iovec and csum copying completely. I still haven't finished
> > tracing all possible paths for address to arrive at the points where we
> > currently check that stuff, but so far it looks very doable.
>
> BTW, csum side of that is also chock-full of duplicate access_ok() -
> e.g. generic csum_and_copy_from_user() checks before calling
> csum_partial_copy_from_user(). And generic instance of that is using
> __copy_from_user(), all right, but a _lot_ of non-default instances
> repeat that access_ok().

While we are at it: here's the default csum_and_copy_to_user()
static __inline__ __wsum csum_and_copy_to_user
(const void *src, void __user *dst, int len, __wsum sum, int *err_ptr)
{
sum = csum_partial(src, len, sum);

if (access_ok(VERIFY_WRITE, dst, len)) {
if (copy_to_user(dst, src, len) == 0)
return sum;
}
if (len)
*err_ptr = -EFAULT;

return (__force __wsum)-1; /* invalid checksum */
}

Note that we do that access_ok() and follow it with copy_to_user() on exact
same range, i.e. repeat the same damn check...

2014-11-07 23:43:00

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

On Fri, Nov 07, 2014 at 10:11:14PM +0000, Al Viro wrote:

> I'm looking through the tree right now; so far it looks like we can just
> move those suckers to the point where we validate iovec and lose them
> from low-level iovec and csum copying completely. I still haven't finished
> tracing all possible paths for address to arrive at the points where we
> currently check that stuff, but so far it looks very doable.

Definitely doable. The only remaining interesting part is drivers/vhost
with the stuff it puts in vq->iov[]. If we can guarantee that it satisfies
the sanity checks (access_ok() and size-related ones), we are done -
making verify_iovec() use rw_copy_check_uvector() (and verify_compat_iov()
use compat_rw_copy_check_uvector()) will suffice to guarantee that none of
csum_partial_copy_fromiovecend
memcpy_fromiovec
memcpy_toiovec
memcpy_toiovecend
memcpy_fromiovecend
skb_copy_datagram_iovec
skb_copy_datagram_iter
skb_copy_datagram_from_iter
zerocopy_sg_from_iter
skb_copy_and_csum_datagram
skb_copy_and_csum_datagram_iovec
csum_and_copy_from_user
csum_and_copy_to_user
csum_partial_copy_from_user
will ever see an address that doesn't satisfy access_ok() checks. And
having looked at the data flow... we definitely want to do those checks
on intake of iovec - as it is, we usually repeat them quite a few times
for the same iovec segment, and we practically never end up _not_ doing them
for some segment of iovec, unless we hit a failure exit before we get around
to copying any data at all.

I'll finish RTFS drivers/vhost and if it turns out to be OK I'll post the
series moving those checks to the moment of copying iovec from userland,
so that kernel-side we could always rely on ->msg_iov elements having been
verified.

2014-11-08 02:21:32

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

On Fri, Nov 07, 2014 at 11:42:53PM +0000, Al Viro wrote:
> I'll finish RTFS drivers/vhost and if it turns out to be OK I'll post the
> series moving those checks to the moment of copying iovec from userland,
> so that kernel-side we could always rely on ->msg_iov elements having been
> verified.

Great thanks!

Dave, please hold off on my skb_copy_datagram_iter series until
this verify_iovec change is added.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2014-11-09 21:19:16

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

[Michael Cc'd]

On Fri, Nov 07, 2014 at 11:42:53PM +0000, Al Viro wrote:

> I'll finish RTFS drivers/vhost and if it turns out to be OK I'll post the
> series moving those checks to the moment of copying iovec from userland,
> so that kernel-side we could always rely on ->msg_iov elements having been
> verified.

Two questions:
1) does sparc64 access_ok() need to differ for 32bit and 64bit tasks?
AFAICS, x86 and ppc just check that address is OK for 64bit process -
if a 32bit process passes the kernel an address that would be valid
for 64bit process, but not for 32bit one, we just get a pagefault in
__copy_from_user() and friends. No kernel objects are going to have
a virtual address in that range, so access_ok() doesn't bother preventing
such access attempts there...

2) shouldn't vhost_dev_cleanup() stop the worker thread before doing anything
else? AFAICS, we do parts of vhost_dev teardown while the thread is
still running; granted, we keep dev->mm pinned down until after it stops
(or we would be _really_ screwed), but is it safe to do all those fput()s, etc.
while it's still running? Michael?

2014-11-10 05:20:29

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

From: Al Viro <[email protected]>
Date: Sun, 9 Nov 2014 21:19:08 +0000

> 1) does sparc64 access_ok() need to differ for 32bit and 64bit tasks?

sparc64 will just fault no matter what kind of task it is.

It is impossible for a user task to generate a reference to
a kernel virtual address, as kernel and user accesses each
go via a separate address space identifier.

2014-11-10 06:58:23

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

On Mon, Nov 10, 2014 at 12:20:20AM -0500, David Miller wrote:
> From: Al Viro <[email protected]>
> Date: Sun, 9 Nov 2014 21:19:08 +0000
>
> > 1) does sparc64 access_ok() need to differ for 32bit and 64bit tasks?
>
> sparc64 will just fault no matter what kind of task it is.
>
> It is impossible for a user task to generate a reference to
> a kernel virtual address, as kernel and user accesses each
> go via a separate address space identifier.

Sure, but why do we have access_ok() there at all? I.e. why not just have
it constant 1?

2014-11-10 07:30:07

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

From: Al Viro <[email protected]>
Date: Mon, 10 Nov 2014 06:58:17 +0000

> On Mon, Nov 10, 2014 at 12:20:20AM -0500, David Miller wrote:
>> From: Al Viro <[email protected]>
>> Date: Sun, 9 Nov 2014 21:19:08 +0000
>>
>> > 1) does sparc64 access_ok() need to differ for 32bit and 64bit tasks?
>>
>> sparc64 will just fault no matter what kind of task it is.
>>
>> It is impossible for a user task to generate a reference to
>> a kernel virtual address, as kernel and user accesses each
>> go via a separate address space identifier.
>
> Sure, but why do we have access_ok() there at all? I.e. why not just have
> it constant 1?

Since access_ok() is in fact constant 1 on sparc64, where we use it,
does it really matter?

2014-11-10 09:10:01

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

On Mon, Nov 10, 2014 at 02:30:00AM -0500, David Miller wrote:
> From: Al Viro <[email protected]>
> Date: Mon, 10 Nov 2014 06:58:17 +0000
>
> > On Mon, Nov 10, 2014 at 12:20:20AM -0500, David Miller wrote:
> >> From: Al Viro <[email protected]>
> >> Date: Sun, 9 Nov 2014 21:19:08 +0000
> >>
> >> > 1) does sparc64 access_ok() need to differ for 32bit and 64bit tasks?
> >>
> >> sparc64 will just fault no matter what kind of task it is.
> >>
> >> It is impossible for a user task to generate a reference to
> >> a kernel virtual address, as kernel and user accesses each
> >> go via a separate address space identifier.
> >
> > Sure, but why do we have access_ok() there at all? I.e. why not just have
> > it constant 1?
>
> Since access_ok() is in fact constant 1 on sparc64, where we use it,
> does it really matter?

*blink*

My apologies - I've got confused by the maze of twisty includes, all alike.
Right you are; in biarch case it *doesn't* depend on 32bit vs. 64bit.
STACK_TOP-using one is sparc32 variant where we obviously don't have
biarch at all.

Anyway, the series switching to {compat_,}rw_copy_check_uvector() and
getting rid of duplicate checks is in vfs.git#iov_iter-net. Warning:
it's almost completely untested. It survives boot, ssh into it and
runltp -f syscalls (no regressions), but that's about it. BTW, what's
the usual regression suite used for net/* stuff?

3 commits in there, on top of net-next#master; head should be at 555126.
There's a bunch of fairly obvious followups becoming possible after that,
but let's keep those separate...

2014-11-10 10:14:21

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

On Sun, Nov 09, 2014 at 09:19:08PM +0000, Al Viro wrote:
> [Michael Cc'd]
>
> On Fri, Nov 07, 2014 at 11:42:53PM +0000, Al Viro wrote:
>
> > I'll finish RTFS drivers/vhost and if it turns out to be OK I'll post the
> > series moving those checks to the moment of copying iovec from userland,
> > so that kernel-side we could always rely on ->msg_iov elements having been
> > verified.
>
> Two questions:
> 1) does sparc64 access_ok() need to differ for 32bit and 64bit tasks?
> AFAICS, x86 and ppc just check that address is OK for 64bit process -
> if a 32bit process passes the kernel an address that would be valid
> for 64bit process, but not for 32bit one, we just get a pagefault in
> __copy_from_user() and friends. No kernel objects are going to have
> a virtual address in that range, so access_ok() doesn't bother preventing
> such access attempts there...
>
> 2) shouldn't vhost_dev_cleanup() stop the worker thread before doing anything
> else?
> AFAICS, we do parts of vhost_dev teardown while the thread is
> still running; granted, we keep dev->mm pinned down until after it stops
> (or we would be _really_ screwed), but is it safe to do all those fput()s, etc.
> while it's still running? Michael?

Before invoking vhost_dev_cleanup,
the caller for vhost-net (vhost_net_release) sets private data to NULL
(using vhost_net_stop_vq) which guarantees thread will do nothing at all.
vhost scsi does it in vhost_scsi_clear_endpoint.


--
MST

2014-11-10 16:19:12

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

From: Al Viro <[email protected]>
Date: Mon, 10 Nov 2014 09:09:55 +0000

> BTW, what's the usual regression suite used for net/* stuff?

There's a regression suite? :-)

2014-11-10 19:26:24

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 0/2] ipv4: Simplify raw_probe_proto_opt and avoid reading user iov twice

From: Herbert Xu <[email protected]>
Date: Fri, 7 Nov 2014 21:25:53 +0800

> This series rewrites the function raw_probe_proto_opt in a more
> readable fasion, and then fixes the long-standing bug where we
> read the probed bytes twice which means that what we're using to
> probe may in fact be invalid.

Series applied to net-next, thanks Herbert.

2014-11-28 05:14:30

by Al Viro

[permalink] [raw]
Subject: Re: ipv4: Use standard iovec primitive in raw_probe_proto_opt

On Thu, Nov 06, 2014 at 10:16:08PM +0000, Al Viro wrote:
> On Thu, Nov 06, 2014 at 09:55:31AM +0000, Jon Maloy wrote:
> > > Point, but that might very well be a pattern to watch for - there's at least one
> > > more instance in TIPC (also not exploitable, according to TIPC folks) and such
> >
> > I don't recall this, and I can't see where it would be either. Can you please
> > point to where it is?
>
> The same dest_name_check() thing. This
> if (copy_from_user(&hdr, m->msg_iov[0].iov_base, sizeof(hdr)))
> return -EFAULT;
> if ((ntohs(hdr.tcm_type) & 0xC000) && (!capable(CAP_NET_ADMIN)))
> return -EACCES;
> is easily bypassed. Suppose you want to send a packet with these two
> bits in ->tcm_type not being 00, and you don't have CAP_NET_ADMIN.
> Not a problem - spawn two threads sharing memory, have one trying to
> call sendmsg() while another keeps flipping these two bits. Sooner
> of later you'll get the timing right and have these bits observed as 00
> in dest_name_check() and 11 when it comes to memcpy_fromiovecend() actually
> copying the whole thing. And considering that the interval between those
> two is much longer than the loop in the second thread would take on
> each iteration, I'd expect the odds around 25% per attempted sendmsg().
>
> IOW, this test is either pointless and can be removed completely, or there's
> an exploitable race. As far as I understand from your replies both back then
> and in another branch of this thread, it's the former and the proper fix is
> to remove at least that part of dest_name_check(). So this case is also
> not something exploitable, but it certainly matches the same pattern.
>
> My point was simply that this pattern is worth watching for - recurrent bug
> classes like that have a good chance to spawn an instance that will be
> exploitable.

Ping? Can we simply remove dest_name_check() completely? That's one of the
few remaining obstacles to making ->sendmsg() iov_iter-clean. For now I'm
simply commenting its call out in tipc_sendmsg(); if it _is_ needed for
anything, we'll need to get rid of that double copying from userland. I can
do that, but my impression from your comments back in April is that you
planned to removed the damn check anyway.

Another question: in tipc_send_stream() we have
mtu = tsk->max_pkt;
send = min_t(uint, dsz - sent, TIPC_MAX_USER_MSG_SIZE);
__skb_queue_head_init(&head);
rc = tipc_msg_build(mhdr, m, sent, send, mtu, &head);
if (unlikely(rc < 0))
goto exit;
do {
if (likely(!tsk_conn_cong(tsk))) {
rc = tipc_link_xmit(&head, dnode, ref);
if (likely(!rc)) {
tsk->sent_unacked++;
sent += send;
if (sent == dsz)
break;
goto next;
}
if (rc == -EMSGSIZE) {
tsk->max_pkt = tipc_node_get_mtu(dnode, ref);
goto next;
}

How can it hit that EMSGSIZE? AFAICS, it can come only from
int __tipc_link_xmit(struct tipc_link *link, struct sk_buff_head *list)
{
struct tipc_msg *msg = buf_msg(skb_peek(list));
uint psz = msg_size(msg);
...
uint mtu = link->max_pkt;
...
/* Has valid packet limit been used ? */
if (unlikely(psz > mtu)) {
__skb_queue_purge(list);
return -EMSGSIZE;
}

and msg_size() is basically the bits copied into skb by tipc_msg_build() and
set by msg_set_size() in there. And unless I'm seriously misreading that
function, it can't be more than pktmax argument, i.e. mtu. So unless something
manages to crap into our skb or change mtu right under us, it shouldn't be
possible. And mtu (i.e. ->max_pkt) ought to be protected by lock_sock() there.

What's going on there?